November 13, 2005

Dabbling with 2.6 kernel

... or, rather my flounderings with 2.6 kernel.

Anyway, it has finally come to that. My Linux enthusiasm has steadily grown in the past two years since moving all my software development on Linux, and after having flirted with it over 10 years. The thing that finally convinced me (about Linux) was that for the first half of a year, I needed to boot my dev machine twice, both times because of a power failure. But even in Linux land, it's not always happy happy joy joy.

I recently bought a dual-Opteron machine running on Tyan S2892 with ECC DRAM and all. I got it with 64bit Red Hat 3.0 AS (using 2.4x kernel) with SATA support pre-installed and it was running okay for awhile, but there were a few things I was missing. The support for the on-board graphics chipset (from ATI) was sad to say the least, and NUMA (Non-Unified Memory Access) was only introduced in the later kernel versions.

Well then, the road was open to choose from any distro with more up-to-date kernels, but I knew I didn't want to run any unproven, unstable versions. After having taken a look at Suse (I was uneasy about Novell's flaky commitment to continue it) and both Debian and Ubuntu (a very nice distro btw), I decided on Fedora. Red Hat was already proven and familiar distro to me and Fedora community is big. The Fedora Core 4 was introduced recently so I figured, why not use the latest stable version.

I wanted RAID 1 (mirroring), but software RAID was acceptable. Tyan S2892 comes with nvRaid, but as I found out, it's really a plain software RAID and Nvidia doesn't support nvRaid on Linux at all. So, while support for BIOS assisted RAID on Linux exists in the form of dmraid, why would I use it instead of just the standard and proven software RAID that the Linux kernel provides.

I installed FC4 with LVM, a very useful technology adhering to the Butler Lampson's famous quote "all problems in computer science can be solved by another level of indirection". I only wish that there was an easy way to save my custom package selections to be installed (anyone know of one?), but anyway, when I was done tweaking the install options and feeding the CDs to the installer, I booted my rebuilt computer for the first time. Of course, ending up with a daunting message "kernel PANIC".

With Windows, you'd probably be pretty much hosed at this point, but since it's Linux, and everything had seemed to go fine during installation, I figured it really must be in the kernel (FC4 comes currently with 2.6.11 though there's 2.6.13 available through up2date). And sure enough, the non-smp kernel seemed to boot up just fine. However, there's a number of kernel versions and each compilation takes *time* so I didn't want to just blindly try out different ones till I'd find one that works. I had to find a reason for the panic or at least a good educated guess to try out new ones. Unsuccessfully, I tried of course the newer, unstable ones just to see if it fixed the problem, but it was finally a Usenet post from my fellow countryman about smp panic, that lead to me on to the right track.

There's some bugs in the implementation of NUMA and the default memory model in 2.6.11 that don't seem to happen in 2.6.10. I had lots of precious lessons learned: Even the stable community distros are not that stable (for a commercial distro, like AS, Red Hat would have undoubtedly taken a lot of heat because of something like this); kernel development and compilation is really not that different from any other software development and lastly, Linux kernel panic is not necessarily big deal. Though, to fix a kernel problem, it can take a lot of time surfing and reading posts from Linux & co.

Posted by thoughts at November 13, 2005 05:38 PM | TrackBack
Comments
Post a comment









Remember personal info?