Linux distributions


[ Site Index] [ Linux Index] [ Feedback ]


Linux distributions

Terminology first: Linux is an operating system kernel. A Linux distribution is a kit consisting of a kernel, a whole bunch of packages -- which between them provide all the stuff you need to make a computer do useful things -- and a mechanism for installing, configuring, and running the whole bunch of packages.

It's actually surprisingly easy to produce a distribution, which is why at last count there were more than a hundred of them, tailored to fit the needs and prejudices of their creators. Only a handful have achieved commercial prominence, but they're all worth a look -- sometimes the small or weird ones can perform tasks that you just can't easily get Redhat or SuSE to do.

In this article I'll be taking a look at distributions -- not just the mainstream ones that come in a box and turn your PC into a general purpose computer running Linux, but some specialised and oddball ones. I'll also discuss tools used by distributions (for installation and package management), and how you might go about rolling your own if you're that way inclined.

Where they came from

Back in the early 1970's, if you ordered a copy of UNIX from AT&T, they'd send you either a magnetic tape or a disk pack to plonk on your minicomputer. This would contain an image of a bootable filesystem. After installing it on a fixed disk you'd boot up -- and find yourself loading a UNIX kernel and facing a filesystem full of tools and source code.

Linux was originally distributed in a similar manner -- you'd copy a couple of disk images, including a floppy disk with a kernel on it. Then you'd laboriously boot the kernel disk, stick a different root floppy in your machine, mount it, and use the tools to format a hard disk partition. You'd then copy files over by hand, create device nodes in /dev, copy the kernel over (and that was quite a trick), then patch some locations in the kernel image by hand to tell it that it was booting off a different (fixed disk) device. Then reboot and hope everything hung together.

This wasn't exactly a turnkey OS -- even by hacker standards, installing Linux was a bit of an odyssey into the unknown. So in the early 1990's a number of people started work on writing distributions. The early distributions consisted of a set of disks containing tar archives of software (destined for installation on the target hard disk), and a boot disk: the boot disk contained a kernel, a small linux root partition, and some shell scripts that would prompt the user to format a hard disk, then unpack the software archives onto it. As time went by, the early distributions acquired complex menu-driven systems to select precisely which floppies to install -- then got transferred to CDROM media, which allowed them to mushroom in size.

Two problems arrived at this point. The first was the need for a consistent filesystem. The linux kernel doesn't care where utilities and files live; it doesn't even require a UNIX-like filesystem to run -- just some simple boot loader that will copy it into memory and start executing it. But other utilities written for UNIX tend to expect to find temporary scratch space in /tmp, common commands in /bin and /usr/bin, device nodes in /dev, configuration files in /etc, and so on. Early in the game it became apparent that a standard filesystem would be necessary in order to permit interoperability: today this has led to the Filesystem Hierarchy Standard draft (FHS), which specifies how a Linux root filesystem should be laid out, and where things should go. (You can find the FHS specifications here.)

The second problem was the dependency problem. Like Windows, MacOS, and other modern operating systems, Linux uses shared libraries -- also known as DLLs (dynamically linked libraries). These are chunks of software -- not whole programs, but libraries that perform special tasks -- which can be loaded by a program that uses them only when they're needed. Because shared libraries present a standard interface to the rest of the software, they save space: a "statically linked" graphical Xclock program, with all the X11 libraries compiled and linked into it, can take over 600Kb of disk space, while a "dynamically linked" one occupies only about 20Kb, plus separate shared libraries that run to another 580Kb or so. If all the other graphical utilities on the system are dynamically linked, and use the same shared libraries, the space required shrinks rapidly. (They're also more memory efficient. Run two dynamically linked programs that use the same library, and the shared code will only be loaded into memory once. Run statically linked programs instead, and they'll have to load the same library code twice.)

Shared libraries come at a cost: if you update a shared library that's already used by fifty programs because you're compiling a new program that requires an update, you may need to re-compile the first fifty (especially if the new version library's public interface has changed). Thus, systems based on shared libraries are "brittle" -- you have to keep track of which programs depend on which libraries, and ensure that when you apply an update, all the dependent software is updated at the same time.

To deal with the dependency problem, two main systems have evolved on Linux: Red Hat's RPM (Red Hat Package Manager), and the Debian project's Package system (DPKG). Both of these systems replace the original compressed tar archives with new archive formats. The new archives contain metainformation -- data describing the files they contain, the version numbers of the package, and other packages and version numbers that the current one depends on. When you install an RPM or DPKG, the installation tool checks a database on your system (containing details of all previously installed packages). If it determines that your new upgrade will break a dependency, it warns you -- or even goes and fetches (via the net) the packages necessary to satisfy the dependency. Once no dependency conflicts will occur, the package manager lets you install the package -- then adds its details to the database in turn. Alternatively, you can tell the package manager to remove a package: it will warn you about any problems before it goes ahead and does the job.

Today, most large commercial distributions (CDROM-sized or bigger) use RPM; the technically superior Debian package system is used by Debian-derived distributions (including Corel Linux, Storm Linux, Progeny, and Libranet).

Rescue kits

bootable business card

Tom's Root Boot Kit

Odds are that if you use Linux on your desktop or laptop PC, you've settled on one of the mainstream distributions -- Red Hat, SuSE, Slackware, Debian, or (smaller players, now) Mandrake, Corel, or Caldera. These kits are designed to install on your hard disk, boot from cold, and provide a choice of server and graphical desktop environments.

However, if something goes wrong with your Red Hat system, what are you going to do?

In some cases, you can try to recover by inserting the bootable installation CD, booting, then by-passing the installation process -- the bootable CD's provide a number of virtual terminals (accessible by pressing Alt-F1 through Alt-F6, or Cntrl-Alt-F1 to F6 if you're in a graphical, X-based installer) that variously display logs of the installation process and a root shell.

But these installation CDs aren't ideal. For one thing, they provide a kit that's designed for installing a distribution, not rescueing a damaged system: some essential tools are missing. For another thing, their use as rescue kits tends to be undocumented.

A number of special distributions exist that are designed to help you repair a damaged Linux system. The rescue distros started life as single-floppy toolkits: a floppy disk is created with a boot sector, a Linux kernel, and the remainder of the disk occupied by a compressed RAMdisk image. When the computer boots, it reads the boot sector, which loads the rescue disk kernel, which uncompresses the boot image as it copies it into memory, then mounts it as the root filesystem. A number of single-floppy kits exist; more recently, some of them have overflowed the bounds of a single floppy, resulting in tools such as LinuxCare's Bootable Business Card.

Tom's Root-Boot Kit, also known as tomsrtbt, was created by Thomas Oehser for his own use; you can download it here. (It's a totally open source project; there is no commercial backing behind it.) What you get in the current kit (1.7.358) is "the most GNU/Linux on one floppy disk" -- a single floppy disk image (for a 3.5" floppy, formatted to store 1.7Mb of data) with tools for rescue, recovery, and emergencies -- whenever you can't boot off a hard drive. Along with the tomsrtbt.raw disk image, Tom provides shell scripts to rebuild the disk image from sources, copy it to a floppy, configure various settings, and so on.

Installation is dead easy: just become root, unpack the archive, cd into the archive directory, stuff a blank 3.5" floppy in your drive (which is assumed to be /dev/fd0 by the install.s script -- you can edit this if it isn't), then type: ./install.s. The tomsrtbt.raw image will be blasted onto the floppy, overwriting whatever's already there: afterwards you can boot off the floppy to verify it's working. There's also an ElTorito image kit suitable for building a bootable ISO9660 CDROM version of tomsrtbt; this is a great starting point if you want to experiment with writing your own tiny Linux distribution, or if you want to use a business-card CD writing service to make your own equivalent of the LinuxCare BBC kit.

tomsrtbt doesn't give you a graphical environment (what, you expected someone to cram a suite of tools and the X11 windowing system onto a single floppy?) but it gives you a comprehensive suite of command-line tools. The core of these is is busybox. Busybox is a Swiss Army Knife utility -- a single program that provides minimalist replacements for most of the core POSIX command-line utilities. For example, if you install busybox and create a link to it called "ls", when you execute it via the ls link it behaves like ls. In addition to busybox, tomsrtbt installs tiny vi and emacs editors, tools for getting files out of RPM and Debian packages, wget for fetching files over the net via ftp and http protocols, a hex editor, the pax archiver (which understands tar and cpio formats), and the ash shell (which is significantly smaller than bash). There are, of course, tools for checking the partition table, scanning filesystems, and lilo (for resurrecting a dodgy boot sector.) It even finds room for the core man pages, and for extras such as a DHCP client and PCMCIA drivers in case you need to rescue a problematic laptop.

If you run a desktop linux system, there's no excuse for not having tomsrtbt or something similar to hand. It gives you all the basics you need to get an ailing system up again -- or at least to connect to internet resources from which you can pull in additional tools. The only shortcoming I can see with it is that it is constrained by how much you can cram into a floppy disk: in this day and age of multi-CD operating systems, something that fits on a single floppy is a bit of a miracle. As it is, you may want to throw away some bits (if, for example, you use one of emacs or vi, why keep both hanging around?), add others such as GNU parted, or the Linux Disk Editor -- if you can find room -- and build your own tailored version of tomsrtbt.

The swank alternative to tomsrtbt is the Linuxcare Bootable Business Card. You can't buy them; Linuxcare is a California-based corporation that provides enterprise support for Linux. However, if you are a member of a local Linux User Group, you can ask Linuxcare to send you a bunch for promotional purposes. A BBC is one of those business-card sized mini-CDROMs, designed to be handed out as promotional items; only instead of a boring powerpoint presentation or other corporate nonsense, this one contains something useful -- 45Mb of Linux filesystem and tools.

The Linuxcare BBC is actually a minimal command-line Debian distribution, based on the Slink release with a 2.2.14 kernel. As its name implies, the BBC is a bootable CD; you stick it in a CDROM drive and boot (or use the lnx.img file in its root as a boot floppy). The BBC contains a fairly small filesystem, most of which is taken up by a 26Mb file named "singularity" -- this is a compressed Linux filesystem image which can be mounted using the cloop (compressed loopback filesystem) driver. When you execute "singularity" (which happens at start-up if you boot off the BBC) it basically gives you a compressed, read-only filesystem containing about 66Mb of stuff. "Stuff" includes a full core Debian kit, vim and emacs (rather than the cut-down versions in tomsrtbt), tools like the lde disk editor, and a whole mass of network sniffers, diagnostic tools for ethernet cards, hard disk analysis kits, and even a rudimentary SVGA-compatible X11 installation, with the fvwm2 window manager and a basic suite of tools hanging off it. (Note that this may not work with bleeding-edge graphics cards in any resolution worth having -- it's much easier to use the BBC command-line tools on the console, than having an X11 desktop that thinks your screen resolution is 320x200 pixels!) Gcc is provided, so if you really need to you can compile additional software: BBC is a minimal core of a complete Debian system, not a special-purpose rescue disk that has to be built using external tools.

If you can get a machine to boot at all, the BBC will let you crowbar your way into a hard disk with a damaged Linux filesystem; once there you can either use LDE, fsck and the other tools to fix it, or you can reformat and begin the process of installing Debian linux over a network connection.

If the BBC has a single shortcoming, it's the lack of documentation. This is noted as a weakness in the README, so presumably Linuxcare are working on it. This isn't strictly Linuxcare's fault: some of the tools that they've bundled with this disk are indispensible, but they're written by experts to be used by experts who can presumably figure out how they work by reading the source code. BBC's user interface isn't currently as friendly as a DOS-based release of Norton Utilities from the late 1980's -- there's a way to go before it's usable by non-experts.

Last minute update: the day before this feature was finished, Seth David Schoen of Linuxcare publicly announced a new mini-distribution based on the Linuxcare Bootable Business Card. This distro is called LNX-BBC; its home page is www.lnx-bbc.org, and it's being actively developed independently from Linuxcare. The goals of the LNX-BBC project are about the same as they were under Linuxcare's auspices; the fork was required because several developers left Linuxcare, and in order to keep the project going it was necessary to make it independent of the company. Future planned upgrades include easier package maintenance, kernel 2.4 and XFree86 4.0, a switch to running from a ramfs rather than ramdisk, and possibly installation tools to allows a BBC to be used to start a network install of other major distributions onto a target machine's hard disk.

Firewalls, Routers, Servers

The Linux kernel incorporates a highly configurable firewalling toolkit. It can talk to multiple network cards at the same time, send and receive various protocols via the same NIC at the same time, and filter packets; in fact, Linux kernels are at the heart of some embedded network routers and firewall products. However, knowing that all this is possible and actually doing it are two different matters. While some larger distributions (such as Red Hat) come with simple firewall configuration tools (suitable for home use), taking a Linux kit and configuring it as a general purpose firewall suitable for connecting a large network to the internet is rather difficult.

A core group -- the Linux Router Project -- are working to put together a small distribution which, with a Linux kernel, provides a shrink-wrapped router/firewall; see www.linuxrouter.org for details. The core is a small floppy-based distribution similar to tomsrtbt, but with additional tools that allow you to customise it; the idea is that you build your own single-floppy kit, write configuration details to it, and then put it in the drive of the PC (with twin ethernet cards) that sits between your internal LAN and the outside world. When you boot it, the PC will act as a router and firewall.

In practice, configuring the raw LRP system isn't very simple. However, a couple of small distros based on LRP exist to make the job easier. One of these, by way of example, is Coyote Linux -- see www.coyotelinux.com. Coyote Linux is a single floppy distribution derived from the LRP: the main difference between LRP and Coyote is the way that it is configured and maintained.

Coyote Linux is designed to connect a small LAN to a broadband internet connection that is provided via ethernet using DHCP or PPPoE (such as a cable modem), or a PPP dial-up connection. "The primary focus of the Coyote design is to make it as easy as possible to configure and use," says the documentation. To install Coyote, you download a tar archive containing a version (such as 1.29, the most recent, based on kernel 2.2.19), and unpack it. In the directory, there's a script called makefloppy.sh; this prompts you to insert a floppy disk in your workstation's drive, asks you some questions about the configuration of your proposed firewall's connection (ethernet, PPP over ethernet, or PPP over dialup), then builds a bootable filesystem on the floppy disk. Alternatively you can edit the coyote.conf file yourself before running the script, if you're confident you understand the networking options.

You then need a firewall PC. This can be anything from a 16MHz 386SX upwards -- an ancient 486 with 8Mb of RAM will do fine -- all it needs is a floppy disk drive, two ethernet cards, and a display and keyboard. Boot off the floppy and Coyote Linux presents you with a menu that allows you to specify which ethernet card is connected to your internal network (assigned by default to the class C unallocated network 192.168.0) and which is connected to your external line. Tell the other machines on your network to use the Coyote system as a gateway, and everything should work -- you'll have masquerading (aka connection sharing), so the internal machines can see the outside world. Hosts outside your Coyote system won't be able to see inside your firewall unless you let them; you can forward incoming connections to a designated internal server using autoforwarding.

This won't give you all the flexibility of a high-end router purchased from, say, Cisco -- but it's a very good way to secure your household LAN if you have a cable modem. The main obstacle is your understanding of TCP/IP networking; what you don't know can bite you, and while systems like Coyote Linux go a long way towards making the construction of a router using Linux an automatic process, there's still a bit of work involved.

Back in the real world

These special-purpose distributions are great if you want to resurrect a dead system, build a firewall, or carry around a couple of floppies with text-mode email client and web browser on them to run on any networked machine that comes to hand -- but they're not what most of us are using.

A number of heavyweight distributions -- mostly built by companies -- have formed a market. While these collections of programs are generally available for free download, most users prefer to buy a box containing CDROMs (seven, with a single-disk DVD copy, for SuSE 7.1; ten, for Red Hat 7.1 deluxe, seven for Mandrake 8.0). There isn't a whole lot to choose between any of these three distributions. All of them have a graphical installation manager that attempts to automate the process of installing on new hardware (and succeeds, to some extent). All of them use the RPM package manager to configure the components they install. All of them provide a 2.4-series Linux kernel and a huge slew of tools, including both the KDE 2.1 and GNOME 1.4 desktop environments. All of them include system administration tools (accessed via a graphical front-end) and some degree of support for simple firewalling, configuration of internet connectivity over a broadband (cable modem/ADSL/ leased line) connection or modem, and so on. All of them provide sound support, and all of them include KOffice, the GNOME office tools, and Star Office -- along with the Netscape and Mozilla web browsers.

To make matters even harder, each of these distributions comes with two manuals in the box, and each of them allows you to pull in security updates over the internet using a graphical administration tool. With such a wide range of similarities, how can you choose between them?

The answer is: you don't. In general, the mainstream RPM-based Linux distributions are so similar that once you learn the ins and outs of one, there isn't much point in switching to another. There are subtle differences, of course. For example, SuSE is now closely tracking the FHS specification; Red Hat, in contrast, has moved a bunch of files away from the FHS layout, so that (for example) in Red Hat 7.x, the web server's HTML files live under /var/www/html -- while in previous releases they were stored under /home/httpd/html. But aside from such minor irritations, the differences are small compared to the differences between such a distribution and, say, Sun's Solaris 8. (In fact, the differences between Mandrake Linux ProSuite 8.0, SuSE 7.1 Professional, and Red Hat 7.1 deluxe edition are smaller than the differences between Windows 98 and Windows 98 Second Edition.)

Because these distributions are kitchen-sink kits (each containing more than 3.5 Gb of compressed files in over a thousand packages), the real trick is to decide what sub-set of these packages you want. For example, a minimal SuSE installation takes about 120Mb of disk space -- but a full install occupies nearly 5Gb. The distributions come with default system profiles that you can pick; for example, Red Hat lets you choose a standard desktop workstation (without all the server software), a server configuration (without X11 and a graphical interface), a laptop configuration (like the desktop, but with APM and PCMCIA support, and minus some other bits), or a customize-it-yourself option. These are useful starting points, but not entirely adequate (for example, the laptop installation misses out some items I consider essential on a laptop -- such as a heavyweight database engine). Ultimately, you will probably need to follow your nose and see where it leads you -- install a basic kit then spend some time adding packages as you find you need them, until you have some idea what you need.


[ Site Index] [ Linux Index] [ Feedback ]