April 2001 Column


[ Site Index] [ Linux Index] [ Feedback ]


A brief history of X

The thing about Linux that most puzzles new users can be summed up in a letter: X. X -- or rather, the X11 Windowing System, Release 6.4 -- is often glibly described as the UNIX graphical user interface. While this statement is not factually incorrect, it is at best misleading. What is X, how did it evolve, and what can you do with it?

Back in the 1960's, the first computers to have graphics systems didn't have an actual GUI in the modern sense; they could display images on a special terminal, or print them using a plotter, but the idea of using images to interact with computers using a desktop metaphor lay in the future. Interactive graphical computing was first demonstrated by Douglas Engelbart around 1968, using the first home-made mouse, and the idea of icons and pointers was worked out during the 1970's, but it wasn't until the early 1980's that computers powerful enough to run a GUI (and cheap enough to be available to non-specialists who needed one) became available.

In the early days of GUIs, there were two models for doing things: the Mac way and the Microsoft way. Apple's Lisa relied on an operating system with a graphical user interface built in at a very low level, and so did most subsequent systems -- the Macintosh (derived from Lisa), Atari's ST, AmigaOS, and so on. Windows NT follows this paradigm. The job of the graphical user interface isn't simply to move pixels around a screen; it's to provide high-level widgets (like buttons and dialog boxes and scroll bars), and policies for managing window placement (how and where new windows open, what happens when you click outside a window, and so on). And the GUI is built into these operating systems at a low level -- it isn't a separate program.

MSDOS was born without a GUI, and so was UNIX. Both were command-line systems, although MSDOS usually ran on computers that had a graphics-capable display. When Digital Research (with GEM) and later Microsoft began looking at a GUI for PC's, their idea to was to build a GUI that ran as an application on top of MSDOS. This was the way Windows operated, prior to Windows 95, and the way GEM ran. (Arguably, descendants of Windows 95 still run on top of MSDOS -- it's just that Microsoft has made it harder and harder to get at the underlying DOS.)

Around the time that GEM and Windows were duking it out on the PC desktop, scientists and academics were getting their hands on the first real supercomputers -- big vector processor systems from Amdahl and Cray. These machines were used for cryptanalysis by the NSA and GCHQ, but found a ready market in meteorology, astrophysics, and other scientific fields that needed to do heavy-duty finite element analysis and visualisation of 3D data. With the supercomputers, the scientists discovered a new headache: how do you adequately display the output from a Cray-1, a machine that is basically a large circular sofa stuffed with electronics and cooling pipes and expects you to use a mainframe as a front-end processor to pipe data in and out of it?

The computer science department at MIT came up with a typically ambitious idea: to write a graphics library -- but one that could display the output from a program running on another machine, using the new internet technologies. A relatively cheap workstation could provide a graphical view onto a process running on a supercomputer attached to the academic/scientific network, even if it was sited on another continent.

X10, and the subsequent X11 release of the X graphics system, was not a GUI; it was a client/server graphics library. An X server runs on a computer with a display and a mouse and keyboard (and maybe other input devices); programs called clients connect to the server via TCP/IP, receive events from the server (such as mouse clicks and keypresses), process them, and send a stream of low-level graphics commands back to the X server, which displays them. In the simplest system, the client runs on the same computer as the server -- but the design goal was to allow cheap servers to provide scientists with a view of programs running anywhere on a network which might include file servers, supercomputers, and just about anything else.

The client/server separation is important -- other GUI systems don't make this distinction. While the graphical side of the operating system may be made available via a set of toolbox APIs, it's generally assumed that they are at least running on the same machine as the application that is driving them.

A second important difference between X and any other GUI you've used is that X11 doesn't make any policy decisions about how a program should interact with the user. It was assumed in the early days that different workstations might have different GUIs, so an X application might have to present itself in different ways to users on different machines. X doesn't provide buttons, scroll bars, or menus; all it provides are windows (bitmaps on screen) and ways of moving them around or filling them in with pixels. To get a user interface, you need more than just X11; you need a program called a Window Manager, that provides decorations for windows that let you grab hold of them and send signals to resize or move them, and a whole host of other things besides. It's the job of the Window Manager to provide a policy about how windows are placed on the screen, how you move them around, how cut and paste between windows is implemented, and so on. In fact, the Window Manager provides almost all the elements of a standard GUI -- it just happens to use the X11 graphics system as a back end.

The widgets, incidentally, don't come from nowhere; there are a whole bunch of libraries of tools (from scroll bars and mouse pointers to dialog boxes and multi-doc windows) available to the X11 programmer. They range from the minimal core Xt toolkit (which came out of MIT along with X11) to OSF/Motif, a large commercial toolbox used for workstation development (and now open- sourced in its dotage: see www.openmotif.org). Newer contenders include Troll Tech's Qt widget set and the open source GTk (GIMP Toolkit) backed by the Free Software Foundation.

There are lots and lots of different window managers for X11. Twm, the tab window manager, is about the most primitive still seen; it's useful on systems that don't have much memory or screen real estate. It has largely been superseded by more modern window managers that incorporate a virtual desktop (a miniature window showing a view of a virtual workspace divided into "panes"; click on a pane and the screen shows you the windows that are open in that workspace), provide easily modified menus of commands that pop up when you click on the background (such as fvwm), add a more complex look and feel (from fvwm2 to the horrendously over-the-top Enlightenment window manager), and so on.

Of course, a window manager doesn't have anything to do with a filesystem: how could it? All it knows about is the X11 graphics system, and any client programs that are hooked into the X11 server it's controlling. As X11 is a networked graphics system, it's not immediately obvious how it could interact with a filesystem. If you want to work on files you need to either open a window containing a terminal emulator and type commands on a virtual console (using Xterm, or a similar tool -- like a DOS window under Windows), or use a file manager. The older file managers have largely been superseded these days by full-blown desktop environments. The major commercial desktop, CDE (the Common Desktop Environment), was based on the Motif widget set and is now obsolescent; the GNOME desktop based on GTk has spread to Sun's Solaris OS, while KDE (based on Qt) is everywhere else. GNOME and KDE both include a set of user interface policies, file managers, text editors, mail tools, and all the other gubbins you need to work with a computer without being aware of the icky quicksand waiting below the slick surface to suck you down.

Anyway, next time you sit down in front of the graphical login on your Redhat or SuSE system, what you're actually facing is a tottering stack of software. At the bottom of the heap is a command-line Linux system. This is in turn running the X11 server (from the XFree86 project) which puts a bitmap on your screen and understands your mouse and keyboard. A program called xdm (or Kdm or Gdm) is waiting to log you in; when you enter your password correctly it will run a shell script that spawns a window manager. The window manager will take over the screen and will run a bunch of graphical programs that show you pretty icons and windows, while it passes your mouse movements and keystrokes through to them. The WM, and all the other applications, rely on a whole shedload of shared software libraries to provide their consistent appearance and it is these libraries that talk (via a socket interface) to the display server.

It's a miracle the whole thing hangs together, isn't it?

Of course, this arrangement has drawbacks and benefits. For example, all X11 based systems come with the Windows world's equivalent of Winframe built in; you can log into a remote UNIX or Linux machine via telnet or ssh on a command line, set the mystical environment variable DISPLAY (to yourhostname:0.0, for example, where yourhostname is your computer's internet address, and 0.0 is the X Server number and X session number you're running -- almost always literally 0.0), then type "Netscape" and watch it crash. And crash it will, at first, because the network-aware GUI has a security policy to present random intruders talking to your X11 server and, for example, spamming you with adverts or trapping your keystrokes when you log into another machine.

Clue: if you're on host1 and want to run netscape on host2, with it displaying everything on host1's monitor, you must first tell the X11 server on host1 that it's to permit connections from host2. You can do this using the xhost command -- "xhost +host2" tells the local X11 server to add host2 to its permitted contact list. Or you can use the more sophisticated xauth system to keep track of access permissions for a bunch of computers, or maybe set up a Kerberos network so that each computer knows who has permission to do what. But the general effect is the same: if you're sitting in front of host 1 and you tell host1 to let host2 run X11 programs that display on host1, you can run Netscape on host2 and it will show up on your screen -- even if host2 is in another building.

There are other problems with X11, of course. First and most perplexing among these is the problem of font handling. X11 was developed in the 1980's, back in the early days of GUIs. A font, as far as X11 is concerned, is simply a bunch of bitmaps that can be displayed on screen. When an X application requests a font to render some text with, X11 searches through a set of directories (defined, if you use XFree86, in the file /etc/X11/XF86Config under FontPath) which contain font files -- these directories contain a fonts.dir and fonts.alias file, which relate the X11 font names to the files containing the fonts.

Because keeping fonts on a machine with each X server led to problems (excessive amounts of storage space being used, and X applications trying to allocate fonts available on the machine they were running on but unavailable on the X server's system), a new mechanism -- the font server -- was added. You'll find a font server in Redhat and SuSE distributions. A font server is a network daemon that takes a request for a font (in the format used by X11 to name and scale fonts) and returns a bitmap. Font servers get around the problem of applications requesting fonts that aren't available on the X server; they also allow X11 to support scalable vector fonts, such as TrueType and Adobe Type 1 fonts.

However, X11's built-in font handling (as a raw bitmap) is rather crude -- it can't deal with transparency, anti-aliasing, or weird rotations and distortions. Moreover, font handling under X has nothing to do with printing! X11 applications typically print by generating a postscript file and sending it to a postscript printer, or to Ghostscript (which takes a set of postscript commands and renders them into a bitmap which can be downloaded into some other type of printer, such as an HP Laserjet). However, Ghostscript doesn't use the same types of fonts as X11, nor are the fonts installed and located in the same places.

There are signs that this situation is improving. The traditional UNIX print support system (based on lpd, the line printer daemon) focussed on delivering preformatted files to printers -- it worked, but it didn't have any say in policy (such as telling applications how to generate a printable file, or what paper sizes were available). But some of the widget libraries now used for supporting GUI environments have quite a lot to say on the matter -- GTk and Qt both provide ways of generating postscript that can be fed to an lpd system with MagicFilter or CUPS (both of which provide for postscript files to be formatted via Ghostscript before being delivered to the destination printer). Work is in progress on extending X11R6.4's font model to include specifications for transparency, colour, and various other rendering options. If Ghostscript and X font servers can be made to work together from a common set of fonts, there's some hope that we'll see the beginnings of the sort of no-brainer WYSIWYG support that is common on other operating systems.

Another headache with X11 is sound support. Bluntly, X11 didn't have it -- it was never designed as a desktop graphics system, and nobody saw the need to have sound capabilities when remotely logging into supercomputers. Sound capabilities have been added to the KDE and GNOME desktops by the addition of "sound servers" which, by analogy with a font server, direct sound streams to the machine where the X11 server is running and ensure that they get mixed together and played appropriately. But sound support is actually external to X11.

Which leads back to the problems may beginners have with a Linux desktop system. Their problem is that what looks like an integrated, graphical environment isn't -- it's a bunch of subsystems flying together in loose formation, any one of which can peel off and crash without harming the others (in theory!).

If you're planning on using X11 a lot, it really helps to be familiar with the underlying guts -- not at the level of being able to write an X11 application, but so that you know what configuration files are used, how to edit them by hand, and so on. If you're going to use KDE or GNOME exclusively you probably don't need this, because those desktops come with their own control centres; but these only really let you mess around with the features that the GNOME and KDE team feel it's appropriate for users to mess with. X11 is supremely configurable, indeed far more configurable than any other graphics system -- as long as you know what you're doing. Most X11 applications have resources (things like colour settings for icons) which are configured from various text files; you can edit these by hand if you want Netscape, for example, not to show you HTML BLINK tags, or those annoying "What's Hot" buttons.

A really useful source of further information is "The Joy of X", by Nial Mansfield (pub. Addison-Wesley; ISBN 0-201-56512-9), subtitled (a bit less dubiously) "An Overview of the X Window System". This book explains in a clear and concise way just how X works as a client/server graphics system, and more importantly how you administer X, add fonts, make your X server run faster, customize applications, and generally troubleshoot it. The book is pitched at a computer-literate, but not highly technical, user: if you use X11 at all, reading the first few chapters will make things a lot clearer and in particular equip you with enough insight into the way X11 works to understand what's going on around you.

Of course, a book like "The Joy of X" is generic to X11 on all platforms; for Linux in particular, you're almost certainly going to want to know a bit more about XFree86, the open source X11 server that is found on all distributions. The online documentation at XFree86.org is comprehensive and vast, if not particularly accessible -- you'll do best to start with "The Joy of X" then get technical on your particular version of XFree86 (there are a variety of releases out there, and different compiled servers with drivers for different video cards). While utilities such as XF86Setup and Xsuse will correctly set up your video card and monitor to run X11 in about 95% of cases, the remaining 5% take a lot of sweat and brainwork to get right -- and you'll still need to dink with an XF86Config file if you want to add a secondary monitor, or increase your video RAM, or something equally exotic.

In particular, if you're planning to buy a new machine, check out XFree86's web site first. The project carries a comprehensive list of those video cards that they support -- make sure your new machine's video system is on it, or you may be disappointed! While a cheap PCI video card only costs twenty or thirty pounds these days, you'll find your first Linux experiences go a lot more smoothly if you have a supported video card right at the outset.


[ Site Index] [ Linux Index] [ Feedback ]