Mail transport tools


[ Site Index] [ Linux Index] [ Feedback ]


These days it seems like the world runs on email. This article you're reading -- and most other articles in Shopper -- are filed via email. We organise everything from business plans to pub crawls through this ubiquitous new medium. It's hard to remember that email between computers, once described as the killer application of networking, is only thirty years old.

Linux, as a UNIX-descended operating system, is particularly good at transporting and delivering email. In this article we're going to dig deep into the world of email on UNIX -- how it's transported between machines, how it's delivered on a server, and how users can collect, read and send it.

How we got here

At its simplest, email is the process of sending files from one user's account on a machine to another user, possibly on the same (or a different) system. Each user account has a mailbox -- a collection of mail messages which can be viewed, sorted, or added to. A user can create a new piece of mail and send it to one or more recipients -- where the mail delivery software saves it in the mailbox.

The first email systems linked users on a single computer. When you sent a mail message, it would be dumped into a spool directory; periodically, a mail daemon would run and for each spooled message, identify the recipient, lock their mailbox (to stop other processes writing to it), and append the new message to it. The first e-mail delivery to occur between two machines took place in 1972, when Ray Tomlinson, an engineer at BBN, delivered a mail message across a network link between two DEC PDP-10's. (If you want to know who to blame for the '@' symbol being used to separate the user and host parts of an e-mail address, you need look no further!) Later, computer to computer delivery within UUNet was implemented using UUCP, the Unix-to-Unix Copy system; ARPANet (the precedessor to today's internet) relied on FTP over NCP (the TCP/IP networking protocols not yet having been developed). On uunet, messages destined for a user on another machine would be copied to the UUCP spool directory, and whenever the UUCP daemon ran, it would dial up the remote host using a modem and send the files in its queue. Incoming files from UUCP could then be delivered to local users.

In those days, an email address was written as a bang-path: the series of machines to forward the message through to reach the recipient. For example, if I wanted to send mail to a user called fred, on a machine called "fredbox", I might very well end up addressing mail to "fred@mailhost!switchbox!fredbox", where mailhost and switchbox were both intermediate computers. My own system would send the message to "mailhost", which would strip off its own name and forward the mail to "switchbox", thence to "fredbox" for delivery.

By the early 1980's it was apparent that this form of addressing had major problems. Firstly, many more computers were coming on the network: knowing the routes between any two of them was difficult, because if the administrator of an intermediate machine like "mailhost" decided to start talking direct to "fredbox" or stopped talking to "switchbox" mail might bounce. Secondly, if a host was offline for some reason, large amounts of mail would start piling up on those systems that were queuing deliveries for to it.

The solution, hammered out by the Internet Engineering Task Force during the early eighties, was the backbone of our current system -- and that's what we're going to look at next.

How mail transport works today

Behind the scenes of any email system is a piece of software called an MTA, or Mail Transport Agent. An MTA doesn't deliver mail to your mailbox; all it does is transfer mail between queues. (A queue is essentially a holding cache into which your own mail tool, or Mail User Agent, drops outgoing messages, or which a local daemon checks for local mail to append to a mailbox.) MTA's read the 'envelope' on a piece of email (usually a separate file indicating where the mail is going) and forward it accordingly -- either to another local queue for delivery, or to an MTA running on a different machine.

Most MTAs talk to each other using a protocol called SMTP, Simple Mail Transport Protocol. (A variant, ESMTP, provides extra facilities.) SMTP is set out in a series of public standards documents or RFC's (Requests For Compliance), starting with RFC822. An MTA trying to deliver a message to a remote system will open a socket connection over the internet, asking for a connection at the other end to the recieving host's SMTP port; it can then introduce itself and transmit one or more messages for the receiving MTA to spool and deliver. SMTP is so central to this system, and so standard, that most mail transport agents are described as SMTP servers -- they're programs that talk SMTP (both to your email client and to other SMTP servers) and that route mail to the appropriate destination.

The eight hundred pound gorilla of the email world is sendmail. Written by Eric Allman while a graduate student at the University of California at Berkeley, its ancestor delivermail dealt with ARPANET mail sent via FTP/NCP. With the spread of TCP/IP and the specification of SMTP, and the spread of DNS (the Domain Name System) from 1986 onwards, sendmail became the standard UNIX mail transport system.

DNS, the domain name system, maps hostnames and domains to TCP/IP machine addresses. Each domain can have a number of DNS records associated with it; one type in particular, the MX record, is of interest. The MX record for a domain points to the SMTP server for that domain; so if you want to send mail to charlie@antipope.org, your mail server need only ask for the MX record for antipope.org to discover which machine to send mail to. What's more, then can be more than one MX record for a domain -- if the first machine fails to reply, the server can try the next, until it contacts a machine that will accept the mail.

Sendmail

Sendmail is a mail transport agent: it runs as a daemon and you don't normally have anything to do with it (as a user). If you're running a mainstream Linux distribution you are almost certainly running sendmail. So, for that matter, is your ISP -- sendmail is capable of handling huge workloads (running into the hundreds of millions of messages per day on a large enterprise server). Sendmail is open source, but commercial support (and enhancements) are available from Sendmail Inc. There's also a Nutshell guide to Sendmail ("Sendmail, 2nd edition", Bryan Costales and Eric Allman, pub. O'Reilly & Associates, Inc., ISBN: 1-56592-222-0). (Note that at 1020 pages this is not for the faint-hearted.)

Sendmail is rule-driven. Although it's most commonly used as an SMTP server, it isn't tied to SMTP; it uses the concept of a mail delivery agent, which is a tool (internal or external) for delivering mail. A local delivery agent like procmail (see below) can be used to drop mail into a local mailbox; or a remote delivery agent such as UUCP or the built-in SMTP delivery agent may be invoked to send mail to a different host. Sendmail's job is to act as a router for mail -- reading its header and working out where each piece is meant to go in accordance with a set of rules compiled into the /etc/sendmail.cf configuration file.

Riles are patterns: each recipient address is matched against a rule, and if it matches the target pattern the address is re-written. (This isn't the address inside the email message that you receive: it's a copy stored in a separate "envelope" for sendmail to mangle). By repeatedly matching address lines against the ruleset until a terminal rule is reached, sendmail works out where to direct the message. (A terminal rule is one that either concludes that the mail is undeliverable -- there isn't enough information to work out what to do next -- or one that triggers a mail delivery agent to take some action, such as dropping the message into a local mailbox.)

If you examine your sendmail.cf file you will see something that looks like random noise, interspersed with comments. This is no accident: sendmail rules are among the most obscure mini-languages you will ever encounter on a UNIX system. However, in general your sendmail.cf will contain at least four different collections of line noise: options that set up sendmail's behaviour (such as delivery modes, where queue directories are located, timeout periods, and so on), rewriting rules (as described above), definitions of mail delivery agents (such as where to find procmail or a local delivery agent), and macros and class definitions (to make the whole thing more readable).

No sane sendmail guru writes their sendmail.cf file by hand; instead, they use the M4 macro processor. A large collection of macro configuration files come with sendmail: these provide a library of possible sendmail rules. When rolling a custom sendmail.cf file from scratch, you usually write a text file that specifies which of the feature-specific macro files to include; you then run m4 over the text file and it spits out a properly- prepared sendmail.cf file. On an FHS-compliant linux distribution, you can find the sendmail support files in /usr/share/sendmail; these directories include macro files for features like enabling RBL (Realtime Blackhole List) support (a system for blocking spam sites), providing a gateway to FIDONet or X.500 mailers, enabling masquerading (so that the internal hostname from which mail is sent on your local network is replaced by a public domain name), and so on.

If you're running a home or small office site, you will almost certainly not need to mess around with sendmail macros. The one exception is that you might want to manually set a couple of the variables in the default /etc/sendmail.cf file. For example, my SuSE 7.0 system's /etc/sendmail.cf contains the following:

-- CUT HERE --

################## # local info # ################## Cwlocalhost # file containing names of hosts for which we receive email Fw-o /etc/mail/sendmail.cw %[^\#] # my official domain name # ... define this only if sendmail cannot automatically determine your domain #Dj$w.Foo.COM CP. # "Smart" relay host (may be null) DS

-- CUT HERE --

If I was using sendmail locally to forward all mail from my system to an ISP, I might replace the DS macro with something like:

DSmail.demon.co.uk

(Forcing sendmail to send all non-local mail out to the smarthost at my ISP, who would then figure out where to send it on).

And again, if I was working behind a firewall configured in such a way that my machine couldn't work out what its domain name is, I'd have to uncomment and modify the Dj macro (which defines the official hostname).

In general, there is one vitally important thing you need to understand if you are going to use sendmail: and that is the significance of spammers.

Spamming, in the context of email, is the practice of sending huge numbers of commercial emails to people who don't want them. From your point of view, as a home or small business user with a modem or ADSL line, it can be poison -- far worse than a simple matter of receiving some annoying junk mail. Spammers seek out email servers because unfortunately they offer a way of making the spammer's own time go further. If your sendmail system is configured to "relay" mail (that is, to accept mail from host A to deliver to host B, where neither A nor B are in your local domain), a spammer can send a single copy of their ad to your server, along with an envelope that tells your server to send it to several thousand recipients. Your system will then spend the next several hours slowed to a crawl as it spews rubbish at people who will hate you for it -- you'll receive hundreds of bounced messages, large numbers of complaints, your network connection will slow to a crawl, your machine will be overloaded, and some of the recipients, mistaking you for the spammer, will complain to your ISP until they yank your account.

You do not want to run an SMTP server that is willing to act as a relay: there's no reason for anyone to do so these days, as the feature was generally used for providing bridges between networks with incompatible mail transport systems and today it merely makes you a sitting duck for thieving spammers. Luckily versions of sendmail from 8.9 onwards do not act as a relay unless you specifically switch this feature on (and SuSE 7.0 and Redhat 7 use sendmail 8.10 and 8.11 respectively). If you are using a sendmail version prior to 8.9, you should either upgrade immediately, or look into blocking relaying!

Other mail servers

Sendmail isn't the only mail server for UNIX and Linux. A number of others exist, although some (such as MMDF) are effectively no longer supported. If you're wanting to provide mail services for a network of Windows clients running Outlook, you may want to look into HP OpenMail. OpenMail isn't open source -- but is notable for supporting MAPI (Microsoft's mail API) fully, along with Lotus cc:Mail, POP3, IMAP4, and X.500; in fact, it's intended to be used as a drop-in replacement for Microsoft Exchange. (There is some question over its long-term future, but HP are committed to producing and supporting another commercial release, and a possible long-term open sourcing of the codebase has been discussed.)

A couple of heavyweight mail servers occupy the same niche as sendmail (that is: open source enterprise-grade servers that ISPs can rely on). For example, Qmail, written by Dan Bernstein, was designed specifically to replace sendmail. It has a lot fewer discovered security holes (such as buffer overruns), and is used on some extremely large sites -- such as handling Hotmail's outgoing mail, PayPal, Yahoo! mail, and a host of other well-known large email users. Qmail is particularly good at handling virtual domain administration, and maintaining user-controlled mailing lists via the ezmlm system. If you're interested in Qmail, you may want to read the HOWTO.

There's a host of smaller or more obscure open source mail servers out there. For example, take Courier from Double Precision, Inc. . Courier is an integrated mail server that provides ESMTP, IMAP, and POP3 services (ESMTP to talk to other mail servers and receive outgoing mail, IMAP and POP3 to deliver mail to clients -- more on this later). It also provides a webmail interface and manages mailing lists for you. It can be used as a drop-in replacement for most simple qmail or sendmail installations. It also incorporates a fairly powerful mail filtering engine designed to help you reject spam rather than accepting it and filtering it once it's been delivered.

Local delivery and mailbox formats

If you're using a Linux box as your workstation, and running sendmail (or another MTA) locally, chances are that you're using a mail tool (such as Pine, Mutt, Balsa or KMail) that can be configured to read a local mailbox.

The core requirement for a mailbox is that it needs to be able to store multiple messages, and it must be possible for a process (such as a mail delivery agent, like procmail) to lock the file -- otherwise if two messages come in very close together, two different delivery agents might try to write to the file at once, and one of them will overwrite the other copy (leading to lost messages).

That last clause is important. NFS, the network filesystem, is particularly bad at file locking operations -- the problem is that if a filesystem is exported via NFS to two or more client systems, processes running on each machine can try and obtain an exclusive lock on a file (meaning, nobody else is allowed to write to it) and because of cacheing of NFS filehandles, both processes can succeed. This means that in general it is very hazardous to export mailbox files via NFS. If you've got a single mail server and several users with clients they read mail on, don't be tempted to export /var/spool/mail! Instead, you'll need to install a POP or IMAP server (see below).

Sendmail's default delivery utility, and most other mail servers, use a standard mailbox format called mbox. This is a single file with each message stored as text in it: messages are delimited by a line starting with "From " (then address and timestamp information) to indicate their start. Most mail clients can read mbox format mail folders happily.

The mostly-obsolete MMDF mail server uses a variant on the mbox format; mail messages are surrounded (top and bottom) with lines containing four control-A characters.

There's another common mailbox format out there: MH. The MH mail tools are extremely interesting because their approach to email is radically different from the monolithic mailbox approach adopted by tools like Pine or Mutt (that is: you have a mailbox file, and need a special mail tool to read it). MH mailboxes are a directory. Each message is stored in a separate file, the name of which is the message number. Instead of a monolithic mail reader, MH consists of a collection of small shell commands to do things like list the subjects of mail messages, reply to a message, or send a new message. (However, many MH users these days use Exmh -- a graphical, X-based mail tool that acts as a front-end to these commands and strongly resembles any other mail reader).

The newest mailbox format (pioneered by Qmail) is Maildir. This is similar to the MH system, but each mailbox directory contains three subdirectories -- cur, new, and tmp. Filenames are generated in a manner designed to ensure that they remain unique at all times -- this allows Maildir mailboxes to be exported over NFS or other systems that don't permit file locking.

In general, you won't need to worry about the mailbox format unless you want to NFS-export your mail spool directories or run a POP3 server: most modern mail tools (in particular, Mutt) can cope with any of these formats. It is possible to convert MH mailboxes to MMDF or mbox format without too much difficulty. However, it's worth remembering that the mailbox format depends on the tool used for local delivery. If you're using a stock Linux distribution, the commonest such tool is procmail, and it writes to mbox files (or maildir and MH mailspools, in recent versions).

Procmail is an automatic mail processor, invoked by sendmail. Its job is to write a mail message into a mailbox, using file locking. In addition, procmail can interpret an (obscure) macro language to scan messages for patterns and save them to different locations. In general, if you have a file called .procmailrc in your home directory (and procmail is being used to deliver mail), procmail will read it and scan each piece of mail for matching patterns. For example, my .procmailrc file contains the following:

-- CUT HERE -- :0: * ^TObugtraq mail/bugtraq

:0: * ^TOwibble@yahoogroups.com |/usr/local/bin/egroups.pl /var/spool/mail/charlie

-- CUT HERE --

The line beginning :0 tells procmail to apply file locking using the default lockfile name (i.e. nothing specified after the second colon); it's to match all messages for the macro ^TO (meaning: to: cc: and similar lines) and the string "bugtraq" (the Bugtraq mailing list), and deliver these messages into the mailbox mail/bugtraq.

The second directive tells procmail to look for messages sent to wibble@yahoogroups.com, and pipe them through a script called /usr/local/bin/egroups.pl (with the parameter /var/spool/mail/charlie -- this script strips out the annoying adverts that eGroups/YahooGroups appends to their messages).

You can find a reference and examples of the procmail macro language in the man pages procmailrc and procmailex; there's more information at procmail.org. Like sendmail, it's annoyingly obscure, but if you receive lots of mail from different lists it's a life-saver. (Unless your mail client supports its own filtering mechanism.)

POP and IMAP servers

So far, we've looked briefly at mail delivery on Linux machines, and methods for filtering mail into different folders. But very often, we run into a situation where a UNIX or Linux box is used as a mail server for a collection of client machines running other operating systems. In this case, we usually save the incoming mail into mailboxes (using procmail), then run a POP3 or IMAP server to serve the mail up to the clients.

POP3 (Post Office Protocol) is a simple protocol that allows mail clients to retrieve the contents of a single mailbox. The client system connects to a machine running a POP3 server, and can obtain a list of messages in the mailbox; it can then download messages and delete them from the mailbox on the server. POP3 is used by most ISPs as a way of delivering mail to an individual named user.

(In contrast, SMTP -- the protocol mail servers use to talk to each other, and which most mail clients use to send new mail iout via a server -- is designed for handling large volumes of mail for multiple users.)

IMAP, the Internet Mail Access Protocol (now at revision 4) allows users to maintain hierarchical collections of folders on a server. A POP3 server simply acts as a post office box for mail, which is downloaded to a client and dealt with there. IMAP, in contrast, is a true client/- server system; the mail stays on the server, and the IMAP client allows the user to tell the server to create or delete collections of messages, move messages around between folders, and create and send new messages. With IMAP, the mail stays on the server.

(A detailed comparison of the pros and cons of POP3 and IMAP4 is beyond the scope of this article; however, here's a paper comparing IMAP with POP. Which protocol you choose to support depends to some extent on the mail client software your organisation mandates, as well as the nature of the organisation. IMAP in general provides more flexibility, but requires the mail server to store all the user mailboxes: POP3 assumes that mail is downloaded to the user's workstations.)

A variety of POP3 and IMAP servers are available for Linux. One of the more interesting is Qpopper -- originally an open source product, this is available both in an open source version and with a commercial version that supports TLS or SSL encryption from Qualcomm (see the Qpopper website for details). Courier (mentioned earlier) also includes IMAP and POP3 servers, and the CMU Cyrus IMAP server is also available as open source: it's scalable and can also act as a POP3 and KPOP server.

Getting at mail

Linux has a huge range of email clients, from the bare-bones mailx console tool, through to the relatively sophisticated mutt mailreader (possibly the best console-based reader: see www.mutt.org). On the graphical side, a variety of powerful mail tools rival anything available on Windows or MacOS. If you're running KDE 2.1, KMail is extremely adequate (and provides powerful filtering tools), while GNOME comes with the excellent Balsa client. And there's always Netscape ...

Almost all of these mail tools can access mail from a local spool directory containing mbox files. Most of them can also fetch mail via POP3 or IMAP. If you have difficulty getting your choice of mail tool to retrieve mail -- say, it can only cope with mbox files or POP3, and you want to use an IMAP server -- you will want to investigate fetchmail. Fetchmail is a mail retrieval and forwarding utility that can grab mail from remote servers and forward it to your local client system. It can retrieve mail using POP, IMAP, and SMTP protocols (and some extensions thereto), and can be configured to merge mail from multiple accounts -- for example, by fetching your mail from two or more ISPs and combining them into a single mailbox.


[ Site Index] [ Linux Index] [ Feedback ]