Moving the Mail


[ Site Index] [ Linux Index] [ Feedback ]


In Shopper 160 we looked at mail servers on Linux, their history, and the range of programs available. But MTAs (or Mail Transport Agents, as they're more correctly known) aren't the only interesting tools you need to know about if you're going to use a UNIX-based mail service. In this feature we're going to look at a range of gadgets for doing things with mail when it arrives -- from managing mailing lists to delivering to different drop-boxes, filtering out spam, automatically replying to mail when you're on vacation, and finally reading the mail. The emphasis here is on mail client tools, not servers: Linux has a bewildering array, and we're going to go on a guided tour of them.

Note that this is not an article about mail user agents (MUA's) that you use to read the mail. It's about things you can plug in between a mail server and your mailbox, to do filtering and redirection and spam- blocking and auto-reply and mailing list management and other whizzy things.

First, some terminology. A Mail Transport Agent (MTA) is a program that runs continuously in the background (a daemon, or server) and transports mail from one place to another. MTAs almost always consist of a spooler (that stashes pending mail messages in a queue) and a server that checks the pending messages, works out where to send them, transmits them, and answers incoming connections. MTAs talk to each other over the network using a protocol called SMTP (Simple Mail Transport Protocol), which acts as an envelope for messages -- it lets them tell each other who a piece of mail is addressed to, and confirm when it's been transmitted in its entirety. The commonest MTA is probably the venerable Sendmail, but there are a host of others, including Postfix, Exim, Qmail, and Microsoft Exchange (not available on Linux).

When mail arrives at a destination MTA, it is queued up for local delivery -- and the MTA usually does this by running a second program called an MDA, or Message Delivery Agent. The MDA doesn't understand network protocols or message routing, but it's very good at delivering mail into local mail folders. If the MTA is the Post Office's sorting office, the MDA is the postman who stuffs your mail through the letterbox.

Unlike a human postman, MDAs can do various things for you other than delivering your mail. For example, Procmail is an MDA that can be used to filter messages, redirect them, automatically send responses on your behalf, and a host of other things.

MTAs don't have to deliver mail straight to your mailbox by way of an MDA. One of the tasks of an MTA is re-writing addresses on envelopes, and feeding messages to different queues for delivery depending on where they're addressed. They can be configured to hand mail straight over to a mailing list manager (MLM), a program that maintains a database of subscribers and redirects mail to the list to everybody who wants to receive it. There are a number of MLMs, the commonest currently being Majordomo, but with Mailman rapidly catching up. Yahoo! Groups appears to be based on Majordomo: indeed, you can by-pass Yahoo's annoying, intrusive (and illegal -- under the Data Protection Act (1998)) registration interface to talk direct to the underlying MLM if you know what you're doing.

A major problem these days is spam -- unsolicited junk email sent without regard for the recipient's wishes. Mail you receive from a list you subscribed to isn't spam, nor is mail from a company with whom you registered a desire to hear about new products: but spam is on the increase, and it has been estimated that as much as 30% of e-mail received by businesses from the outside world consists of spam (and self-propagating Microsoft Outlook viruses, which are functionally equivalent to spam -- the recipients didn't ask for them and don't want them). Many tools exist for filtering out spam, including some increasingly slick commercial systems (SpamCop.net's subscription- based product) and free ones -- Vipul's Razor, and Spam::Assassin, which runs as a wrapper around the Razor.

Mail delivered into a mailbox on a Linux system can be read in two ways. Firstly, you can log into the Linux system and use a mail reader on it: either a text-mode reader (such as the primitive mailx or the friendly PINE or Elm and the powerful Mutt) or a graphical mail client, such as KMail, Balsa, Exmh, or Netscape Communicator. Alternatively, you can run a POP3 or IMAP4 server on the Linux system, which allow you to access your mail remotely from a Windows, Macintosh or other Linux client machine.

(As an aside, if you have mail accounts elsewhere that let you collect the post via POP3 or IMAP4, you can use the fetchmail tool to grab the mail and feed it straight to your MTA, re-addressed for delivery to your Linux mailbox.)

In summary, there's a vast range of things you can do with email on Linux. Linux, and UNIX, are email plumbing toolkits. Please don't take offense if I've missed your favourite utility out -- there are too many to discuss in one article.

Mailing lists

If you want to hold a group discussion via email, typing six or seven addresses rapidly becomes tiresome. In your own mail tool you can usually define an alias (that, when typed, expands to several recipients' addresses), but how do you know everyone else has gotten the list right? The answer is to set up a single email account that, upon receiving a message, re-addresses it and sends it to everyone on a list.

The simplest way to do this on a UNIX or Linux system running sendmail is through the /etc/aliases file -- a list of mappings that match email recipient addresses to user accounts. For example, we can add a line that says:

 articles:   charlie karen fred@somesite.org

And, after we've run the newaliases program to update the aliases database, mail sent to 'articles' will get bounced to the three designated recipients.

This is not, however, ideal -- if fred@somesite.org goes to work for newcompany.com, someone has to log in as root and manually edit the aliases file. And we really don't want to be using this technique for a list with thousands of subscribers, of whom about 20-50% of the membership will change in any given year.

The solution to the problem is to run an automated mailing list package. This consists of two parts: a specialised mail client that runs automatically, without human intervention, to read and send mail messages, and a management interface. The client side of the mailing list software reads a database of subscribers. Whenever it receives a message directed to the mailing list address, it resends the message on to everyone in its subscriber database (optionally adding text such as a footer). It may do other tasks, as well: for example, mailing lists can be configured to collect messages for a set period of time (a day, a week, until the volume of collected messages exceeds 64Kb of text, and so on) and send them out in batched digest form. And they can be configured to reject mail containing attachments or HTML (unreadable by many users, because HTML is not an email standard, whatever AOL and Microsoft might believe). The critical point to note is that it is designed for unattended operation; however lists may be moderated, in which case the moderator has to read each message and explicitly send it on to the list before it will be allowed through. (This is usually used for formal company anouncements, learned symposia, and lists where the owner has a lot of time on their hands and is sick and tired of flame wars.)

The mailing list management software is used by the list manager, or by the subscribers, to manage their subscriptions. By ancient convention, if you see a mailing list called "wibble@myhost.org", the associated management software will receive and obey commands mailed to "wibble-request@myhost.org", and the list manager will receive mail for "wibble-admin@myhost.org". A newer variation, used by the Majordomo and Mailman list management systems (discussed here) is to have an account for the mail robot: "majordomo@myhost.org" or "mailman@myhost.org", that can accept commands destined for any of the mailing lists based at myhost.org. (Both Majordomo and Mailman can manage multiple mailing lists and multiple virtual domains.)

If you want to subscribe to a list, you would send an email message to the server "wibble-request@myhost.org" containing the text message "subscribe"; alternatively, you'd mail "majordomo@myhost.org" with the message "subscribe wibble". To unsubscribe from a list, you send a message saying "unsubscribe" or "unsubscribe wibble" to the mailing list robot, not to the mailing list itself (where it will not be read by the list robot but will annoy everyone else).

Incidentally, this command-in-email management system -- which pre-dates the web -- is still used as a back door by Yahoo! Groups; you can subscribe to a Yahoo! group without disclosing personal information by mailing the -request alias at groups.yahoo.com.

Mailing list robots do more than just let you subscribe and unsubscribe; you can get comprehensive help on various features by sending a message containing the word 'help' to the Majordomo or Mailman server. In addition, Mailman provides a really cool, easy to use, web based interface. When you successfuly subscribe to the list a Mailman server will notify you of your password, which you can then use to edit various settings (such as whether you want to receive daily digests, whether you want your name to be publicly visible on the list of subscribers, whether you want to remain on the list but suspend deliveries temporarily because you're going on vacation, and so on).

One annoying form of email abuse that used to happen from time to time was for a script kiddie somewhere to send lots of 'subscribe' messages with forged mail headers, to add someone to tons and tons of mailing lists they neither knew nor cared about -- thus rendering their email service unusable. These days, mailing list systems by default do not automatically accept a subscribe message. Instead, they first send the subscribed address a piece of mail saying "someone has attempted to add you to this list; if you want to go ahead with this, please reply to this message". The purpose of this preliminary message is to ensure that the recipient of the subscription is willing, and as a result it's largely killed off this form of net abuse.

From an administrative point of view, neither Majordomo nor Mailman are for the faint of heart. Majordomo is the older package. Written in Perl, and designed for easy integration into sendmail by way of the /etc/aliases file. After configuring the software to run on your system, you need to edit /etc/aliases, along these example lines;

# Majordomo core installation -- pipe mail for majordomo to the server

majordomo:		"|/usr/lib/majordomo/wrapper majordomo"
owner-majordomo:	root,
majordomo-owner:	root,

# sample entry for a majordomo mailing-list called "test"
#
wibble:			"|/usr/lib/majordomo/wrapper resend -l wibble wibble-outgoing"
wibble-outgoing:		:include:/var/lib/majordomo/lists/wibble
wibble-request:		"|/usr/lib/majordomo/wrapper majordomo -l wibble"
wibble-approval:		owner-wibble,
owner-wibble-outgoing:	charlie,
owner-wibble-request:	charlie,
owner-wibble:		charlie,

Spot the metric ton of aliases needed to get this single list working. The program /usr/lib/majordomo/wrapper is a set-UID wrapper that runs the Majordomo perl script safely; Majordomo itself reads and parses incoming mail and acts on the basis of any commands in the mail, and the alias the mail was addressed to.

In terms of functionality, Majordomo has lots -- but no easy web based interface. (On the other hand, if you want to run an FTP-MAIL service, Mailman doesn't have that feature. FTP-MAIL was very useful back in the 1980's, before TCP/IP was universal; you could tell the mail server to retrieve a file for you, and it would do so then mail it to you in UUencoded chunks -- this was before MIME messages! Ah, the nostalgia.) Suffice to say that Majordomo is great for handling lists on a single server (no virtual hostnames), and as long as you don't want a web- based interface. If you want a web based interface, you can try adding MajorCool on top, but you might be better off with Mailman, which was designed for it from the ground up.

Mailman is a newer and (arguably) better-written mailing list management tool, written in Python. It is slightly harder to install than the old war-horse Majordomo, because for full configuration it requires integration into an Apache web server. Mailman gives each mailing list a unique web page and allows users to subscribe, unsubscribe, and change their account options over the web. Even the list manager can administer their list entirely via the web. Mailman has a wide range of features, including built-in web-accessible archiving of lists, mail-to-news gateways, spam filters, bounce detection (to catch users who move accounts without telling the list manager), digest delivery, and so on.

The one point you need to bear in mind about both these systems is that they're complex tools that require some passing familiarity with a text editor to install, and you need to know your mail transport agent as well in order to get them to work with it. This is not a job for amateurs unless you've got a lot of time on your hands. If you want to run a small mailing list for personal use, you would be well advised to look at a public system such as Yahoo! Groups (formerly eGroups.com) and its mail-mediated Majordomo style interface.

Collecting the mail from elsewhere

If you use a mail reader like Netscape you're probably used to having multiple accounts on multiple machines. But how do you do that on Linux, if you want to collect mail from different ISPs? The answer is one word -- fetchmail. Fetchmail is about the ultimate tool for fetching mail from a remote mailbox. It talks POP2, POP3, IMAP, KPOP, APOP, and every other remote mailbox retrieval format in use on the net; it also supports IPSec encryption. It's designed to run in a user's account; you can either edit the .fetchmailrc configuration file using a text editor or, if you're running X11, use the fetchmailconf graphical configuration tool to enter the details of your mail accounts. You then type 'fetchmail' and it runs as a daemon under your user account.

By default fetchmail slurps up the mail from your remote POP3 account and opens up a connection to your local SMTP server. It can map remote user names to local ones, so if you're cstross at work and charlie at home it can tell your copy of sendmail that the mail is to be delivered to charlie. It can funnel multiple remote accounts to the same address. And you can tell it to deliver directly by feeding mail to an MDA like Deliver or Procmail instead of an SMTP server (see "Filtering incoming mail" below). In fact, it's glue for remote mail accounts and local mailboxes. If you want to use it, start by reading the manual (but not until you've read the rest of this article!) -- this should give you a feel for what you can do with it.

Filtering incoming mail

Once the MTA receives mail on your Linux box, it has to deliver it. This job is carried out by an MDA (Mail Delivery Agent). On the standard UNIX principle that anything worth doing once is worth doing twice and then turning into a special-purpose programming language, there are a couple of common MDA's: deliver and procmail.

Deliver (the BSD version is shipped with some Linuxen, and is a clone of the original AT&T deliver) is basically a shell-script wrapper. It reads mail messages which are passed to it by the MTA, scans them, and executes an intricate maze of shell scripts to figure out what mailbox to append them to. The deliver mechanism is arcane and sufficiently complex to conceal a whole family of security holes; for example, the system and user delivery control files (/usr/local/lib/deliver.sys and ~/.deliver) can use the standard hash-bang notation to execute under any shell language you like. Deliver understands a number of action commands, including append to mailbox, pipe to another command, throw an error condition, and resend (via the obsolescent UUCP networking protocol!) to another host.

This all makes deliver horribly powerful, if a bit opaque to beginners. For example, a .deliver file in your home directory containing the following line:

      ( cat $HEADER; tr '[A-Z]' '[a-z]' <$BODY ) | deliver -n "$1"

causes the body of all incoming mail to be translated into lowercase before storing it in your default mailbox.

Deliver is, if anything, too powerful: trivial errors in file ownership over deliver scripts can render your system insecure and have disastrous side effects. So a more modern solution is to use procmail. Procmail (from www.procmail.org, and present in most Linux distributions) is a pattern-matching language designed to control the delivery of email messages in much the way that awk is used as a language for processing text files. While deliver runs external shell scripts, procmail runs 'recipes' it finds in a file in the user's home directory called .procmailrc, or globally in /etc/procmailrc.

A procmail 'recipe' is a declaration of how a particular type of mail message is to be processed. There are delivery recipies -- which send the message to a mailbox and cease processing -- and non-delivering recipies, which do something to a message then continue processing it against the procmailrc script. Each recipe starts with a directive specifying the locking mode and scanning options to employ for the message, then a pattern to match against in the message header or body, and finally an action -- save the message, pipe it to the resend program with some arguments to send it elsewhere, or whatever.

A common procmail recipe looks like this:

  :0:
  * ^FROMwriters@some.listhost.org
  mail/writers

This means that a local lockfile is to be used, and the incoming message is to be scanned for a From header containing the address "writers@some.listhost.org"; if this is found, the message is to be saved in the mailbox mail/writers.

   :0:
   * ^TOcharlie
   ! charlie-on-the-road@mydomain.com

This is a slightly different recipe: it picks up any mail addressed to "charlie" and forwards it (via the local MTA) to a different address -- charlie-in-the-road@mydomain.com (presumably a mobile account).

You can pipe messages through external programs, too. Here's a complex one:

     SHELL=/bin/sh    # for other shells, this might need adjustment

      :0 Whc: vacation.lock
       # Perform a quick check to see if the mail was addressed to us
      * $^To:.*\<$\LOGNAME\>
       # Don't reply to daemons and mailinglists
      * !^FROM_DAEMON
       # Mail loops are evil
      * !^X-Loop: my@own.mail.address
      | formail -rD 8192 vacation.cache

        :0 ehc         # if the name was not in the cache
        | (formail -rI"Precedence: junk" \
             -A"X-Loop: your@own.mail.address" ; \
           echo "I received your mail,"; \
           echo "but I won't be back until Monday."; \
           echo "-- "; cat $HOME/.signature \
          ) | $SENDMAIL -oi -t

In this example, a bunch of conditions (check for the recipient, filter out replies to daemons, filter out mail loops) result in the formail program being triggered, to generate an auto-reply header and check that it hasn't already seen this message ID. The next rule is then triggered (if formail hasn't already seen the message), and formail is invoked under the shell /bin/sh to add some new headers and an away-from-home message before feeding the outgoing reply message back to sendmail (the MTA).

You can chain procmail and deliver together, but as procmail can do almost everything useful that deliver can, and is safer, it's probably best to stick to just procmail. There are a number of manual pages that come in handy when configuring your mail delivery options -- in particular the procmailex man page (type "man procmailex" at a command prompt to read it) which has lots of canned examples (including the one given above), and the procmailrc man page (which describes the procmail mini-language).

Spam, Spam, and more Spam

Spam -- unsolicited, unwanted, bulk email -- is a plague on the internet. It also tends to make people angry. Because it makes people angry, many programmers have devoted a ridiculous amount of time and energy into writing tools that block spam. And many of these tools run on UNIX and Linux.

The first spam filtering tools to be aware of interact with the MTA. If you run one you should always ensure that your MTA doesn't relay messages to third parties on behalf of other domains -- if it does, your open relay will be leeched on by spammers with cries of glee, for a relay lets them disguise the origin point of their mass-marketing campaign, defeat the cruder filters, and steal someone else's network bandwidth (yours).

Almost all current MTAs (including sendmail) can block incoming connections from hosts named in a blacklist file. But this doesn't help much, because spammers change their addresses faster than you or I change our underwear. A more useful solution is to use a DNS- based blacklist service. DNS, the domain name system, is a distributed database; you throw a name at a server and either it returns a number associated with the name, or it queries upstream servers on your behalf until it finds the answer. Normally DNS is used to map domain names onto TCP/IP addresses, the numeric codes that are the 'real' identifiers of a host on the internet. But special DNS services exist that are used to distribute databases of known spammers. You can configure sendmail 8.9 or later (and other MTAs) to do a DNS query to a special server on each incoming mail message, and reject the mail if the DNS query gets a reply indicating that the message is blacklisted. These so-called realtime blackhole services let central organisations coordinate lists of spamming sites and update them, adding and removing spam sources as soon as they show up.

You can find usage instructions for the MAPS RBL, one of the earliest and oldest, including instructions on how to configure sendmail to use it. More to the point, there are a variety of different RBL services: some of them have different policies on adding/removing spam hosts, so you may want to shop around. Here's a list of services.

While DNS-based filtering works at the server level, what about at the user level? There are potloads and potloads of Linux-based tools for filtering out spam; mostly you tie them into your mail delivery system via procmail. For example, here's a chunk of my personal .procmailrc file:

 :0f
 | /usr/bin/spamassassin -P

 :0:
 * ^Subject:.*\*\*\*\*SPAM\*\*\*\*
 mail/caughtspam

Spam::Assassin is an extremely cool spam filtering utility that relies not only on a DNS check, but on a tool called Vipul's Razor (see below). Mail is filtered through spamassassin and comes back with a modified subject line (containing "****SPAM****" if the filter thinks it's spam). It then gets processed by the next rule, which scans the headers for a "****SPAM****" subject and dumps suspencted spam into a mailbox called mail/caughtspam, to be dealt with later. (It could equally well invoke a formail recipe -- like the vacation alert above -- to bounce the spam back to the sender's ISP along with an automatic complaint.)

Spam::Assassin is written in Perl, but to install it you don't need any knowledge of that language; just follow the directions in the README.

Vipul's Razor is a collaborative spam-tracking database, which works by taking a signature of spam messages. Since spam typically operates by sending an identical message to hundreds of people, Razor short-circuits this by allowing the first person to receive a spam to add it to the database -- at which point everyone else will automatically block it. Because it works at the message level, rather than by blacklisting entire domains and servers, it's more merciful -- and effective -- than the crude DNS-based blackhole services. (Interestingly, the more people who use VR, the more efficiently it works: a classic example of the 'network externalities' model of economics promoted by the internet.)

One of the problems with Vipul's Razor is that its efficiency is proportional to the number of people who use it. If you feed a message to a VR client, it generates a cryptographic 'fingerprint' of the message and polls the nearest VR server. The server checks its database for the fingerprint. If it's there, it tells the client, "yup, that's spam"; if it isn't, it's not known to be junk so the VR client passes the mail through. Someone has to be the first victim of the spam, and has to invoke their VR client in "notify" mode to tell the servers "the following fingerprint belongs to a piece of spam -- remember it".

However, those of us who've been on the net for long enough have lots of old email accounts that are moribund -- nobody except spammers remember them. For example, if you send email to charlie@antipope.demon.co.uk (an account I've had since 1993) it won't get through to me; I don't use it any more, but it still exists, and it gets about ten or twenty pieces of spam a day. You can use fetchmail to slurp in your mail via POP3 from an old ISP account, and feed it straight to the Vipul's Razor servers.

It takes a simple .fetchmailrc configuration file like this:

  set logfile fetchmail.log
  set daemon 120
  set no bouncemail
  poll pop3.demon.co.uk protocol pop3
          user "oldaccount" with password "xyzzy" to "spamtrap" here
        and wants mda "/usr/bin/procmail -d %T"

And a simple .procmailrc, to process the incoming messages like so:

  :0f
  | /usr/bin/spamassassin -r -l /home/spamtrap/spam_added_to_razor

And anything that hits user@oldaccount.demon.co.uk is reported as spam (by a robot running as user "spamtrap" on a server with a broadband internet connection -- a cable modem will do).


[ Site Index] [ Linux Index] [ Feedback ]