July 2000 Column


[ Site Index] [ Linux Index] [ Feedback ]


All about usenet

This year, usenet is twenty years old. It's not as well known to the public as the world wide web or IRC or new-fangled semi-proprietary protocols like Napster, but usenet -- the internet news -- has been around for a long time, and is one of the most useful services available on the net.

If you've ever used a bulletin board, or a Lotus Notes system, you'll have seen something a bit like usenet. A usenet server stores messages. Messages are partitioned among news groups -- essentially holding areas that identify the messages by general topic. In addition, messages can refer to one another, so that threaded discussions can build up. (Quite a few companies have found that usenet servers make a good -- and free -- alternative to a Notes server or some other commercial groupware system.)

What makes usenet different from a bulletin board is that it is distributed across tens of thousands of servers. Current news servers (and news readers, the client programs that let users read and post articles) talk to each other using a TCP/IP protocol called NNTP (for Net News Transport Protocol), originally defined in RFC977. If you write a new article -- say, a reply to someone else's posting -- and post (send) it to your local server, then in due course your local server will call those neighbouring servers that it knows about and send them a copy of the new message (if they don't already have its message-ID in their database). Servers pass on copies of new messages to all their neighbours, so in this way your posting may be relayed to every other server on the internet.

Obviously, with millions of people using it usenet builds up messages fast. It's therefore usual for usenet servers to "expire" articles, deleting those that are older than a set age. (Some servers, notably Deja.com, never expire anything -- which is why Deja operate as an archive for usenet.) The volume of usenet postings is currently about two gigabytes per day, but a large chunk of this is rubbish (pirated copies of commercial software, for example, or spam -- junk adverts -- and cancel storms -- messages sent by robots to cancel previous postings by people the robot owners don't like). Once you weed out the rubbish, you find roughly half a gigabyte of discussions, spread through roughly thirty thousand newsgroups. Nobody can read all that, but you don't need to; the news groups are arranged hierarchically, like the topics in a library catalogue, and you can subscribe to just one or two (or a dozen) of them.

In the beginning usenet covered a couple of academic computing sites in the USA, and there were no NNTP servers; instead, fairly simple-minded file copying tools moved articles between sites, each of which maintained a spool area -- a directory tree containing news group subdirectories. (Each article was stored as a separate file in a general article holding area, with a link from the appropriate newsgroup subdirectories pointing to it. In this way, an article could be "cross posted" so that it would show up under two or more newsgroups if it was considered to be of interest to several groups of readers who might otherwise not see it.)

Users could read news by looking at the files held in the spool directory, or later by using a tool called "rn" (read news), written by Larry Wall (of Perl fame). rn was useful because it introduced the idea of the killfile -- a file containing regular expressions to match article header lines (such as the subject, name of person who posted the article, and so on). If someone annoyed you or simply talked nonsense you could "kill" articles by them, so that you wouldn't see them any more. (Note that this is not the same as cancelling an article. In cancellation, a special article is posted that basically says "I am the author of article-ID foo and I hereby ask all servers to delete that article". No prizes for spotting the security-related fun and games that this sort of control message can be used for!)

With the spread of usenet -- it reached the UK as early as 1985, although it still wasn't universally available even at universities connected to JANET until the early 90's -- NNTP arrived, and so did more sophisticated newsreaders. One improvement was threading -- the ability to group articles on a common topic hierarchically within the newsgroup, showing their relationships. Another was scoring; instead of simply junking anything posted by a hopeless dweeb, you could tell the newsreader to subtract points from a score card -- but perhaps add back points for the topic under discussion, or some other attribute of the posting, and display everything above a certain threshold. Usenet wasn't a purely UNIX thing either; newsreaders for MacOS and DOS and Windows and VMS and just about anything under the sun showed up. Microsoft Outlook Express can read and post news (although we wish it wouldn't -- it has some weirdly nonstandard default behaviour), and so can Netscape.

Another thing that happened with the spread of usenet was "the great renaming". In the beginning, all newsgroups had names beginning with "net." As with the domain name system, this name space filled up rapidly. In 1989 or thereabouts, the net newsgroup namespace was renamed. Seven major, controlled newsgroup hierarchies were set up:

comp.*
discussion of computer software and hardware
rec.*
discussion of recreational activities (books, art, hang-gliding, whatever)
soc.*
discussion of society, social sciences, etc.
sci.*
discussion of hard sciences (math, physics, biochem, etc)
news.*
discussion of the usenet system itself
talk.*
general area for discussing things that need discussing
alt.*
anything goes (more or less)
Alt takes some explanation. The alt hierarchy is a catch-all for any topic that can't pass the formal voting process needed to establish it under a major hierarchy. Usually, when a mailing list overflows an alt newsgroup is spawned -- the process for doing this is fairly informal and doesn't require a large quorum. When this has shown its worth, someone may come along and organise the voting arrangements to get a similar group set up under an organised hierarchy. For example, the newsgroup alt.history.what-if was set up for the discussion of alternate history scenarios. After a while, a formal vote resulted in the creation of soc.history.what-if, and much of the discussion transferred there.

In addition to the main hierarchies above, a whole bunch of johnnie-come-lately hierarchies have been spawned. Each country usually has its own news hierarchy -- for example the uk.* tree -- and various companies (such as Microsoft and Borland) have built their own in-house trees, then make them publicly accessible.

Getting started with usenet

Usenet has a body of lore and cultural conventions; because it's a public medium, anything you post will be seen by people you never expected to meet. What's more, it will be retained in places like Deja.com. In general, it is a good idea to understand the basics before you try and ask any questions (or answer any).

The first place to go for answers is, interestingly enough, the newsgroup news.newusers.announce. You can't post here; this is a moderated newsgroup (meaning, all postings go to a human being for approval before they show up in the group). What you will find here is a mass of good advice -- information about how to use the usenet system, how to understand the usenet hierarchy, and some very good advice about netiquette -- how to engage in discussions with people you've never met without getting right up somebody's nose. (It's surprising how seemingly-minor gaffes may get you a loud and irrascible response. For example, one usenet convention is that TYPING IN CAPITALS LIKE THIS IS READ AS SHOUTING. So don't do that unless you really want to shout at your reader! Another bad thing: posting an article that consists of the entire quoted text of someone else's long essay or rant, folowed by the immortal words, "me too". There are other ways of putting your foot in it publicly; to avoid them, start by reading news.newusers.announce.

You might then want to go and look in one of the most obviously interesting newsgroups -- such as uk.comp.os.linux, perhaps, which is a living, breathing example of why so many people find usenet useful. Simply put: you can ask questions, or answer them. If you ask a question, you will likely get several answers fairly rapidly, within a few hours or days. And in some areas -- especially technical questions in places like comp.os.linux.setup or comp.windows.x.* -- the responses will be as good as anything you can get off a commercial support hotline. (One caveat: nobody is payed to post on usenet, so the quality of responses is something you will have to guauge for yourself. Ask a silly question and you may well get a silly answer -- or a flea in your ear.) SUBTITLE: News readers This is all very well, but how do you go about reading usenet?

You need to use a program called a newsreader. If you're running Linux, there's an embarassment of choice; chances are, you've already got several on your system. If you use a console based system, probably the best newsreader bar none is slrn; type "man slrn" to get the basics. You can get slrn from space.mit.edu, it's author's local ftp server. If slrn is a bit intimidating, try tin -- less features and a more basic display, but workable. (While the popular mail tool pine can read usenet groups, I really wouldn't recommend it -- it couldn't cope with threading, last time I looked. And I strongly advise against going in search of the original rn, unless you really enjoy writing regular expressions. It hasn't aged gracefully.)

Before firing up either tin or slrn you will need to make sure that you're on-line with access to a news server. That's because these are on-line newsreaders; they expect to talk to an NNTP server, the name of which should be specified in your shell's environment using the NNTPSERVER environment variable. For example, if on-line to Demon Internet, you'd set up the NNTPSERVER variable before running either newsreader:

  NNTPSERVER='news.demon.co.uk'; export NNTPSERVER
  slrn
(When running slrn for the first time you also need to add some extra options to tell it to set up a bunch of configuration files. See the man page for details.)

You may also need to set your EDITOR environment variable to the name of a text editor you're comfortable with. In general, if you don't know vi or emacs you'll be better off using pico (the editor from the Pine mail/usenet client) or nano (an open-source clone of pico).

If you don't want to stay online all the time you're reading news, you can install your own news server on your Linux system. You then set localhost as your NNTPSERVER, and your newsreader goes to your local server. Anything you post there should be forwarded to your ISP's server when you go online (and tell your server), and your server will pick up anything that interests you from the upstream host.

There are also offline newsreaders that maintain their own spool directory. In general, you run a separate program when you're online that pulls down new news and pumps out new postings, then browse the spool directory when offline. (Just to add to the confusion slrn comes with a special kit -- slrnpull -- to do just this, if you configure it right.)

Most of the GUI newsreaders for X11 are able to run offline; the cost of this is that each user's newsreader maintains a database of megabytes of postings. There's a newsreader built into Netscape Communicator, and another built into StarOffice, but I wouldn't recommend either of these -- apart from being memory hogs, they have weird default configuration settings. You may be better off starting with Krn (the KDE newsreader) or one of the GNOME newsreaders that maintains its own spool.

Casting aside journalistic objectivity for a moment, I use slrn. It doesn't look as cute as Krn or Netscape Communicator, but it's faster, uses less memory, has zillions more features than the GUI newsreaders, and I can run it remotely in a telnet or ssh session. (This lets me keep the same scorefiles and lists of interesting groups wherever I am, rather than having to carry a parcel of configuration files with me wherever I go. I tend to work on a lot of different machines ....)

Learning to use slrn is a bit harder -- most of the documentation is buried in the change notes -- but when in doubt, just hit the question-mark key for some help. Once you're up to speed it's unbeatable. And it comes with a cool text editor, Jed, which would probably be my editor of choice if I didn't already have vi wired into my fingertips. Install it from your distribution, or download the sources (for slrn and the slang graphics library it depends on) from ftp.space.mit.edu and follow the recipe to compile and install it. Or go and read news.readers and news.readers.offline for a comparison of features among various newsreaders, and make up your own mind.

In general, all newsreaders provide three views of usenet. The first view is a newsgroup selection view; this lets you pick newsgroups that you're interested in reading and which can be "subscribed" to. Subscription is a bit of a misnomer, because all it means is that your newsreader downloads some overview files listing the current contents of those groups on your local server -- unless you're running an offline newsreader and are online, in which case it will fetch everything in the subscribed groups for you to read later.

The second view, once you've picked a newsgroup from the list of subscribed groups, is a thread-level view; it should show you the articles currently in the group, possibly sorted by subject or author. (Most newsreaders also sort by thread -- you only see top-level topics, but if you select an article with follow-up postings it will unfold the outline of the discussion so you can see everything.)

Finally, having picked a thread, the newsreader lets you read the body of the articles in it, and move around them as if you're looking at the contents of a mail folder in your email tool of choice. The difference is that instead of emailing a reply to the author (which most newsreaders will let you do) you can also post a public follow-up, which will go into the usenet news system instead. You can also (if you know what you're doing) start a new thread.

When composing a usenet post, you'll find yourself editing a message. It's a lot like sending an email note -- except that your posting will be seen by thousands, or tens of thousands, of readers. (So read the Emily Postnews articles in news.newusers.announce and try not to embarrass yourself in public, okay?)

News servers

A news server isn't just a TCP/IP server that talks the NNTP protocol; it's a suite of programs for maintaining a news spool, adding and deleting articles, processing control messages, maintaining a list of interesting newsgroups, seeing that postings to moderated newsgroups are mailed to the newsgroup's administrator, and letting the system administrator know if the filesystem where the news spool lives is about to fill up. (Phew.)

In the beginning there was A-news, which begat B-news, and we don't talk about them much because they predate NNTP. The oldest news server kit still in use anywhere is C-news (written by net.god Henry Spencer, who also wrote the regular expression package perl and rn use), but you probably don't want to mess with C-news either. In general, there are currently two news servers of interest to you (unless you feel like writing your own): Inn, and Leafnode.

Inn (Inter-Net News) is a very powerful, rather complex software suite. It comes with most linux distributions, notably Red Hat (and derivatives) and Debian. Inn is suitable for running medium-sized ISP-level newsfeeds; large ISPs tend to run on modified versions of Inn. For example, Demon run their news service using a slightly modified Inn kit (actually a bunch of Inn daemons on a pool of machines that use the same network filesystem to store the news spool).

Inn may come with Red Hat and relatives, but if all you want to do is provide a service for one or two local users who read five or ten groups, it represents massive overkill. (Who wants to have to edit twenty-three configuration files before they can read news?) If you're setting up a newsfeed for a company it may be worth it, in which case you need to start with the manual (and check for news about Inn at the Internet Software Consortium, who currently maintain the software). But Inn administration is a complex, hairy artform that really warrants a book, not a magazine column. Even though newer versions (notably inn 2.2) are much easier to cope with than older ones -- you edit about three files,then control it by issuing commands through the ctlinnd program -- it's still a bit of a beast; and besides, I'm not an Inn guru. I use Leafnode instead.

Leafnode was written as a throw-away piece of freeware by the guys at Troll Tech, who also wrote the Qt widget library that KDE is built on top of. Leafnode is a very small caching news server for leaf sites with a dialup connection -- which probably means you (or me). You install leafnode on your Linux system and tell it where your ISP's news server is. You then point your online newsreader at your leafnode server.

When you try to read news in a group that leafnode isn't currently reading, it adds the group to a list of interesting groups (actually the files named in the interesting-groups directory in your news spool area). When you go online, you run a program called fetchnews; this posts outgoing articles and then fetches all new articles in the groups you're interested in. If you don't read those groups for a while (specified in the leafnode configuration file) leafnode will quietly stop fetching articles for them. If you run low on disk space or inodes, you run texpire, the news expirer, which gets rid of old threads that haven't been read for a while.

Leafnode has a single major problem, plus some minor ones. The big problem is security. If you connect to the internet for even a few hours, the odds are that some asshole -- no milder word suffices -- will find your server and try posting spam through it. The first you'll know about it is that your connection slows down; if you check what's going on using netstat, you'll see incoming NNTP connections. The next thing you know, someone will scream at you by email -- if you're lucky. If you're not lucky, your ISP will suspend your account for net abuse; spam is generally unwelcome everywhere and all reputable ISPs have terms of service that basically forbid their users from sending it.

The way to avoid this is to follow the installation instructions to the letter. In particular, ensure that TCP Wrappers (tcpd) are installed on your system, and add the following line to the file /etc/hosts.allow:

  nntp: LOCAL
and to /etc/hosts.deny:
  nntp: ALL
Assuming that you have configured inetd to launch leafnode as the news server for all NNTP connections, using /sbin/tcpd to control who is allowed to connect (which is what the installation cookbook tells you to do), the entry in /etc/hosts.deny should block all incoming NNTP connections, while the /etc/hosts.allow entry overrides it to make an exception for your local system. (If you use masquerading and run a local class-C network in your office or home, so that other machines are using your Linux server as an internet dialup gateway, such as one in the 192.168 address space, you can add this network as well so that only NNTP clients on your local side of the gateway can see the leafnode server.)

In addition to the leafnode documentation, there's a mini-HOWTO document (News-Leafsite) that lays out the recipe, and this ships with Red Hat (although leafnode isn't part of the standard distribution). If you can't find Leafnode in your Linux distribution, you can download and compile a patched version from Leafnode, or the original (unpatched, less cool) version from Troll Tech.

One final point. News builds up over time. Articles you've read aren't erased automatically from the spool area -- other people might want to read them. So you need to periodically "expire" ancient articles, unless you want usenet to eat all your disk space. Leafnode provides a tool called "texpire" to purge old articles. If you're going to use a news server, it is a very good idea to read the documentation. Leafnode doesn't require a professional administrator (unlike a full-blown Inn server), but it can bite you if you aren't careful -- if your housemate decides to see what alt.binaries.unreasonably.big.pictures contains, you can end up downloading gigantic images for the next couple of weeks unless you know just what configuration item to delete to stop Leafnode downloading it. (Because Leafnode assumes that if someone tries to read a newsgroup it must be interesting. Which isn't always the case!)

Happy usenetting, and see you on uk.comp.os.linux!


[ Site Index] [ Linux Index] [ Feedback ]