LINUX FORMAT PERL COLUMN STRAP: Classy Last month we began our exploration of object oriented programming in Perl by planning a pair of modules to let us cleanly edit a configuration file (in this case, /etc/hosts -- but it could equally well be a BIND configuration file under /var/named, or anything else for that matter). This month, we're going to flesh out these modules and get them working; along the way we'll look at file i/o, how we give instructions that affect the way the Perl compiler treats our programs, how inheritance works, and a few other important aspects of day-to-day programming in Perl. The /etc/hosts file is used to map internet IP addresses to host names on a small network. (It predates the BIND internet daemon and DNS servers, but it's still useful if you have a handful of hosts sitting on a LAN.) Each line of the file starts with an IP address, then one or more aliases (names) by which the machine is known. A hash-sign introduces a comment, and anything after the comment is ignored. For example, a line like this: 192.168.1.10 frodo.localnet.net frodo Means that the names "frodo" or "frodo.localnet.net" apply to the machine with IP address 192.168.1.10. Newer Linux systems that have IPv6 support also use /etc/hosts to store some IPv6-related data -- for example, the localnet, broadcast, and router addresses. But we're not going to bother with that. What we're going to do is design a Perl module that manages an /etc/hosts file. Create a new object, and it reads in the hosts file's contents. Tell it to save, and it saves the hostfile. It should let you add new host records, delete records, and retrieve a record for a host using any of its aliases. This sounds fairly straightforward, but remember -- while we may think of a hosts file as being an object, what about the records within it? Aren't they equally appropriate objects to manage? The answer is yes. A record in an /etc/hosts file has attributes (IP address, any trailing comment, a list of hostnames; or maybe some unknown IPv6-related data). So the way we do it is to use two classes. We'll call them LF::Hosts and LF::Hosts::Entry. An LF::Hosts object is the one that provides an API for programmers who want to mess around with /etc/hosts; it corresponds to the hostfile. It's a container for a bunch of LF::Hosts::Entry objects, each of which contains a single hosts entry (like the sample line above). An LF::Hosts object has various attributes. It contains a whole bunch of host records (each of which is a unique description of a single IP address -- if there are two records for an IP address, this is an error). It may contain a hash table of aliases, each of which points to a host record, so we can look up the IP address associated with a name. An LF::Hosts::Entry object is a bit simpler. It's basically a hash, containing some clearly-named fields: ip_addr (for a host IP address), alias (a reference to an array of names for the host), comment (for any associated comment text), type (which can be one of "addr", "comment" for pure comment lines, or "other" for IPv6 records), other (for IPv6 content), all blessed as an LF::Hosts::Entry object. We can use AUTOLOAD methods to get at the contents of each field by name (so we can say $entry->ip_addr() to return its IP address, rather than poking around the guts of the hash), and there's a print() method that returns a formatted text string assembled from the LF::Hosts::Entry object's content and suitable for writing to a new copy of the file. SUBHEADING: Building a container for data First, let's examine LF::Hosts::Entry. This is really a simple container class; its job is to store named attributes (like IP addresses or aliases), and let us get/set attributes using methods with the same names. In addition, it needs a method to parse a line from /etc/hosts into an object, and another method to take an object and turn it back into an /etc/hosts line. We can get away with only three methods in the class. We start by declaring a new package: ---BEGIN LISTING use strict; package LF::Hosts::Entry; ---END LISTING The second line, "package LF::Hosts::Entry", declares that everything from this line on belongs to the package until we leave the scope of this block. Astute readers will notice the lines that are missing after the package declaration: there's no assignment to @ISA, nor is there an Exporter declaration. That's because LF::Hosts::Entry doesn't inherit anything from LF::Hosts -- despite the similar name, it's an entirely separate class, not a child class. (@ISA lets us specify the classes that our current package inherits methods and data from. If we try to invoke a subroutine foo() in the current class, and the subroutine isn't defined, the classes listed in @ISA will be searched in turn for this method. See the boxout "Inheritance and automatic methods" for an explanation.) We now have three methods. LF::Hosts::Entry::new() is a constructor; it takes a parameter (a line from an /etc/hosts file) and parses it, stashing the sections of the line in a hash referenced by $self. We're using the hash as a record; it has a number of named fields ("type", "comment", "ip_addr", "alias") which contain the entity in question (or an arrayref pointing to a list of them, in the case of "alias" -- there may be more than one name associated with an IP address). Note one feature of a method (a subroutine that's defined in a package and is used in OOP): the first parameter supplied to the method is always the name of the class it belongs to. This is a feature of the way we invoke a method. If we invoke a normal subroutine by writing fred("param1"), the first item found in the @_ (parameter passing array) inside fred()'s context is "param1". If, on the other hand, we call it by saying MyModule->fred("param1"), this is taken by Perl to be equivalent to saying fred MyModule "param1"; the first item in @_ is "MyModule", and the second is "param1". It's therefore common to see methods in Perl begin like this: sub my_method { my $class = shift @_ ; my $param = shift @_ ; (shift(@_) returns the first item on the parameter list @_. If you refer to shift without a parameter, it defaults to working on @_, so you may also see code like "my $class = shift;".) The goal of LF::Hosts::Entry->new() is to take a parameter like this: 192.168.1.10 gateway.localnet gateway gw # address of network gateway And turn it into a Perl hash structured like this: { "type" => "addr", "ip_addr" => "192.168.1.10", "alias" => [ "gateway.localnet.", "gateway", "gw" ], "comment" => "# address of network gateway" } This is a fairly straightforward parsing job. It's complicated by the need to recognize lines that are nothing but comments, like: # this line is a comment Which produces a record like this: { "type" => "comment", "comment" => "# this line is a comment" } And weird IPv6 related stuff like this: ::1 localhost ipv6-localhost ipv6-loopback Which is stored as this: { "type" => "other", "comment" => undef, "other" => "::1 localhost ipv6-localhost ipv6-loopback" } Although LF::Hosts::Entry::new() is large, there isn't much magic in it. We use regular expressions to extract the comment, if any, and return the $self object immediately if there's nothing else in the line. Then we see if the line starts with a pattern that matches an IP address in dotted-quad format. If it does, we chop up the line, stick bits of it into $self, and return; otherwise we assume the line is something we don't understand, flag it as an "other" record, and return that. The really confusing magic in this subroutine is how we refer to hash values. $self is just a reference to a hash. To understand how we say $self->type("other"), instead of $self->{type} = "other" see the text box "Inheritance and automatic methods". The only other method in LF::Hosts::Entry is print(). This simply returns the content of the current object in a printable form -- rebuilding the /etc/hosts line in canonical form. It asks the object what type of line it corresponds to (an address, a pure comment, or something that the class doesn't understands) then returns an appropriate string assembled from the data stored in the object. SUBHEADING: Building an object that marshalls other objects LF::Hosts is a more complex class; its job is to marshall a bunch of LF::Hosts::Entry objects. We can add entries, delete entries, index them (by hostname), search them (by hostname), pretty-print them in readable form, and save them to a file. LF::Hosts doesn't know anything at all about the contents of the LF::Hosts::Entry objects it pushes around. It deals with the LF::Hosts::Entry objects only via their method-based interface. For example, if it wants (in add_entry()) to know if an LF::Hosts::Entry object called $object is a comment or an address record, it asks the object by calling $object->type(). The object responds by returning its type: LF::Hosts itself doesn't have anything to do with the internals of the class it's marshalling. The significance of this can't be understated. Data hiding lies at the core of object orientation: the data owned by a class is hidden behind a public interface, and we can only get at it by calling methods, subroutines that the class makes available in the public interface and which are designed to act as gatekeepers. In principle we can re-design LF::Hosts::Entry completely from scratch; as long as our new version provides methods with the same names for querying/setting its data, LF::Hosts will be able to run unchanged. If we were deploying these modules as part of a huge system administration framework, this design principle would allow us to make improvements or fix bugs without running the risk of some other module being affected. Let's take a quick walk through LF::Hosts. To start with, there are some interesting lines here that begin with an equal-sign: =pod, then =head1 NAME, then =cut. Text delimited by lines beginning with "=pod" and "=cut" is POD -- Plain Ordinary Documentation, Perl's built-in documentation format. At heart, POD is a multi-line comment system; the text in a pod section isn't executed. However, it's more than that. POD lets you define various levels of heading, include lists and verbatim code, and I or B text using simple tags. There are tools in the standard Perl distribution for parsing POD files (or Perl scripts containing pod sections) and turning them into HTML or man pages; to read the online documentation about POD type "perldoc perlpod" at the shell, or read Chapter 26 of "Programming Perl" (3rd edition). Below the POD section, we run into the package line, that declares the start of LF::Hosts. It's immediately followed by a use statement, that loads up the module IO::File; IO::File lets us read and write files using object oriented semantics (see boxout, "Opening, reading, writing and closing files"). It's used in two places -- new() and save(). We also see a global variable, $LF::Hosts::HOSTFILE; this is the place where we expect to find a hostsfile, and in practice it's a constant (/etc/hosts, according to the Filesystem Hierarchy Standard). In the example code this is changed to /tmp/hosts, so we can experiment without damaging the real thing. The method new() is the object constructor for this class; it creates a hash, blesses it as an LF::Hosts, and then does something else: it opens the $LF::Hosts::HOSTFILE (as an IO::File object), then does the following: while (! $file->eof() ) { my $added = $self->add_entry($file->getline()); $added->line_num(++$line_num); } $file->getline() causes the IO::File object $file to return a new line from the hostfile. This is passed to add_entry(), which we'll see in a moment. add_entry() adds a new LF::Hosts::Entry to the LF::Hosts hash, keyed by its IP address, and returns a reference to the entry. On the next line, we use LF::Host::Entry's autoloader to add an attribute called "line_num" to the entry -- this indicates the line number within the file at which the entry was originally read. (This enables us to later save out the entries in the order they were read in; Perl's hashes, which we are marshalling the entries in, are unordered, so without the line number field there's no way of keeping track which way we saved everything.) add_entry() itself is a wrapper around LF::Hosts::Entry->new(), passing a line read from /etc/hosts (by new()) to it. The main point of note is that we do some error-checking. Obviously, we can't have two entries in /etc/hosts for a single IP address; so we look to see if there's a duplicate, and call die (exiting with an error message) if we spot one. die() doesn't automatically kill a program, if we execute that method (add_entry()) in an eval block; we'll discuss error trapping and eval() in the next tutorial. For the time being, just take it as an indicator that duplicated IP addresses are a bad thing. After the error check, add_entry() hangs the LF::Hosts::Entry off the LF::Hosts object in one of three places. Address records are stored directly in the object, keyed by their IP address; "comment" or "other" records are pushed onto arrays pointed to by the special keys "comment_lines" and "other_lines". (Because comments don't have IP addresses, we can't store them in the usual place.) re_index() maintains an index of hostnames in LF::Hosts; this has the special key "alias_index", and is a hash of hostnames (the associated value of each being a reference to the LF::Hosts::Entry that it is defined in). Note that when we call re_infex() it dies if it discovers that the same alias (hostname) is used for two different IP addresses -- another fatal inconsistency in an /etc/hosts file. Useful bits of Perl idiom: foreach my $entry (keys %$self) { next if (ref $self->{$entry} ne "LF::Hosts::Entry"); foreach my $name (@{ $self->{$entry}->alias() }) { Foreach is a loop construct; it sets $entry (which is limited lexically to the scope of the loop) to each entry in the list (keys %$self) in turn. That is, $entry is set to each hash key in the LF::Hosts object in turn. We want to index the LF::Hosts::Entry objects, but skip those special lists of comments or other records -- so in the next line we call ref(): next if (ref $self->{$entry} ne "LF::Hosts::Entry"); ref() returns the type of an object -- in this case, it is used to see if $self->{$entry} is an LF::Hosts::Entry, and if it isn't, we skip to the next one. delete_entry() is used to delete an IP entry. (No, it doesn't cope with "other" or "comment" lines. Feel free to add that capability!) It deletes the reference to the LF::Host::Entry from the LF::Hosts marshall, then deletes all the surviving references to it in the alias_index hash, at which point it will be garbage-collected. pretty_print() is the most interesting of the remaining methods; it is used by LF::Hosts to tell each LF::Hosts::Entry in turn to print itself (in a format suitable to splatting back into /etc/hosts). However, it does this carefully -- first, by compiling a hash, keyed on line number, of all the LF::Hosts::Entry objects with line numbers, then by building a list of the (presumably new) unnumbered entries, and then printing them in turn. Note that it has to go through the address records, then loop over both the "comment" and "other" lists to compile a list of all the objects. The special LF::Hosts attribute date_stamp, if set to something non-zero (by the method date_stamp()) tells it to insert the current date as a comment above the unordered, new, fields. Finally we come to save(). save() merely opens a hostfile for output, then calls pretty_print() and sends the output to the file. Warning: neither new() or save() attempt to do any file locking. That's a topic for another tutorial; let's just say that it would be a bad idea to use these modules in a production environment without locking the hosts file before writing to it! (Otherwise another process using these toolboxes could overwrite the file, resulting in lost changes.) BOXOUT: Opening, reading, writing, and closing files Perl uses the idea of a file handle (a core data type) to keep track of input and output. The standard file handles STDIN, STDOUT, and STDERR are open by default in any Perl program -- these are the standard input (read from the command line or any files specified as parameters to Perl), the standard output (which goes to your terminal), and the standard error (error messages, which by default go to your terminal but not in the same stream as STDOUT). You use open() to associate a new file handle with a file (or process, or pipeline): open HANDLE, "" (write to) or ">+" (append to). (In fact, you really need to read what "Programming Perl" has to say about open() -- it has many subtle variations.) If the open() call fails, an error message will show up in the special variable $!. You get rid of a file handle by close()'ing it: close HANDLE; In between, you can read from a file handle; enclose it in angle-brackets to read in a line at a time, or use read() to read a specified number of bytes from HANDLE: read(HANDLE, $next_1024_bytes, 1024) my $nextline = () (Best not to mix and match these modes of access!) And you can write to a file using print: print HANDLE "Hello, file!\n"; The semantics of file handles are rather ugly, and date to Perl's early C and shell heritage. In large projects it makes a lot of sense to use the modules IO::File and IO::Socket; these inherit from IO::Handle, which blesses a file handle into its class and supplies access methods, so we can say things like: my $file = new IO::File; $file->open(">logfile") or die "logfile: $!\n"; $file->print(dump_some_state_data()); $file->close(); Or even ... my $file = new IO::File; $file->open("load_state_table($file); $file->close(); (See? We passed a file handle to a subroutine, as a parameter.) END BOXOUT (opening ... closing files) BOXOUT: Compiler pragmas Compiler pragmas -- "pragmatic modules" -- are invoked with the command "use", which is more commonly used to load a perl module; pragmas change the behaviour of Perl when it runs. When you invoke a pragma, rather than loading a module it causes the Perl compiler to adjust an internal setting. For example, "use integer" forces the compiler to use integer arithmetic rather than double- precision floating point. The "strict" pragma causes a variety of ultra- strict compilation-stage checks to be imposed, a bit like the "-w" command line flag. When "strict vars" is in effect, all variables must be explicitly declared to belong to a namespace -- you can't arbitrarily create a global variable by using its name. Instead of simply assigning a value to $fred, you have to refer to $LF::Hosts::Entry::fred. When "strict refs" is in effect, soft references are banned. (A "soft" or symbolic reference lets you refer to some entity by name. For example, if we assign the string "my_func" to $thing, we can call the subroutine &my_func() by referring to &$thing -- a powerful, but dangerous capability.) In general, it is a good idea to "use strict" at the beginning of any code you write that is going to be re-used elsewhere; it turns the Perl compiler into a pedantic nit-picker, but it massively reduces the risk of accidentally polluting the name space of another program when you re-use the code. There are a number of other pragmas available. "use subs qw(foo bar)" predeclares foo and bar as subroutine names, allowing you to use them without parentheses or ampersand (i.e. as "foo", instead of "foo()" or "&foo") before their definition is parsed. use lib "/opt/perl5/extras" adds the path /opt/perl5/extras to the @INC path that is searched for loadable modules. And use bytes forces Perl to treat character data as 8-bit bytes rather than Unicode UTF-8 characters. END BOXOUT (compiler pragmas) BOXOUT: Inheritance and automatic methods When we invoke a method on an object that belongs to a given class (say, MyClass), what happens if that method isn't defined by name in the package MyClass? Perl maintains a special array called @ISA within each name space. @ISA contains a list of packages (classes); these are the classes that the current one is a child of. For example, if package MyClass has an @ISA array that contains MyParent, MyClass is a child of MyParent. When a method invocation within MyClass fails, it's booted up a level; Perl searches MyParent for a method with the right name. This carries on until all the classes in @ISA have been searched; Perl supports multiple inheritance (the idea that a child class can have two or more parents that supply it with methods). Once the search for a method in @ISA fails, the current class is searched for an AUTOLOAD method. If that fails, it hunts for AUTOLOAD methods in the parent classes. , first for explicitly named methods and then for AUTOLOAD methods. An AUTOLOAD method is a special way of creating methods on the fly within a class. If we invoke a non-existent method in LF::Hosts::Entry (which doesn't have any parent classes defined in @ISA), Perl doesn't throw an immediate runtime error. Rather, it exhausts @ISA then looks for a subroutine called AUTOLOAD and tells it what the name of the method we're looking for is. (This is placed in the $AUTOLOAD variable.) The AUTOLOAD method in LF::Hosts::Entry looks in the current object for a hash key of the given name, and either sets or returns its value. AUTOLOAD therefore creates new methods depending on what we call it with. (Watch out for typos in your method calls, or you'll end up sticking data somewhere unexpected!) This AUTOLOAD method isn't doing anything other than provide syntactic sugar (by giving our LF::Hosts::Entry methods called "type" and "ip_address" and so on, which we can export to any other program that needs to set or get these attributes without knowing what goes on in the bowels of an LF::Hosts::Entry). But we can use AUTOLOAD to do sophisticated type checking, catch errors, or filter the method names we're prepared to use. (We'll examine the uses of AUTOLOAD in detail in a future article.) Note that in addition to the classes listed in @ISA, Perl searches a special class called UNIVERSAL; this is tacked onto the end of @ISA as a last-ditch attempt before searching for AUTOLOAD methods or failing completely. END BOXOUT (inheritance and automatic methods)