LINUX FORMAT PERL COLUMN


STRAP: Being objective


SUBHEADING: Object-orientation and Perl


Object-oriented programming is not new; it's been around since the late
1960's, although it only really caught on in the mid to late 1980's, as a
response to the increasing complexity of software. If you've written any
program in any language that was more than a hundred lines long, you'll
appreciate the need to wrap chunks of code up as separate subroutines. If
you've written a program that was more than a thousand lines long, you'll
probably have moved a bunch of utility subroutines out to a separate
library file, so that they don't confuse the flow of control of the main
program. But in really large projects, the proliferation of subroutines
and data types that they work on rapidly becomes uncontrolable: which
is where object orientation comes in. Object orientation is essentially
a way of looking at software that allows us to fence off chunks of a
project into "objects" (packages containing source code and the data
structures the source code works on), with well-defined interfaces,
so that we can concentrate on the big picture.


In its early days, Perl didn't do object orientation. If you were a
masochist you could emulate it using namespaces, just as you can emulate
object orientation in C (the Motif APIs require you to do just that!),
but that was about the limit. Perl 5 introduced some new keywords and
constructs that give Perl a very flexible model for doing object-oriented
programming, and that's what we're going to look at this month.


SUBHEADING: What is object-oriented programming?


Most programming work involves messing around with data structures --
collections of variables linked in weird and wonderful ways. In object
oriented design (and programming) we try to keep our data structures
parcelled together with the subroutines that create, modify, access, or
destroy them. Access to a data structure is provided via some subroutines
which are globally visible, but what happens to the internals of an
object is a secret from the rest of the program, as is the internal
structure of the object. There may also be some private subroutines
that the rest of the software doesn't know about -- these are used
by the public routines, for their own purposes.


In general, object orientation relies on a handful of properties:
information hiding (data is only visible inside the object's own code),
inheritance (we can define a new type of object, incorporating an existing
one but adding new data and subroutines to access it), modularity (information
and subroutines related to a class of object are bundled together).


This month, and continuing next month, we're going to take a look at
a concrete example: a Perl module for editing the /etc/hosts file on a
Linux system.


/etc/hosts is a file that matches hostnames to internet addresses for 
computers on a network. (DNS, the domain name system, replaced the hosts
file for computers connected to the global internet, because it's a 
distributed database: a hosts file for the entire net would be gigantic
and require very frequent updates. However, we still use /etc/hosts
files for small office and home networks because it's convenient and 
easy to set up.) We might want to write a Perl script to read, update,
or modify an /etc/hosts file if we're planning a system administration
framework for a small local network.


Within /etc/hosts, we can write comments; they begin with a hash '#' symbol
and continue to the end of the current line. We can also include a host
record. A host record consists of an IP address, followed by a fully-qualified
domain name for the host, then zero or more aliases (such as the hostname
with no trailing domain information). Fields are separated by whitespace
(spaces or tabs), and each record is terminated by a comment character or
a newline.


Within our (hypothetical) system administration tool, we might want to
hive off maintenance of /etc/hosts entries from other functions (say,
simultaneously updating entries in BIND's database of hosts). Typical
tasks include looking to see if a hostname has an IP address in the file,
or if an IP address has an associated name: also, deleting a host, adding
a new host, adding a new alias to a host, and changing the IP address of
a host. We may also want to future-proof ourselves: IPv6 (the next version
of the TCP/IP networking protocol) adds a new syntax for defining classes
of networks.


It's fairly clear that the core entity we're going to work with, the
object, corresponds to a file. We could pick different objects to work
with -- say, individual entries in an /etc/hosts file -- but we'd still
need an object corresponding to the hosts file, and its contents are
simple enough that we don't need to modularize it further. On the other
hand, we don't want to try to use a single class to update /etc/hosts,
BIND configuration, SMB configuration, and so on in one place -- that
would be excessively complex.


We want to be able to create a new hosts object by reading in the 
/etc/hosts file and populating some sort of internal data structure with
its contents. We want to be able to tell our object to update the version
on disk (saving its contents). We want to be able to look up the names
for an IP address, and vice versa. We want to be able to create a record
for a given IP address, change its associated aliases, or delete it.
Actually, this lot sounds like we *do* need another class, so we're
going to create one: a class of objects that consists of records in a 
hosts file. Our main program will never see this class, but it'll make
life easier inside the main class.


So what we're going to do is this:


* Write a class (let's call it LF::Hosts) that gives us a set of data
  structures and subroutines for messing around with an /etc/hosts file.


* Write a class (called LF::Hosts::Entry), to be used by
  LF::Hosts, that gives us data structures and subroutines for
  creating/querying/editing/deleting a host record.


Our main program will then be able to say something like this:


  my $hosts = new LF::Hosts or die "Could not open /etc/hosts file: $!\n!";
  
  # get names associated with an IP address
  my @aliases = $hosts->identify("192.168.1.10");


  # and vice versa
  my $ip_addr = $hosts->identify("mike.linuxformat.org");


  # print comments associated with host $ip_addr
  print $hosts->comments($ip_addr);


  # print comments associated with no particular host
  print $hosts->comments();
  
  # insert a new host 
  $hosts->add("192.168.1.14", "bob.linuxformat.org", "bob");


  # modify (rename) an existing host from "bob" to "patricia"
  $hosts->edit("192.168.1.14", { "bob" => "patricia",
                                 "bob.linuxformat.org" => "patricia.linuxformat.org" });


  # delete a host
  $hosts->delete("192.168.1.100");
  
  # finally, save the file


  $hosts->commit();


SUBHEADING: Creating Perl objects


In Perl, a class of objects is defined by a package (that is, a set
of Perl subroutines that come with their own namespace, usually in a
separate file). All data associated with the class is stored in the
class's namespace, or in data structures hanging off a reference.


We usually put packages in separate files (with the suffix .pm); when
our program needs to use a class, we add the "use Classname" directive
to tell Perl to load the appropriate package while it compiles the 
program. For example:


   use MyPackage;


Tells Perl that it should locate the file containing MyPackage (i.e.,
a file called MyPackage.pm, in one of the directories listed in the
special array @INC) and compile it. 


(Note that "use" is executed at startup, before the script begins to run;
the similar "require" directive, a hangover from Perl 4, is executed
whenever the Perl script flow of control gets around to executing
that line.)


Unlike a normal data structure (such as an anonymous hash), an object
knows what class it belongs to: instead of being a HASHREF or an ARRAYREF,
it belongs to LF::Hosts, or LF::Hosts::Entry, or something. We tell an
object what class (package) it belongs to using the bless() command. For
example:


   bless $variable, Fred;


This line tells $variable that it is a Fred (whatever a Fred is).


A side-effect of blessing a variable, so that it belongs to a specific
package like Fred, is that if we then call a subroutine ("method"
in object-oriented jargon) called do_something() on the blessed
variable, it will look first for a subroutine in its own class, called
&Fred::do_something(). If no such method exists, it looks for other classes
listed in a special array called @ISA (literally, "is a") before seeing if
there's a standard subroutine of that name.


We use a special shorthand for calling methods (subroutines associated
with an object):


   $thing->do_something()


Means "run the subroutine do_something(), from whatever package $thing
belongs to, passing it $thing as its first parameter."


In general, a class contains two types of method (subroutine): class methods
and instance methods. A class method is one that operates on all objects 
defined as belonging to the class -- for example, we might use one to tell
our LF::eosts class that our systems all put the file /etc/hosts somewhere
unusual. An instance method is one that operates on a single object: for 
example, to get or set its internal state.


We almost always need one specific type of instance method called a
constructor. A constructor is called like a class method (i.e., by name,
rather than by dereferencing an existing object), and it returns a 
reference to a new object. By convention, Perl classes usually call
their constructors "new".


In the case of LF::Hosts, calling "new" should return a reference to a
data structure that embodies an /etc/hosts file, which has been blessed
so that it "knows" it is a member of the LF::Hosts class (and "knows"
what subroutines apply to it). In the case of LF::Hosts::Entry, calling
"new" should return a reference to the next entry in the parent object's
hosts file.


Like this:


package LF::Hosts;


$LF::Hosts::HOSTFILE = "/etc/hosts";


sub new {
    my $self = {} ;           # create a reference to an empty hash
    bless $self, $LF::Hosts;  # tell $self that it is an LF::Hosts object


    # now we open the hostsfile, and generate a bunch of 
    # LF::Hosts::Entry objects, each of which is added to $self 
    # as a hash key/value pair


    open (FH, "<$LF::Hosts::HOSTFILE") or die: "$!\n";
    while (! eof(FH) ) {
        my @line = LF::Hosts::Entry->new(FH);
        if ($line[0] eq "COMMENT") {
            push(@{$self->{COMMENT}}, $line[1]);
        } else {
            $self->{$line[0]} = $line[1];
        }
    }
    close FH;
    return $self;
}


This is the constructor method for LF::Hosts; it returns a reference
(called $self within the subroutine) to a blessed object, which is actually
an LF::Hosts object. The object is a hashref containing various key/value
pairs; each value is a reference (pointing to either an array of comment
lines, or to an LF::Hosts::Entry object).


A point to note: LF::Hosts::Entry's new() method returns an anonymous array
containing two items: a key and an LF::Hosts::Entry object. The key is either
the string COMMENT, or an incrementing number (stored in a class variable 
maintained by LF::Hosts::Entry) that is unique for each instance. The line:


   push(@{$self->{COMMENT}}, $line[1]);


shows that we can push (append to an array) a variable (in this case, the
object referenced by $line[1]) into an anonymous array that hangs off an
object. $self->{COMMENT} is an array reference; we use @{ $self->{comment} } 
to tell Perl to treat it as an array.


Next month, we'll see how the instance methods are written, write a
child class (LF::Hosts::Entry), see how Perl's POD documentation system
works, and discuss some of the more interesting applications of OOP in
Perl.


BOXOUT: References


A reference is what Perl uses instead of pointers. You haven't met
pointers?  Don't worry ...


Computers organise data in their memory by putting each byte (or word)
into a separate cell. Each cell has a unique numerical address, just like
the position of an element in an array. Languages like C or Pascal let
us refer to data we've stored in memory either by giving it a variable
name, or by specifying the address of the memory cell it is stored
at. (Actually, all a variable name is is a key in a special table of
memory adddresses called a symbol table: when you refer to a variable
called fred in Pascal or C, the compiler generates code that checks
the symbol table to find out the address where fred's data is stored,
then fetches it.)


A pointer is simply a raw memory address. We can grab a pointer to the
data associated with a variable, and stash it in another variable (or 
in part of a variable -- say, inside an element of an array.)


Perl references aren't pointers to physical chunks of your computer's
memory; they're merely an internal handle that the current Perl process
uses to store or retrieve a bit of information.  But they act like
pointers. We can obtain a reference to a variable by prefixing the
variable's name with a backslash, and store references in any scalar:


  my @an_array = ("red ", "blue ", "green ");
  my $reference_to_an_array = \@an_array;


$my_reference_to_an_array doesn't hold an actual array of data -- but
it holds a reference which points to the chunk of memory where the array is
stored.


If we print a reference, it doesn't show us anything useful:


  print $reference_to_an_array;


  ARRAY(0x80f8f28)


But we can dereference the contents of $reference_to_an_array (getting
back to the original contents) by prefixing the scalar containing the
reference with the type it belongs to:


  print @$reference_to_an_array;


  red blue green


We can also use the ref() command to tell us what type of data is 
referenced:


  print ref($reference_to_an_array);
  ARRAY


(Valid things that ref() can return include CODE, HASH, SCALAR, ARRAY. It
returns undef (zero, false) if the object you call it on isn't a reference.)


END BOXOUT (References)


BOXOUT: Complex data structures


We can create data structures by storing references in arrays or hashes. For example:


  my @colours = (qw(red blue green));
  my @widgets = (qw(screw nail staple));
  my @colourful_widgets = (\@colours, \@widgets);


This is cumbersome, so we can employ an anonymous array constructor. Instead
of using brackets to create a list, we use square brackets to return a
reference to an anonymous (unnamed) array. An anonymous array is just an
array without a name -- because we've saved a reference to it somewhere, it
continues to exist and we can get at data stored in it. Like this:


  my @colourful_widgets = ( [ qw(red blue green) ],
                            [ qw(screw nail staple) ] );


The array @colourful_widgets is not a two-dimensional array; it's a
one-dimensional array containing references to other arrays. But we can
use it as a two-dimensional array:


  $color = $colourful_widgets->[0]->[1];


  # $color contains "blue"


  $thing = $colourful_widgets->[1]->[2];


  # $thing contains "staple"


The arrows are inherited from C's syntax for dereferencing pointers, and have
pretty much the same meaning. Note that unlike C, they're optional (when
dealing with array subscripts, as above), so we can refer to
$colourful_widgets[0][2], just as if it's a true multidimensional array.


In addition to constructing anonymous arrays using [ ... ], we can build
anonymous hashes using { ... }. For example:


  my $properties = { "food" => "cheese",
                     "colour" => "blue",
                     "smell" => "strong" };


Which is equivalent to:


   %properties = ("food" => "cheese",
                  "colour" => "blue",
                  "smell" => "strong" );
   my $properties = \%properties;


A common use of anonymous hashes in Perl is to provide variables with multiple
named fields -- like records in Pascal or structs in C. For example, we can
get the smell of our object by saying:


  print $properties->{smell};


In general, Perl lets us return a reference to most items -- even subroutines.
For example:


  sub fred {
       # do something
  }
  my $subref = \&fred();


Perl provides a powerful module for looking at complex data structures
consisting of nested arrays and hashes linked by references: Data::Dumper.
You use it like this:


  #!/usr/bin/perl
  use Data::Dumper


  # do something to create a complex data structure pointed to by a 
  # scalar called "$fred" 


  # now we want to inspect the structure of whatever $fred points to ...


  print Dumper $fred;


prints something like this:


$VAR1 = {
          'colour' => [
                        'red',
                        'blue',
                        'green'
                      ],
          'type' => [
                      'screw',
                      'nail',
                      'staple'
                    ]
        };


Curly braces denote an anonymous hash; square brackets indicate an
array. So what we have here is $VAR1 (also known as $fred), pointing to
an anonymous hash with two keys, 'colour' and 'type'. Each key has
an associated value -- which is a reference to an array.


END BOXOUT (Complex data structures)


BOXOUT: Subroutines and parameter passing


Like all serious programming languages, Perl lets us define subroutines
(equivalent to C functions or Pascal functions). We do it like this:


  sub my_subroutine {
      # code goes here


      return $some_return_value;
  }


When we invoke my_subroutine(), $some_return_value is returned to the calling 
context. We can invoke it either by prefixing it's name with an ampersand
(like &my_subroutine), or following it with brackets (C style).


If you don't explicitly return a value from a subroutine, it returns 
the result of the last expression to be evaluated within its scope. So:


  sub return_true {
     1;
  }


always returns "1" (which is not false, by definition).


We can return more than one scalar value; in this case, whatever receives
the returned values must be able to cope with a list, and identify
whatever's been returned appropriately. If we try returning a hash, though,
it will be "flattened" into a list -- and if we try returning a hash and an
array, the results will be a messy collision. So  if you want to emit
complex structures from a subroutine, the best policy is to return a list
of scalars containing references:


  sub returns_complex_stuff {
      # code goes here


      return ( \%my_internal_hash, \@some_array, $an_object);
  }


  # main program, now:


  ($my_returned_hash, $my_array, $my_object) = returns_complex_stuff();


A similar rule applies to getting parameters into a subroutine. We can
pass as many scalars as we like to a subroutine; but from the point of 
view of the subroutine, they all get squished into a special array, called
@_. So if we want to push a mix of different variables into a subroutine
it's best to pass them as references:


   sub complex_sub {
       $incoming_array = shift @_;
       $incoming_object = shift @_;
       # do something or other and return
   }


   $result = complex_sub(\@array_to_process, $object_ref);


Note that "shift" grabs the leftmost element of an array and returns it,
shortening the array by one element. Along with the corresponding commands
"unshift" (shove an item onto the "left" of an array), and push/pop (which
operate on the other end of an array) we can implement stacks, queues, and
a whole load of other useful structures using ordinary arrays.


END BOXOUT (Subroutines and parameter passing)


BOXOUT Variable scope


In Perl, there are three mechanisms for defining the scope of a variable.


First, there's the namespace. If you refer to a variable like $thing, it
instantaneously springs into existence -- within the current namespace. If
you haven't used a 'package' command to specify a different package -- each
package comes with its own namespace -- this will be the namespace "main";
so your variable will actually be $main::thing.


A variable created in this way is global, which is a nuisance; if you
want something called $thing to be local to a subroutine, don't just use it
this way.


Tip: you can make Perl throw a runtime error when you do this by using
the "use strict" compiler pragma: put a line like:


  use strict;


at the top of your script, and Perl will refuse to run it unless all variables
are explicitly declared within a namespace, or are lexical (see below).


You can locally override a global variable by using the "local" command.
For example, if our script has a $thing floating around, we can say:


  sub mysub {
     local $thing;


Thereafter, within the subroutine mysub() $thing is treated as an entirely 
different variable; the global version is rendered invisible. But when we
leave the scope of mysub(), the global version of $thing reappears, and
the copy inside the subroutine vanishes mysteriously. (This is because 
the command "local" causes Perl to stash the designated variable on a stack,
and restore the old value of it upon leaving the enclosing block of code.)


Local scope is sometimes handy, but what you probably want are true local
variables, the way a language like C or Pascal provides them. To declare
a lexically scoped variable -- one that exists only within the scope of
the lexical block of code enclosing the declaration, use "my":


  sub mysub {
     my $thing = shift @_;
     # and so on
  }


The lexically scoped $thing doesn't exist within a symbol table; it's stored
somewhere else entirely. It's invisible outside of mysub(), unless you obtain
a reference to it and return the reference (in which case you can do neat
things with it). This gives us true control over variable scope, like a real
grown-up programming language. And in general, unless you want your variables
to be global, you should remember to "use strict" and always declare your
variables lexically (and initialise them to a sensible value!).


END BOXOUT (Variable scope)