Starting programming in Perl


[ Site Index] [ Attic Index] [ Perl/Tk Index] [ Feedback ]


[ Intro ] [ Chap 1 ] [ Chap 2 ] [ Chap 3 ] [ Chap 4 ]


Programming in Perl

In the preceding chapter we learned about the difference between Perl, the language, and perl, the application. We took a long detour around the language, choosing instead to concentrate on getting a working system installed. Now it's time to go back and examine Perl itself: what is it, and how can we use it?

In this chapter we will not be looking at Tk; before we can write a Perl program that uses Tk, there is a steep learning curve, and right now we're still standing at the bottom of it. Instead, we're going to start with the language basics: some fundamental elements of the language, how to start writing a Perl program, how perl works, and what constitutes a program,


A first program

Perl is a programming language; like all such languages it follows a much more constrained grammar than any human culture would be willing to put up with. Unlike most other languages, Perl was designed by a linguist: and Larry Wall has said (frequently) that one of his priorities was to bear in mind that computer languages are written for the convenience of human programmers. So Perl isn't quite as rigid as a more traditional language such as C or Pascal. (Of course, the old saw about donating a sufficiency of rope to facilitate a hanging applies: Perl's very flexibility can become a problem.)

Anyway, here's a first Perl program:

 print "hello world!\n"; 

Type this line into a text file (using an editor such as vi or emacs). Save the text file as ``myprog.pl''. Then type:

 perl myprog.pl

It will print:

 hello world!  

The meaning of the word ``print'' should be self-explanatory. But what are the quote marks there for, and what is the funny \n"; doing at the end of the line?

Last things last. The semi-colon is important; it identifies the preceding words (since the beginning of the file, or the last newline, or the last semi-colon) as ending a statement. The perl interpreter executes statements one at a time. In effect, it indicates the end of the print command: anything after the semi-colon doesn't belong to ``print'' and is none of its business.

Perl is picky about semi-colons for the same reason that you (or any other reader) are picky about periods and commas in the book that you're reading -- if they're not there it makes it much harder to establish the context in which you're meant to interpret the text that you're reading. All Perl commands should be ended with a semi- colon, except under special circumstances (such as a group of commands, which are indicated by a different delimiter.)

print is a Perl command. It takes a series of arguments (chunks of perl code that follow it, up to the next semi-colon or equivalent separator), and prints them. As it prints them, it carries out some basic interpolation operations on them. For example, if it is reading text enclosed in double-quotes, it replaces any occurence of a \n in the text with a newline.

To be a bit more precise, the backslash '\' indicates that the character following it is escaped; rather than being a literal 'n', it has the special meaning 'newline'. There are other escape characters you can embed in double-quoted strings and that Perl will replace at run-time; we'll meet them later.

This program doesn't do much: but it demonstrates a central point -- that it's possible to write trivial Perl scripts very rapidly.

The perl program is an interpreter: it reads a file full of statements, and interprets them, one at a time. When you type 'perl myprog.pl', you are telling perl to open the file myprog.pl, read the contents, and treat them as a series of statements to execute. See ``How perl works'' for a deeper explanation of this process.

NOTE: Making programs that run themselves
The program we've seen above is interpreted (read, parsed, and executed) by perl; we tell perl the name of the program to run explicitly, on the command line.

It is usually more convenient to just type the name of a program and have it run itself. We can accomplish this on most modern UNIX systems using the ``hash-bang'' notation: at the beginning of the first line of the program, insert the characters #! , then follow them immediately with the absolute pathname of the perl interpreter. So if perl lives in /usr/bin/perl, the first line of the program should be:

  #!/usr/bin/perl 

Before a program can be executed by the UNIX shell, it needs to be tagged as 'executable'. To set the executable permission on a program, use the UNIX command chmod; for example:

  chmod a+x myprog.pl

Sets the executable permission for all users on myprog.pl. (That is, it is flagged as being executable by anyone.)

Once you've done this, you can type ./myprog.pl to run the program. The first line points to the perl intereter; the shell reads this, runs the interpreter, and passes the rest of the file to it as input.


Perl language fundamentals.

The ``Hello, world'' program serves to demonstrate a point; but beyond that, it's pretty useless. Suppose we want to say something else? What then? We need at least a pidgin version of this tongue -- enough to express ourselves. So in this section we're going to look at some basic components: a handful of extremely common nouns, verbs, and adjectives.

Like all other computer languages, Perl provides facilities for storing arbitrary data by name; a piece of information it knows about and can change is called a variable. Variables usually have names, which are used in conjunction with commands to identify which variable to apply the command to. You can think of a variable name as being similar to a noun, and a command as being equivalent to a verb.


Assignment

Here's how we give perl a new noun to think about:

 $fred = "Someone I know";

$fred is the name of a variable. The leading dollar sign '$' means that it's a singular item -- in Perl parlance, a scalar -- named 'fred'. In this statement, we are indicating that we are assigning the value ``Someone I know'' to the scalar variable $fred.

The equal sign '=' is an operator: it indicates a connection between the entity on its left ($fred) and the entity on its right (``Someone I know''). In this case, it simply means that $fred will subsequently contain whatever is to the right of the equal sign. We sometimes refer to the item on the left of an operator as an 'lvalue'; all this means is that it's the destination of some kind of operation.


String literals

The quoted text ``Someone I know'' is known as a string literal, because it consists of a string of text presented to the interpreter 'as-is'. (Yes, escape codes embedded in the string literal may be interpreted during an assignment operation.) We're not limited to putting just string literals on the right of an operator:

  $fred   = "howdy ";     # This is a comment; perl ignores it
  $joe    = "world";      # comments start with a # sign
  $output = $fred . $joe; # and extend to the end of a line

The ``.'' in the last line is the string concatenation operator. It expects to see two strings or scalar variables, one to either side of it; it returns a new string consisting of their concatenated contents. This expression is then assigned to $output. If we follow this line with another:

  print $output; 

We should see our program print:

  howdy world

The print command takes one or more expressions and simply prints them to the standard output device (usually the screen or terminal window you're working in). It can print string literals, scalars, or the output of some expression. For example:

  print $fred . $joe;

prints the concatenation of $fred and $joe's contents.

However, in Perl (as usual) There Is More Than One Way To Do It. We can equally well say:

  print "$fred $joe";

That's because the dollar sign and anything that follows it is subject to variable substitution in double-quoted strings. When you write ``foo $bar quux'' in a Perl script, perl understands this to mean foo, followed by the contents of the scalar $bar, followed by quux.

NOTE
Variable names occuring in double-quoted strings are subject to variable substitution: they will be replaced by the entity they name.

Likewise, escape sequences like \n and \t in double-quoted strings will be replaced by the appropriate special character.


Arithmetic operators

What if we want to do some arithmetic?

  $value_1  = 4;
  $value_2  = 5;
  print $value_1 * $value_2;      # prints 20

The asterisk '*' is another operator. As with most computer languages, it means 'multiply'. Perl is smart enough to realize that if you're talking about multiplying two variables, and they both contain numbers, then you want to multiply two numbers. Unlike C, Pascal, or other strongly-typed languages, you don't need to declare the type of a variable explicitly. (Or rather, Perl has a different notion of variable typing to those languages.)

What happens if you get it wrong, though? As in:

  $value_1 = 4;
  $value_2 = "cheese";
  print $value_1 * $value_2;

Perl prints:

  0

That's right. It doesn't explode, fall over and wave its legs in the air, or otherwise complain. But because you can't multiply a number by a string, the operator fails. When an operator fails, it evaluates to the special value ``false''. And in a numeric context, false is equivalent to zero. (Hence the comment in the earlier example, indicating the result you expect your program to give. Comments, introduced by a '#' sign, come in really handy -- make sure you use lots of them in your programs.)

NOTE: True or False?
Perl needs to be able to evaluate logical expressions as well as simple numerical ones. Logical expressions (AND, OR, NOT ...) require logical values; so Perl has a mechanism for representing the two necessary values (TRUE, FALSE).

This mechanism is modeled on that provided by C:

  • A value of 0 (numeric) or ``'' (empty string) is false.

  • Anything else is true.

If an operation fails, it usually returns a FALSE value; numerical operations return 0. If an operation succeeds, it returns something else (unless you're doing something special like subtracting a number from itself). So if a numeric operator ever returns 0, be suspicious -- unless the inputs you gave it are identical (for example, 2 - 2).


The if statement

We can see conditional behaviour more clearly if we write a short program to test it:

  #!/usr/local/bin/perl

  $value_1 = 4;
  $value_2 = "cheese";
  if ($value_1 * $value_2) {
      print "operation succeeded:";
      print ($value_1 * $value_2);
      print "\n";
  } else {
      print "operation failed\n";
  };  

Here we are looking at a new construct or three: the if statement, the list ``()'', and the block ``{}''. Let's take these in reverse order.

The curly brackets {} surround a block of statements. They indicate that the contents are some kind of functional unit; in the 'if' statement, they indicate the block of statements to be executed if (and only if) the conditional expression evaluates to TRUE.

The conditional expression in the short program above is surrounded in ordinary brackets. It's a list of subexpressions and operators; the contents are evaluated, and IF they evaluate to TRUE, the associated block of statements is executed; otherwise (ELSE) the second block is executed.

In this example, we're seeing if $value_1 * $value_2 evaluates to TRUE (non-zero) or FALSE (zero). If TRUE, then the first block (print ``operation succeeded\n'';) is executed. If FALSE, the second block is executed. If we drop a different expression into this program we can see if it returns (evaluates) TRUE or FALSE.

Of course, because ``cheese'' isn't numeric, this program prints out ``operation failed'' every time, until you chance $value_2 to something sensible.


Brackets

A note about the brackets in the if statement. Brackets () are used to group a list of items. In the if statement, we're effectively passing a list of lvalues and operators or expressions to the if command, which then munches on them and works out whether they evaluate as TRUE or FALSE.

Most perl commands that take one or more arguments can be written as either:

  command arg1, arg2, arg3, ...

or

  command(arg1, arg2, arg3);

For example, we can say either:

  print "Hello ", "there"

or:

  print ("Hello ", "there").

Many programmers prefer to use the brackets to indicate the logical scope of a command -- to make it clear where the arguments to a command begin and end. The brackets indicate that the contents are a list of items, and should be treated in list context; more about this later. For the time being, it's worth noting that the print command expects a list of arguments; you don't have to bracket them, but it will treat anything you feed it as a list.

Brackets are also used to indicate the precedence (priority) of operators with respect to one another. Operators in brackets are evaluated before operators outside brackets, as in conventional arithmetic. For example:

  $a = ($b * $c) + $d;

is not the same as:

  $a = $b * ($c + $d);

If we omit the brackets altogether, it's worth knowing that addition '+' has a higher precedence than multiplication '*'; additions are evaluated before multiplications in an unbracketed expression. Thus:

  $a = $b * $c + $d;

is equivalent to:

  $a = $b * ($c + $d);

There are numerous other arithmetic and string operators. Rather than describe them all in a long table here, we'll see them defined in use later on. You can find a complete list of them in ``Programming Perl 5''.


Logical conditions

The if loop depends on being able to evaluate whether or not its contents are true or false. It helps to have some additional relational operators, to test the relation between two or more variables:

Operator
Meaning

==
test for numerical equality: TRUE for $a == $b if $a and $b are the same number

!=
test for numerical inequality: TRUE for $a != $b if $a and $b differ

<
less than

<=
less than or equal to

>
greater than

>=
greater than or equal to

eq
test for string equality: TRUE for $a eq $b if $a and $b contain the same text

ne
test for string inequality: TRUE for $a ne $b if $a and $b differ

gt
string greater-than: TRUE for $a gt $b if $a is greater than $b (i.e. appears before $b in ASCII collating order)

ge
string greater-than-or-equal-to

lt
string less-than: TRUE for $a lt $b if $a is less than $b (i.e. appears before $b in ASCII collating order)

le
string less-than-or-equal-to.

So we can now write loops with conditions like this:

  if ($a eq "roast potato") {
      print "your choice of vegetable is $a\n";
  } else {
      if ($a eq "sprouts") {
          print "your choice of vegetable is $a\n";
      }
      if ($a eq "peas") {
          print "your choice of vegetable is $a\n";
      }
      #
      # and so on ...
      #
  } else {
      print "That's not a vegetable I know!\n";
  }

However, it saves a lot of space to be able to link a number of relational operators logically into a single statement. To that end, we use the logical operators:

Operator
Meaning

&&
Logical-AND; true for (a) && (b) if both expression a and expression b are true. Returns the last value evaluated.

||
Logical-OR; true for (a) || (b) if either one of expression a and expression b are true. Returns the last value evaluated.

!
Logical-NOT; reverses the truth value of whatever it is applied to: for example, if (a) is true, !(a) is false. Returns the last value evaluated.

We can use these operators to re-write the above as something like:

  if ( ($a eq "roast potato") ||
       ($a eq "sprouts")      ||
       ($a eq "peas") ) {
      print "your choice of vegetable is $a\n";

  } else {
      print "That's not a vegetable I know!\n";
  }

Perl uses short-circuit evaluation to reduce the amount of work it needs to do in handling logical connectives like the ones above. Simply put, perl works from left to right through the expression: it returns after evaluating the fewest possible conditions. For example, consider the statement:

  if (("red" eq "green") && (1 == 1)) {

Before the body of the if statement can be executed, the condition must evaluate to true. For && to evaluate to true, the expressions on either side of it must both be true.

``red'' is not equal to ``green'', therefore, after getting as far as:

  ("red" eq "green") && 

Perl knows that it cannot return true under any circumstances, so it returns false immediately and doesn't bother evaluating (1 == 1).


Lists and arrays

Often, items in a list are separated by commas: and as the print command works on a list of values, we can replace the main block of the if statement above with:

  print "operation succeeded: ", ($value_1 * $value_2), "\n";

We can operate on an entire list in one go. A list can be represented by a different type of variable: an array. Arrays have names beginning with '@', indicating their plural nature. For example:

  @friends = ("Sue", "Joe", "Barbara" );

This defines an array @friends with three elements (components).

Arrays may contain any number of elements; they resize automatically as you add or delete elements from them. You can think of them as an endless series of adjacent pigeon-holes, ready to accept whatever information you want to stuff into them.

  +-------+-------+-------+-------+-----
  | item1 | item2 | item3 | item 4| ......
  +-------+-------+-------+-------+-----

You can refer to an individual element of an array by using its subscript. The subscript is its numerical position in the array. For the above array, we have:

                +-------+-------+---------+
  Value:        | Sue   |  Joe  | Barbara |
                +-------+-------+---------+
  Subscript:    0         1       2

To get the value of item #1 in the array @friends, we refer to $friends[1].

Note that the array subscripts start from zero, as with C. We can change the number from which we count subscripts; there is a special built-in scalar in perl ( $[ ) which contains the array subscript base. If we want to count our array subscripts from, say, 7, we can change $[ ; to get at the second element of @friends we then need to refer to $[ + 2 as our subscript:

  @friends = ("Sue", "Joe", "Barbara");
  print "Subscript base is $[ \n";
  print $friends[1];
  $[ = 7;
  print "Subscript base is now $[ \n";
  print $friends[($[ + 1 }];

What we get is this:

                +-------+-------+---------+
  Value:        | Sue   |  Joe  | Barbara |
                +-------+-------+---------+
  Subscript:    7        8       9

(Note the use of the brackets to cause $[+1 to be evaluated before it is used as an array subscript.)

There's another useful built-in variable we can use: $#arrayname, the subscript of the last element in an array called @arrayname:

                +-------+-------+---------+
  @friends:     | Sue   |  Joe  | Barbara |
                +-------+-------+---------+
  Subscript:      $[      $[ + 1   $#friends

We can add an element to an array by assigning to $arrayname[ ($#arrayname + 1) ], that is, by assigning a value to an array subscript immediately following the last element in the array. For example:

  $friends[ ($#friends + 1) ] = "Andy";

We can find out how many elements there are in an array by subtracting ( 1 - $[ ) from $#array. Or (and here is a big stumbling block for new Perl programmers!) we can simply evaluate the array in a scalar context:

  $number_of_friends = @friends;

NOTE: Context
Like a human language, Perl keeps track of the context of entities you are talking about. In other words, it knows the difference between singular and plural forms of a verb (or command).

When you apply an operator to an array element, such as $friends[1], you are referring to element 1 of @friends in a scalar context. That is: you're talking about $friends[1] as a single entity. You're talking about Joe.

But you can also refer to @friends[1]. In this case, we're referring to a small array consisting solely of element 1 of @friends; an array slice. You're talking about the set of people that includes Joe.

To make things clearer, you can select a range of subscripts to work on using the range operator '..'. For example, @friends[1..3] returns an array slice -- a list, or small array -- consisting of elements 1 to 3 of @friends.

As an example of how variable context works, consider these three statements:

  $people = $friends[1]; # assign element #1 of @friends to $people
  @people = @friends;    # assign array @friends to @people
  $people = @friends;    # assign scalar context of @friends 

The equal sign means quite different things in each of these cases.

In the first case, we are looking at a single (scalar) element of an array, and assigning it to another scalar.

In the second case, we are attempting to assign an array to another array. No problem: each element in @friends becomes an element in @people.

But in the third case, we're trying to assign a plural entity to a singular. That doesn't make sense in English, and it wouldn't make sense in Perl either except that Perl gives a special result when you evaluate an array in a scalar context: it returns the number of elements in the array (which is, of course, a scalar value).

This is actually a simple example of 'operator overloading'; the equal operator '=' means different things in different contexts (when dealing with different types of object). Most Perl statements, operators, and commands, do different things if you apply them in a different context. Some commands work only on arrays, others apply only to scalars -- but many will work on both, but do different things.

Here's another example of context problems. We can assign values to an array like this:

  @friends = ("Sue", "Joe", "Barbara" );

But if we mistakenly try:

  $friends = ("Sue", "Joe", "Barbara" );

Then $friends contains ``Barbara''. That's because of the comma operator inside the list. The comma effectively means ``is followed by''; Sue is followed by Joe, who is followed by Barbara. Each scalar is, in turn, assigned to $friends, and at the end, $friends contains the last scalar in the list. Because the list of values isn't an explicit array, we don't get the array-in-scalar-context behaviour (return the number of items in the array).

Array slices are subsets of an array. They are referred to in an array context, and usually indicated using the range '..' operator. For example, @foods[4..57] is a slice from the array @foods; it contains elements 4 to 57 inclusive.


Associative arrays (hashes)

Elements in an array are uniquely identified by the name of the array, and their subscript (position within the array). An associative array (or hash, as Perl programmers prefer to call them) is a special type of array; while you refer to an element in an array by its position, you refer to an element of a hash by name.

Hashes don't consist of ordered elements in neat pigeon holes. Instead, they consist of a set of names, with associated values. You can think of a hash as being a kind of dictionary -- each key has an associated value, or entry. Hashes are prefixed with a '%' instead of a '@'. And you initialize them using curly brackets and the => symbol to link names to their corresponding values. For example:

  %dogs = {
      "huge"     => "rotweiller",
      "tiny"     => "chihuahua",
      "medium"   => "labrador",
  };

The items on the left are names for the elements on the right of each ``=>'' sign. (Actually, we could replace ``=>'' with a comma, but it's clearer this way; it's easy to see that 'huge' points to 'rotweiller' in the hash %dogs).

You get at a value in a hash by refering to its name in a scalar context:

  print $dogs{tiny};

results in the output, ``chihuahua''.

You can add a new element to a hash by using a new name:

  $dogs{evil} = "yorkshire terrier";

The name ``evil'' is now indelibly associated with ``yorkshire terrier'' in %dogs.

You delete an element of a hash using the delete command:

  delete $dogs{tiny} ;

The items in a hash are not stored in any particular order. You can treat a hash as an array by interpreting it in an array context (for example, you can say @dogs = %dogs), but what you get in @dogs is an unordered series of name, value pairs. For example:

  @doglist = %dogs;
  foreach $item (@doglist) {
      print " ", $item, "| ";
  }
  print "\n";

may equally well result in:

  huge | rotweller | evil | yorkshire terrier | medium | labrador

or ...

  evil | yorkshire terrier | medium | labrador | huge | rotweller 

Hashes are extremely important in Perl, but we're not going to go into them in any depth in this chapter: we'll meet them again later on.


the foreach loop

The range operator isn't just used in arrays. We can use it for a range of values, for example in a loop:

  foreach $i (1 .. 45) {
      print "$i\n";
  }

The foreach command iteratively sets $i to each value in its list, then executes the associated block of statements. In this case, $i is set to each number in the range 1 to 45.

We can use foreach to iterate over an array:

  @friends = ("Sue", "Joe", "Barbara" );
  foreach $person (@friends) {
      print "I have a friend called $person\n";
  }

$person is set to each value in @friends, in turn.

This still works if we omit $person from the loop:

  @friends = ("Sue", "Joe", "Barbara" );
  foreach (@friends) {
      print "I have a friend called $_ \n";
  }

As with most things, Perl keeps track of context. It knows that foreach loops iteratively asign each element in an array to some scalar variable. When no scalar variable is specified, Perl assumes you mean the special, current variable: $_. $_ is whatever scalar you are currently talking about: you can think of it as the Perl equivalent of the English word ``it''.

In the statement foreach (@friends) { ..., we didn't specify a scalar to stick the values of @friends into; so Perl assumes we're talking about $_.

By default, most commands are applied to $_ if you don't specify a target variable. For example, if you have a line like:

  print;

The print command implicitly assumes that you want to print $_.

So the above loop can be rewritten as:

  @friends = ("Sue", "Joe", "Barbara" );
  foreach (@friends) {
      print ;
  }

(Although there won't be any spaces or newlines between the names of the friends).

NOTE Special variables
As you may have noticed, Perl goes in for these special variables in a big way. There are lots of them: see the Camel book (Programming Perl 5) for an exhaustive list.

In this chapter, we refer only to the commonest ones, although we'll use a lot more later on. Here they are:

$_
The default input and pattern-searching space. A nameless scalar variable; use it in Perl where you would say 'it' in English.

@_
The default array space. The equivalent of $_, for array operations that expect a parameter.

$[
The subscript of the first element in an array.

$0
The name of the file containing the Perl script being executed.


Reading data

The Pidgin Perl we've met so far is a crippled language; it has variables, a loop, a test construct, but no way of getting data into a program.

This is very un-Perl-like. Perl can read and write files containing text, data, and just about anything else you can cram into a computer. It reads and writes through a different type of variable, called a file handle.

A file handle is a name which is assigned to a specific resource (such as a file) by the open() command. Once we've opened a file, it is explicitly associated with a file handle until we close it. Prod the file handle, and it disgorges some data from the associated file. For example:

  open (INPUT, "<myfile.dat") || print "could not open myfile.dat!\n";
  @myfile = <INPUT>;
  close INPUT;

Let's examine the first line:

  open (INPUT, "myfile.dat") || print "could not open myfile.dat!\n";

open() requires two arguments (bits of information); the name of a file handle (which we can safely make up), and the file to associate with it. Here, we're telling open() to associate the handle INPUT with a file called myfile.dat.

open() normally returns true. If it doesn't return true, the second half of this line (following the logical-OR) is executed; it prints a warning message.

We can read from a filehandle by enclosing it in angle-brackets, like <INPUT>. If you read from a filehandle in a scalar context, it returns a single record (line); if you read from it in an array context, it returns its entire contents.

The line:

  @myfile = <INPUT>;

Reads from INPUT in an array context, stashing the entire contents of INPUT in @myfile, placing one record in each element of the array.

Finally, we use close to tell perl to forget about the file handle called INPUT.

Now we've got myfile.dat in an array, we can do things with it, like this:

  open (INPUT, "<myfile.dat") || print "could not open
  myfile.dat!\n";
  @myfile = <INPUT>;
  close INPUT;

  foreach $line (@myfile) {   # print the entire file
      print $line;
  }

  # now print the number of lines in @myfile

  print "\n lines: ", $#myfile + (1 - $[);  

  # now print the number of chars in @myfile

  foreach $line (@myfile) {   
      $chars += length($line); # $a += $b means: evaluate $b,
                               # then add its return value to $a
  }

  print "\n characters: ", $chars;

Note that length($var) returns the number of characters in $var.

NOTE: File access modes
You might not be able to open a file if you don't have read permission on it, or on the directory it is stored in; this is system-dependent. However, you can test the access permissions on a file under UNIX using the file test operators (introduced later).

More importantly, there are several different ways to open a file. You might want to write data to a file, read data from a file, append data to an existing file, or both read from and append to a file. You can also read data from a program!

To open a file for writing, clobbering any existing data stored in it, use the ``>'' prefix on the filename:

  open (FILEHANDLE, ">myfile.dat");

If you want to append to the end of an existing file, without destroying data already stored in it, use the ``>>'' prefix:

  open (FILEHANDLE, ">>myfile.dat");

In either case, you can write stuff into a file using print; just make FILEHANDLE the first item you pass to it. For example:

  print FILEHANDLE "Printing text to myfile.dat\n";

To open a file for reading (without permission to write to it), use the ``<'' prefix:

  open (FILEHANDLE, "<myfile.dat");

And to read data line-by-line from a program, use the ``|'' (pipe) suffix:

  open (FILEHANDLE, "/usr/local/bin/myprog|");

(If all these pipes and < and > symbols look familiar to you from UNIX shell syntax, then you're dead right; Perl has a lot in common with the syntax of the UNIX shells when it comes to input/output redirection.)

There are three standard file handles we need to remember:

STDIN
The standard input. By default, if you type something on the keyboard from which you started a perl script, it goes to the standard input of the program.

STDOUT
The standard output. By default, whatever terminal window or console you ran the program on. This is the file handle print() uses by default.

STDERR
The standard error. Usually the same as STDOUT, this is where warning messages go.

The filehandle read operator <> reads an item (scalar or array) either from STDIN, or from whatever filehandle you put in it (for example, FILEHANDLE). So:

  $line = <FILEHANDLE>

reads a line from FILEHANDLE, and:

  print FILEHANDLE $line

prints $line to FILEHANDLE.

The default behaviour of <>, if you don't specify a filehandle, is quite interesting. If you ran your perl program with a feed of data to its standard input, for example:

  myprog.pl <somedata.file

The contents of somedata.file appear on the standard input and can be read from <>.

If instead you specified a list of files, like:

  myprog.pl file1 file2 file3

each line in file1, file2, and file3 will come through <> in turn.


A non-trivial program

Given all the material we've covered so, far, we can begin to write some non-trivial programs. Here's a short command- line program (no Tk stuff yet!) that prompts you to enter the name of a file. When you enter it, it opens the file and tells you how many lines and characters there are in it:

  #!/usr/local/bin/perl

  print "Enter a filename -->";
  $target = <>;
  
  if (! open(INFILE, "<$target")) {
      # if we failed to open $target ...
      print "An error occured: I was unable",
            "to open $target for reading\n";
      # exit with an error status of 1
      exit 1; 
  } else {
      # we must have opened $target
      @file = <INFILE>;
      close INFILE;
      print "$target contains ",
            ($#file + (1 - $)),
            "lines\n";
      $chars = 0;                # keep track of chars in file
      foreach (@file) {          # foreach line in the file
          $chars += length($_);  # add length of current line
                                 # to $chars
      }
      print "$target contains $chars characters \n";
  }
  # all done; we can now exit perl, with a return value of 0
  
  exit 0;


How perl works

Perl is an interpreted language. Like all interpreters, it reads your program (written in the Perl language) when you run it, then executes each command in turn. However, unlike most interpreted languages, perl is fast. This is because it isn't a true interpreter.

A true interpreter scans a line of program, figures out which commands to execute in what order, carries them out, then moves on to the next line and repeats the process. If it runs across a loop which repeats 100 times, then each line in the loop will be re-interpreted 100 times.

In contrast, perl scans your program just once. In the process, it builds a complex data structure called a parse tree; a Perl program consists of nothing but blocks of statements or statements, and each of these is represented by a node in the tree. Variables are stored in a separate data structure called the symbol table: the symbol table is like an associative array, in that for each variable name it stores a value. This is quite similar to the way a compiler works, except that compilers reduce the symbol table and parse tree into a form that can be represented by the computer's low level machine code, then save the machine code as a separate executable program. For this reason, perl is sometimes known as an 'interpiler'.

(A perl compiler does exist: it takes the parse tree and dumps it into either a compiled bytecode format or C source code that can be fed to a C compiler to produce a machine language program. But for most purposes the perl compiler is unneccessary: it is invoked as the last stage in preparing a program, to get the final burst of speed out of it once all the development work is finished.)

When perl begins to execute a program, it doesn't need to re-interpret each line; it just runs through the parse tree, where each node triggers a different internal (compiled) subroutine within perl itself. The loop that repeats 100 times is only interpreted once; statements inside it may be executed up to 100 times, but a considerable amount of work is saved.

You can tell perl to stop and re-interpret a new chunk of code, using the eval() command: thus, perl can modify its own programmatic instructions, unlike a compiled language. However, this does tend to slow things down. And perl is fast; according to some accounts it is up to three orders of magnitude faster than Tcl, about five times as fast as Java, and perhaps a third as fast as optimized C (the ne plus ultra for speed in high-level programming languages).


Summary

We've covered a lot of ground in this chapter. We've introduced a minimal skeleton of the Perl programming language, containing the following items:

Variables
        scalars
        arrays
        associative arrays
        file handles

Flow-of-control statements
        if
        logical-AND (&&)
        logical-OR (||)
        logical-NOT (!)

Simple loop statement
        foreach

Arithmetic and String operators

        +, =, ==, +=, *
        .

Relational operators
        ==, <, >, >=, <=, !=
        eq, lt, gt, ge, le, ne

Input operator
        <>

Simple commands
        print
        length
        open
        close

Grouping and punctuation constructs
        {}, (), "", ","

Comment character
        #
        #! (interpreter exec specifier)

Built-in variables
        $_, $[, $0, @_

We've also looked at several important topics, including variable substitution, true and false, variable context, file access modes, and grouping.

This is just scratching the surface of Perl's rich syntax, but it's enough to make it practical to start discussing more complex programs. In the next chapter we're going to examine regular expressions, Perl's language-within-a- language for matching patterns in text. Regular expressions turn Perl from a run-of-the-mill programming language into a powerful data processing tool. They also add an additional level of complexity which can be baffling if they're encountered at the same time as the rest of the language; they have what amounts to a programming language syntax of their very own.


Exercises

  1. Write a program that reads a file and counts the number of lines, characters, and words it contains.

    Hint: you will need two additional commands.

    chomp($line) -- if $line ends with a newline character, chomp removes it

    @array = split(/delimiter/, $line) -- takes the text in $line, scans it for ``delimiter'', and splits it at each delimiter string; the results are placed an array (@array).

  2. Take the above program and modify it so that, for every word in the file, it prints a count of the number of times the word appeared.

    Hint: use associative arrays. When you get to printing out the results, if you're feeling ambitious, look in ``Programming Perl 5'' for the each() command; otherwise think about variable contexts. Oh, and you might want to investigate the sort() command.


[ Site Index] [ Attic Index] [ Perl/Tk Index] [ Feedback ]


[ Intro ] [ Chap 1 ] [ Chap 2 ] [ Chap 3 ] [ Chap 4 ]