LINUX FORMAT PERL COLUMN STRAP: Part 1: Introducing Perl When people mention programming languages and Linux (or UNIX), the first one that springs to mind is C. C is an efficient, low-level language, and the kernel and most of the standard Linux utilities are written in it. However, C isn't the ideal language for all tasks; in particular, tasks like file manipulation (which the shell is designed for) are cumbersome in C. The shell (or rather, shells) are high level scripting languages designed for gluing command-line programs together; they make it easy to feed a series of data files through filters that slice and dice them, but are far less efficient than C -- in particular, just about every time you do something in the shell you end up running an external program (written in C). The result of this problem (easy to program shells that are inefficient, and C or C++, which are efficient but hard to program) has been a seemingly endless proliferation of "little languages" that try to bridge the gap between shell scripting and C. For example, a common task is to scan the contents of a file and look for a pattern (using a tool like grep), or to modify the contents of a file wherever a search pattern is sound (using a tool like ed or sed). Because it's hard to use sed or grep to do something more complex than a find/replace operation within a file, the UNIX research team invented awk, a pretty little language with C-like syntax, sed's ability to scan through a file and do things whenever it spots a pattern. And thus was born yet another little language. This trend towards lots of little languages continued until Perl was born. Perl - the Programmable Extraction and Report Language, or Pathologically Eclectic Rubbish Lister, according to its author, Larry Wall - is ten years old. In the late eighties, Wall had to write tools to support an NSA networking project on VAX minicomputers and Sun workstations. He needed the pattern matching abilities of awk, but the networking power of C -- but he didn't have time to write it in C. Instead, he invented a new, general purpose language that had C-like syntax, the text processing power of awk, string handling routines stolen from BASIC, report generating routines taken from RPG-2, and a host of other features. Perl wasn't designed to be elegant, it was designed to be functional. He released it as freeware in 1988 and it rapidly migrated to other operating systems; today you can get it on Windows, all flavours of UNIX (including Linux), MacOS, and exotica like IBM mainframes running MVS and Psion Series 5's. Along the way it has picked up a reputation as the scripting language of choice for glueing web applications together; it integrates particularly well with Apache, the most widely used web server. Perl is, in fact, one of the first Very High Level Languages (VHLL) -- a category that includes Python, Tcl, and Ruby. As the VHLLs have caught on, the rate at which small languages were appearing has dropped off; these days, specific tasks tend to be integrated into an existing VHLL, rather than having a whole new language designed around them. Today, Perl is a vital component of the Linux world. SuSE and Debian Linux won't even install or run without Perl; large chunks of their core system administration code is written in Perl. It's optional on Redhat, but just barely (Redhat picked Python instead); however a fully usable Perl set-up is a standard part of Redhat's default installation. You can locate the perl interpreter on your system by typing "which perl" -- it usually lives in /usr/bin/perl. SUBHEADING: Run, Perl, Run Perl is currently neither an interpreter nor a compiler; it's a weird hybrid called an interpiler. When you type 'perl myprog.pl', you are telling Perl to open the file myprog.pl, read the contents, and parse them; Perl internally compiles the program and then interprets the resulting data structure at high speed. This way of doing things is considerably faster than a normal interpreter, which parses, tokenises and acts on each line in a program as it reads it, but is a little slower at start-up than a true compiler. On the other hand, it's easier to use at the command prompt. (Note that this way of working is likely to change in Perl 6, due some time in 2001 -- Perl 6 is likely to share Python's way of working. Python is a compiler, but you run it like an interpreter. The first time it runs on a file it compiles it to an intermediate file, then executes that; thereafter, it runs the compiled file unless you've updated the source file in the meantime.) Unlike most other programming languages, Perl was designed by a linguist: and Larry Wall has said (frequently) that one of his priorities was to bear in mind that computer languages are written for the convenience of human programmers. So Perl isn't quite as rigid as a more traditional language such as C or Pascal. (Of course, the old saw about donating a sufficiency of rope to facilitate a hanging applies: Perl's flexibility can be a problem: the commonest answer to any question about accomplishing some task in Perl is "there's more than one way to do it". In perl, there's _always_ more than one way to do it!) Perl borrows from other languages. Many of its flow of control constructs resemble C; in other ways, including its standard variable naming scheme (variables are prefixed with a symbol like $ or %, rather than being pre-declared, as in C) resemble those of the UNIX shell. Perl provides a huge repertoire of built-in keywords, and here it is even more eclectic; just about the whole UNIX system-call library is represented (so that you can resolve an internet hostname using gethostbyname(), or read the current system time and date using time()), along with a load of other operators looted from all over the place; BASIC programmers will find familiar substr() and length() operators for use on strings, for example. In addition, Perl offers specialised sublanguages for doing common tasks. Regular expressions provide a pattern-matching language for search and replace operations on strings; Perl's regular expression system is massively more powerful than most. Much of Perl's power comes from CPAN, the Combined Perl Archive Network (see http://www.perl.com/CPAN/), a huge library of modules for specialised tasks. Among these, probably the most frequently used is the CGI module -- which makes writing web CGI scripts easy. But Perl is much more than just a web server scripting tool. It's really a general-purpose application language that is best suited for writing back-office programs: complex data-munging tasks that involve communications, database access, text processing, and (once in a blue moon) graphical user interfaces. END (Main text) BOXOUT: Anatomy of a Perl program 1 : #!/usr/bin/perl 2 : 3 : $in = ""; 4 : @lines = (); 5 : 6 : print "Enter some text >"; 7 : while($in = ) { 8 : print "\n"; 9 : last if ($in =~ /^quit$/i); 10: push(@lines, $in); 11: print "Enter some text >"; 12: } 13: chomp @lines; 14: print "\n"; 15: print "You typed:\n"; 16: 17: for ($i = 0; $i <= scalar(@lines); $i++) { 18: print sprintf("%03d", $i), $lines[$i], "\n"; 19: } 20: 21: exit; Here's a simple Perl program. When you run it, it prompts you to enter some text. It reads whatever you type, until you type in a line containing nothing except the word "quit". It then prints "you typed", followed by a line-numbered listing of whatever you fed it (before the "quit" line). Here's how it works, line by line. The first line is optional but makes life easier; it's not strictly part of the program, but it tells the shell that this is a Perl program and you run it by invoking /usr/bin/perl and feeding the rest of the file to that program. (If your Perl interpreter lives somewhere else -- say in /bin/perl or /usr/local/bin/perl -- you need to change this line accordingly.) Next, we initialise two variables: $in and @lines. These are different types of variable. Where C or Pascal has low-level data types such as integers, floating point numbers, and character types, Perl has three high-level data types: singular, plural, and dictionary. Line 6 should be fairly clear; we prompt the user to type something. Line 7, however, is a lot more complex! We're introducing the while loop, a flow-of-control construct that looks rather like its equivalent in C; the bracketed block is executed while the expression at the top evaluates to true. However, the expression looks like nothing in C; what we're doing is using some Perl magic to read a line from the standard input file handle STDIN (whatever you type on the console) into a variable called $in. If the standard input closes the loop exits; otherwise we keep running continuously ... except that on line 9 we have two more bits of Perl magic. Perl's while loop has some seriously powerful exit conditions. "last" is a keyword that means "exit this loop now"; because Perl is happy with somewhat colloquial usage, you can put the if statement controlling this command after it. Basically, line 9 means "exit this loop if the variable $in matches the pattern /^quit$/i". The // actually acts as brackets around a regular expression. A regular expression is a pattern that is tested against the variable preceding it (which it is bound to by the =~ operator). If the pattern matches the variable on the left, the =~ operator returns "true"; otherwise it returns "false". The pattern matches if the beginning variable (in the pattern, "^") is followed by the characters "quit" and then the end of the variable (in the pattern, "$"), in a case- insensitive manner (that trailing letter "i".) So if $in contains nothing but "quit" (because the pattern is anchored at either end to the beginning and end of whatever it's matching), the expression returns "true" ... and causes the "last" command to be executed, breaking out of the loop. Line 10 introduces a new command: push. Push takes at least two parameters -- an array and one or more scalars. It pushes the scalars onto the front of the array, shuffling everything already in the array to the right to make room. We use it to stash the lines we read in the array @lines. Line 13 shows the chomp command in action. Chomp takes one or more scalars (or an array) and strips off any trailing carriage returns off the end of them. (We can add them back later; when you put the escape symbol \n in a double-quoted string, it is replaced by a carriage return -- just like in a C printf statement.) Finally, we get to lines 17 to 19. These show us the format of a Perl for loop; it's just like a for loop in C. Here, we use it to increment a counter called $i (a scalar containing an integer number) and, using sprintf (which works pretty much like it's C equivalent) to print each line of the array @lines in turn, prefixed with the line number. The one point to note is that to refer to the fifth scalar element of an array called @fred, you talk about $fred[5]; the leading dollar sign means "interpret this in a scalar context". If we talked about @fred[5], we would actually be using list context: it would mean "the array consisting of the fifth element of @fred", not "the scalar which is the fifth element of @fred". This is an important distinction as we'll see in future articles: as I said earlier, Perl variables are singular (scalar), plural (array or list), and dictionary (hashes, aka associative arrays), and most perl commands do different things in an array or scalar context. END BOXOUT BOXOUT: Variables in Perl A scalar (singular, unitary) variable has a name beginning with a dollar symbol, like $in, and stores one entity -- a number or string or just about anything else. Scalars are elastic -- they stretch to fit whatever you put in them, so that you can cram a very large binary file into one if you want. The most important characteristic of a scalar is that it is atomic -- that is, Perl can't subdivide it. You can dink with the contents of a scalar using Perl's string or arithmetic operators, but the scalar itself is a single identifier that refers to a single object. In contrast, when you've got more than one item you can store the data together in an array: these have names beginning with an at symbol "@", such as @lines. Arrays are dynamic; you don't need to specify how many elements there are in one, they grow to fit. An array consists of a numbered list of scalars, starting from zero: that is, the first item in the array is a scalar with index number (subscript) of zero, the second is a scalar with subscript one, and so on. One oddity to note: when you're referring to a scalar member of an array, you prefix it with a dollar sign to indicate it's a scalar: @my_array = ($foo, "blue", 1, $bar); print $my_array[1], "\n"; ==> blue We created an array here by assigning a list to it. A list is just a bunch of scalars, delimited with brackets; it's not treated as a separate variable until we explicitly turn it into an array. We can extract slices from an array like this: @small_slice = @my_array[1..2]; print $small_slice[0], " ", $small_slice[1], "\n"; ==> blue 1 @small_slice is a new array; we assign to it a slice of @my_array containing the elements with subscripts between 1 and 2 (inclusive). In the first print example, we observed that you prefix a scalar member of an array with a dollar sign, to indicate it's a scalar. When printing a slice of an array, you use an at-sign instead of a dollar: print @my_array[1] means "print the array slice containing elements with subscripts from 1 to 1" rather than "print the scalar with array subscript 1". Because the print command understandard arrays, this will work and produce exactly the same output as printing the scalar -- but not all Perl commands are this forgiving. It's a good idea to get used to distinguishing array slices from scalars as soon as possible, otherwise you're liable to trip over them later. It's important to note that lists and arrays are not the same thing in Perl. An array is a special type of variable that contains one or more scalars, but can be manipulated as a variable. A list is just a grab-bag of scalars (or other variables). This has far-reaching and subtle consequences. For example, we can force an array to grow by referring to an element in it with a high subscript: @big_array = ("foo", "bar", 53, $blue); # @big_array now contains four elements $big_array[72] = ""; # We just assigned the empty string to the 73rd element of @big_array, # implicitly populating elements 4..71 with empty scalars Just as we have singular (scalar) and plural (array/list) variables, most Perl commands operate differently in a singluar (scalar) or plural (list) context. Context is unique to Perl: most languages, such as C or Pascal, insist that a function can only operate on one type of input data. Perl does things differently. For example: print @my_array; print $my_array[0], $my_array[1], $my_array[2], $my_array[3]; These uses of the "print" function both do what you'd expect even though they're working on different data types, because the print command normally works on a list of one or more scalars; it interprets its parameters as a list (in "list context", in Perl jargon). If you want to know how many cells @my_array has, use the function scalar() to force it into a scalar context; when you refer to a list or array in scalar context, Perl returns its size. print scalar(@my_array); ==> 4 There's one other type of simple variable in Perl; the hash (or associative array, or dictionary). This type of variable is found in other high-level languages but not in the likes of C or Pascal or Basic. A hash is basically an array, but instead of each scalar element of the array having a numerical subscript starting with 0, 1, 2 ..., the hash subscripts are strings (or numbers, or anything else that's legal as the value of a scalar). Hashes are indicated by a percent "%" sign: %dictionary = ( "my_colour" => "blue", "my_side" => $direction, "num_of_widgets" => 4 ); print $dictionary{my_colour}, "\n"; => blue Each entry in a hash consists of a key and a value; you use the key where you'd use a subscript in an array, and the value is some scalar entity. In fact, we can assign an array to a hash, as long as the array has an even number of contents so that it can be broken down into matching key/value pairs. In the example above, we create %dictionary by assigning a list of key/value pairs to it; the "=>" operator is a synonym for the comma "," in Perl which makes it easier to see that the key (my_colour) points to the value ("blue"). There're two special commands to remember with hashes: keys() and each(). keys(%dictionary) will return a list of all the keys in %dictionary, i.e. a list like ("my_colour", "my_side", "num_of_widgets"). This makes it easy for us to loop over all the items in a hash and do something with them. In scalar context, keys() returns the number of elements in the hash. If the hash is huge, the list of keys will be huge, too: so Perl also provides the each() command. Whenever each(%dictionary) is called, it returns a two-element list consisting of the next key, and the value associated with it -- until the end of the hash is reached, at which point it returns an empty list. So we can also walk through the elements of a hash using a while() loop: while (($key, $value) = each (%dictionary)) { # do something with $key and $value -- leave the loop if these items # are empty because we've reached the end of %dictionary } END BOXOUT (Variables in Perl) BOXOUT: On the web Perl culture has its roots in two places; the internet, and publishing house O'Reilly and Associates Inc (http://www.oreilly.com/). O'Reilly and Associates fund Perl's inventor, Larry Wall, to work full-time on his brainchild; they also publish the best and broadest range of books about Perl. In fact, no Perl programmer should be without a copy of the definitive book on the language, "Programming Perl" (third edition), by Larry Wall, Tom Christiansen, and Jon Orwant (ISBN 1-56592-00027-8). If you're beginning Perl, you should also consider buying "Learning Perl" by Randal Schwartz (ISBN 1-56592-042-2) -- this was the first tutorial on the language, and there's no way that a magazine column such as this one can compete with it. You may also want to buy "The Perl CD Bookshelf" (ISBN 1-56592-462-2), a CD-ROM which comes with a paper copy of "Perl in a Nutshell" (a quick-reference guide that's only 600 pages long!) and both the above books, along with three others. Finally, if you know Perl but can't remember every little detail of the language syntax, you need a copy of the Perl 5 Pocket Reference (3rd edition) by Johan Vromans -- it's a very terse aid-memoire that fits in just 90 pages, and it'll set you back a fiver or so (ISBN: 0-596-00032-4). In addition to printing books, O'Reilly maintain (and fund) the core website, http://www.perl.com/. This is effectively a portal site for Perl programmers, and has all the documentation online along with articles about Perl. It also contains a central link into CPAN, the Combined Perl Archive Network, source of almost all Perl downloadables. (CPAN contains stacks of Perl scripts, the language itself in numerous versions, and a gigantic archive of modules -- reusable object classes.) In contrast, http://www.perl.org/ is the public headquarters of Perl Mongers -- a grass-roots organisation of Perl programmers that acts as an umbrella organisation for local user groups. Check it out -- there's probably one near you, and often the easiest way to learn is to pick the brains of someone who's already been through the process. There's also an associated web portal, http://use.perl.org/ which contains news of active Perl events and module releases, and another portal for questions, answers, and support: http://perlmonks.org/. If you're really interested you might want to subscribe to the Perl Journal. This is a quarterly magazine for Perl programmers; subscribers can also access its content via the web at http://www.tpj.com/. TPJ is authoritative and chock-full of technical articles, albeit targeting the working Perl programmer rather than the general public. Traditionally, most Perl support is delivered via usenet. A number of newsgroups (conferences) exist specifically for Perl, starting with news:comp.lang.perl.announce for announcements, then news:comp.lang.perl.modules (discussion of object-oriented programming in Perl), news:comp.lang.perl.cgi (discussion of Web application programming in Perl, not for general Perl questions!), and news:comp.lang.perl.moderated (for questions to which you can't find an answer in the FAQs). Unfortunately news:comp.lang.perl.misc is so badly overrun that it's difficult to use it these days. END BOXOUT BOXOUT: Running a program To get a Perl program to run on Linux, you type it into a text editor and save the file. Make sure that the first line of the file begins with the characters '#!' (omitting the quote marks) followed by the full pathname to the perl program: for example: #!/usr/bin/perl Then use chmod(1) to make the file executable: chmod +x myprog.pl Finally, run it like this, from the current directory: ./myprog.pl (Hint: you can miss out the leading './' if you have '.' in your PATH environment variable.) Alternatively, in the directory you saved the file type: perl myprog.pl END BOXOUT