Linux Format 15 Perl Tutorial QUOTE: Mailing the webmaster TITLE: Basic CGI scripting in Perl This time round, we're going to look at Perl and the world wide web. Perl is the commonest programming language used for automating web sites: we're going to examine a simple example CGI script and see how it uses perl modules from CPAN to get the job done. Perl came to prominence on the world wide web back in 1993-94. At that point, most web documents were static, unchanging HTML files. Where it was necessary to provide access to a database or other program, perl provided a powerful tool for interfacing to those resources on the UNIX systems that up most of the web. Today, Apache (with 60% of the web server market) still supports CGI, the common gateway interface, as a mechanism for invoking external programs that produce HTML output. END (Main text) BOXOUT: Anatomy of a CGI script ((TO BE DECIDED: describe source code of mailform.cgi here? 94 lines ...)) 01 : #!/usr/bin/perl 02 : 03 : #------------------- standard CGI skeleton ---------------- 04 : 05 : use strict; 06 : 07 : use CGI qw(shortcuts font table td TR); 08 : use Mail::Send; 09 : 10 : # configuration bits 11 : 12 : my $recipient = "user\@host.org"; 13 : my $recipient_name = "Your name goes here"; 14 : my $sender_addr = "nobody"; 15 : 16 : # end of configuration bits 17 : 18 : my ($q) = new CGI; 19 : 20 : $q->param('live') ? process_form($q, $recipient, $sender_addr) : print_form($q); 21 : 22 : exit 0; 23 : 24 : #------------------- support routines ---------------------- 25 : 26 : sub print_form { 27 : # we have no CGI parameters, so print a form 28 : my ($q) = shift @_; 29 : print $q->header(); 30 : print $q->start_html(-title => 'Mail to $recipient_name' , 31 : -BGCOLOR => '#FFFFA0'); 32 : print $q->blockquote( 33 : $q->startform, 34 : $q->p(""), 35 : $q->h1("Send mail to $recipient_name"), 36 : $q->hr(), "\n", 37 : $q->table( {"border" => "1", 38 : "bgcolor" => "#FFA0FF", 39 : }, 40 : $q->hidden( -name => "live", -value => "1"), 41 : TR( 42 : td("From:"), 43 : td( $q->textfield("my_name"), 44 : $q->i("Your email address goes here") ) 45 : ), 46 : TR( 47 : td("Subject:"), 48 : td( $q->textfield("subject") ) 49 : ), 50 : TR( 51 : td("Message:"), 52 : td( $q->textarea(-rows => "10", 53 : -cols => "60", 54 : -name => "content") 55 : ) 56 : ), 57 : TR( 58 : td("Submit:"), 59 : td( $q->submit() ) 60 : ) 61 : ), 62 : $q->hr(), 63 : $q->endform, 64 : ); 65 : print $q->end_html(); 66 : return; 67 : } 68 : 69 : 70 : sub process_form { 71 : # we detected CGI parameters, so do something with them 72 : my ($q, $recp, $from) = shift @_; 73 : my ($subj) = "CGIMAIL: Subject: " . $q->param("subject") . "\n"; 74 : my ($body) = "Originally from: " . 75 : $q->param("my_name") . 76 : "\n\n" . 77 : $q->param("content"); 78 : my ($m) = new Mail::Send ; 79 : $m->subject($subj); 80 : $m->to($recp); 81 : my ($fh) = $m->open; 82 : print $fh $body; 83 : $fh->close ; 84 : 85 : print $q->header(); 86 : print "Mail to $recipient_name", 87 : "\n"; 88 : print "
\n"; 89 : print "

Mail sent

\n", 90 : "Your mail has been sent. Thank you.

"; 91 : print "

\n"; 92 : print "\n"; 93 : return; 94 : } This CGI script does two things. If you run it with no parameters passed to it via CGI, it creates and returns an HTML form. (It does this the simple way, by printing HTML; there's a more sophisticated way of doing this, described in the boxout "CGI.pm -- What it provides".) If you run the script and populate the parameters specified in the form, it tries to send you email. (Or rather, it sends email to user@host.org -- to change this, edit line 12.) We could write this tool as an HTML page and a single-purpose CGI script, such that the script would merely process the results of filling in the form on the HTML page. However, if we do that, we risk having the HTML form direct its output to the wrong place, and we also risk losing track of one half of the application. As it's a single application which can exist in two states ("fill out form" or "process submitted form"), keeping them bundled together makes installing it easier. The program starts at lines 07 and 08, which import two Perl modules: CGI and Mail::Send. We qualify the "use CGI" command with some parameters -- these specify the methods we want to explicitly import from the CGI namespace in addition to the normal CGI stuff. CGI is a big module. If you want to learn about it you really need to read the documentation that comes with it (in POD format) -- it runs to about forty pages. (See the boxout "CGI.pm -- what it provides" for an overview.) Mail::Send, in contrast, is a simple widget designed for a single purpose -- to send email to a recipient. At line 18, we create a CGI object. This implicitly checks our environment to see if we're running as a CGI script, imports bits of data from the web server we're running under, and creates an object we can do useful things with. We do just one thing, at line 20: we look for an HTML form parameter called "live". If it's present, we call a subroutine called process_form(), to process the form input. If it isn't present, we call print_form(), to produce the HTML form for this script. The HTML form contains a (hidden) form parameter called "live". So, if someone's filled out the form (because they've already visited the CGI script), the script knows it's live; if this is a first visit (and they haven't seen the form yet) it knows to print its form out. print_form() is the simpler of the two routines, although it doesn't look it at first sight. We pass it a single parameter ($q, the CGI object), then call various CGI methods via $q and print them. CGI.pm supplies tools to let us programmatically generate HTML; call a method like $q->p("some text") and it returns a string consisting of "some text" formatted as an HTML paragraph. print_form() could equally well be written as a simple print statement to output HTML. (There's no reason for doing this here other than to show off some of CGI.pm's features -- but if we were feeling daring we could do nifty Perl things like generate extra fields on the fly, or expand macros). One point to note is that CGI's startform() method implicitly creates a form that points back to the current CGI script. So broken links to scripts are a thing of the past. process_form() does a bit more leg-work. First, we extract the named parameters from the form. Then, in lines 78 to 83 we create and send an email message! This looks deceptively simple; actually, Mail::Internet conceals a vast mass of complexity from us. Just create a new Mail::Internet object (using the new() method), use the subject() and to() methods to say what it's about and who it's addressed to, then call open() on it. This returns a file handle; print the contents of the message into this file handle and close it. When it closes, the mail is automagically sent. Finally, we print some HTML -- the traditional way, just to demonstrate that you can mix and match HTML and CGI's programmatic HTML generation in the same script, or even the same subroutine. END (BOXOUT) BOXOUT: The Common Gateway Interface and Perl In 1992, when the web was in its infancy, people were just beginning to realise that in addition to serving up HTML documents, web servers could be called on to run programs that produced output in HTML. The Common Gateway Interface was invented by NCSA (from whose web server Apache is descended) as a mechanism to make it easy to interface external programs to the web. When Apache (or another CGI-compatible server) receives an HTTP request for an executable object (one that it executes instead of simply copying to the user), it puts a chunk of information about the request in the environment -- an area of shared memory accessible to child programs. (The environment is used to store data which can be passed to child processes when the current program executes them; it's also used by the UNIX shells to stash program variables.) When a web server receives an HTTP request for a CGI program, it runs the CGI program, passing it any data from the client by stashing it in the environment. The standard output filehandle of the CGI program is connected to the parent web server process; anything the CGI program prints is read and returned direct to the web client that sent the request. CGI programs usually need to start their output by stating what type of data they're producing; it's common to see hand-rolled CGI programs start by printing: Content-type: text/html followed by a blank line. (This is the minimal HTTP 0.9 header required for an HTML document. If it's missing, the web browser simply won't know how handle the output stream. Indeed, the web server may also complain if it doesn't recognize the output content-type.) There are two distinct HTTP request types that are used to invoke CGI programs: GET and POST. A GET request, like a standard HTTP request for a file, can be accompanied by some variables in the form of name/value pairs (specified in the URL after the address of the script -- for example, http://www.foo.com/cgi-bin/myscript?colour=red&option=1). A POST request is more complex: it specifies that the parameters are appended to the request in a MIME attachment. POST parameters can be larger than those of a GET request, can include binary data, and are relatively immune from users tampering with the parameters by manually editing the URL: they're therefore recommended for any non- trivial application of CGI programming. A minimal -- but useful -- CGI program looks like this: 01: #!/usr/bin/perl 02: 03: use Data::Dumper; 04: 05:print "Content-type: text/html\n\n"; 06:print "Current Environment\n"; 07:print "

Current Environment

\n" ; 08:print "\n
\n";
09:print Dumper \%ENV;
10:print "
\n"; 11:print "\n"; This script doesn't do anything with its inputs -- but it lets you see the environment variables the CGI script can see. To do this, it calls the subroutine Dumper (imported from the module Data::Dumper), to dump out a neatly-formatted version of the hash %ENV. %ENV is the current perl process's view of its environment. Line 5 of this script prints "Content-type: text/html", followed by a blank line. This is a minimal HTTP header, and is passed through the Apache server to tell the user's browser that what follows is HTML. The following lines print brief HTML headers and then dump the contents of %ENV (at line 9) in a preformatted block. If you experiment by writing an HTML form, with the SUBMIT button pointing at this script, you'll be able to see the HTTP request showing up in the environment in the form of a variable called QUERY_STRING (if you use the HTTP GET method in your form). For example, if you saved the script (above) as cgi-bin/dumper.cgi (in your web server's file tree), you can write a form like this: Test form

Test form

Enter some data:

Favourite colour:

Preferred food:

The result of hitting this form's submit button should be a dump of the environment variables visible to your program, and QUERY_STRING should contain something like: colour=red&food=tomato%30soup (Note that non-alphanumeric characters may be encoded.) Having to parse this mass of data by hand is a nuisance, which is why we have the excellent CGI.pm module, written by Lincoln Stein (and a host of contributors). CGI.pm takes a lot of the legwork out of writing CGI scripts by letting you treat an HTML form as an object. You can query the object for the current values in a named field, or you can tell it to print out various HTML form elements in a programmatical manner, setting various options along the way. For example: print CGI->textfield("fred") prints out an HTML text entry field within a form, named "fred". And if you're processing data submitted from this form: my $cgi = new CGI; my $freds_value = $cgi->param("fred"); The new() method in CGI parses the CGI environment variables and extracts data; thereafter, we can query the value of the CGI parameter "fred" via the param() method. END BOXOUT BOXOUT: CGI.pm -- what it provides CGI.pm, written by Lincoln Stein and a host of extras, is one of the most powerful Perl modules on CPAN. CGI.pm defines a class that provides an abstract interface to a CGI session. That is: once you create a CGI object (in a script running on your web server) you can either tell it to emit some HTML in the direction of the user's browser, or process data submitted by a browser. In any event, you don't deal directly with the gory details of the Common Gateway Interface. A script that uses CGI.pm sits somewhere in an Apache (or other) web server's file space. The web server must be configured to treat it as executable -- either by placing it in a cgi-bin directory (where everything is treated as a script) or by using the ScriptAlias directive to tell the server that it's executable. When a user's browser sends a request for an executable object, the web server doesn't simply load the file and send it back to the browser: it runs it, and passes any data the user sent it via CGI. Your typical CGI.pm script starts by creating a single CGI object. The process of calling CGI->new() implicitly parses the CGI parameters and loads them into the object. There may not be any (if, for example, this is a straight HTTP request for the script, and it's expected to serve up its own invocation form), but if there is it will be accessible via the param() method. For example: use CGI; my $session = CGI->new(); my @variables_from_form = $session->param(); In this case, @variables_from_form returns the names of all the INPUT fields in the form that was submitted to this script. If you're looking for a specific variable (say, one called "colour") you can invoke the param() method in scalar context: my $requested_colour = $session->param("colour"); And you can even modify the stored value of "colour": $session->param(-name => "colour" , -value => "turquoise"); Note that in the above example we're using named parameters: CGI is written so that you can effectively pass through hashes of parameters to its methods, making it clear what you want to do. (This will be familiar to anyone who's looked at Python's extremely cool parameter passing model. Guess where CGI.pm stole it?) If you want to do away with the hyphens that indicate a parameter name, call use_named_parameters(): $session->use_named_parameters(); $session->param(name => "colour", value => "green"); or ... $session->param( name => "list_of_foods", value => ["spuds", "peas", "beans", "tuna" ] ); (which assigns an anonymous array of values to list_of_foods.) You can also delete parameters: $session->delete("colour"); Most web applications track a user through multiple screens (HTML forms) and do different things with data they input in each screen. You can save a user's CGI session data in a file, and restore it later, using CGI.pm's cookie-handling abilities to keep track of where the data they've already entered is saved. This therefore lets you keep track of data entered in earlier forms, providing the illusion of interacting with a graphical application rather than a script that is run like a batch process (i.e. it's given some data, digests it, spits out an answer, and exits). The save(FILEHANDLE) method saves the state of the current CGI object into FILEHANDLE, and you can restore the saved variables into a new CGI object like this (if, say, FILEHANDLE was associated with a file called test.out): open (IN,"test.out") || die; while (!eof(IN)) { my $q = new CGI(IN); print $q->param('counter'),"\n"; } In addition to providing access to CGI parameters, the CGI.pm performs another valuable task: it lets you generate HTML. Rather than having to hard-code HTML pages which are sent to the user's browser, you can write code like this: print start_html(-title => "CGI Results", -style => { "src" => "/styles/results.css" } ), h1("CGI Results"), p("Your search returned no results."), p("Click", a({ href => $q->self_url() }, "here"), "to search again" ), end_html; The start_html() method is used to emit an HTML header, and to set up various characteristics of the document (such as javascript to embed, style sheets to use, and so on); end_html() is the corresponding end of document method to call. In between, we see h1(), p(), and a() methods, each of which is equivalent to an HTML tag (

headers,

paragraph tags, and hyperlinks). Among the methods we can use are ones for generating form fields: my $q = new CGI; print $q->startform(), $q->p("What is your favourite colour?", $q->textfield("colour") ), $q->p("How old are you?", $q->radio_group( -name => "age", -values => [ '0-10', '11-20', '21-40', '40-80' ], -default => '21-40' ) ), $q->submit(), $q->endform(); (Note how we can mix form elements with the procedural HTML generation described earlier.) The true power of this mechanism becomes clear if you think in terms of using Perl's eval() mechanism to take a string of perl and execute it on the fly. Combined with a few simple HTML templates, you can use the ability to generate HTML to make your scripts much more flexible in the way they present search results or forms. For example, you can generate HTML tables to hold your results -- and vary the width of the tables to match the largest number of fields returned from a search that returns variable-length records. END BOXOUT (CGI.pm) BOXOUT: More information on Perl/CGI Programming The primary source of information on CGI.pm is, of course, the manual page -- type "man CGI" or "perldoc CGI" and prepare to settle in for a long read. The man page is long and complex, and is perhaps best read in accompaniment with Lincoln Stein's length notes at http://stein.cshl.org/WWW/software/CGI/ . If that doesn't satisfy, Lincoln Stein (who is the primary author of CGI.pm) has written a whole book about it: "Official Guide to Programming with CGI.pm" (pub. John Wiley & Sons, Inc., ISBN 0-471-24744-8). This isn't the only book on CGI programming in Perl. O'Reilly and Associates publish the excellent "CGI Programming with Perl, 2nd Edition" (Scott Guelich, Shishir Gundavaram & Gunther Birznieks, ISBN 1-56592-419-3). This covers sub-topics including querying relational databases, incorporating Javascript to validate forms before they're sent to the CGI script, dealing with browser caches, dynamically generating graphics, and a load of other useful things. If you learn best with a book, this is probably the one to get. While I'm on the topic of Perl and books, I'd like to give a plug for "Advanced Perl Programming" (by Sriram Srinivasan, ISBN 1-56592-220-4). Yet another of O'Reilly's excellent stable of Perl books, this one covers a number of advanced techniques. In particular, if you're a bit confused by Perl's complex data structures, name spaces, and modules, this book will make things a lot clearer; if you think of it as a tutorial book that takes up where "Learning Perl" leaves off, you won't be far wrong. In particular, the sections on networking and object orientation are well-nigh indispensible. Published in 1997, this book is likely to need an update when Perl 6 comes out -- but in the meantime, it's well-thumbed and occupies a spot on the shelf as close as possible to my desk. END BOXOUT (Books) NEXT MONTH: We take a look at mod_perl, techniques for loading CGI programs directly into the Apache web server and making them run faster. We also take a look at eval() and error trapping. END (NEXT MONTH)