Linux Format 20 Perl tutorial HEADING: Using DBI and Perl to write a simple web guestbook This month, we're going to look at an example CGI guestbook script that sits on top of MySQL, and see how it we can get data from a CGI script into a database and back out again using the DBI database interface. As we saw last month, Perl can talk to a variety of relational database management systems via a module called DBI (the Database Interface). DBI presents a standard API that applications can use to connect to a database server, submit SQL commands, and execute queries. DBI provides a uniform interface -- in general, you can port a Perl database program that uses DBI from one database to another very easily. To do the low-level stuff, DBI relies on a DBD (Database Driver) module; there is a separate DBD module for each supported RDBMS. The job of the DBD module is to accept commands from DBI and translate them into a call to the underlying RDBMS. In addition, the DBD modules may expose some application-specific functionality to Perl (although using this may compromise portability). There are DBD drivers for Oracle, Informix, Sybase, Interbase, ODBC, flat CSV files, and a whole load of other databases. One advantage of DBI is that you can connect to multiple RDBMSs at the same time, and interchange data between them. Another is that it makes it very easy to build database driven web applications -- like this one, a simple guestbook. For this article, we're going to use MySQL, from TcX of Norway. MySQL is an open source database engine that is lightweight and reasonably powerful -- and it's ubiquitous in the Linux world. (You'll find a copy of it in every major Linux distribution.) A web guestbook is the poor cousin of a bulletin board or discussion forum; it's a tool that allows users to write comments and append them to a web site, and displays a list of previously added comments. It doesn't do things like threading, subjects, discussion areas, and scoring, but provides the minimal subset of posting an article and reading articles. Most guestbook CGI scripts simply provide a form, and append text uploaded via the form to a flat HTML file. This one is a bit more sophisticated; submitted comments are inserted into a database table where they are tagged with some additional information (a timestamp, a unique identification number, and a flag to indicate whether or not the comment is 'visible'). When you look at the guestbook, only the most recent comments will be displayed -- and if the administrator sees a comment they disapprove of, they can manually set the 'visible' field to zero or NULL and the comment will not be shown to visitors. (Actually, by simply adding one or more 'topic' or 'subject' fields we can turn this into the core of a conferencing system. But for now, we're going to keep things simple.) The program gb.cgi is structured like the mailform.cgi script presented in Linux Format #16; the main difference is the addition of DBI database code, and a couple of extra service subroutines. There's a user-configurable hash reference called $dbconf at the top; this is where we set up the name of the database and table we'll use to store our guestbook, along with the password and other useful items. Below it, we create a new CGI object called $q, open a database handle on the database, then hit the main logic of the CGI program, lines 47-52: if ($q->param('data') == 1) { insert_gb ($q, $dbconf, $d); print_gb ($q, $dbconf, $d); } else { print_gb ($q, $dbconf, $d); print_form($q); } The CGI field 'data' (which is hidden in the form printed by print_form()) contains a '1'. So if we see it, we're processing input from a form submission. We therefore call insert_gb() to insert a guestbook record into the database, then call print_gb() to print an HTML page showing the guestbook contents. If we don't see a field called 'data', we know that the user has just hit the guestbook script for the first time. So we call print_gb() to show them the guestbook, then call print_form() to give them the opportunity to add their comments to it. (We do things this way, with the HTML and the processing logic rolled up into a single file, to minimize the risk of the two components of the CGI program being separated accidentally, or of a different HTML template being used by mistake. There's another way round this using the module CGI::FormMagick, which we'll visit in a future tutorial.) insert_gb() is fairly simple-minded. It takes the form fields from the CGI object and builds an SQL INSERT statement that puts them in the guestbook.gb table. (See boxout, "MySQL and the guestbook database" for details of the database layout.) The only non-standard items to note are: my $buffer = sanitise($cgi->param('contents')); Instead of inserting the 'contents' textarea verbatim, we do some mild processing on it. We don't want malicious users inserting random URLS or bits of javascript into our guestbook, do we? We farm out the job of tidying up our input to a subroutine called sanitise(), which currently doesn't do much, but which can be extended quite easily. (If you feel like a learning exercise, look into using the module HTML::Parser to parse the input as HTML and return the plain text contents.) We call the MySQL function CURRENT_TIMESTAMP() to provide a suitable parameter for the creation_time field (of type datetime), and we set comment_id to NULL, which forces MySQL to insert a sequential, auto-incremented number into that field. (We don't use this here, but per-article unique reference numbers are a vital prerequisite for any full-blown conferencing system that supports threading, and it's nice to be future-proof.) print_gb() is the most complex subroutine of the bunch. First, it generates a SQL SELECT statement to retrieve records from the gb table. It calls DATE_FORMAT() to turn creation_time into something more human-friendly, and it only retrieves records where 'visible' has a value of 1, and where the difference in days between NOW() and the value of creation_time is less than $MAIN::max_age -- so we don't display really ancient comments unless you want to set $MAIN::max_age to a large number of days. MySQL does the heavy lifting of retrieving the right records and sorting them in descending date order (most recent postings go at the top). It uses the $MAIN::max_items variable (by default, the 50 most recent), to limit the number of items we print, stopping a heavily-used guestbook from spawning megabytes of HTML. Once we call prepare() to prepare an SQL query, and execute() to run it, we process each record returned from the database by calling fetchrow_hashref() on the statement handle returned by prepare(). For a given query, fetchrow_hashref() returns the next row of the results, in the form of a hashref. The keys to the hashref are the field names of the table resulting from the SQL SELECT statement. For example, we can do this: my $sth = $dbh->prepare("SELECT author, creation_time FROM gb") while (my $ref = $sth->fetchrow_hashref()) { print "Author: ", $ref->{author}, "\n", "created: ", $ref->{creation_time}, "\n"; } All we do in print_gb() is wrap CGI.pm's ability to generate HTML procedurally (using Perl subroutines) around a loop of this type, so that instead of printing text, we produce a nice HTML table. This isn't all we can do with the guestbook script. The print_gb() routine in this version uses a fairly simple SQL query that, stripped of the Perl surrounding it, looks like this: SELECT comment_id, author, contents, DATE_FORMAT(creation_time, '%W %M %Y') FROM guestbook WHERE visible=1 AND (TO_DAYS(NOW()) - TO_DAYS(creation_time) <= 14) ORDER BY comment_id DESC\n What this means is: we want to retrieve four fields (comment_id, author, contents, and a neatly formatted version of the date -- which is stored internally as a machine-readable datestamp) from the guestbook table. The only records we want are those where the "visible" flag has a value of "1" and the record is less than 14 days old. (TO_DAYS() is an internal MySQL function that takes a timestamp -- such as the one stored in the database or the one returned by the MySQL NOW() function -- and returns the integer number of days in it.) The retrieved records are then ordered in descending order of the comment_id field (an integer assigned to each record as it is created). If we have a busy guestbook, we may not want to retrieve hundreds of records. We may instead prefer to display, for example, blocks of ten records with "forward" and "back" buttons to navigate through the list. How can we modify the gb.cgi script here to do this? The simplest technique is to add a spare hidden field in the HTML produced by print_gb(), containing the current offset into the list of records in terms of the comment_id value of the first record -- CGI.pm lets us do this by saying: print $cgi->hidden(comment_id_offset, $starting_comment_id); (Where $starting_comment_id is the comment_id of the first record we've printed.) We can then add "forward" and "back" submit buttons by attaching values to the submit button: print $cgi->submit(-name => 'forward', -value => 'forward'); or: print $cgi->submit(-name => 'back', -value => 'back'); print_gb() can detect which of the submit buttons was pressed in the usual way, by checking the CGI parameter's value: if ($cgi->param("forward") eq "forward") { # a forward button was pressed } elsif ($cgi->param("back") eq "back") { # a back button was pressed } else { # this is the first invocation of print_gb() code -- neither # button was pressed } Now we need to modify the SQL executed by print_gb() accordingly. Suppose we want to view pages of $page_length records in the guestbook. If we have no previous invocation of print_gb(), we set $starting_comment_id to 0. If a forward button was pressed, we add $page_length_records to $starting_comment_id, and if a back button was pressed, we subtract $page_length_records from $starting_comment_id (but make sure that the number never goes below zero!). Now our SQL select statement in print_gb() needs to look like this: SELECT comment_id, author, contents, DATE_FORMAT(creation_time, '%W %M %Y') FROM guestbook WHERE visible=1 AND (TO_DAYS(NOW()) - TO_DAYS(creation_time) <= 14) AND comment_id >= $starting_comment_id AND comment_id <= ($starting_comment_id + $page_length_records) ORDER BY comment_id DESC\n Feel like modifying some code? Note that while this is the simplest way of making our guest book navigable, it's not necessarily the best. Each time you hit the forward or back button the CGI script executes a SQL statement. A heavily used system that doesn't absolutely require database consistency may cache the output from SQL queries in something like a DBM file, and return a cookie to the user containing the key to the DBM file, so that instead of needing a SQL SELECT statement to generate the next pages it can simply yank the frozen results of an earlier SELECT out of a cache. But that's getting beyond the scope of a simple guestbook script. END (Main text) BOXOUT: MySQL and the guestbook database MySQL, from TcX DataKonsult AB of Sweden, is an open source database. It provides a lightweight and nearly complete SQL92 implementation, while being small enough to pick up relatively fast. (Unlike, say, Oracle.) You can find it at http://sunsite.org.uk/packages/mysql/. MySQL is described in detail in the Reference Manual that comes with the package. It's online at http://www.mysql.com/Manual/manual_toc.html . There is also a book on the subject: "MySQL & mSQL", by Randy Jay Yarger, George Reese & Tim King (pub. O'Reilly and Associates, ISBN 1-56592-434-7). This column is way too short to explain how to install MySQL in detail; you will need to look at the directions in the manual. However, to run this guestbook script we need to do a couple of things: *** BEGIN NUMBERED LIST 0. Install MySQL and start the server running (of course!). 1. As root, create a new database called 'guestbook': mysqladmin create guestbook 2. Create a new MySQL user called 'gb'. This user can connect only from localhost, using a password (set up as 'gb_password' in the gb.cgi script -- you'll need to change this), but has root privileges on the 'guestbook' database. As root, run 'mysql mysql' (i.e. fire up the MySQL query monitor, running on the 'mysql' database which stores MySQL's local settings). Type: insert into mysql.user values ('localhost','gb',PASSWORD('gb_password'), 'y','y','y','y','y','y','y', 'y','y','y','y','y','y','y'); This sets up a new user called 'gb' with password 'gb_password'. (You can dink with these, but if you do you'll need to modify the contents of the hashref $dbconf in gb.cgi -- otherwise the Perl CGI program won't be able to connect to the database using this username and password.) Once you've set up the user called 'gb', you need to quit the mysql query monitor and tell the database server to reload its' user privileges: mysqladmin flush-privileges 3. Now it's time to set up the 'guestbook' database. Actually, it's not a very complex database -- one table with five fields. You can do it like this: mysql -u gb -pgb_password guestbook < gb.sql (Note that we can now do this as user 'gb' by specifying gb's password). gb.sql is simply a text file containing our database's structure: CREATE TABLE gb ( author varchar(128), contents text, creation_time datetime, visible tinyint(4), comment_id int(10) unsigned DEFAULT '0' NOT NULL auto_increment, KEY comment_id (comment_id) ); 4. We should now have a database called 'guestbook', that contains a single table called 'gb', with the structure defined above. We should also have a user called 'gb' who, when connecting from localhost, has full privileges on the 'guestbook' database (but no others). The final step is to put the gb.cgi script somewhere appropriate (your cgi-bin directory is a good start!), make it executable (if necessary editing the #! line), and point a web browser at it. *** END NUMBERED LIST END (BOXOUT) BOXOUT: The Perl DBI Interface A Perl program calls: use DBI; to load the DBI module. Before this will work, you need to install the latest version of DBI (available from CPAN -- http://www.perl.com/CPAN/) and an appropriate DBD drivers for relational databases you're going to use (also available from CPAN). A DBI session typically starts by calling the connect() method with a bunch of parameters which vary from database to database (the details are in the documentation for each DBD driver). This tells DBI to load the correct DBD driver, establish a connection to the RDBMS engine, and return a database handle. The database handle is actually a DBI object, and provides methods for talking to the database we're connected to. For example: my $dbh = DBI->connect("DBI:mysql:database=guestbook;host=localhost", "guestbookuser", "guestbookpassword", { 'RaiseError' => 1 } ); This calls the DBI->connect() method and tells it to return a handle connected to the MySQL database server running on localhost, talking to the database called 'guestbook', signed on as 'guestbookuser' using the password 'guestbookpassword'. Once we have a handle, we can feed SQL commands to the database: $dbh->do("UPDATE gb SET visible=0 WHERE comment_id=42"); (This carries out the UPDATE statement immediately.) Or we can prepare a SELECT statement, then fetch each line of the resulting table (in this case, fetching the line as an array of fields): my $sth = $dbh->prepare("SELECT author, creation_time FROM gb") || die $DBI::errstr; while (@row = $sth->fetchrow_array() ) { print "Author: $row[0]\n", "Date: $row[1]\n\n"; } (Note the "die $DBI::errstr" bit; if the prepare() call fails, it returns non-zero -- we therefore die(), emitting the value of $DBI::errstr, which is hopefully an informative error message.) At the end of a session, we call disconnect() to shut down the RDBMS connection: $dbh->disconnect(); This doesn't even scratch the surface of what you can do with DBI! For the full in-depth tour, you really need to read "Programming the Perl DBI" by Alligator Descartes & Tim Bunce (pub. O'Reilly and Associates, ISBN 1-56592-699-4). Highly recommended -- Alligator and Tim invented DBI and wrote more than half the DBD drivers, so they know what they're talking about! END BOXOUT BOXOUT: How not to design a database Many books have been written about the design of relational databases; if you're serious about it, you could do worse than start with "An Introduction to Database Systems" by C. J. Date (pub. Addison-Wesley). For the guestbook, we don't need anything fancy. Users submit comments; each of these has some associated information -- a flag to say if it is visible, a timestamp, the author's name, and an autoincremented sequence number that we assign to it automatically. We don't have a user registration system, so we don't keep user information in a separate table; we just take whatever name they give us. A single table is in fact all we need for an unstructured guestbook -- we don't have to start worrying about entity relationship diagrams unless we want to extend the application to cope with user accounts, threading, and the like. Our table looks like this: CREATE TABLE gb ( author varchar(128), contents text, creation_time datetime, visible tinyint(4), comment_id int(10) unsigned DEFAULT '0' NOT NULL auto_increment, KEY comment_id (comment_id) ); In case you haven't met all these data types, a varchar is a variable-length character string (here maxed out at 128 characters). A text field is also variable length, but able to hold much larger data items (albeit with constraints on indexing and searching). A datetime field is our timestamp, and we use a tinyint (eight-bit signed integer) as a flag to say whether or not this record is visible ('1' means it will show up in guestbook listings; any other value means it won't). The comment_id field is more complex. We specify it as an unigned long integer, defaulting to a value of '0'. It can't be a NULL, and it auto-increments -- if we try and assign a NULL to it, MySQL will replace the NULL value with the next sequential number available in ascending order. It's designated as a key field, although we don't use it as such here: this is a constraint on MySQL's auto_increment fields. A more interesting guestbook will extend this basic data structure. In particular, it'd be nice to allow users to add titles to their postings, and to refer (by message ID) to previous postings. It would also be nice to be able to print the discussions in a structured, neatly indented, manner -- providing for hierarchical, threaded communications. We could accomodate this by adding a "reference" field, of type int(10); this can contain the comment_id of a parent article, if the current one is a follow-up to an earlier posting. However, before we can print such a threaded discussion we need to know how to build tree-like data structures using Perl ... END BOXOUT BEGIN LISTING: gb.cgi 001: #!/usr/bin/perl 002: 003: #------------------- standard CGI skeleton ---------------- 004: 005: use strict; 006: use CGI qw(shortcuts font table td TR); 007: use DBI; 008: 009: # configuration bits 010: 011: $MAIN::max_age = 14; # oldest record, in days, to display 012: $MAIN::max_items = 50; # maximum number of records to display 013: 014: my $owner_name = "Charlie Stross"; 015: 016: my $dbconf = { 017: "host" => "localhost", 018: "dbname" => "guestbook", 019: "username" => "gb", 020: "password" => "gb_password", 021: "table" => "gb", 022: "fields" => [qw(comment_id author 023: contents creation_time visible)], 024: }; 025: 026: # end of configuration bits 027: 028: my ($q) = new CGI; 029: 030: my $connectstr = "DBI:mysql:database=" . 031: $dbconf->{dbname} . 032: ";host=" . 033: $dbconf->{host} ; 034: 035: my ($d) = DBI->connect($connectstr, 036: $dbconf->{username}, 037: $dbconf->{password}, 038: {'RaiseError' => 1}); 039: 040: print $q->header(-type => 'text/html', 041: -status => '200 OK'); 042: print $q->start_html(-title => "$owner_name\'s guest book" , 043: -BGCOLOR => '#FFFFA0'); 044: print $q->h1("$owner_name\'s guest book"); 045: print $q->hr(); 046: print $q->start_blockquote(); 047: if ($q->param('data') == 1) { 048: insert_gb ($q, $dbconf, $d); 049: print_gb ($q, $dbconf, $d); 050: } else { 051: print_gb ($q, $dbconf, $d); 052: print_form($q); 053: } 054: 055: print $q->end_blockquote(); 056: print $q->hr(); 057: print $q->start_font({'size' => '-2'}); 058: print $q->p("This page generated for host ", $q->remote_host(), 059: " (using ", $q->user_agent(), ") on ", scalar(localtime(time))); 060: print $q->end_font(); 061: 062: print $q->end_html; 063: $d->disconnect(); 064: exit 0; 065: 066: #------------------- support routines ---------------------- 067: 068: sub print_form { 069: # print guestbook entry submission form. This includes a hidden 070: # field called 'data' that is set to '1', which tells the next 071: # invocation of the CGI program that it is processing a 072: # submission. 073: 074: my $q = shift; # CGI query object to use for building form 075: 076: print $q->startform; 077: print $q->p(""); 078: print $q->h2("Have your say"); 079: print $q->hr(), "\n"; 080: print $q->table( {"border" => "1", 081: "bgcolor" => "#FFA0FF", 082: }, 083: $q->hidden( -name => "data", -value => "1"), 084: TR( 085: td("From:"), 086: td( $q->textfield("author"), 087: $q->i("Your name/email address goes here") ) 088: ), 089: TR( 090: td("Message:"), 091: td( $q->textarea(-rows => "6", 092: -cols => "60", 093: -wrap => "physical", 094: -name => "contents") 095: ) 096: ), 097: TR( 098: td("Send comment:"), 099: td( $q->submit(-name => "send") ) 100: ) 101: ); 102: print $q->hr(); 103: print $q->endform; 104: return; 105: } 106: 107: sub insert_gb { 108: # process a new guestbook submission. Strip anything suspicious 109: # (i.e. raw binaries) from the contents field, and insert into 110: # database. 111: my $cgi = shift; # perl CGI object 112: my $dbconf = shift; # hash of configuration info 113: my $db = shift; # perl DBI database handle 114: 115: # we want to sanitise the contents of 'author' and 'contents' here. 116: my $buffer = sanitise($cgi->param('contents')); 117: my $query = "INSERT INTO " . $dbconf->{table} . " " . 118: "(author, contents, creation_time, visible, comment_id)\n" . 119: "VALUES (" . 120: $db->quote($cgi->param('author')) . ", " . 121: $buffer . ", " . 122: "CURRENT_TIMESTAMP()" . ", " . 123: "1" . ", " . 124: "NULL" . 125: ")\n"; 126: $db->do($query) or print "\n", $db->errstr(); 127: return; 128: } 129: 130: sub print_gb { 131: # print guestbook contents We retrieve all 132: # records from the database that are not flagged with visible<>1, 133: # and which were submitted in the preceeding $MAIN::max_age 134: # days, and we order them in reverse order of date (to a maximum 135: # of $MAIN::max_items entries). 136: 137: my $cgi = shift; # CGI object 138: my $dbconf = shift; # hash of configuration info 139: my $db = shift; # perl DBI database handle 140: my $count = 0; # count of retrieved rows from database 141: 142: print $cgi->h1("Things people said"); 143: 144: print $cgi->start_table( {"border" => "1", 145: "bgcolor" => "#FFA0FF"} ); 146: 147: my $query = "SELECT comment_id, author, contents, " . 148: "DATE_FORMAT(creation_time, '%W %M %Y') " . 149: "FROM " . 150: $dbconf->{table} . 151: " WHERE " . 152: "visible=1 AND " . 153: "(TO_DAYS(NOW()) - TO_DAYS(creation_time) <= $MAIN::max_age)" . 154: " ORDER BY comment_id DESC\n"; 155: 156: my $sth = $db->prepare($query); 157: $sth->execute(); 158: while (my $ref = $sth->fetchrow_hashref() and 159: $count <= $MAIN::max_items) { 160: $count++; 161: print $cgi->TR( 162: td("From:"), 163: td( $ref->{author}), 164: ), 165: $cgi->TR( 166: td("Date:"), 167: td( $ref->{creation_time}), 168: ), 169: $cgi->TR(td("Comment:"), 170: td( 171: $cgi->pre( 172: $ref->{contents} 173: ) 174: ) 175: ); 176: } 177: if ($count == 0) { 178: print $cgi->TR( 179: td("Sorry, this guestbook is empty!") 180: ); 181: } 182: print $cgi->end_table; 183: print $cgi->hr(); 184: return; 185: } 186: 187: sub sanitise { 188: # sanitise the contents of someone's posting. 189: # 190: # A *really smart* guestbook would be smart enough to keep 191: # sane/legal HTML while ditching the dubious stuff. But this 192: # program is here to demonstrate MySQL and DBI, not whizzy CGI 193: # scripting. 194: # 195: # Currently all we do is replace carriage returns with
tags, 196: # and trust that the user will only enter ASCII text. Feel free 197: # to extend this subroutine to do something more useful! 198: # 199: 200: my $buffer = shift; 201: $buffer =~ s//>/g; 203: $buffer = join("
", grep(/^.+$/, split(/[\r\n]/, $buffer))); 204: $buffer =~ s/'/\\'/g; 205: $buffer = "'" . $buffer . "'"; 206: return $buffer; 207: 208: } END LISTING (gb.cgi)