Linux Format 20
Perl tutorial
HEADING: Using DBI and Perl to write a simple web guestbook
This month, we're going to look at an example CGI guestbook script
that sits on top of MySQL, and see how it we can get data from a
CGI script into a database and back out again using the DBI database
interface.
As we saw last month, Perl can talk to a variety of relational database
management systems via a module called DBI (the Database Interface). DBI
presents a standard API that applications can use to connect to a database
server, submit SQL commands, and execute queries. DBI provides a uniform
interface -- in general, you can port a Perl database program that uses
DBI from one database to another very easily. To do the low-level stuff,
DBI relies on a DBD (Database Driver) module; there is a separate DBD
module for each supported RDBMS. The job of the DBD module is to accept
commands from DBI and translate them into a call to the underlying
RDBMS. In addition, the DBD modules may expose some application-specific
functionality to Perl (although using this may compromise portability).
There are DBD drivers for Oracle, Informix, Sybase, Interbase, ODBC, flat
CSV files, and a whole load of other databases. One advantage of DBI is
that you can connect to multiple RDBMSs at the same time, and interchange
data between them. Another is that it makes it very easy to build database
driven web applications -- like this one, a simple guestbook.
For this article, we're going to use MySQL, from TcX of Norway. MySQL
is an open source database engine that is lightweight and reasonably
powerful -- and it's ubiquitous in the Linux world. (You'll find a copy
of it in every major Linux distribution.)
A web guestbook is the poor cousin of a bulletin board or discussion
forum; it's a tool that allows users to write comments and append them
to a web site, and displays a list of previously added comments. It
doesn't do things like threading, subjects, discussion areas, and
scoring, but provides the minimal subset of posting an article and
reading articles.
Most guestbook CGI scripts simply provide a form, and append text
uploaded via the form to a flat HTML file. This one is a bit more
sophisticated; submitted comments are inserted into a database table
where they are tagged with some additional information (a timestamp,
a unique identification number, and a flag to indicate whether or not
the comment is 'visible'). When you look at the guestbook, only the most
recent comments will be displayed -- and if the administrator sees a
comment they disapprove of, they can manually set the 'visible' field
to zero or NULL and the comment will not be shown to visitors.
(Actually, by simply adding one or more 'topic' or 'subject' fields we
can turn this into the core of a conferencing system. But for now,
we're going to keep things simple.)
The program gb.cgi is structured like the mailform.cgi script
presented in Linux Format #16; the main difference is the addition of
DBI database code, and a couple of extra service subroutines. There's
a user-configurable hash reference called $dbconf at the top; this is
where we set up the name of the database and table we'll use to store
our guestbook, along with the password and other useful items. Below
it, we create a new CGI object called $q, open a database handle on the
database, then hit the main logic of the CGI program, lines 47-52:
if ($q->param('data') == 1) {
insert_gb ($q, $dbconf, $d);
print_gb ($q, $dbconf, $d);
} else {
print_gb ($q, $dbconf, $d);
print_form($q);
}
The CGI field 'data' (which is hidden in the form printed by
print_form()) contains a '1'. So if we see it, we're processing input
from a form submission. We therefore call insert_gb() to insert a
guestbook record into the database, then call print_gb() to print an
HTML page showing the guestbook contents.
If we don't see a field called 'data', we know that the user has just
hit the guestbook script for the first time. So we call print_gb() to
show them the guestbook, then call print_form() to give them the opportunity
to add their comments to it.
(We do things this way, with the HTML and the processing logic rolled up
into a single file, to minimize the risk of the two components of the CGI
program being separated accidentally, or of a different HTML template
being used by mistake. There's another way round this using the module
CGI::FormMagick, which we'll visit in a future tutorial.)
insert_gb() is fairly simple-minded. It takes the form fields from the
CGI object and builds an SQL INSERT statement that puts them in the
guestbook.gb table. (See boxout, "MySQL and the guestbook database"
for details of the database layout.) The only non-standard items to
note are:
my $buffer = sanitise($cgi->param('contents'));
Instead of inserting the 'contents' textarea verbatim, we do some mild
processing on it. We don't want malicious users inserting random URLS
or bits of javascript into our guestbook, do we? We farm out
the job of tidying up our input to a subroutine called sanitise(), which
currently doesn't do much, but which can be extended quite easily. (If
you feel like a learning exercise, look into using the module HTML::Parser
to parse the input as HTML and return the plain text contents.)
We call the MySQL function CURRENT_TIMESTAMP() to provide a suitable
parameter for the creation_time field (of type datetime), and we set
comment_id to NULL, which forces MySQL to insert a sequential,
auto-incremented number into that field. (We don't use this here, but
per-article unique reference numbers are a vital prerequisite for any
full-blown conferencing system that supports threading, and it's nice
to be future-proof.)
print_gb() is the most complex subroutine of the bunch. First, it
generates a SQL SELECT statement to retrieve records from the gb
table. It calls DATE_FORMAT() to turn creation_time into something
more human-friendly, and it only retrieves records where 'visible'
has a value of 1, and where the difference in days between NOW() and
the value of creation_time is less than $MAIN::max_age -- so we don't
display really ancient comments unless you want to set $MAIN::max_age
to a large number of days.
MySQL does the heavy lifting of retrieving the right records and
sorting them in descending date order (most recent postings go at the
top). It uses the $MAIN::max_items variable (by default, the 50 most
recent), to limit the number of items we print, stopping a heavily-used
guestbook from spawning megabytes of HTML. Once we call prepare() to
prepare an SQL query, and execute() to run it, we process each record
returned from the database by calling fetchrow_hashref() on the
statement handle returned by prepare().
For a given query, fetchrow_hashref() returns the next row of the
results, in the form of a hashref. The keys to the hashref are the
field names of the table resulting from the SQL SELECT statement. For
example, we can do this:
my $sth = $dbh->prepare("SELECT author, creation_time FROM gb")
while (my $ref = $sth->fetchrow_hashref()) {
print "Author: ", $ref->{author}, "\n",
"created: ", $ref->{creation_time}, "\n";
}
All we do in print_gb() is wrap CGI.pm's ability to generate HTML
procedurally (using Perl subroutines) around a loop of this type, so
that instead of printing text, we produce a nice HTML table.
This isn't all we can do with the guestbook script. The print_gb() routine
in this version uses a fairly simple SQL query that, stripped of the Perl
surrounding it, looks like this:
SELECT comment_id, author, contents, DATE_FORMAT(creation_time, '%W %M %Y')
FROM guestbook
WHERE
visible=1 AND (TO_DAYS(NOW()) - TO_DAYS(creation_time) <= 14)
ORDER BY comment_id DESC\n
What this means is: we want to retrieve four fields (comment_id, author,
contents, and a neatly formatted version of the date -- which is stored
internally as a machine-readable datestamp) from the guestbook table. The
only records we want are those where the "visible" flag has a value of
"1" and the record is less than 14 days old. (TO_DAYS() is an internal
MySQL function that takes a timestamp -- such as the one stored in the
database or the one returned by the MySQL NOW() function -- and returns
the integer number of days in it.) The retrieved records are then ordered
in descending order of the comment_id field (an integer assigned to each
record as it is created).
If we have a busy guestbook, we may not want to retrieve hundreds of records.
We may instead prefer to display, for example, blocks of ten records with
"forward" and "back" buttons to navigate through the list. How can we modify
the gb.cgi script here to do this?
The simplest technique is to add a spare hidden field in the HTML produced
by print_gb(), containing the current offset into the list of records
in terms of the comment_id value of the first record -- CGI.pm lets us do
this by saying:
print $cgi->hidden(comment_id_offset, $starting_comment_id);
(Where $starting_comment_id is the comment_id of the first record
we've printed.) We can then add "forward" and "back" submit buttons by
attaching values to the submit button:
print $cgi->submit(-name => 'forward', -value => 'forward');
or:
print $cgi->submit(-name => 'back', -value => 'back');
print_gb() can detect which of the submit buttons was pressed in the
usual way, by checking the CGI parameter's value:
if ($cgi->param("forward") eq "forward") {
# a forward button was pressed
} elsif ($cgi->param("back") eq "back") {
# a back button was pressed
} else {
# this is the first invocation of print_gb() code -- neither
# button was pressed
}
Now we need to modify the SQL executed by print_gb() accordingly. Suppose
we want to view pages of $page_length records in the guestbook. If
we have no previous invocation of print_gb(), we set $starting_comment_id
to 0. If a forward button was pressed, we add $page_length_records to
$starting_comment_id, and if a back button was pressed, we subtract
$page_length_records from $starting_comment_id (but make sure that the
number never goes below zero!).
Now our SQL select statement in print_gb() needs to look like this:
SELECT comment_id, author, contents, DATE_FORMAT(creation_time, '%W %M %Y')
FROM guestbook
WHERE
visible=1 AND
(TO_DAYS(NOW()) - TO_DAYS(creation_time) <= 14) AND
comment_id >= $starting_comment_id AND
comment_id <= ($starting_comment_id + $page_length_records)
ORDER BY comment_id DESC\n
Feel like modifying some code?
Note that while this is the simplest way of making our guest book
navigable, it's not necessarily the best. Each time you hit the forward
or back button the CGI script executes a SQL statement. A heavily used
system that doesn't absolutely require database consistency may cache
the output from SQL queries in something like a DBM file, and return a
cookie to the user containing the key to the DBM file, so that instead
of needing a SQL SELECT statement to generate the next pages it can
simply yank the frozen results of an earlier SELECT out of a cache.
But that's getting beyond the scope of a simple guestbook script.
END (Main text)
BOXOUT: MySQL and the guestbook database
MySQL, from TcX DataKonsult AB of Sweden, is an open source database.
It provides a lightweight and nearly complete SQL92 implementation,
while being small enough to pick up relatively fast. (Unlike, say,
Oracle.) You can find it at http://sunsite.org.uk/packages/mysql/.
MySQL is described in detail in the Reference Manual that comes with
the package. It's online at http://www.mysql.com/Manual/manual_toc.html .
There is also a book on the subject: "MySQL & mSQL", by Randy Jay Yarger,
George Reese & Tim King (pub. O'Reilly and Associates, ISBN 1-56592-434-7).
This column is way too short to explain how to install MySQL in
detail; you will need to look at the directions in the manual.
However, to run this guestbook script we need to do a couple of
things:
*** BEGIN NUMBERED LIST
0. Install MySQL and start the server running (of course!).
1. As root, create a new database called 'guestbook':
mysqladmin create guestbook
2. Create a new MySQL user called 'gb'. This user can connect only
from localhost, using a password (set up as 'gb_password' in the
gb.cgi script -- you'll need to change this), but has root
privileges on the 'guestbook' database.
As root, run 'mysql mysql' (i.e. fire up the MySQL query monitor,
running on the 'mysql' database which stores MySQL's local
settings).
Type:
insert into mysql.user values ('localhost','gb',PASSWORD('gb_password'),
'y','y','y','y','y','y','y',
'y','y','y','y','y','y','y');
This sets up a new user called 'gb' with password 'gb_password'.
(You can dink with these, but if you do you'll need to modify the
contents of the hashref $dbconf in gb.cgi -- otherwise the Perl
CGI program won't be able to connect to the database using this
username and password.)
Once you've set up the user called 'gb', you need to quit the mysql
query monitor and tell the database server to reload its' user
privileges:
mysqladmin flush-privileges
3. Now it's time to set up the 'guestbook' database. Actually, it's
not a very complex database -- one table with five fields. You can
do it like this:
mysql -u gb -pgb_password guestbook < gb.sql
(Note that we can now do this as user 'gb' by specifying gb's
password). gb.sql is simply a text file containing our database's
structure:
CREATE TABLE gb (
author varchar(128),
contents text,
creation_time datetime,
visible tinyint(4),
comment_id int(10) unsigned DEFAULT '0' NOT NULL auto_increment,
KEY comment_id (comment_id)
);
4. We should now have a database called 'guestbook', that contains a
single table called 'gb', with the structure defined above. We
should also have a user called 'gb' who, when connecting from
localhost, has full privileges on the 'guestbook' database (but
no others).
The final step is to put the gb.cgi script somewhere appropriate
(your cgi-bin directory is a good start!), make it executable (if
necessary editing the #! line), and point a web browser at it.
*** END NUMBERED LIST
END (BOXOUT)
BOXOUT: The Perl DBI Interface
A Perl program calls:
use DBI;
to load the DBI module. Before this will work, you need to install
the latest version of DBI (available from CPAN -- http://www.perl.com/CPAN/)
and an appropriate DBD drivers for relational databases you're going
to use (also available from CPAN).
A DBI session typically starts by calling the connect() method with a
bunch of parameters which vary from database to database (the details are
in the documentation for each DBD driver). This tells DBI to load the
correct DBD driver, establish a connection to the RDBMS engine, and
return a database handle. The database handle is actually a DBI
object, and provides methods for talking to the database we're
connected to. For example:
my $dbh = DBI->connect("DBI:mysql:database=guestbook;host=localhost",
"guestbookuser", "guestbookpassword",
{ 'RaiseError' => 1 }
);
This calls the DBI->connect() method and tells it to return a handle
connected to the MySQL database server running on localhost, talking
to the database called 'guestbook', signed on as 'guestbookuser' using
the password 'guestbookpassword'.
Once we have a handle, we can feed SQL commands to the database:
$dbh->do("UPDATE gb SET visible=0 WHERE comment_id=42");
(This carries out the UPDATE statement immediately.)
Or we can prepare a SELECT statement, then fetch each line of the resulting
table (in this case, fetching the line as an array of fields):
my $sth = $dbh->prepare("SELECT author, creation_time FROM gb")
|| die $DBI::errstr;
while (@row = $sth->fetchrow_array() ) {
print "Author: $row[0]\n",
"Date: $row[1]\n\n";
}
(Note the "die $DBI::errstr" bit; if the prepare() call fails, it
returns non-zero -- we therefore die(), emitting the value of
$DBI::errstr, which is hopefully an informative error message.)
At the end of a session, we call disconnect() to shut down the RDBMS
connection:
$dbh->disconnect();
This doesn't even scratch the surface of what you can do with DBI! For
the full in-depth tour, you really need to read "Programming the Perl
DBI" by Alligator Descartes & Tim Bunce (pub. O'Reilly and Associates,
ISBN 1-56592-699-4). Highly recommended -- Alligator and Tim invented
DBI and wrote more than half the DBD drivers, so they know what
they're talking about!
END BOXOUT
BOXOUT: How not to design a database
Many books have been written about the design of relational databases;
if you're serious about it, you could do worse than start with "An
Introduction to Database Systems" by C. J. Date (pub. Addison-Wesley).
For the guestbook, we don't need anything fancy. Users submit comments;
each of these has some associated information -- a flag to say if it is
visible, a timestamp, the author's name, and an autoincremented
sequence number that we assign to it automatically. We don't have a
user registration system, so we don't keep user information in a
separate table; we just take whatever name they give us. A single
table is in fact all we need for an unstructured guestbook -- we don't
have to start worrying about entity relationship diagrams unless we
want to extend the application to cope with user accounts, threading,
and the like.
Our table looks like this:
CREATE TABLE gb (
author varchar(128),
contents text,
creation_time datetime,
visible tinyint(4),
comment_id int(10) unsigned DEFAULT '0' NOT NULL auto_increment,
KEY comment_id (comment_id)
);
In case you haven't met all these data types, a varchar is a
variable-length character string (here maxed out at 128 characters). A
text field is also variable length, but able to hold much larger data
items (albeit with constraints on indexing and searching). A datetime
field is our timestamp, and we use a tinyint (eight-bit signed
integer) as a flag to say whether or not this record is visible ('1'
means it will show up in guestbook listings; any other value means it
won't). The comment_id field is more complex. We specify it as an
unigned long integer, defaulting to a value of '0'. It can't be a
NULL, and it auto-increments -- if we try and assign a NULL to it,
MySQL will replace the NULL value with the next sequential number
available in ascending order. It's designated as a key field, although
we don't use it as such here: this is a constraint on MySQL's
auto_increment fields.
A more interesting guestbook will extend this basic data structure. In
particular, it'd be nice to allow users to add titles to their postings,
and to refer (by message ID) to previous postings. It would also be nice
to be able to print the discussions in a structured, neatly indented,
manner -- providing for hierarchical, threaded communications. We could
accomodate this by adding a "reference" field, of type int(10); this
can contain the comment_id of a parent article, if the current one is
a follow-up to an earlier posting. However, before we can print such a
threaded discussion we need to know how to build tree-like data structures
using Perl ...
END BOXOUT
BEGIN LISTING: gb.cgi
001: #!/usr/bin/perl
002:
003: #------------------- standard CGI skeleton ----------------
004:
005: use strict;
006: use CGI qw(shortcuts font table td TR);
007: use DBI;
008:
009: # configuration bits
010:
011: $MAIN::max_age = 14; # oldest record, in days, to display
012: $MAIN::max_items = 50; # maximum number of records to display
013:
014: my $owner_name = "Charlie Stross";
015:
016: my $dbconf = {
017: "host" => "localhost",
018: "dbname" => "guestbook",
019: "username" => "gb",
020: "password" => "gb_password",
021: "table" => "gb",
022: "fields" => [qw(comment_id author
023: contents creation_time visible)],
024: };
025:
026: # end of configuration bits
027:
028: my ($q) = new CGI;
029:
030: my $connectstr = "DBI:mysql:database=" .
031: $dbconf->{dbname} .
032: ";host=" .
033: $dbconf->{host} ;
034:
035: my ($d) = DBI->connect($connectstr,
036: $dbconf->{username},
037: $dbconf->{password},
038: {'RaiseError' => 1});
039:
040: print $q->header(-type => 'text/html',
041: -status => '200 OK');
042: print $q->start_html(-title => "$owner_name\'s guest book" ,
043: -BGCOLOR => '#FFFFA0');
044: print $q->h1("$owner_name\'s guest book");
045: print $q->hr();
046: print $q->start_blockquote();
047: if ($q->param('data') == 1) {
048: insert_gb ($q, $dbconf, $d);
049: print_gb ($q, $dbconf, $d);
050: } else {
051: print_gb ($q, $dbconf, $d);
052: print_form($q);
053: }
054:
055: print $q->end_blockquote();
056: print $q->hr();
057: print $q->start_font({'size' => '-2'});
058: print $q->p("This page generated for host ", $q->remote_host(),
059: " (using ", $q->user_agent(), ") on ", scalar(localtime(time)));
060: print $q->end_font();
061:
062: print $q->end_html;
063: $d->disconnect();
064: exit 0;
065:
066: #------------------- support routines ----------------------
067:
068: sub print_form {
069: # print guestbook entry submission form. This includes a hidden
070: # field called 'data' that is set to '1', which tells the next
071: # invocation of the CGI program that it is processing a
072: # submission.
073:
074: my $q = shift; # CGI query object to use for building form
075:
076: print $q->startform;
077: print $q->p("");
078: print $q->h2("Have your say");
079: print $q->hr(), "\n";
080: print $q->table( {"border" => "1",
081: "bgcolor" => "#FFA0FF",
082: },
083: $q->hidden( -name => "data", -value => "1"),
084: TR(
085: td("From:"),
086: td( $q->textfield("author"),
087: $q->i("Your name/email address goes here") )
088: ),
089: TR(
090: td("Message:"),
091: td( $q->textarea(-rows => "6",
092: -cols => "60",
093: -wrap => "physical",
094: -name => "contents")
095: )
096: ),
097: TR(
098: td("Send comment:"),
099: td( $q->submit(-name => "send") )
100: )
101: );
102: print $q->hr();
103: print $q->endform;
104: return;
105: }
106:
107: sub insert_gb {
108: # process a new guestbook submission. Strip anything suspicious
109: # (i.e. raw binaries) from the contents field, and insert into
110: # database.
111: my $cgi = shift; # perl CGI object
112: my $dbconf = shift; # hash of configuration info
113: my $db = shift; # perl DBI database handle
114:
115: # we want to sanitise the contents of 'author' and 'contents' here.
116: my $buffer = sanitise($cgi->param('contents'));
117: my $query = "INSERT INTO " . $dbconf->{table} . " " .
118: "(author, contents, creation_time, visible, comment_id)\n" .
119: "VALUES (" .
120: $db->quote($cgi->param('author')) . ", " .
121: $buffer . ", " .
122: "CURRENT_TIMESTAMP()" . ", " .
123: "1" . ", " .
124: "NULL" .
125: ")\n";
126: $db->do($query) or print "\n", $db->errstr();
127: return;
128: }
129:
130: sub print_gb {
131: # print guestbook contents We retrieve all
132: # records from the database that are not flagged with visible<>1,
133: # and which were submitted in the preceeding $MAIN::max_age
134: # days, and we order them in reverse order of date (to a maximum
135: # of $MAIN::max_items entries).
136:
137: my $cgi = shift; # CGI object
138: my $dbconf = shift; # hash of configuration info
139: my $db = shift; # perl DBI database handle
140: my $count = 0; # count of retrieved rows from database
141:
142: print $cgi->h1("Things people said");
143:
144: print $cgi->start_table( {"border" => "1",
145: "bgcolor" => "#FFA0FF"} );
146:
147: my $query = "SELECT comment_id, author, contents, " .
148: "DATE_FORMAT(creation_time, '%W %M %Y') " .
149: "FROM " .
150: $dbconf->{table} .
151: " WHERE " .
152: "visible=1 AND " .
153: "(TO_DAYS(NOW()) - TO_DAYS(creation_time) <= $MAIN::max_age)" .
154: " ORDER BY comment_id DESC\n";
155:
156: my $sth = $db->prepare($query);
157: $sth->execute();
158: while (my $ref = $sth->fetchrow_hashref() and
159: $count <= $MAIN::max_items) {
160: $count++;
161: print $cgi->TR(
162: td("From:"),
163: td( $ref->{author}),
164: ),
165: $cgi->TR(
166: td("Date:"),
167: td( $ref->{creation_time}),
168: ),
169: $cgi->TR(td("Comment:"),
170: td(
171: $cgi->pre(
172: $ref->{contents}
173: )
174: )
175: );
176: }
177: if ($count == 0) {
178: print $cgi->TR(
179: td("Sorry, this guestbook is empty!")
180: );
181: }
182: print $cgi->end_table;
183: print $cgi->hr();
184: return;
185: }
186:
187: sub sanitise {
188: # sanitise the contents of someone's posting.
189: #
190: # A *really smart* guestbook would be smart enough to keep
191: # sane/legal HTML while ditching the dubious stuff. But this
192: # program is here to demonstrate MySQL and DBI, not whizzy CGI
193: # scripting.
194: #
195: # Currently all we do is replace carriage returns with
tags,
196: # and trust that the user will only enter ASCII text. Feel free
197: # to extend this subroutine to do something more useful!
198: #
199:
200: my $buffer = shift;
201: $buffer =~ s/</g;
202: $buffer =~ s/>/>/g;
203: $buffer = join("
", grep(/^.+$/, split(/[\r\n]/, $buffer)));
204: $buffer =~ s/'/\\'/g;
205: $buffer = "'" . $buffer . "'";
206: return $buffer;
207:
208: }
END LISTING (gb.cgi)