CGI Scripting


[ Site Index] [ Attic Index] [ Perl Index] [ Feedback ]


One of the commonest tasks new Perl programmers seem to get up to is writing a search script for a website.

This is a somewhat tricky job. There are several different ways to shoot yourself in the foot: poor someone@somewhere posted on comp.lang.perl.misc asking for help with a script that was broken in several different ways. Here's a detailed examination of what went wrong, illustrating the classic mistakes, and showing how the job really needs to be handled.


From: charles@fma.com (Charlie Stross)
Newsgroups: comp.lang.perl.misc
Subject: Re: help with grep in perl
References:  <01bb6462.d8bf2a00$d0762e99@wally.us.telekurs>
Reply-To: charles@fma.com
Message-Id: <slrn4t7mdl.2nl.charles@fma10.fma.com>
X-Newsreader: slrn (0.8.8.1 UNIX)
Date: 28 Jun 96 14:17:03 GMT
Lines: 150
Path: news.fma.com!charles

"Name Withheld"<someone@somewhere> wrote
(in article <01bb6462.d8bf2a00$d0762e99@wally.us.telekurs>):
>
>I have seen similar messages but can't seem to get this to work right.
>In the following script, if $query is more than a single word, grep
>returns nothing or doesn't work.
>Also, if grep returns nothing, my $look = "not found!\n";  line doesn't
>work either.
>How can this be changed to allow a multiple word search?
>Or is there another search method which allows and/or operators?
>Much thanks in advance...


In perl, it has been said, There Is More Than One Way To Do It (whatever "It" is).

You have chosen a way of doing It which, in this case, sort of works but is far from optimal.

>#!/usr/local/bin/perl
>print "Content-type:text/html\n\n";
>$query = $ENV{'QUERY_STRING'};
Firstly, this is pretty broken. It assumes that everybody who uses it is going to query your script through the form you wrote, which presumably says something like <FORM ACTION="/cgi-bin/foo.pl" METHOD="GET">.

This is a lousy assumption. What if J. Random Webhacker decides to poke around your site and writes their own invocation of your script, using METHOD="POST"? You're going to miss that.

Next:

>$grep="/usr/bin/grep";
>($field_name, $key) = split (/=/, $query);

BadBadBad!

Parameters passed via the QUERY_STRING environment variable are escaped, in the form percent-sign<hexadecimal_number>, where the hex number is the ordinal position of the encoded character in the ISO8859/1 (Latin-1) codeset. For example, spaces are encoded as %20.

So suppose someone puts

"foo bar"

into your form.

You're going to end up with a query string containing

foo%20bar

and when you do this split() and pass it to grep you're going to end up searching for 'foo%20bar'. Which is not what you intended.

>$result = `$grep -i $key /WWW/usmc81/allguests.html`;
This is an even more ghastly offense. Suppose J. Random Cracker types the following query into your search form:
foo ; cd / ; rm -rf *; #
What do you suppose is going to happen?

Actually, nothing -- because you aren't unescaping the query string elements, so what comes through will look like foo%20;%20cd%20/%20;%20rm%20-rf%20*;%20#, which is pretty meaningless. But if you do fix the query-string escaping problem, you've then got to grapple with the security implications of letting users enter text which is passed to an external program and executed on your server(!).

Plus, the overheads of executing an external program are non-trivial, when perl has its own built-in grep() function.

>unless ($result <0)   {
>$result = "not found!\n";
>}
>print <<EndOfHtml;
><html><head><title>Search Results</title></head>
><body>
><h2>Search Results</h2>
><hr>
>EndOfHtml
>    ;
>
>print "<pre>";
>print $result;
>print "</pre>";
Let's be sensible about this. What you want to do is to grep through a single file (/WWW/usmc81/allguests.html) for one or more strings, right?

First, don't reinvent the wheel; use Perl 5.003 and CGI.pm, the CGI module.

What you want should look a bit like this, in object-oriented perl 5. (Apologies for any trivial errors -- I wrote this on the fly, cannibalizing gratuitously from a working script, and haven't had time to test it.)

#!/usr/bin/perl5.003 -w 

use Config;
use CGI;
use CGI::Carp;

my ($file) = '/WWW/usmc81/allguests.html';

my($q) = new CGI;    #create a new CGI object
$| = 1;              # set STDOUT to unbuffered

print $q->header( -type => 'text/html',
                  -status => '200 OK',
                  ); # send an HTTP response -- anything we output is HTML
print "\r\n";

                     # start accumulating HTML for output in our CGI object
print $q->start_html( -title=>'Search Guestbook',
                      -author=>'me@home.com',
                      -BGCOLOR=>'#F0F0F0' );

print $q->h1('Search Guestbook');
print $q->h2('This is how you insert text in an <H2> tag in your form');
print $q->p('This is how you insert text into  the CGI output');
print $q->hr; # This is how you insert a horizontal rule in the CGI output

# now we're ready to run. We've created a CGI object and given it some
# HTML headers to print. If we've been called with a parameter called 'target'
# we can invoke a search routine on target's contents; otherwise, we need
# to print a form

$q->param('target') ? search($file, $q), \%visited) : printform();

print $q->end_html; # end accumulating HTML in CGI object; render it

exit 0;            

sub printform {
    # This routine is triggered if we've been invoked without a
    # query parameter called 'target'

    print $q->startform(-method => 'get',    # print a form
                        -action => $q->url); # action is this very CGI script
    print $q->p("Enter the text to search for:");
    print $q->textfield(-name => 'target',
                        -size => '50',
                        -maxlength => '80');
    print $q->submit();                     # add a submit button
    print $q->reset();                      # add a reset button
    print $q->endform;                      # end the form
    return;
}

sub search {
   my ($target); # this will contain our search string
   my ($f) = shift @_;
   my ($q) = shift @_;
   open (F, "<$f") || CGI::Die "Failed to open $f for searching!\n";
   my (@srch_array) = (<F>);
   close F;

   # @srch_array now contains the contents of /WWW/usmc81/allguests.html
   # Now we want to get the 'target' parameter. If there are more than
   # one word in it, we get an array of words, put each word in brackets,
   # and use the '|' alternation regexp to indicate a logical-OR search

   my (@targets) = $q->param('target');
   if ($#targets == $[) { # if we've only got an array with one element
      $target = $targets[$[] ; # stick its content in $target
   } else {
      $target = "(" . join(")|(", @targets) . ")"; # logical-OR regexp
   }
   # now we use perl's built-in grep() on @srch_array

   my (@results) = grep($target, @srch_array);
   if ($#results > $[) {       # If we found anything at all
       foreach (@results) {    # loop on the results, 
           $q->p("Found: $_"); # printing them
       }
   } else {
       $q->p('No matches found');
   }
   return; 
}


[ Site Index] [ Attic Index] [ Perl Index] [ Feedback ]