Behind the scripts

Behind the scripts

As promised, I’m going to go through how the CGI scripts work. This is not a Perl how-to, or even an example of good perl, but more a description of the process of writing any CGI script.

You can download these scripts from my script archive!

I’ve also fixed the guestbook on the site!

counter.cgi

#!/usr/local/bin/perl
use CGI;
use GD;
use GDBM_File;

The first line, called the shebang, specifies the interpreter to use for the file - this is a UNIX-like feature where any file can be ‘executable’ by specifying the interpreter to launch. In my case as I’m using FreeBSD the perl interpreter is in /usr/local/bin/perl. If you’re familiar with *NIX systems you might wonder why this isn’t /usr/bin/env perl, and it’s not because FreeBSD doesn’t have env. Scripts run in the context of the web server, and the web server does not have /usr/local/bin in its PATH environment variable.

Then we import a bunch of modules we’re going to use. Modules were almost always installed by the sysadmin and you dealt with what was available. However there were a bunch commonly installed, like CGI and GDBM. GD was less common, but was often available on dedicated CGI hosts.

$datafile = '/home/thea/web_data/count.db';
$font = '/usr/local/share/fonts/oldschool-pc-fonts/Ac437_IBM_BIOS.ttf';

$cgi = CGI->new;
$page = $cgi->param('page') or '';

Set up the font and the location of the data file, which must be web server writable or in a directory that is web server writable. It has to be an absolute path otherwise it’d be relative to the web server’s home, which on most systems is created read only.

Then initialise CGI. Perl allows you to omit parenthesis in some cases so CGI->new is actually a method call to create a CGI object. Also grab the page name out of the query string using the new CGI object. Perl actually has both || and or as boolean or operators, they differ only in precedence. This can be handy sometimes, and make things more confusing for the next person.

$db = tie %hash, 'GDBM_File', $datafile, GDBM_WRCREAT, 0640
    or die "$GDBM_File::gdbm_errno";

$count = $hash{$page} or 0;
$count++;
$hash{$page} = $count;
$db->sync;
untie %hash;
$db->close;

This is where the counting happens. tie is a perl built-in that creates a hash (aka hashmap or dictionary in other languages) backed by some sort of store. In this case GDBM_File, which uses the GNU implementation of the DBM (DataBase Manager) key-value store. This is a fast store that uses file locks to synchronise access, so we pass GDBM_WRCREAT as the access mode - for Write, Read, Create if not found, in mode 0640, or user read/write, group read, everyone no access.

You may notice that hash is first declared as %hash, then later used as $hash. Unlike other languages like Javascript and PHP Perl does not just use $ to mean variable, there are five possible “sigils” to use the perl term. The ones I’ve used are $ for scalar variables, @ for array variables, and % for hash variables. The sigil indicates how to access the variable, not what type of variable it is. As seen here if you’re talking about the whole hash it’s %hash, but if you’re trying to get a single item from the hash it’s $hash. Sigils are one of the things that make perl very confusing.

Then we update the count, sync the DB, then untie and close it so another request can be processed.

$im = GD::Image->new(100,28);
$im->colorAllocate(0, 0, 0);
$white = $im->colorAllocate(255, 255, 255);
$im->stringFT($white, $font, 12, 0, 10, 20, sprintf('%06d', $count));

Next draw the count graphic using GD. GD is a graphics drawing library with bindings for multiple languages. Here we create a 100x28 pixel image, set up a black and white colour then draw the counter. sprintf is similar to the C function and here it’s used to output the text with six digits, padded with leading zeros.

The image background is the first colour allocated, so there’s no need to draw the black background.

print $cgi->header(
	-type => 'image/gif',
	-cache_control => 'no-store'
);
print $im->gif;

Finally print the CGI headers and then the GIF itself. The header call syntax is a way to provide a hash argument, it’s a bit weird but that’s Perl for you.

Discussion

I made a few choices here, firstly using DBM for storing the counts. It was common to use a comma or tab delimited file instead, or use a single file per counter. GDBM takes care of concurrency for me, so I used that.

Next in choosing GIF. GD supports many output formats, including modern ones. but I used GIF for old time’s sake. Technically back in 1999 it’d be more likely to use PNG as the patents on GIF caused support to be removed from a lot of libraries, including GD where GIF support wasn’t restored until 2004. However a lot of people used old versions of GD with GIF support remaining.

It could do more, like assign cookies and only count each person once per day, as I’m not in the EU I don’t need to worry about permission, or track data server side to try to reduce duplicate counts, but then the number would be lower! GDBM is a key-value store so it’s easy enough to add, iterate, and look up entries, but it only supports storing scalar values so we can’t store an array or hash without serialising it manually.

But there’s a web counter in under 30 lines of code!

guestbook.cgi

#!/usr/local/bin/perl
use strict;
use CGI;
use POSIX;
use Fcntl qw(:flock);
use HTML::Escape qw/escape_html/;

shebang and module imports. Here we use strict, which will throw an error if we use certain Perl constructs that are common pitfalls, and yes this is where 'use strict'; in Javascript comes from! I didn’t use this in the counter script.

HTML::Escape is a modern module but it isn’t 1999 now so I’m going to escape HTML.

my $url = '/~thea/guestbook.html';
my $file = '/home/thea/public_html/guestbook.html';
my $cgi = CGI->new;

Set up some variables, like file paths and the URL of the guestbook file. And here we see the first strict change - the my keyword. my puts the variable in lexical scope, without my the variable will be in global scope no matter where it’s declared. This is exactly like the behaviour of Javascript in non strict mode with var.

if ($cgi->request_method eq "POST") {
	my $name = escape_html $cgi->param("name");
	my $message = escape_html $cgi->param("message");
	my $website = $cgi->param('website') or '';

If the HTTP method is POST then we grab the parameters. param will fetch from either query string or post body, which is normally considered bad practice today. Additionally we run escape_html on the name and message fields, again omitting the parenthesis from the function call.

	$message =~ s/\r?\n/<br>\n/g;
	$website =~ s/[^a-zA-Z0-9\/\\:\?=& _\.-]//g;
	$website = substr($website, 0, 50);

Next we do two things, replace new lines with HTML line breaks, and remove unknown characters from the website field. I could use escape_html here, but I though I’d show a more 90s method. This demonstrates Perl’s =~ operator, which sets the var to the result of the regexp substitution on the right.

	my $time = strftime "%a %e %b %I:%M%p %Z", localtime time;

Grab the time, for the timestamp. This is the POSIX strftime function, so the formatting operators depend on your operating system.

	open(FH, '+<', $file) or die 'Cannot open guestbook file';
	flock(FH, LOCK_EX) or die 'Cannot lock file';

Open the guestbook file and take an exclusive lock on it. A wiser solution would to loop this in case of concurrency issues, but is that really going to happen?

File handles in Perl are a special type. They can be assigned to regular with my $fh, but that’s not required. Just remember that FH is in global scope.

	my @content = <FH>;
	seek $fh, 0, 0;
	truncate $fh, 0;

Read the file in to @content. This is the first time we see a Perl array, and this construct reads the whole file line-by-line in to @content. Then we seek to the start of the file and truncate it.

	for (@content) {
		if (/<!--NEXT_ENTRY-->/) {

Next loop over the file until we find the next entry marker. You may wonder where the loop variable is, and notice there’s no variable specified in the if. Perl has an implicit variable - $_ - which is automatically used in many functions if you don’t specify a variable. So in this case $_ is the loop variable and the if automatically test against this. $_ is addressable as a regular variable as well.

			print FH "<p>$message</p>\n";
			if ($website) {
				print FH "<p><a href=\"$website\">$name</a> at $time</p>\n";
			} else {
				print FH "<p>$name at $time</p>\n";
			}
			print FH "<hr>\n\n"

This just prints the new entry in to the guestbook

		}
		print FH;
	}

	close FH;

And finally print the current line with the implied $_ argument, the close the file handle. print to a filehandle doesn’t automatically use $_ if you use a variable-bound filehandle.

}

print $cgi->header( -status => 302,  -location => $url );
print "Redirecting...";

Finally redirect back to the guestbook page! Under 50 lines of code!

Discussion

This method of directly modifying the file was quite common. Given most of the time you couldn’t access the server other than via FTP this allowed you to moderate the guestbook by downloading, editing, and re-uploading.

It was also pretty normal to ask for and publish an email address, but then spam stopped this practice.

So what next?

Well this was my foray in to CGI scripts! Next up some other 90s classics are up, like a webring!

I find it interesting how Javascript and Perl both had global by default variables and use strict to prevent those footguns, but somehow Javascript ended up with multiple null types (Perl doesn’t have null, only undef for undefined) and automatic semicolon insertion. Perl has many more features to make incomprehensible code though.

Comments

Post a comment