This blog runs on CGI

This blog runs on CGI

I’m a month late to this internet debate, but life got in the way. But I finally got around to this after everyone’s forgotten!

This blog runs using CGI, and CGI is neither slow nor insecure. It might not be “webscale”, but you can do a lot of things without being webscale.

What is CGI anyway?

CGI, the Common Gateway Interface, is a specification on how a web server can start a process to serve a request. Like all traditional internet standards it’s specified in an RFC, RFC3875 for those interested.

When a request is handled by CGI the web server invokes the handler with environment variables defined in the RFC and any request body supplied via STDIN. It then reads a response from the process which is formatted almost like an HTTP response, but this is parsed by the server first and transformed in to a real response. If the process exits without returning a valid CGI response you get the dreaded 500 internal server error.

Why did CGI fall out of favour?

The two commonly cited reasons are that it runs a process for every request, and that there were security issues.

But isn’t starting a process for every request slow?

No, not really. Using my home server, a 10 year old Intel 3.5Ghz i5, a plain file on disk takes 3.5ms for the response to start and a CGI program that returns plain text takes 6ms. That 6ms includes starting the process, it returning the response, exiting, and the web server serving that response. CGI responses are typically fully buffered so response does not start until the process exits, though this can be configured in some servers.

CGI got the reputation for being slow from using scripting languages like Perl and PHP, where every startup also parsed the script. A basic Perl CGI script can take 200ms to start on my server. Back in the 90s and 2000s this was often the only choice if you couldn’t afford a dedicated server, many shared hosts did not allow you to run your own binaries and you had to use the languages provided.

However my blog is written in Rust so startup time isn’t a big problem. The slowest part is connecting to the database.

So isn’t CGI insecure?

Not more than any other web application framework. There were some issues with common CGI client libraries parsing the environment variables incorrectly but there have been numerous HTTP header parsing vulnerabilities over the years in many other web application systems.

There is a unique aspect to CGI that made it insecure back in the 1990s though. On a multi-user system you could use the execution model of CGI to get access to files you weren’t supposed to.

When a process starts another on a POSIX system it has to run as the same user, unless the process is running as root so it can call setuid. So either your CGI application was running as the web server’s user (often www-data) or your web server had to run as root and call setuid between fork and exec, and determining the correct user can be a difficult task. There is another option - the setuid bit on the file - but that has its own history of problems and generally won’t work with interpreted scripts that CGI was normally used with.

So your application ran as the web server’s user, which means it can read all the files the server process can. Doesn’t seem like a big issue, since those files will be served by the web server right?

Most applications have secrets of some sort, perhaps database passwords or third party service tokens (or admin credentials… but we don’t do that right?). When the CGI application is executed by the web server these stay secret.

Lets assume you can’t trick the web server in to returning the plain text, but your server is also used by dozens of other people. This was common in the 90s and 2000s as server were expensive, so we shared.

If Eve creates read_any_file.cgi in her cgi-bin this file can read any other file on the server the web server can read. Eve can use this to get Alice’s application, and if this is a script as was typically the case all the source would be returned. Even if Alice kept her secrets in a separate file outside the web server’s CGI path it would still have to be readable to the web server, and also Eve.

This is also not a problem today. We have virtual machines everywhere and truely multi-user systems are pretty rare.

So in today’s web CGI isn’t any more insecure than any other framework.

But it isn’t webscale!

And that’s fine. We don’t all need to be webscale. However CGI can get you quite far, process creation is cheap and you don’t have to worry about memory or resource leaks!

Should I CGI?

Probably not. Most modern web servers don’t support CGI applications, you’ll need to run something like Apache httpd or Lighttpd. Nginx won’t help you.

However function-as-a-service systems like AWS Lambda and Azure Functions and whatever Google Cloud is calling it this month are effectively the same thing. The processes do live longer than one request, but they’re not all that different to write.

I like CGI because it’s so simple, and with a bit of knowledge you can debug the application without even running a web server. I’m also old enough to remember doing this when it was the only option.

Is this blog really using CGI?

Wellllll yes and no. This blog is a static site, but the admin site is a CGI application that writes the HTML to disk like Moveable Type used to do. Could it dynamically generate? Yes, but I don’t want to deal with the rewrite rules to make the URLs nice.

The comment form is dynamic, as is the activity pub handler. Yes you can follow this blog on the fediverse @blog@thea.hutchings.gen.nz and that’s all handled with CGI.

And yes it is written in Rust, purely because I can. Don’t use that as an example of how to write Rust.

Comments

Post a comment