hex2env / hex2char

I don't know about you, but I like prototyping ideas and layouts of CGI-generated pages using the bourne(again) shell. If you're like that, too, you will have found, no doubt, that decoding CGI-encoded parameters (my.cgi?name=Steve%20Parker) is a pain, using sed excessively. So there are also a couple of good utilities here which I wrote quickly and badly in C to take away the pain.

These are hex2char.c and hex2env.c. They basically do the same thing as each other, which is converting those nasty hex-coded CGI strings (like %20 or + for space, %3A for : and so on), into the characters they really should be. This is mainly useful for parsing form input, but other things, too.

Either of these could be easily taken to simplify your C language CGI script, but the reason I wrote them was for using shell scripts to sort out CGI encoding. They do the same job, in slightly different ways. Here's all the documentation you'll get (apart from the code itself):

If my CGI script is going to be called with the URL:

http://steve-parker.org/this/is/a/demo/my.cgi?name=Steve+Parker&a=b+%26c&f=red

then it must translate that into:

name="Steve Parker" a="b &c" f="red"

We all use CGI like this every day, and take it for granted. However, whilst the C code to cope with this (as you'll see from the code here) is simple, translating this lot in a shell script is much more complicated. The reason I like to prototype (and sometimes keep) CGI in shell script format, is that I find it much simpler for straightforward tasks, and for some more complex ones. For example, a simple text database using cut is much quicker in a shell script than it is in C. It's just getting the data that's difficult.

My answer to this problem is: cheat.

So here's an example shell script to take this data:


#!/bin/sh

TMPFILE=/tmp/hex2env-demo.$$

echo "Content-type: text/html"
echo
echo "<html>"
echo "<head>"
echo "<title>"
echo "An example use of hex2env"
echo "</title>"
echo "</head>"
echo "<body>"
echo "<h1>This is your input...<h1>"
read x
./hex2env "${x}" > ${TMPFILE}
. ${TMPFILE}
rm $TMPFILE
echo "The received string is: <pre>"
echo "${x}"
echo "<pre>"
echo "Your name is: <B>${name}</B> <BR>"
echo "Your comment is: <B>${comments}</B> <BR>"
echo "<a href=\"/c/hex2X.shtml/\">"
echo "That's all folks! <BR>"
echo "</a>"
echo "</body>"
echo "</html>"

And here's how it works: So try it out for yourself. Okay, I've added my standard headers and footers to the output, just to keep the formatting in line with the rest of the site. Yours won't have the navigation bars, etc. That's easily done if the site is designed with this kind of thing in mind: just cat the header and footer files, and create HTML in between.

hex2char.c works in a very similar way, but just sorts out the main CGI encoding, but doesn't remove the &'s between the variables. It also doesn't add a newline at the end. So the difference is:


$ ./hex2env "name=Steve+Parker&comments=Hello+There%21"
name="Steve Parker"
comments="Hello There!"
$ ./hex2char "name=Steve+Parker&comments=Hello+There%21"
name=Steve Parker&comments=Hello There!$
$

Which could be useful; at least you don't have the security issues of temporary files. However, hex2env grew out of hex2char, but hex2char is shorter. (Doesn't compile to a much smaller binary under Linux, though: hex2char is 12116 bytes, and hex2env is 12547).

So that's what it is, that's how it works, it seems trivial if you try it with the comments suggested above - try to fool it! Come up with weird characters, see if it gets them right, try HTML coding (such as "steve <i>sucks<i;> at CGI), and so on. It's not 100% foolproof, but it's pretty good. Also consider when writing CGI to parse user-entered data, what the security implications are.

I've written this code on Linux 2.2 and 2.4, and it runs on steve-parker.org on Solaris 7, so I guess it should work on most things. In all cases I've compiled with gcc. Your mileage may vary, but only if the stdin library differs significantly from the POSIX standard.