Friday, May 15, 2009

Rshrtnr: The private URI shortener

Over the past couple of days, I've implemented a private URI shortener service for myself, which I have named "Rshrtnr". The derivation of the name is left as an exercise for the reader.

My main motivation for writing it was a criticism of public URI shortening services that I have been seeing in blogs for a long time: if the service has some downtime or suddenly disappears, all of the links that you have created with it are useless. With my approach, I regain some control of where my shortened links point to, and if the service has downtime and/or disappears, I have more options for restoring it.

The code itself is written in Python. SLOCCount says that the core module runs at around 100 SLOC. Most of my time, however, was taken up by working around problems relating to my webhost's Python installation. The supported Python version is 2.4.x, which is ridiculously old (for reference, Gentoo was the last major Linux distribution to switch from Python 2.4 to 2.5, around July 2008). Additionally, for some reason, if I attempt to change the sys.path variable (i.e., the "include path") to use locally installed modules (I am on a shared host), the entire script breaks with zero logged messages anywhere. It runs fine via the command line, but in FastCGI mode, the strangeness occurs.

The two third-party modules that I used were Paste and mysql-python. I store the URIs and their associated aliases in a simple SQL table, and I use Paste for various WSGI/HTTP-related utilities. I "manually" handle routing via parsing the PATH_INFO environment variable.

There are two ways to specify an alias: either explicitly send a custom one as a query parameter with the URI, or let the app make a random one for you. With the latter behavior, it hashes the URI to generate an eight character "unique" alias. Since there are (in theory) 64^8 possibilities, I don't think I'll run out of aliases any time soon, especially since custom aliases can be anywhere from 1 to 15 characters long.

In my opinion, the most interesting feature is that adding URIs requires one to send an OpenPGP-encoded query string, which needs a public key recognized by the app for the operation to succeed. To write this, I simply parsed the output from sending the OpenPGP message to the gpg binary.

Finally, mod_rewrite magic is used to prettify the shortened URIs. Nothing too exciting about that part.

I had thought about hosting a version of Rshrtnr on Google App Engine, but a key component is missing - OpenPGP support.

If anyone wants me to release it, please comment below. There's currently a bunch of webhost-specific things that I would need to abstract out before I release the code to the general public, and unless someone gives me a very good reason, it will be licensed under the AGPL version 3.

Thursday, May 07, 2009

On Bindings

One of the more interesting areas in software development, to me at least, is language bindings. Being able to interface with a library written in one language in another language is kind of satisfying, as it allows me to develop without having to reinvent the wheel. There are two specific projects that I use and work on so that I can enhance the software that I develop: GObject Introspection (G-I) and python-spidermonkey.

GObject Introspection

As a quick overview, the goal of this project is to give C libraries the tools to provide enough metadata about their API so that bindings can be written with minimal effort. Given the time and effort that I have put into maintaining the Awn bindings, it is not very surprising that I would be willing to help out getting this framework working for Awn. My ultimate goal is to eliminate the bindings/python folder in the Awn source tree. It is basically a mixture of a Scheme definition file plus a very bizarrely formatted "override" file for custom definitions, all integrated into autotools to produce a C library that is ready to be dynamically loaded into python via import. To meet this goal, I am contributing to the PyBank project, which is a prototype Python module that interfaces with the GObject Introspection library to read compiled library metadata files (called "typelibs") on the fly so that classes, functions, etc. can be loaded and called at runtime. In addition to myself, a Google Summer of Code student and a Sugar Labs developer are also working on the module, with Johan Dahlin overseeing it all. So far, I've contributed a unit test suite, ported from the gjs project (JavaScript bindings for GLib-based libraries based on the Spidermonkey VM) and working type bindings for various simple types (e.g., int64 and float).

I have also put some coding effort toward G-I integration in Vala. Vala supports G-I by both reading GIR files (the XML serialization of G-I metadata) to produce VAPI files (short for Vala API files), and writing GIR files when producing a library written in Vala (e.g., libdesktop-agnostic). I have contributed mostly what amounts to workarounds in the GIR reading code, with regards to Vala/G-I behavioral inconsistencies. Didier 'Ptitjes' has done much, much more solid work than I have on both fronts, which I greatly appreciate.

python-spidermonkey

This project, as the README states, lets you [execute] arbitrary JavaScript code from Python[, and allows] you to reference arbitrary Python objects and functions in the JavaScript VM. As I've stated in an earlier blog post, I use this in my custom website build system to both validate and pack my JavaScript code, via JSLint and Packer, respectively. Since I published that post almost two years ago, that project was revived twice - once by a Mozilla employee (and co-founder of Humanized, which is quite awesome) named Atul Varma, and the latest incarnation is on github. Since it is based on the original implementation in C, and not the Python-based ctypes version, the Base2 recursion problem does not exist, and so I have happily written modules and scripts which wrap the two JavaScript utilities. Recently, I have made them available in a public project on Launchpad called python-jsutils. I haven't really announced it until now because it currently relies on a change I made to python-spidermonkey which allows one to iterate over a JavaScript array, instead of having to write "unpythonic" code like for x in range(0, len(foo)): #.... While it is in my fork, it has not been merged to the "official" repository.