Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Monday, November 15, 2010

A Year With Mark @ DevHub

Note: verb tenses might be a little out of whack, because it's kind of strange to write this the day before my last day at DevHub.

Other Note: Obviously, this is my personal blog and in no way is indicative of the opinion of either company.

TL;DR

I've been working on the DevHub platform for just over a year, and I've decided that it's time for me to move on. I'll be working for an educational startup, Dreambox Learning, which creates web-based math software. Specifically, I'll be working on their marketing website (the one that I linked to). I might have more time to work on other projects, but we'll see.

Why now?

Honestly, I expected to be working at EVO Media Group for at least another year, building up my "professional experience" so that I wouldn't have to go through the hell that was my last job-seeking "adventure". Additionally, I loved the work that I did - I was writing challenging code in my favorite programming language, Python, and it was being used by thousands upon thousands of people every day, on hundreds of thousands of sites. It also helps that I enjoyed working with my co-workers, even during extreme crunch time (I'll get to that later).

But there are a few things which this new opportunity will give me, in no particular order (sorry, I rather like bulleted lists):

  • The chance to work at a company where one of the primary focuses is social change. This has been one of my career goals for a while now (along with, "working for a company that primarily creates open source software" - we'll see when that one is checked off). I recently tweeted that "[t]he Seattle school district averages for 10th grade math/science proficiencies (2009-10) are <50%." As a person who enjoyed both of those subjects in school, I would love to help fix that problem. Granted, the company doesn't do high school math curricula, but giving younger students a firmer grasp of basic math concepts will surely help.
  • The opportunity to work in a larger company. Dreambox is several times larger than my current place of work, and working with more (and different types of) people is always a good learning experience, and will help me, career-wise.
  • An excuse to learn Ruby. Sadly, there aren't that many Python jobs around (though strangely, I've been getting cold-called by recruiters on a much more frequent basis lately), and it doesn't hurt to be more versatile. Particularly when I refuse to work with Java servlets, and to a lesser extent, the .NET Framework. I've also been avoiding PHP work, now that I know how wonderful Django is.
  • I won't have to do customer technical support anymore. Not that I absolutely hate doing it - I voluntarily did it for the Avant Window Navigator project for years. I like helping people, I just don't like being strongly encouraged to do so, every single day. Speaking of Awn…
  • I'll (probably) have more time and energy for side projects. So much of both of those things were taken up by work, especially during the Month of Hell™, where we were working nonstop on creating the gamified version of the site editor (and I slept in the office for a week). I've been told that the likelihood that I'll be pulling an all-nighter at my new gig is low - we'll see. But I really, really want to get back to having side projects, and possibly resuming work on Awn and related technology. At my current job, I've mainly been really worried about burnout, which was a strong factor in me putting off working on other coding projects. I really love to code, and it would be terrible if I just started hating it. (On a somewhat related note, one of the metrics for whether I should start looking for a job is when my life starts sounding like the first verse of Jonathan Coulton's "Code Monkey". Not that I currently feel like that about DevHub.)

Hopefully, the reasoning above shows that I have put some thought into whether I should change jobs, unlike what certain people (whom I will not name) have insinuated.

What did I do at DevHub?

I've been relatively quiet about what I've worked on at DevHub. You can see bits and pieces of it via Twitter and LinkedIn (not to mention BitBucket and GitHub), but I wanted to give an overall view of what I did, without violating NDAs or anything like that.

My primary focus was the application layer. As the DevHub developers page says, it's a Django-based environment. Interestingly enough, when I applied for the job, I didn't know Django at all. I was aware that it existed, and I had tried learning Pylons a few months prior (that ended badly). I did, however, know WSGI fairly well, as my URL shortener uses it. So, dealing with Python and the web wasn't a completely foreign concept to me. I would say that it's a testament to how awesome Django is, that I was able to pick it up and port the simple to-do web application that I was writing in PHP (using Doctrine as the ORM) in under a day. Of course, as soon as I was hired, I was made aware that certain major components of Django (the ORM and template systems) weren't being used, but SQLAlchemy and Jinja2 were. Which is another good thing about Django - it may be heavily opinionated, but it's not necessarily "my way or the highway".

This particular aspect is important, mostly because about ten months later, I was given the task to write a "macroframework" around this particular combination of technologies, using all of the best practices that we had accumulated since I was hired. I genuinely hope that it gets open sourced, because it's a fairly complete framework - it ports many popular Django apps, and as a good Django-based package would be, it has a lot of unit tests and documentation. In the process of writing it, I've also contributed fixes to the apps that I've ported, when I've seen areas which need improvement.

There's one other library that I wrote, which I hope will be open sourced. It's essentially a domain name parser. It can tell you whether a given domain name is syntactically valid, and provide relevant and proper concatenations of the constituent parts, such as the subdomain and the domain. It also handles IDNs just fine. It's a bit domain-specific (no pun intended), but works well, mostly due to the amount of unit/doc/regression tests I've written for it.

In late January, I was assigned the task of porting the DevHub platform from PHP to Python (as one of the reasons I was hired was because I knew both languages fairly well). And that began a six month journey, along with my co-workers (which included the other, more senior developer and two recently hired designers) where we would be working incredibly long hours, to get the new-and-improved DevHub launched in early July.

I had been doing some experimenting with PyPy, because of its sandboxing capabilities. Unfortunately, due to several factors, it was deemed infeasible to use. Since then, however, I have been keeping tabs on its development to see if any of said factors have been eliminated. Regardless of that setback, by March I had made a reasonable amount of progress on the port, and the unpaid overtime began. (Yay for exempt status¡)

In the process of porting the platform, I had to deal with a number of third-party APIs, because one of DevHub's features is that it supports a number of third-party services by default (as opposed to having to add HTML embed code given by the third-party). The quality of these APIs ranged from half-decent to just plain terrible. Mind you, I've worked with other APIs prior to DevHub (in fact, I won a t-shirt in an API contest), but they were at least decently documented and the structure made some sense. It's amazing how little thought that some of these API providers give to their users.

In May, a few things happened: The platform port was essentially complete, our hosting provider took forever to move our server instances cross-country, and it was decided that the site editor needed to be gamified. I wrote a small prototype to see how that would work. Eventually, it was decided that most of that would be scrapped and that we would be using the BigDoor API. We were already partners, so it seemed like a natural fit.

June was the aforementioned "Month of Hell™". At one point, I was at the office for 14 days straight. At the end, I began my week of sleeping at the office (AKA, "The Week of Utter Hell™"). Quite possibly, the one good thing that came out of that experience, on a personal level, was that I was given my current phone, a Motorola Droid, as recognition of how much time I spent at the office. (My boss had gone to Google I/O and had gotten one for "free", and was/is an iPhone user and thus on AT&T, so it wasn't much use to him.)

By the launch in July, I was extremely close to burned out as I ever wanted to be. Fortunately, I had made sure that I got a week of vacation in mid-July (where I would be going to OSCON, independent of the company, and also taking in some of the sights of Portland). By the time I was back to work, some people had noticed that I was a significantly different person (i.e., not ridiculously stressed out). I don't really want to think about what would've happened if I didn't take that trip at that time.

Relative to the previous couple of months, August was pretty calm. We (the company) did play a game of dodgeball with a company that we were going to partner with. For me, that just indicated that I was really out of shape. I immediately began jogging when the CEO insinuated that there may be more of these games. (To date, there hasn't been another one.)

September was pretty awesome, mostly because I was fortunate enough to go to DjangoCon. (The company paid for most of it, as part of an agreement during The Month of Hell™.) I talked to some fellow web developers, plus sat in on some pretty interesting talks. I really wish that I could have stayed for all three days, but alas. One interesting thing came out of the experience. One of the technologies that people were consistently touting as a must-use package was celery, a distributed task queue. About a week after DjangoCon, we had a big problem with a long-running task during the request process. I remembered celery, and in under a week, I experimented with it on the development server, documented the process to install the subsystem (for the benefit of our sysadmin), helped my co-worker patch the task to use celery, tested the patch, and deployed it to the live servers.

October was the month where I was both working on a client project and dealing with the decision of whether to change jobs, so I've covered most of that already. One thing that I think is worth mentioning is that I started to use code from the HTML5 Boilerplate project. I liked it so much, I'm using it in my current side project, the recently resurrected to-do app. And I plan on using it in the next job, too.

The End

And here we are, in the "present". I know it's a bit cliché, but I'd like to publicly thank the execs at EVO Media Group for hiring me 13 months ago. I really, really appreciate the amount of confidence that you have with my work, and I hope that DevHub becomes even more popular and awesome than it is now.

Friday, May 15, 2009

Rshrtnr: The private URI shortener

Over the past couple of days, I've implemented a private URI shortener service for myself, which I have named "Rshrtnr". The derivation of the name is left as an exercise for the reader.

My main motivation for writing it was a criticism of public URI shortening services that I have been seeing in blogs for a long time: if the service has some downtime or suddenly disappears, all of the links that you have created with it are useless. With my approach, I regain some control of where my shortened links point to, and if the service has downtime and/or disappears, I have more options for restoring it.

The code itself is written in Python. SLOCCount says that the core module runs at around 100 SLOC. Most of my time, however, was taken up by working around problems relating to my webhost's Python installation. The supported Python version is 2.4.x, which is ridiculously old (for reference, Gentoo was the last major Linux distribution to switch from Python 2.4 to 2.5, around July 2008). Additionally, for some reason, if I attempt to change the sys.path variable (i.e., the "include path") to use locally installed modules (I am on a shared host), the entire script breaks with zero logged messages anywhere. It runs fine via the command line, but in FastCGI mode, the strangeness occurs.

The two third-party modules that I used were Paste and mysql-python. I store the URIs and their associated aliases in a simple SQL table, and I use Paste for various WSGI/HTTP-related utilities. I "manually" handle routing via parsing the PATH_INFO environment variable.

There are two ways to specify an alias: either explicitly send a custom one as a query parameter with the URI, or let the app make a random one for you. With the latter behavior, it hashes the URI to generate an eight character "unique" alias. Since there are (in theory) 64^8 possibilities, I don't think I'll run out of aliases any time soon, especially since custom aliases can be anywhere from 1 to 15 characters long.

In my opinion, the most interesting feature is that adding URIs requires one to send an OpenPGP-encoded query string, which needs a public key recognized by the app for the operation to succeed. To write this, I simply parsed the output from sending the OpenPGP message to the gpg binary.

Finally, mod_rewrite magic is used to prettify the shortened URIs. Nothing too exciting about that part.

I had thought about hosting a version of Rshrtnr on Google App Engine, but a key component is missing - OpenPGP support.

If anyone wants me to release it, please comment below. There's currently a bunch of webhost-specific things that I would need to abstract out before I release the code to the general public, and unless someone gives me a very good reason, it will be licensed under the AGPL version 3.

Thursday, May 07, 2009

On Bindings

One of the more interesting areas in software development, to me at least, is language bindings. Being able to interface with a library written in one language in another language is kind of satisfying, as it allows me to develop without having to reinvent the wheel. There are two specific projects that I use and work on so that I can enhance the software that I develop: GObject Introspection (G-I) and python-spidermonkey.

GObject Introspection

As a quick overview, the goal of this project is to give C libraries the tools to provide enough metadata about their API so that bindings can be written with minimal effort. Given the time and effort that I have put into maintaining the Awn bindings, it is not very surprising that I would be willing to help out getting this framework working for Awn. My ultimate goal is to eliminate the bindings/python folder in the Awn source tree. It is basically a mixture of a Scheme definition file plus a very bizarrely formatted "override" file for custom definitions, all integrated into autotools to produce a C library that is ready to be dynamically loaded into python via import. To meet this goal, I am contributing to the PyBank project, which is a prototype Python module that interfaces with the GObject Introspection library to read compiled library metadata files (called "typelibs") on the fly so that classes, functions, etc. can be loaded and called at runtime. In addition to myself, a Google Summer of Code student and a Sugar Labs developer are also working on the module, with Johan Dahlin overseeing it all. So far, I've contributed a unit test suite, ported from the gjs project (JavaScript bindings for GLib-based libraries based on the Spidermonkey VM) and working type bindings for various simple types (e.g., int64 and float).

I have also put some coding effort toward G-I integration in Vala. Vala supports G-I by both reading GIR files (the XML serialization of G-I metadata) to produce VAPI files (short for Vala API files), and writing GIR files when producing a library written in Vala (e.g., libdesktop-agnostic). I have contributed mostly what amounts to workarounds in the GIR reading code, with regards to Vala/G-I behavioral inconsistencies. Didier 'Ptitjes' has done much, much more solid work than I have on both fronts, which I greatly appreciate.

python-spidermonkey

This project, as the README states, lets you [execute] arbitrary JavaScript code from Python[, and allows] you to reference arbitrary Python objects and functions in the JavaScript VM. As I've stated in an earlier blog post, I use this in my custom website build system to both validate and pack my JavaScript code, via JSLint and Packer, respectively. Since I published that post almost two years ago, that project was revived twice - once by a Mozilla employee (and co-founder of Humanized, which is quite awesome) named Atul Varma, and the latest incarnation is on github. Since it is based on the original implementation in C, and not the Python-based ctypes version, the Base2 recursion problem does not exist, and so I have happily written modules and scripts which wrap the two JavaScript utilities. Recently, I have made them available in a public project on Launchpad called python-jsutils. I haven't really announced it until now because it currently relies on a change I made to python-spidermonkey which allows one to iterate over a JavaScript array, instead of having to write "unpythonic" code like for x in range(0, len(foo)): #.... While it is in my fork, it has not been merged to the "official" repository.

Monday, February 02, 2009

Old Projects: pytiger

I mentioned on Twitter that I wanted to blog about some old projects. The first one is more than a year old, but it still may be useful to someone.

Back during college, I was implementing a file sharing client with Twisted - the GUI was first written using wxPython, and then rewritten in PyGTK (the reasons for the rewrite I can expound upon later, if anyone wants to know). It's not publicly released, mostly because it never got past being a chat client. Anyway, a part of the file sharing protocol involved tiger tree hashes to verify the data as it was downloaded. I couldn't find an implementation in Python (and attempts to write it myself failed), so I found some C source code and manually bound it to Python (via its C extension API). This was my first exposure to extending Python via C — I later used some of this knowledge to fix up the Python bindings for Awn.

The Python part is licensed under the Apache License v2, the tiger (tree) code is public domain-ish. (see COPYING for details)

The code can (as of the publication date) be found in a junk bzr branch on Launchpad. If you wish to continue work on it, I can move it to a full-fledged project — just contact me about it via the comments.

Edit (2009/03/24): Due to interest in the code, I've created a project for pytiger at Launchpad.

Sunday, November 23, 2008

HOWTO Run an OpenID-authenticated WSGI Application (with AuthKit)

According to Blogger, this is going to be post #100. I have no idea if that counts the various dead drafts in my queue or not.

Anyway, if you've been following my Twitter stream, you'll know that I've been playing with Pylons, and by extension, WSGI. One of the things that I'm interested in is OpenID-only authentication, mostly because I hate having to create new account names/passwords everywhere, and I'm too lazy/paranoid to use one of those password management extensions. After several attempts, here is a short Python script which runs a sample web app that requires OpenID authentication for the /private path (via the AuthKit middleware). The OpenID URL that was used to sign in is stored in the environ['REMOTE_USER'] variable. It was tested with AuthKit 0.4.2, Beaker 1.0.3, and Paste 1.7.2.


#!/usr/bin/env python
#
# Copyright (C) 2008  Mark Lee
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# For a copy of the GNU General Public License, see
# <http://www.gnu.org/licenses/>.

import os
from beaker.middleware import SessionMiddleware
from paste.auth.auth_tkt import AuthTKTMiddleware
from authkit.authenticate import middleware, sample_app
from paste.httpserver import serve

app = middleware(sample_app,
                 enable=True,
                 setup_method='openid',
                 openid_store_type='file',
                 openid_store_config=os.getcwd(),
                 openid_path_signedin='/private')

app = AuthTKTMiddleware(SessionMiddleware(app),
                        'some auth ticket secret');
serve(app) # opens a socket at localhost:8080

Wednesday, September 20, 2006

Re: Do We Need New Software?

I've gotten the chance to talk to a lot of people about these issues, and with the exception of those who are very close to the current software, opinion is almost unanimous: the Wikipedia software needs to be rewritten from scratch in Python. (Yes, everyone really did say Python.) Rewrites of large software projects aren't taken lightly, but from everything I've seen this is one of the rare cases that it's actually necessary.

This made me laugh. It makes me wonder how this will play in non-Python communities. Somehow, I doubt this will happen. I took a quick look at the SVN repository, and it all looks very muddled to me. I had to guess as to where the main source code was, based on the version timestamps.

With regards to the series from which this article comes, I find them very thought-provoking, It will be interesting to see if Mr. Swartz ends up on the board. I'd vote, but I'm just a typo finder (i.e., I don't have 400 edits).