How To Write Code
That Doesn't Suck

wry observations from the deep end of the software cesspool

2012-07-02

automatic routing for RESTful webpy controllers

I've recently been working in Python again (after a decade long hiatus), including some rework of middle-tier REST(-ish) services built on webpy. This is a lean-and-clean web application framework that lends itself well to writing simple web services without all the cruft a "full-service" framework like Django brings with it. There was one particular annoyance for me in the service I was working on, which was that URL-to-controller routing was maintained separately from the controller classes themselves. Here's a stripped down example:

import web

class BaseController:
   # some cool stuff useful to all the other controllers

class RecordSearchController(BaseController):

   def GET(self, customer_id, record_code):
       """lookup records matching given code for given customer"""
       # lookup in database
       # return JSON

class AnnotateRecordController(BaseController):

   def POST(self, customer_id, record_code):
       """attach note (given in post body) to given record into database"""
       # parse form
       # write to database
       # return OK

... a couple hundred more lines of that sort of stuff ...

urls = (
    '/search/([0-9]+)/([A-Z0-9]+)/?', RecordSearchController,
    '/record/annotate/([0-9]+)/([A-Z0-9]+)/?', AnnotateRecordController,
    ...
)

web.application(urls, ...)

This structure meant if you added a new controller class you'd have to go add its URL somewhere else, and also made it (too) easy to have inconsistent URLs or ones that had no obvious relationship to the controller (which is confusing for someone like me to come in and do maintenance). Seemed to me there had to be a better way, and in fact web.py has a built in approach called auto_application() . Unfortunately it's got a couple (for me annoying) limitations, for one it uses the exact class name as a default path (for these controllers yielding paths like /RecordSearchController and changing the names of our controllers was a non-starter). It also doesn't seem to support RESTful parameters (the bits in parens in our urls above). There is also a slightly more flexible recipe for using metaclasses to do automatic URL registration. This also didn't quite meet my needs (e.g. it doesn't even try to automatically map controller names to urls) but it did set me in the generally correct direction. Here's an example of how things would look using this recipe:

urls = []

# metaclass definition cloned from recipe
class ActionMetaClass(type):
    def __init__(klass, name, bases, attrs):
        urls.append(attrs["url"])
        urls.append("%s.%s" % (klass.__module__, name))

class RecordSearchController(BaseController):
   __metaclass__ = ActionMetaClass
   url = "/search/([0-9]+)/([A-Z0-9]+)/?"

   def GET(self, customer_id, record_code):
       # ...

class AnnotateRecordController(BaseController):
   __metaclass__ = ActionMetaClass
   url = '/record/annotate/([0-9]+)/([A-Z0-9]+)/?

   def POST(self, customer_id, record_code):
       # ... 

...

web.application(urls, ...)

This gets the URL into the controller's code so solves some of my problem, but definitely still didn't feel as clean (or DRY) as I'd like. For one thing I'd now have to add the __metaclass__ = ActionMetaClass boilerplate to every controller (I couldn't add it to the BaseController for complicated reasons). It also bugged me that this required more than one line--I really wanted to be able to write something like:

class RecordSearchController(BaseController):
   __metaclass__ = ActionMetaClass("/search/([0-9]+)/([A-Z0-9]+)/?")

My first thought was to just add a URL param to ActionMetaClass.__init__ Unfortunately I quickly discovered you can't (directly) pass parameters to the value of __metaclass__ in this way, since its calling signature is assumed to match the default type(name, bases, dict)). I also ran into some pain around the fact that BaseController already was using a __metaclass__ to do some cool tricks like adding some standard envelope fields to every response. At this point I set out some formal goals for my implementation:

  • not require changing any of the existing routes or class names
  • automatically generate canonical paths from class names which matched as many of the existing paths as possible
  • require at most one additional line of code per controller class, even those that don't use a canonical path
  • automatically add REST parameters to the URLs
  • play nicely with other metaclasses in the inheritance chain

I wound up meeting most of the goals, with a couple pragmatic tweaks. Before I break down exactly how the solution I came up with works here's the actual code (stripped of comments/doc strings for brevity, I'll try to get the full code up as a gist at some point soon).

import re
import inspect
from functools import partial

url_params = {
    'customer_id': '([0-9]+)',
    'record_code': '([A-Z0-9]+)',
}

urls = []

def autoroute(path = None):

    def default_path(name):
      components =  re.findall('([A-Z0-9]{0,1}[a-z]+|(?:[A-Z0-9](?![a-z]))+)', name)
      if "Controller" in components: components.remove("Controller")
      components.reverse()
      return '/' + '/'.join(map(lambda s:s.lower(), components))

    def _new(path, name, bases, dict):
        global urls
        if not "url" in dict:
          if not path: path = default_path(name)
          args = inspect.getargspec('GET' in dict and dict['GET'] or dict['POST']).args
          args.remove("self")
          url = '/'.join([path] + [url_params[arg] for arg in args] + ['?'])
          dict["url"] = url
        if bases and hasattr(bases[0], '__metaclass__'):
          metaclass = getattr(bases[0], '__metaclass__')
        else:
          metaclass = type
        controller = metaclass.__new__(metaclass, name, bases, dict)
        urls += [controller.url, controller]
        return controller

    return partial(_new, path)

This allowed the controller classes to look pretty much just the way I wanted them to:

class RecordSearchController(BaseController):
   __metaclass__ = autoroute("/search")

   def GET(self, customer_id, record_code):
       # ...

class AnnotateRecordController(BaseController):
   __metaclass__ = autoroute()

   def POST(self, customer_id, record_code):
       # ... 

There are three key elements to this solution: class name to URL mapping, parameter detection/injection, and metaprogramming magic.

class name to URL mapping

For the large majority of the existing URL-controller pairs the start of URL's path could be derived from the controller's class name as follows:

  1. split the camel case class name into words using a regex (e.g. AnnotateRecordController["Annotate", "Record", "Controller"]
  2. remove "Controller" from the list (→ ["Annotate", "Record"])
  3. reverse the order (classes mostly followed a VerbNoun (or AdjectiveNoun) pattern, while the URLs were mostly /noun/verb (→ ["Record", "Annotate"]
  4. lowercase and join with slashes (→ "/record/annotate")

This algorithm is implemented in the default_path(). function.

parameter detection/injection

Although there were dozens of controllers, there actually were less than a handful of distinct REST parameters that appeared in the URLs, and fortunately they had very consistent names across the controllers GET. and POST methods. Unfortunately, prior to this refactoring, the patterns were repeated many times (typically two parameters per URL) and multiple parameters shared patterns, so e.g. changing the record_code from numeric ([0-9]+) to alphanumeric required dozens of replace operations each of which had to be vetted by hand by cross-referencing with the controller's source. These I pulled out into a single short dictionary of parameter names, url_params. I then use a little bit of reflection magic to figure out which params go in which URLs by looking at the argspec of the controllers GET method (or POST if there's not one). I can then use the name the args to look up the correct patterns to tack onto the path generated from the class name. This logic can be found in the center of _new. I also broke with my strict one-liner rule for a special url level override, this is done via setting a url attribute in the controller rather than passing it to autoroute to simplify the signature, and it turns out that having the URL around in the class is useful for testing/debugging as well.

metaprogramming magic

So remember the pain I had with not being able to pass a custom path to a metaclass's __init__? For a moment I thought of generating classes dynamically to get around this, but by RTFM I discovered that the value assigned to __metaclass__ doesn't have to be a class, it can be any callable, including a function--importantly it can even be a partially evaluated function! This insight is embodied in _new(), whose signature is like type.__new__'s but with one extra parameter (the non-standard path, if any). That parameter is then partially evaluated in autoroute's return statement, the result being a functools.partial that has the correct signature.

Note that in the end there's nothing super webpy specific about this solution other than the assumption that controllers are classes and they have methods named GET/POST/etc. Most of it should be adaptable to any another framework.

2012-06-19

How to dump SQL for all views in a mysql database

I often generate a fair number of views while developing reporting SQL, and have off an on looked for a way to easily save their source code to a file. mysqldump doesn't directly support extracting only views, but with a little command line trickery and a query against INFORMATION_SCHEMA you can make it do the right thing:

mysql -u username INFORMATION_SCHEMA
      --skip-column-names --batch
      -e "select table_name from tables where table_type = 'VIEW'
          and table_schema = 'database'"
      | xargs mysqldump -u username database
      > views.sql

The skip-column-names and batch options produce output with just one view name per line, which is what xargs needs as input. Be sure to replace both occurences of username and database with appropriate values, and add -h for remote hosts and -p if the user requires a password. Here's a one-line example for user root with no password on localhost, with a database named "foo":

mysql -u root INFORMATION_SCHEMA --skip-column-names --batch -e "select table_name from tables where table_type = 'VIEW' and table_schema = 'foo'" | xargs mysqldump -u root foo > views.sql

2012-02-09

transpose 2-dimensional Javascript array in one line

There are quite a few libraries that implement a transpose function for arrays, but sometimes you don't want the hassle of installing them or the added dependencies. I'm sure I'm not the first to use this approach but most snippets found by a quick search are too verbose to comfortably drop in for a single use.
Update April 2014: It occurred to me the other day that the magic mentioned below is unneeded, as map already provides the column indexes as the second argument. I've updated the gist so the following remark is no longer relevant. The most magical bit here is to use Object.keys() on the first row of the array which yields an array of column indexes, which we can then use map() on to extract each column in turn. Love to see a terser implementation if you've found one.