How To Write Code That Doesn't Suck: 2014

2014-01-30

Stripe CTF, minimalist solution edition

I'd heard of the previous rounds of Stripe's Capture The Flag coding competition but hadn't actually looked into them in detail, but a friend of mine was giving this round a go and so I decided to play along. I had no vision of winning, so to keep the effort time-boxed I went for the minimalist solution that I could envision to each problem. I only made it to level 3 then stalled out, mostly because I'm unfamiliar with Scala and thus implementing my envisioned solution would have required a significant amount of time. I also wasted a significant chunk of time on level 1 (probably more than equal to the time spent coding all the other levels) trying to pull off what I assumed was an invited hack before trying a straightforward solution. Overall I found it a lot of fun and would encourage anyone interested in software to give it a go next time.

level 0

In this level we're given a short ruby script that "highlights" (by adding angle brackets) all of the words in an input text which aren't found in a dictionary. The script works but is slow, as it simply reads the dictionary into an array and does a naive search of that array for each word in the input. My solution was trivial, simply read the dictionary into a hash, and check for a key. Could have just been two changed lines if I bothered to try to remember a terser array to hash technique.

 path = ARGV.length > 0 ? ARGV[0] : '/usr/share/dict/words'
-entries = File.read(path).split("\n")
+entries = {}
+File.readlines(path).each do |entry|
+  entries[entry.chomp] = true
+end
 
 contents = $stdin.read
 output = contents.gsub(/[^ \n]+/) do |word|
-  if entries.include?(word.downcase)
+  if entries.has_key? word.downcase
     word
   else
     "<#{word}>"
   end
 end
 print output

Result scored 117/50 required points.

level 1

On this level I wasted a bunch of time trying a hack rather than coding a solution. The task is to submit a git commit with a hash "lexicographically less than the value contained in the repository's difficulty.txt file". The fact that difficulty.txt was a local file rather than some constant or stored on a server seemed to me to be asking for a hack like submitting a difficulty.txt file with a value like "ffffff" in it. Could not make that work, but would be interested to hear from anyone who did. After giving up on the hack it took about 5 minutes on Google and Stack Overflow to find this utility:

beautify_git_hash: Beautify the Git commit hash! This is a little useless toy inspired by BitCoin's "proof of work" concept. It enables you to modify your Git commit to enforce a certain prefix on the Git commit hash.

I cloned this into my project and tested it by hand to see that it worked. Since this beautifies un-pushed commits via an amend, I changed prepare_index() in miner to do a commit instead of an add, then simply replaced default implementations solve routine with a call to beautify_git_hash. High level loop didn't change at all, so here's the guts of my code:

prepare_index() {
     perl -i -pe 's/($ENV{public_username}: )(\d+)/$1 . ($2+1)/e' LEDGER.txt
     grep -q "$public_username" LEDGER.txt || echo "$public_username: 1" >> LEDG
 
     git commit LEDGER.txt -m 'Give me a Gitcoin'
 }
 
 solve() {
     fix=`../beautify_git_hash.py 000000`
     bash -c "$fix"
 }

I fired up the miner and in a couple minutes I'd mined a gitcoin, which gave me a result of 50/50. Didn't have to improve performance of the beautify_git_hash script at all so credit for this win goes to the original author, Volker Grabcsch.

level 2

Hey, finally a node level, how exciting! Well, not really, this turned out to be just a rehash of the level 0. In this level we have to create a proxy that can blacklist attackers based on ip. A full implementation is provided with one key function missing: currently_blacklisted(ip). The obvious implementation proved good enough, with a couple quick tests to figure out the right constant to use to detect a "bad" number of requests.

var blacklist = {};
function currently_blacklisted(ip)
{
  if (!(ip in blacklist)) blacklist[ip] = 0;
  blacklist[ip] += 1;
  return blacklist[ip] > 4;
}

This got me a result of 95/85. I actually spent a little more time on this later and was able to get higher scores during local testing, but never got a positive score on submitting them, so I suspect something was borked in the test environment.

2014-01-22

a simple canonical port numbering scheme for web services

My work in the past few years has involved producing quite a few web services. Some are public facing but many are middle-tier type things accessed by other services or REST back-ends for client apps. Almost all have leveraged some sort of framework such as rails or django. Each such service must be bound to a port to be accessed, and since binding to the standard http port (80) requires root privileges most frameworks have some higher default port they run on during development. For example rails defaults to 3000, django to 8000, dropwizard uses 8080, etc. This works well enough when throwing together a single service, but what if you need a bunch of these running at once? Each framework lets you configure alternate ports, but what values should you actually use?

The answer to that question is really a function of how the services will be used and maintained. If you're part of a team developing multiple services that need to work together you will have to agree on some sort of standard, or pick arbitrary numbers and simply record them in a central place. In practice that "standard" may amount to little more than "we've been using rails, so our first service is 3000, our next is 3001, etc.". For a variety of reasons, including preserving your sanity, I suggest a somewhat more formal approach--a canonical port numbering scheme based on the name of your service.

A service's name might not be particularly well defined in all cases, but usually there is some simple short name that a team refers to a service by, and that's the one you should start with. The name of the project, app, or source code repository are all reasonable candidates. There are a variety of ways to turn that name into a number (e.g. a hash like CRC-16), but I've come to prefer simply interpreting the first few characters of the service name as a base-32 encoded number (base-32 is like hex but goes from 0 to v instead of just 0 to f). This approach typically ensure unique ports as long as you pick service names that don't start with the same three letters. It also allows you to guess the service from the port. Here's an example--assume you have a service named api. Interpreted as a base-32 number that's 11058. With a couple tweaks to account for the valid port range, restricted ports, and characters that aren't valid base-32 digits you have a complete solution. Here's an example algorithm in javascript:

Here are a few examples of service names and the ports they yield, and what you get from converting the port number back to a string

api         => 12082 => api...
db          => 14688 => db0... right padded with 0
foo-bar     => 17176 => foo...
aardvark    => 11611 => aar...
alligator   => 11957 => all... note that lexical sort is preserved
bee         => 12750 => bee...
cat         => 13661 => cat...
caterpillar => 13661 => cat... collides with cat
dog         => 15120 => dog...
elephant    => 16046 => ele...
vole        => 33557 => vol...
zebra       => 33227 => veb... z isn't a valid digit

Turns out this scheme easily avoids most of the common "well-known" ports, as they mostly decode to strings you'd never think to start a service name with. Some examples:

                6379 => 67b... redis
               27017 => qc9... mongodb
                3000 => 2to... rails default
                8000 => 7q0... django default
                8080 => 7sg... dropwizard et al
                2195 => 24j... apple push notifications

While just having a sane and consistent way of picking port numbers may be it's own reward, this approach is actually an important component of automated service deployment tools I've been working with for the past year. I hope to explore these further in an upcoming post, but for example consider that well chosen service names be part of a domain name. You can leverage this into a "config-less" approach to inter-service communication, e.g. if the api service running on api.cluster-1.example.com it could assume the database service it should talk to lives at db.cluster-1.example.com, and will be found at port 14688. Initially the two services could be deployed on one box pointed two by both domain names. If later the load increases and the db service is moved to another box there's no need to update the config for the api service.