Monday, November 29, 2010

Programmer Competency Matrix

Ran across this excellent matrix

http://www.starling-software.com/employment/programmer-competency-matrix.html

I plan to start using this when I interview people, it'll help.

But more to the point, I suggest that it is highly valuable as a sort of check list that we should all use to help ensure that we build and maintain our skills. I suspect it is a rare developer who is comfortable with everything in the boxes on the far right of the grid, but if you've been in the industry for ten or more years and you can't comfortably check off most of them, it is probably time to start sharpening your skills.

Pick up a new development book, write some code in a language you've never used before. Code up a few new algorithms or data structures. If that doesn't sound like fun to you, you are probably in the wrong industry.

Saturday, November 20, 2010

Recursive Money Counting: Or why avoid recursion

Last post I asked this:

Question: Write an algorithm that, when given some number of pennies, will print out all the ways that this value can be represented in nickels, dimes, quarters, and dollars, but don't use recursion.

In this post, I'm going to describe the recursive version and then talk about some of the differences between the two.

As mentioned, I don't much like recursion because of the overhead and because it makes things like ugly in a debugger.

But it does often lead to easier to develop and maintain code. It is, after all, just using another construct of a high level language, like objects or inheritance or function overloading. Those all have costs, but we happily pay them because it tends to make our software faster to develop and easier to maintain.

But here is the big difference in my mind: recursion is used most obviously in the design of the algorithms we most want to execute quickly: sorting, tree traversal, etc.

Algorists tend to love recursion because it makes for tidy algorithms (think quick sort, then try to imagine it without recursion: doable, but really ugly and no good for a text book). But algorithm design is exactly where you want to pay attention to performance issues. After all, if you just wanted easy to understand algorithms, we'd all just use insertion sort or bubble sort in all cases and be done with it and we certainly wouldn't worry about using adaptive sorting algorithms.

What is the cost of recursion? Well each time we call a function, we create a stack frame with the byte code we're executing and any local variables and parameters, then we push the frame onto the stack. When we return, we have to pop that frame off the stack restore the local variables and parameters and jump back to the byte code we were executing before we started. People who write JVMs and compilers care a bunch about making this run fast, but the fastest code is the code you don't execute so it'll always take some time. How much? Lets run through the recursive method and do some comparisons to find out.

For the recursive version of this algorithm, I'm going to leave out the printCurrency and getValue they are identical to the versions in the last post if you want to see them.


public static void startRecursiveCountPennies(int pennies) {
int[] mInts = {0,0,0,0};
recursiveConvertPennies(pennies, mInts, mInts.length -1);
}
public static void recursiveConvertPennies(int pennies, int[] currency, int currencyTypeIndex) {
while(true) {
if(getValue(currency) > pennies) {
currency[currencyTypeIndex] = 0;
return;
}
if(currencyTypeIndex == 0) {
printCurrency(pennies, currency);
} else {
recursiveConvertPennies(pennies, currency, currencyTypeIndex - 1);
}
currency[currencyTypeIndex]++;
}
}


So what is happening here? We're creating an array to represent the total number of each type of coin (except pennies, remember we just calculate how many pennies are left over for each of the coin type sets) and we're using a pointer to indicate where in the array we are working. While the value of our currency array is less than the total number of pennies, we call increment the counter and call ourselves, then increment the number of the type of currency the pointer is pointing at, and so on.

Before we return, we reset the currency type at our pointer to zero.

When the pointer itself is zero we'll print the values rather than calling ourselves again.

As a test, I commented out the printing of the values, set some timers and ran both the recursive and the non-recursive methods with 5000 pennies each. Over and over again, the nonrecursive version ran 20% faster than the recursive version.

So there you have it. Later on, I'll write a post about the recursive vs. non-recursive string mutation method. To be titled: how I didn't get a job I wanted back in 2001.

Take home lesson:

Recursion may make for easy to read algorithms, but be mindful of the costs.

Friday, November 19, 2010

The Rolling Counter Algorithm Pattern

Question: Write an algorithm that, when given some number of pennies, will print out all the ways that this value can be represented in nickels, dimes, quarters, and dollars. And don't use recursion.

Why not recursion? First, because the solution is more elegant. Second, because I don't like recursion. For more on why, see my next post "why I don't like recursion" ... or something like that but the short answer is that it is less efficient.

Before I get into this question, I’m going to discuss the rolling counter algorithm pattern for which this blog post is named: This pattern is often useful in interviews (almost as useful as the "just model it as a graph and traverse it or find the spanning tree" pattern) and usually ends up tripping me up, especially when I fail to recognize that I need to use it rather than (say) a bunch of nested for loops or worse: recursion. Maybe it trips you up too. Hopefully this post will make the pattern clear enough that it’ll stop tripping you and me up.

BTW: any time someone asks you to print the set BLA{…} where condition FOO holds true, you should think of this pattern. Ill try to make the why clear later.

You should think of a rolling counter as being analogous to the odometer on a car. As you’re driving the first roller of the odometer increases one mile at a time until you drive past nine miles. At this point, rather than increasing that roller to 10, it goes back to 0 and increments the second roller. Then you go back to incrementing the first roller for another ten miles. Eventually you have to increment the third roller, and so on. Until you run out of rollers, but you've usually replaced your car by then.

In code, you first, initialize the counter (note that if you are using java as I am, you'll get the initialization to zero done for you).

int[] mRollingCounter = new int[length];
for(int i = 0; i < mRollingCounter.length; i++) {
mRollingCounter[i] = 0;
}

Now the code to roll the counter:

int mCounterPointer = 0
while(mCounterPointer < mRollingCounter.length) {
mRollingCounter[mCounterPointer]++;
if( rollcondition() == true) {
mRollingCounter[mCounterPointer] = 0
} else {
break;
}
}

That is all there is to execute a single increase of the counter. The critical bit is the rollcondition() function. It is what tells the odometer that we've tried to roll past 0009 miles and should increment the second roller to 1. Or that you've tried to roll past 0099 miles and you need to increment the third roller.

We start with the first element of the counter and increment it. (This is equivalent to the odometer going from [0][0][0][0] to [0][0][0][1].) If this change triggers the counter rolling condition, set that element back to zero, move to the next element of the counter and continue. (this is what happens when we’re already at [0][0][0][9], first we go to [0][0][0][10] and since any element being greater than 9 is a roll condition for an odometer, we set the counter back to [0][0][0][0], move the pointer one over, and again increment to [0][0][1][0] at which point we break.

If this isn’t clear, code it up in your favorite IDE and play with it for a while (full code sample is below for the lazy) until you understand it or you can just TRUST ME and memorize it to be spit out in your next interview. Your choice, but I’d suggest understanding it.

So implementing an odometer isn’t that interesting but other things are. For example, if you can write a non recursive string permuter with this (ie, given a string of characters, print all the possible re-arrengements of characters). Or something to spit out all the possible ways that 50 cups of beer can be represented with various imperial units. I’m going to now get into the question that I started this post with.


public static String[] mLabels =
{"Nickles", "Dimes", "Quarters", "Dollars"};
public static int[] mUnitBases = {5, 10, 25, 100};

public static void convertPennies(int pennies) {

int mCounterPointer = 0;
int[] mRollingCounter = new int[mUnitBases.length];

while(mCounterPointer < mRollingCounter.length) {
printCurrency(pennies, mRollingCounter);
mCounterPointer = 0;
while(mCounterPointer < mRollingCounter.length) {
mRollingCounter[mCounterPointer]++;

if(getValue(mRollingCounter) > pennies) {
mRollingCounter[mCounterPointer] = 0;
} else {
break;
}
mCounterPointer++;
}
}
}

public static int getValue(int[] currency) {
int mTotalValue = 0;
for(int i = 0; i < currency.length; i++) {
mTotalValue += currency[i] * mUnitBases[i];
}
return mTotalValue;
}

private static void printCurrency(int pennies, int[] currency) {
for(int i = 0; i < currency.length; i++) {
System.out.print(String.valueOf(currency[i]));
System.out.print(" " + mLabels[i] + " ");
}
System.out.println(" and " +
(pennies - getValue(currency)) + " pennies");
}


And there you have it.



Take home lesson:

Any time you need to iterate through all the permutations of a thing, consider using a rolling counter.

Thursday, August 12, 2010

Distributed Sorting

So you've got a pair of servers (I'll call them servers A & B) and each of them has an array of strings in random order. You are to find a way to sort the strings such that if you lined the arrays up server A first, server B second, this combined array of strings would be sorted. That is to say: all the strings are sorted and the higher strings are all on server A and the lower strings are all on server B. These arrays take up most of the storage on the servers and your network connectivity between servers is mediocre.

How do you accomplish this with the lowest network traffic?

The naive solution is pretty simple given these constraints and looks like this:

First, sort all of the strings on each server locally.
Then compare the last string on A to the first string on B
If A[last] > B[first], swap them, then resort the strings locally on each server.
Otherwise, you are done.

So what is the cost (worst case where n is the number of keys per server):

Your local sorts are free since we don't care about those.
If you have to compare every single key, you'll make 2n comparisons.
And so the cost will be 2n * the size of the keys in network traffic + a small amount of overhead to open and close the connection if we use http or similar.

Great! So lets say you have a million ten character strings on each server. How long will this take? Twenty megabytes in a fast data center? A few seconds at most, totally reasonable.

And then things start to get interesting.

So... now we should consider the speed of the whole system, data transfer _and_ local sorting. A smart candidate will have already thought this far ahead, if they are accustomed to thinking about everything in terms of big O. But then it is easy to fall into the trap of thinking that network speed is so much slower than things that happen in memory that we can forget big O in situations like this. We'll see that we can't.

So, ok, we've got a good sorting algorithm O(n ln n) or maybe even O(k n) if you take advantage of the shortness of the strings to be sorted. Great, so we do that. Then, in the worst case we have to move all the strings to different servers, so that is n. AND we're resorting the strings again which should cost O(n) with each string we move across so we're looking at O(n^2).

So what does n^2 mean here? Well if we have fast memory, we can get each 10 char key out of memory in around 100 ns. So lets say (10^6)^2 * 100 ns. If I'm doing my math right here, that comes out to over 27 hours. So we spend a few seconds streaming data across the wire because we managed to minimize network traffic but we spend 27 hours resorting arrays of strings.

At this point, the fix should be pretty obvious. Move all the strings that you need to move, then sort them. But how do we know which ones to sort?

Well here is the trick, obvious to some, not obvious to others: for every string on server B that is lower in value than the lowest string on server A, we can move that string from B to A, and at the same time move the lowest string from A to B.

Think of the keys as arrays on both servers. Set a pointer at the end of the array for the first server (ie: the highest key on the lowest server) and a pointer at the start of the array on the second server (ie: the lowest key on the highest server). As long as the key at the pointer on the first server is greater than the key at the pointer on the second server, we need to swap the keys. Here is a worst case example (with single letter keys for simplicity):


After each swap, we move the pointers (the first server's pointer moves left, the second server's pointer moves right) and again compare the keys under the pointers.


Again, as long as the key at the pointer on the first server is greater than the key at the pointer on the second server, we need to swap the keys.



When we're done, we'll again have an unsorted array of keys on each server, but we can guarantee that all the right keys will be on the right servers. From here we can just resort the servers again, and we'll be done.

Take home lesson:
1. Even though moving data around a datacenter takes a while (20 ms to send a couple k around a data center) moving data around in memory still costs some time: 1/2 a ns for L1 cache kits, 100 ns to hit main memory. Yes, orders of magnitude difference, but:
1. O(n^2) is almost always worse than O( n ln n) even when the operations described by n^2 are much, much faster.