## Monday, August 26, 2013

### Nokia 1020

So I've been really excited to get my new Nokia Lumia 1020... for all you shutter bugs out there, it has a lot of great features.  But beware, the Nokia apps still have a lot of bugs, and Nokia doesn't seem to have great community support for feedback.

To date:
Creative studios will always crash on some pictures when I mix color pop with radial tilt blurring.

Cinemagraph is the coolest app, but it isn't the most intuitive feature.  Unfortunately, it lost some great pictures for no apparent reason.  I'm really bummed about that more than anything.  Also, uploads to skydrive don't come as animated gifs, they are converted to still frame jpegs.

Sometimes, the Nokia software seems to cause the phone to hang.

When photos are "Saved" in the Nokia apps, a copy isn't saved to the camera roll, and the saved pictures directory doesn't have an obvious way to navigate to it.  It often becomes easier to pull them off the skydrive afterwards.

## Saturday, August 17, 2013

### A Heuristic for Ultimate Tic Tac Toe

If you haven't heard of ultimate tic-tac-toe, you have been missing out on all the rage in the tic-tac-toe world.  Ok, maybe there isn't that much of a tic-tac-toe world... but this variant of the game which everyone has played and mastered as a child, it adds additional levels of complexity, bringing back the game to the world of adults.  One of my coworkers has been working on a website for the game, still in beta, found here... And in playing it a few times, I decided to try and write a bot that could beat me at the game.

The rules for ultimate tic-tac-toe are similar to the original game, but with an extra dimension.  As you can see, each square of the large tic-tac-toe board has another tic-tac-toe squarer within it, and to take a square in the larger board, you must win the game within that square.  The real wrinkle comes next...

Any move made in a little board dictates which little board is to be played next.  In this example, the red X is in the top right corner of its own little board, which means the next move mus be played in the top right corner of the big board.

Anyone who has done turn-based complete information game programming knows that heuristic search using some form of alpha beta pruning is the way to go, such as this example for regular tic-tac-toe.  For simple games like that, all you need is a move generator for each position, and some heuristic score which can be used to compare positions at the end of each search to tell you who is winning.  In game trees that you can search the full depth, you can just return something like 1 for X winning, -1 for O winning, and 0 for a cats game.  But there is a maximum depth of 81 with a branching factor of 9 in ultimate tic tac toe, so the heuristic becomes the interesting part.

We can find some interesting observations looking at one mini board in isolation.  Where in regular tic-tac-toe, there is always a back and forth of who plays next... but here, one player could be sent to the same board many more times than their opponent.  In fact, it is possible that only one player ever makes moves on a given board.  Also, since the macro view of the game dominates the micro view, we find that even when a board could clearly be won on the next move, that doesn't imply that the next move should be to take the win.  Good moves almost appear to be random from the prospective of the mini-board.  We can take this to the extreme, and say that from any starting position on a mini board, we can estimate the probability of a winner for that board by simulating out thousands of random continuations of pieces from either side to empty squares.  Since each miniboard has nine squares, and each square has either an X, an O, or nothing... there are 3^9 possible positions to consider, which works out to just 19683 positions.  I wrote a program to pre-populate this table mapping any possible starting position for a little board to a probability that this board would be won by either X or O, given the assumptions of randomness from above.

 52.3% 50.9% 49.3% 45.2% 62.7% 59.9% 11.1% 90.1% 50.3%

Applied at game time, we can reduce our tic-tac-toe board to a 9x9 board of probabilities that any one square is going to be won by a given player.  You could sort of use this to speculate on what your chances are for winning a given row or column by combining the probabilities... for instance, my chances of winning the vertical center row are .509 * .627 * .901, or approximately a 28% chance.  However, the game is a race, and while I may have better chances of winning the center column, it is balanced with the chances that my opponent wins the left column.  To adjust for this, I decided to take a probability board like the one above, and again randomly simulate thousands of times from such positions the process of choosing squares using the probability to decide who would win that square, and repeating until the game had ended.  This would give me a pseudo-probability of who would win the game in such a position.  Of course, the size of such a search field is too large to cache, so I decided to use these probabilities to train a classifier for linear regression.

So for those of you who don't remember, linear regression involves collecting sample inputs to sample outputs in the form A*X + B = Y... where Y is the variable our model will try to produce.  In our case, I want to take some combination of the probability board as our X, and the simulated probability of winning the game described above as our Y.  Then we can solve the system of equations for some set of scalars A that can be used to combine an arbitrary set of inputs to produce a probability of winning.  Rather than make X simply the probabilities from the board above, I decided to collect the combined probabilities for winning any given row, column, or diagonal for each player, and using those instead.  So our A*X + B = Y starts to look like...

One of the reasons I like using linear classifiers, is because there are certain techniques that make training them extremely easy.  In this case, using the pseudo-inverse allows us to solve this equation for A very easily, even given a large set of sample points that don't form a square matrix.  The result is a set of weighs that can be multiplied by the input values, and summed to a predicted score for the game.

There was a bit of code to this work, I posted on github here.  Feel free to use these ideas generically to create a heuristic score without having to understand the tactics and strategies of whichever game you are creating.

## Thursday, May 23, 2013

### probabilistic alpha beta search

Any good chess player can tell you the difference in the styles of someone like Mikhail Tal and Anatoly Karpov.  Tal was known as the magician, someone who sought to guide games into the sharpest possible lines where he could induce mistakes from his opponents, even while taking great risks of his own.  Karpov on the other hand, he was a python.  He'd safely accumulate small advantages, and slowly squeeze his opponent until they cracked.  There styles couldn't be more different, but there results were undeniable; both joined the exclusive club of World Chess Champions.  And yet, neither of these styles are well captured in Computer Chess, who's engines have become the cold assassins of the chess world.  I think capturing the essence of these stylistic differences is a factor that could be used to further improve the results of computer chess engines, and help tailor their response to a given candidates strengths and weaknesses, without compromising quality.  What is the missing factor?  Quantification of risk.

 Figure 1: Game tree with mimimax highlighted in red, and prospective better path in green
What does it mean to include "risk" in a turn based two player zero sum game?  Consider Figure 1, which highlights the typical  minimax approach to game trees in red.  Each player has competing goals, Max is looking to choose a path that maximizes his score, thus picking the maximum nodes in the next level, while Min is trying to minimize the score on his turn.  Scores are bubbled up from the bottom.  We see from this perspective, Max's optimal choice would be to pick the left branch, because given optimal play from Min, the score of 1.2 produces a slightly superior position.  If the scores were perfect, indeed this analysis would be correct, and risk would be completely mitigated.

The reality in games where the search depth rarely allows one to reach terminal nodes is that we use a heuristic as a proxy for the true score.  When the frontier node is finally reached, we repeat the process of creating a tree to search, bubbling up a new score that serves as a better estimation of the actual value of the position at that node.  This means there is a variance in the heuristic that must be considered when choosing our path through the tree.  When we eventually reach a pre-frontier node, we will have reached the last responsible moment from which we can change our decision.  The score of pre-frontier nodes as viewed from the root must then be some combination that accounts for the likelihood that the proxy score for any frontier node has improved by the time we reach the pre-frontier level.

Consider node A vs node B.  If we were to reach node A and then discover that our heuristic evaluation overestimated the score of the child node valued 1.2, we have left ourselves without a viable alternative, as the variance in our heuristic score is unlikely to overcome the 6.2 point gap we predicted in the alternative branch.  If the true score of the best choice at node A was -1.0, we would have no recourse but to continue down that path.  Node B, on the other hand, provides two choices of similar value.  If we were to reach Node B and discover that our 1.1 option actually produced a score of -1.0, we would still have a viable alternative in the 1.0 node.  Minimax fails to capture this characteristic.

Now consider nodes C and D.  From the prospective of Max, we again recognize that Min's options are restricted at node C due to the large difference between the first and second best nodes.  It is not likely that the variance in the 4.0 node will overcome the gap in scores, which means that a miss-evaluation of the frontier node in Max's favor cannot be mitigated by a second choice from Min.  Node D however does provide that alternative, in the similar score of the 1.3 node.

 Figure 2: Normal Distribution
What we are interested in is creating a probability distribution that models the difference between the quiescence score of a frontier node (such as the 1.2 valued node in Figure 1), and the score of that same node if it were fully evaluated one ply deeper than the root (such as the position of node D in Figure 1).  That is to say, we need to understand how much our heuristic evaluation will vary between the time when we first encounter a node in the search tree, and our last chance to change our mind.  A normal distribution (see Figure 2) may be an appropriate model for a good heuristic, where the x-axis represents the score, and the y-axis represents the probability.  This model can be fitted empirically ahead of time, sampling quiescence and max depth evaluation pairs.

 Figure 3: Cumulative Distribution Function
Integration of the normal distribution leads to the cumulative distribution function, pictured in Figure 3.  This function can be used to describe the probability that a random draw from the normal distribution will be less than X.  For example, a random draw from the distribution represented by the green curve in Figure 3 will be less than -2.5 20% of the time, if we trace 0.2 on the y-axis to the intersection.  That means with 80% certainty, that distribution is greater than -2.5.

 Figure 4: Three choices represented by three cumulative distribution functions
Now consider the situation in Figure 4, where our search has reached node A (a pre-frontier node), in the context of a much larger search.  Nodes B, C, and D are individually represented by the cumulative distribution functions F1, F2, and F3.  We can hedge against the risk of any single node by combining the three cumulative distribution functions to get a single distribution, one which represents the probability of the score of the best of these three alternatives as determined by a full depth search rooted at node A.
 Equation 1: Recursively defined CDF accumulating all move alternatives

Equation 1 expresses a combined cumulative probability distribution recursively, where Fi represents CDF for the ith alternative move choice.  Here, we can say at any point that the probability that a random draw from a distribution is at least as great as X is equal to the probability that Fi is greater than X, plus the probability that Fi is less than X times the probability of the combination of the remaining alternatives is greater than X.  A concrete example can be again taken from Figure 3.  Consider the blue, red, and orange distributions, each with its mean at 0 (ie: F1(0) = F2(0) = F3(0) = 0.5), and assume those represent the probability distributions for Figure 4.  Each by itself has a 50% chance of scoring higher than 0.  Combining those into Equation 1 and expanding the probability around 0, we find that we have an 87.5 percent chance of scoring better than zero.  This completely upends our understanding of the score at node A in Figure 4.

We can incorporate this probabilistic model into the existing framework of Alpha Beta Search.  Our search algorithm is initialized with a confidence parameter denoting the risk we are willing to take in selection of our next move.  I would expect that requiring a high confidence (95%) would produce strategically sound games with little risk, as in the style of Karpov.  Requiring low confidence (5%) would result in riskier search paths, but higher rewards, as in the style of Tal.

At every terminal node, the Quiesce function instead returns the cumulative probability function mapping to the score produced by our heuristic evaluation.  Every non-terminal node produces a combined CDF using the methodology described by Equation 1.  Bounds are set by evaluating a CDF at the prescribed confidence level, giving us a fixed expected value to compare against as Beta.  Then, when a child is iterating through its move choices, it updates the probability that one of its subnodes will be greater than Beta.  If the probability is greater than our prescribed confidence, we can do a beta cutoff.  One of the nice consequences of this strategy is that as the variance of our heuristic evaluation drops to 0, the search converges back to a normal alpha beta search.

References to other related approaches:
B* Search
ProbCut Pruning
Conspiracy Search

Similar approaches in the literature:
An optimal game tree propagation for imperfect players

## Sunday, May 5, 2013

### 2013 Google Code Jam Round 1B

Better than Round 1, but still missed by several hundred.  If I had been able to get a second problem complete, I think I could have advanced.  However, my attempt at the third problem turned out to be way too slow.

### Problem Osmos:

Armin is playing Osmos, a physics-based puzzle game developed by Hemisphere Games. In this game, he plays a "mote", moving around and absorbing smaller motes.

A "mote" in English is a small particle. In this game, it's a thing that absorbs (or is absorbed by) other things! The game in this problem has a similar idea to Osmos, but does not assume you have played the game.

When Armin's mote absorbs a smaller mote, his mote becomes bigger by the smaller mote's size. Now that it's bigger, it might be able to absorb even more motes. For example: suppose Armin's mote has size 10, and there are other motes of sizes 9, 13 and 19. At the start, Armin's mote can only absorb the mote of size 9. When it absorbs that, it will have size 19. Then it can only absorb the mote of size 13. When it absorbs that, it'll have size 32. Now Armin's mote can absorb the last mote.

Note that Armin's mote can absorb another mote if and only if the other mote is smaller. If the other mote is the same size as his, his mote can't absorb it.

You are responsible for the program that creates motes for Armin to absorb. The program has already created some motes, of various sizes, and has created Armin's mote. Unfortunately, given his mote's size and the list of other motes, it's possible that there's no way for Armin's mote to absorb them all.

You want to fix that. There are two kinds of operations you can perform, in any order, any number of times: you can add a mote of any positive integer size to the game, or you can remove any one of the existing motes. What is the minimum number of times you can perform those operations in order to make it possible for Armin's mote to absorb every other mote?

For example, suppose Armin's mote is of size 10 and the other motes are of sizes [9, 20, 25, 100]. This game isn't currently solvable, but by adding a mote of size 3 and removing the mote of size 100, you can make it solvable in only 2 operations. The answer here is 2.

My Solution:

This is actually a pretty easy problem, both for the small and large sets.  The strategy in Osmos should always be just absorb the smallest remaining mote, if you can.  Once you can't absorb the smallest, you can't absorb any other ones, so it is up to us to either add a new mote which you can absorb, or remove all the remaining motes.  Since we have a vested interest in seeing you absorb the motes with as little help as possible, we want to give you the biggest mote you can currently absorb.  This lends itself well to a recursive solution, where you have your current size, and a sorted list of the remaining motes.

There is one edge case to concern ourselves with... if we are a mote of size 1, then it isn't possible for our mote to absorb anyone.  In this case, we have to return that we remove all remaining motes, as our strategy for comparison would go into an infinite loop.

Another concern is the runtime.  Fortunately, our recursion only needs to run down a single branch for each decision, since if we have to remove motes, we know we have to remove all of them.  With that in mind, we run in O(N) time with respect to the length of the mote list...  The other concern is when the next remaining mote >> our current mote.  But by picking the largest mote available to absorb, we nearly double in size with each iteration, which means we can cover the gap in O(log(M)) time, where M is with respect to the maximum size of a mote.  So our algorithm is something like O( N log(M) ).

### Problem Falling Diamonds:

Diamonds are falling from the sky. People are now buying up locations where the diamonds can land, just to own a diamond if one does land there. You have been offered one such place, and want to know whether it is a good deal.
Diamonds are shaped like, you guessed it, diamonds: they are squares with vertices (X-1, Y), (X, Y+1), (X+1, Y) and (X, Y-1) for some X, Y which we call the center of the diamond. All the diamonds are always in the X-Y plane. X is the horizontal direction, Y is the vertical direction. The ground is at Y=0, and positive Y coordinates are above the ground.
The diamonds fall one at a time along the Y axis. This means that they start at (0, Y) with Y very large, and fall vertically down, until they hit either the ground or another diamond.
When a diamond hits the ground, it falls until it is buried into the ground up to its center, and then stops moving. This effectively means that all diamonds stop falling or sliding if their center reaches Y=0.
When a diamond hits another diamond, vertex to vertex, it can start sliding down, without turning, in one of the two possible directions: down and left, or down and right. If there is no diamond immediately blocking either of the sides, it slides left or right with equal probability. If there is a diamond blocking one of the sides, the falling diamond will slide to the other side until it is blocked by another diamond, or becomes buried in the ground. If there are diamonds blocking the paths to the left and to the right, the diamond just stops.

Consider the example in the picture. The first diamond hits the ground and stops when halfway buried, with its center at (0, 0). The second diamond may slide either to the left or to the right with equal probability. Here, it happened to go left. It stops buried in the ground next to the first diamond, at (-2, 0). The third diamond will also hit the first one. Then it will either randomly slide to the right and stop in the ground, or slide to the left, and stop between and above the two already-placed diamonds. It again happened to go left, so it stopped at (-1, 1). The fourth diamond has no choice: it will slide right, and stop in the ground at (2, 0).

My Solution:

So I didn't reach the problem until after the contest had ended, but I was I had.  There are a couple of insights that make this problem much easier to solve than the third problem.  First, you may notice that what we are making is a pyramid, one layer at a time.  Each layer of the pyramid (centered at 0,0) is completed before the next layer can begin, because we need the peak of the pyramid in place before a diamond can slide out to the next tier.

So how many diamonds are in a tier?  Well, we can inspect the sequence to find a pattern.

So if we sum all those layers together, we get...

Where D is the number of diamonds, and L is the layer of interest.  So given that that previous layer must be complete before we start our own layer, we can subtract away everything from the layer below our own, and are left with N-D diamonds.

Now I'd reform the question as follows... how many of the remaining diamonds would have to fall on the opposite side from ours for the layer not to complete up to our location?  Well, our location at height Y means that there would have to be at least Y diamonds falling on our side for us to get a diamond.  That means that if only Y-1 diamonds fell to our side, N-D-(Y-1) diamonds must fall to the other side.  If this number is greater than the other side can hold, then there is a 100% chance that our location will get a diamond.  If N-D < Y-1, then there is a 0% chance that our location will get a diamond.

A quick google search tells me...

"
The general formula for the probability of w wins out of n trials, where the probability of each win is p, is combin(n,w) × pw × (1-p)(n-w) = [n!/(w! × (n-w)!] × pw × (1-p)(n-w)"

Since the probability p = 0.5 (its equally likely to fall left or right), and we want to consider all cases from Y to N-D, we can reformulate as follows...

So we've arrived at a formula for calculating the probability.  I haven't checked my numbers, since I didn't actually reach this part in competition, but I'm fairly certain that it will work.  Another optimization in implementing this summation is to recognize that as w gets larger, the added probability gets smaller.  We probably can stop the search early if the probability of the last iteration times the remaining iterations is less than the threshold google checks for... In cases where N-D is quite large relative to Y, this can be very useful.

### Problem Garbled Email:

Gagan just got an email from her friend Jorge. The email contains important information, but unfortunately it was corrupted when it was sent: all of the spaces are missing, and after the removal of the spaces, some of the letters have been changed to other letters! All Gagan has now is a string S of lower-case characters.

You know that the email was originally made out of words from the dictionary described below. You also know the letters were changed after the spaces were removed, and that the difference between the indices of any two letter changes is not less than 5. So for example, the string "code jam" could have become "codejam", "dodejbm", "zodejan" or "cidejab", but not "kodezam" (because the distance between the indices of the "k" change and the "z" change is only 4).

What is the minimum number of letters that could have been changed?

My Solution:

This was the problem I actually attempted... and while I produced valid results for the small example test cases, it was far too slow to actually finish even the small dataset.  Funny, I work at an email company, and have a coworker named Jorge... I couldn't resist trying this problem instead of the second.  I'll present my ideas here, but would be interested in seeing what the correct approach is to this problem from someone else.  It reminded me of a problem I'd done on my blog a few months ago... a Morse Decoder... so I felt like I could reuse some of those ideas.  The wildcard matching needed seemed to make this problem much harder... so I went about setting up the dictionary in a tree, where each node of the tree contains a dictionary mapping the next character to the list of possible nodes, as well as a list of all sub nodes.  When inserting the dictionary into this tree, I broke it up into N trees, where each tree was of words of length n.  The tree class then had a function for returning the distance between itself and a given substring from the test case... that code looked like this:

Here, scoreSoFar is initialized to 0, and bestSoFar would be the length of the word (where we would have to change all letters to match something in the dictionary).  Then this code was used in a recursive search, where we'd map out all possible first word fits in the test case, and recursively search the remaining part, keeping the minimum score of that.  I tried caching as I went for reuse, but that didn't really help. This was just way too slow.  Interested in feedback on successful solutions.

So those are my thoughts on Round 1B of the code jam.  Last chance for me is 1C... if I can stay awake that late.

## Saturday, April 27, 2013

### 2013 Google Code Jam Round 1A

Well that was a bust.  I figured out how to do the first problem within the first 15 minutes, and spent the rest of my time screwing it up.  The problem is as follows...

### Problem

Maria has been hired by the Ghastly Chemicals Junkies (GCJ) company to help them manufacture bullseyes. A bullseye consists of a number of concentric rings (rings that are centered at the same point), and it usually represents an archery target. GCJ is interested in manufacturing black-and-white bullseyes.

Maria starts with t millilitres of black paint, which she will use to draw rings of thickness 1cm (one centimetre). A ring of thickness 1cm is the space between two concentric circles whose radii differ by 1cm.
Maria draws the first black ring around a white circle of radius r cm. Then she repeats the following process for as long as she has enough paint to do so:
1. Maria imagines a white ring of thickness 1cm around the last black ring.
2. Then she draws a new black ring of thickness 1cm around that white ring.
Note that each "white ring" is simply the space between two black rings.
The area of a disk with radius 1cm is π cm2. One millilitre of paint is required to cover area π cm2. What is the maximum number of black rings that Maria can draw? Please note that:
• Maria only draws complete rings. If the remaining paint is not enough to draw a complete black ring, she stops painting immediately.
• There will always be enough paint to draw at least one black ring.

My Solution

The key here is writing a generalized expression relating area to number of rings...  Lets start with the area for the ith ring... this will be the area of the circle with a radius going to the outside of the black paint, minus the area of the paint just inside the black line...

 $\begin{matrix} A_{in_{i}}= & \pi R_{i}^{2} \\ A_{out_{i}}= & \pi \left ( R_{i} + 1 \right )^{2} \\ A_{ring_{i}}= & A_{out_{i}} - A_{in_{i}} \\ A_{ring_{i}}= & \pi R_{i}^{2} + 2 \pi R_{i} + \pi - \pi R_{i}^{2}\\ A_{ring_{i}}= & \pi \left ( 2 R_{i} + 1 \right ) \end{matrix}$ Equation 1: Area of ring #i

Here, R is the radius just at the inside of the ith ring, and we know that the ring radius will be one centimeter larger.  Next, lets formalize the radius with respect to the ring number...

 $\begin{matrix} R_{i}= &R_{0}+2i \\ A_{ring_{i}}= & \pi\left ( 2R_{0}+1+4i \right ) \end{matrix}$ Equation 2: Incorporating formalized Ri
Now all we need to do is sum all N of our rings... which means summing from 0 to N-1, the way we formulated our answer...

 $\begin{matrix} A= & \sum_{i=0}^{N-1}A_{ring_{i}} \\ A= & \sum_{i=0}^{N-1}\pi\left ( 2R_{0}+1+4i \right ) \\ A= & \pi\left (2 R_{0}N +N+4\sum_{i=0}^{N-1}i \right ) \\ A= & \pi\left (2 R_{0}N +N+4\frac{\left ( N-1 \right )N}{2} \right )\\ A= & \pi\left (2 R_{0}N +N+2N^{2}-2N \right )\\ A= & \pi\left (2N^{2}+\left ( 2R_{0}-1 \right )N \right ) \end{matrix}$ Equation 3: Total Area of paint as a function of N rings
Equation 3 gives us a total area... and the problem specifies that 1 ml of paint covers an area of pi... we get a final answer as follows...
 $\begin{matrix} \pi T = & \pi\left (2N^{2}+\left ( 2R_{0}-1 \right )N \right ) \\ 0= & 2N^{2}+\left ( 2R_{0}-1 \right )N - T \end{matrix}$ Equation 4: Quadratic equation for number of rings that can be painted
So this is where I made my big mistake... I have a quadratic equation that I can stuff into the Quadratic Formula... so that is what I did.  Convert everything to double precision and solve.  Then the number of rings would be the floor of the solution to this equation.  But double precision runs into rounding errors... I'd get answers that would be correct MOST of the time, and then fail for edge cases where rounding matters, and you can bet Google checks for rounding.

However, I was convinced that I had made an algebra mistake, and so I went through this math several times, trying to figure out if I had an off by one error.  I was left with only a few minutes to work through the rest of this problem, when it occurred to me that left in the final form in equation 4, I could use a binary search to finish the search in a way that wouldn't induce round-off errors.

Unfortunately, my first attempt at the binary search for the small set didn't pass, and at this point, the round ended.  I'm pretty much convinced that my approach was correct, but I got caught in the technical details of the solution, which killed me.  I'll be back next week, trying to get through to Round 2.  But I'll have to do a lot better than this.  None the less, Google Code Jam is always my favorite programming event of the year, and this is no exception.  Cheers, Google!

## Saturday, April 13, 2013

### Google Code Jam Qualification Round

Today was the qualification round for the 2013 Google Code Jam... always my favorite programming contest.  I decided to make an attempt at the first three problems, and didn't attack the last problem, given the points I figured a depth first search was too naive to work.

The first problem was a 4x4 tic tac toe game, with wildcard squares.  The goal was just to determine if there was a winner, or if it was a draw, or if the game is not done yet.  I've done many chess games based on bitmasks, so I decided to use that approach here.  I created the masks by hand, they are easy enough to figure out.  The problem is then solved with 10 bitwise checks per side, and an additional draw check.

The second problem was about figuring out if a given lawnmower pattern could be achieved given certain rules... first, the lawnmower could only make straight line cuts perpendicular to one of the edges.  Second, the height of the blade could only be set at the start of an edge.  The lawnmower had to be driven all the way across the grass, at which point you could drive the lawnmower across from another edge starting point.  The key insight was that for any given patch of grass, the height had to be greater than or equal to the max height in the row, or the max height in the column.  Finding the max for each row and each column has to be done only once, and then once through each patch to do the check, leads to O(n*m) time.

The third problem was sneaky .. but I've been messing around with palindromes for fun with some coworkers, so we had figured out how to get them efficiently as a sequence.  The problem asks us to find all numeric palindromes which are also squares of numeric palindromes.  So for instance, 121 is a palindrome, which is 11^2, where 11 is also a palindrome.  The hard part about this problem is the interval bounds.  The small set is only squares less than a thousand, but the middle set could be all such numbers from 0 to 10^14, and the large set was numbers from 0 to 10^100.  I gave up on the large data set immediately, figuring that it would take some mathy trick to get it to pass.  The key insight for the other two data sets is first recognizing that instead of looking for numbers to be taking the square root of, you start with smaller palindromes, and check to see that the squares are also palindromes.  This means we really are only searching for numbers less than 10^7, which is much more tractable.  In fact, there are only 10999 palindromes less than 10^7, so I went with the approach of just finding all of the requested numbers in the full range up front.  Then, for each interval, just test how many in the list fall within the bounds.

I didn't attempt the last problem, but I did notice a couple of things that could have helped improve the performance of a solution.  This problem was described as having a set of locked chests, each which may contain some keys, and each chest can only be unlocked by certain classes of keys.  You start with at least one key, and need to check if you can unlock all the chests.

If you go with a depth first search approach, don't include the empty boxes in the initial search.  They are outside the critical path, and could be added back in to the critical sequence after the fact, maintaining the "lexigraphical order".  Second, at each node of the search, you should probably check if there are enough reachable keys left to unlock any given box... this is a more directed search, and could cut off much more quickly.  Third, it may be possible to identify if the graph of chests is fully connected graph up front, and immediately fail without searching for an unlock order if it isn't connected.  Other than that, I didn't take a crack at this one.

I hope everyone is enjoying the code jam so far, and am interested in hearing other approaches.

## Monday, April 8, 2013

### A fixed capacity cache with an LRU eviction policy in C#

One of the critical strategies in creating a system that performs well at high scale is effective caching.  In distributed systems, caching can be your best defense against overloading a database, and an effective solution to temporary database failures, so you can keep servicing requests during downtime.  Our services often provide a cache timeout per endpoint, depending on how fresh the data needs to be for the system to function reliably.  However, the cache size can grow unwieldy if there are many unique calls to an endpoint in a short period of time, when only a subset of the calls are likely to benefit from the caching.

The solution is a cache of a fixed size, which evicts items using a least recently used policy when capacity is reached.  Requirements are that the cache support O(1) inserts and reads, and manage the freshness of the data internally.  The user should be able to specify a capacity of the cache, as well as a staleness timeout.  The interface in my example would be a string-string key value store, with functions for set(key, value), and get(key).  Expired or non-existing items return null strings.

What makes this a fun problem is understanding how to mix data structures to achieve these goals.  Constant time lookups and inserts mean you want a hash table to be your primary storage.  To maintain an order, we need to store our nodes in a list... but it is important to make sure removing nodes from the list, as well as adding either to the head or tail of the list occur in constant time.  A dequeue is appropriate for this goal.  In my design, I'll store the dequeue nodes in the hash table indexed by the key, and a tuple containing the key, value, and timeout inside the dequeue nodes.

My solution is below:

## Thursday, March 7, 2013

### Lessons from Sim City Limited

So a blast from the past was released this week.  The new SimCity, descendant of one of my great childhood joys, SimCity 2000.  I pre-ordered this game, and had it running the night I got it.  But fate is a cruel mistress... Or more accurately, products released before they are ready are cruel mistresses.  I had trouble at just about every step of the pipe.  My install failed on my Microsoft Surface Pro, claiming that it was a virtual machine and thus, ineligible.  After installing it on my desktop, it crashed about every 10 minutes, and did nothing to restore games from a checkpoint.  After a patch came out on day 2, the servers became impossible to log in.  It is clear that EA had a deadline to be hit come hell or highwater, and their engineers were left to compromise the product's quality.  This is a dangerous game to play.

Traditional project management is guided by the notion that a project can be characterized by three axis which can be traded on, where those axis are resources allocated to a project, time allotted, and feature scope.  Management's balancing act is figuring what compromises can be made with two of these axis to allow a third column to extend out, thus making a quality product achievable.  Problems manifest when too many of these axis become fixed in scope.

The first problem I would argue is the cost axis.  We all recognize that there are non-linear costs associated with adding people to projects.  Larger teams have more complicated communication structures, require more time integrating pieces developed independently together, contain more bugs because of failed assumptions at the interfaces, and have the problem of running into irreducible complexities.  Nine women can't make a baby in one month.  Brook's law, so eloquently stated in the Mythical Man Month is that adding late people to an already late project makes it later.  New people have a spin-up time which lowers the productivity of everyone else on that project.  The only real time to trade on this axis is at the start of a project.  If ever you get way ahead, it is much easier to remove people from a project than add them to it.  The effect of this is typically that cost really moves to the center of the triangle, and teams trade on quality, even as every fiber of their being scream that this should not be traded on.

Time is the hairy beast on the triangle.  Engineers are bad at projecting how long something is going to take.  Even over short intervals such as Agile sprints, it is hard to gauge the timelines accurately.  To compound the problems, engineers usually error to the side of estimating low.  That means that management is really only supposed to be left with scope to trade on, when timelines begin to balloon.  And the pin that pops this bubble is how stakeholders force teams to be deadline driven.  Customers have expectations as to when a feature should be released.  Management needs a backbone to push back on customers if expectations have been already made.  Release schedule should be the deep dark secret that customers don't see coming, giving the time tent pole more room to grow.

Being able to trade scope requires developers to heavily focus on loosely coupling their products, so simpler solutions can be dropped in place of planned ones.  Developers often lack the foresight to adequately plan for these contingencies, and waterfall projects tied to long term design plans can fail if dependent features are incomplete when a deadline is reached.  If any given feature is missing come deadline day, it has the potential to cascade down the line, blocking feature after feature.

Quality starts out as the least tangible aspect for management.  They have their budgets projecting costs, their requirements detailing feature requests, and their timelines that they rigidly stick to.  Code rot can quietly be swept under the rug.  Test plans can be silently neglected.  Acceptance criteria can be too incomplete to fully capture a feature.  And it isn't until release, when many hands are all touching your product, then you realize what was traded on over the course of your product.

I appreciate the fresh look Agile has for dealing with this triangle.  By always having a stable product that can be re-evaluated every sprint, you make tangible the quality part of the triangle, and can force it to remain high.  By reducing the features to smallest working extension, you can shrink the timescales allowing them to become more predictable for engineers.  By allowing engineers to decide the time frame on a feature by feature basis, it helps deconstruct artificial deadlines that serve to press against quality.  And finally, it forces product owners to focus on trading on the right part that managers should really be focused on in the first place: features.  Prioritization becomes their sword, because it is the most effective weapon a PO or PM has in their arsenal.

Sim City obviously has traded on quality, and that is going to cost them a lot of money, as well as make them an example for others to learn from.

## Saturday, March 2, 2013

I love the Code Jam... I've done it every year since I first discovered it in grad school.  As a competitive person, I was always looking for an edge that could improve my scores.  With Google Code Jam right around the corner, I thought I'd share some of the tips I've picked up over the last several years.  Feel free to include them into your contest.

• Optimize your work space for efficiency.
With the exception of the first round, you'll have a really limited time frame to work with, and you want to avoid as many disturbances as possible.  I like to close all the miscellaneous windows I generally keep open, save for some music and maybe a quick window for Google searches, in addition to my IDE and a window with the problem sets.  I have some water handy, and anything that could distract me during the contest hours conveniently taken care of.
• Write your boilerplate before the contest.
All Google problems look exactly the same from an IO standpoint. You read some data from a file, perhaps multiple lines per problem.  You parse the data into their respective data types, do some real work on the data, and then output to a file a line starting with a leading number, followed by whatever answer you are going to give.  Typically, they have three problems for you to work on, so I have a project set up with all three problem classes already set up to do some generic parsing of a simple problem, and generic output of whatever you are going to solve.  With three hours available for critical thinking, these presious minutes can be the difference between a working solution and an almost finished solution.
• Have a resource file easily available to paste in the free test data they give you
Google problems all give you some example inputs and outputs as a starting point for what your program should be able to do.  In the boilerplate code, I like having an empty text file pre-wired to be parsed as the input file of choice, which I can just paste the examples in and run against my program.  I also have a pre-wired text file for the output, already open, which automatically refreshes when I run my program.  It gives me instant feedback as I am writing my algorithms.
• Read all the problems before you start.
This is more of a generic test taking strategy, but it applies well here.  Google assigns different points to different problems.  Typically in the first round, a score large enough to get you on to the next round doesn't have to be a perfect score... but it also typically isn't the score of the easiest problem either.  Also, I feel that reading all the problems before starting gives your mind a chance to process a bit before you jump into coding.
• Have a plan of attack
Unless there is a very obvious optimal type answer, you'll need to think about how you are going to attack a problem before you jump into it.  The big points require optimized algorithms.  O(n^2) is the quickest way to not having a working answer.  Diagram out on a piece of paper the implications of what you want to achieve.
• Know where to find your file I/O.
• Fuzzy test your big data sets before submitting.
This will be a new one for me this year... but it is something that I tried in the Facebook Hacker Cup.  Before attempting the large dataset, you need to know the performance pain points of your algorithm.  The best solution I've come up with is to Fuzzy Test your algorithm against the boundaries supplied in the question.  Making sure your fuzzy testing module tries various combinations on the boundaries especially, because this is where the tricks occur.

Any additional suggestions you may have about how to improve your Google Code Jam experience, I'd love to hear.  Good luck in the 2013 Code Jam!

## Sunday, February 24, 2013

Google Code Jam is right around the corner!  registration will open on March 12, 2013... this is always my favorite programming contest each year.  I'll be trying to get a group of coworkers to participate in the action.  Last year, I reached the second round, and actually had one of the better scores in round 1C, which was the best round of my life.  Hope to see you there!

## Thursday, February 14, 2013

### Scalatron

Today, I was introduced to "Scalatron", a kind of virtual robot wars as a platform to teach programmers about Scala.  Its a fascinating concept, and appeals to my competitive nature in the programming arena.  Additionally, I think I can use this as a virtual world to validate some ideas I have about emergent intelligent behavior.

Watching some of the competitions on youtube, there seem to be a number of ways you can set up the rules to the game.  I like this particular one, which introduces food, both animal and vegetable, poison, and predators as part of an environment where you try to prove your bot the best.

If you are looking for an excuse to learn a new language, this may be as good a reason as any!

## Saturday, February 9, 2013

### Beginning a model for emergent intelligent behavior

 A single neuron under an electron microscope
One of the most fascinating mysteries of life is the nature of intelligence.  How is the brain able to make sense of the world around it?  How is able to use that information to make intelligent decisions?  How is a conglomeration of individual neurons, each largely functionally identical, able to link together into networks representing complex associations, abstract ideas, knowledge about the very patterns that make up the world around them?  I'd like to start a collection of posts describing my understanding of how these structures could be modeled to produce intelligent behavior.

 Anatomy of a Neuron

The arena of Artificial Intelligence actually has a lot to say about the mechanisms by which intelligence might work.  Early pioneers of the field examined the structures of neurons,  like the one pictured above.  Dendrites, which are short extensions emerging from the cell body, act as receptors of chemical-electrical inputs.  The axon acts as the emitter of electrical information, where its endings are significantly further away from the cell body.  When enough electrical activity builds up through simultaneous pulses to the dendrites, a signal will fire through the axon.  Neurons form electrical networks through overlapping regions of dendrites and axons.  But an arbitrary electrical network is not that interesting.  What is the guiding principal which gives neural networks that plasticity which allows the network to mimic patterns?  One of the long standing theories about how these adaptations occur was formed in 1949 by a man named Donald Hebb.

Hebbian Learning
The basic idea of Hebbian Learning is captured in the expression, "cells that fire together, wire together".  When a synapse fires from an input neuron, and correlates to the firing of the output neuron, the dendrite connection to the input axon grows stronger.  Inactive dendrites over time grow weaker.  This provides the insight to describe a mathematical model for how connections on an artificial neural network can be updated such that any neuron naturally mimics the pattern of electrical activities at a neuron's outputs given only its inputs.  Suppose you have a neural network that looks as follows:

 Model of a single neuron

Here, the neuron's dendrites are represented as the X inputs on the left, which form a linear combination to produce the output Y on the right, representing the axon of a neuron.  The W weights represent the strength of the connection for any specific dendrite.  When some pattern of X inputs are fired, the weights are updated as follows:

This equation says the weights get stronger when the input X and the output Y are the same, and gets weaker when they are different.  The N parameter is a learning rate.  From our diagram, we see that the linear combination comes together at f, which is a nonlinear function which acts to normalize the range of outputs.  To mimic the activity of a real neuron, usually some kind of step function is used, so that positive linear combinations produce a 1 at the output, and negative combinations produce a 0 at the output.  If you were to train such a neuron with example inputs and outputs, you would eventually get a neuron that given only the inputs would predict the output that most closely matches that in its training set.  This is called a Perceptron network.  Such a network would be capable of accurately predicting any set of data that is linearly separable in the input space, but would by itself not be good at identifying more complex patterns.

 Multi-layer Perceptron

Now suppose we take these neurons, and chain them together to form a more complicated network.  Here, the network takes input impulses and maps them to several neurons, which in turn map their axons to two more output neurons.  This feed forward multi-layer neural network is able to identify more complicated non-linear patterns.  The math to apply Hebbian updates to the weights in this network is also more complex, as the learning starts from the output layer and propagates backwards towards the input layers.... leading to the term "Backpropagation".  So with only a simple feed forward neural network, we are able to create a structure capable of identifying patterns.  But the brain does so much more... it is able to store and recall memories, and learn to make decisions to its own benefit.  How does this happen?

 Hopfield Network

Hopfield Networks
The next piece of the puzzle are a special type of neural network known as a Hopfield network.  A Hopfield network is different from a feed forward neural network because it allows for outputs from a neuron to have feedback loops to itself.  As inputs are fed into such a network, the outputs loop back to the inputs, providing a new training input for the network to adapt to.  This feedback repeats until the network converges on a stable solution.  One of the properties of such a network is that is acts as a type of memory.  If you give it an incomplete set of inputs, or if the inputs are partially garbled, it will naturally converge on the actual memory that most closely resembles the inputs it was fed.  For instance, if the inputs were pixels of an image, and we only provided the left half of the image as an input, it wold naturally reproduce the right half of the image (as seen below).

 Hopfield network memory recovery from partial input
Again, there is a Hebbian learning learning rule for updating this type of network,  as follows:
which says that when you have P neurons that fully feed back into themselves, the weight between neuron i and j can be calculated as the sum of the inputs Xi, Xj for each of the training examples.

What Next?
So we now have a basis of understanding for how neurons form networks, and how the networks can adapt their connections through Hebbian learning to allow the network to build pattern recognizers.  We have a mechanism for the network to store memories in an auto-associative fashion.  But we have a problem here... how does the brain learn good from bad behaviors?  Imagine you started walking towards a wall.  Eventually you reach this wall, but failing to stop, you bonk your head.  A pattern recognizer, trained to recognize the circumstance, may be capable of recognizing that when you reach the wall, your head will hurt.. but it wouldn't care one way or the other about the outcome it predicted.  Artificial intelligence typically prescribes that a teacher separates good behaviors from bad behaviors, and trains the neural network to avoid bad behaviors.  I believe this to be an incomplete explanation, as our brains do not have the luxury of a teacher to predefine all good and bad experiences for us.  This is an emergent property of the mind.  What I would like to explore is what mechanisms could exist to produce this emergent property, and come to a hypothesis of how intelligent behavior is formed.

Stay tuned for more to come...

## Sunday, February 3, 2013

### Facebook Hacker Cup 2013 Round 1 Problem 1

Wow, I have to say I was at first skeptical of a 24 hour round 1 in the Facebook Hacker Cup... but these problems were quite challenging. I'm going to break this up into three posts, one for each of the problems I attempted.

The first problem was as follows...

You are given an array a with N ≤ 10 000 different integer numbers and a number, K, where 1 ≤ K ≤ N. For all possible subsets of a of size K find the sum of their maximal elements modulo 1 000 000 007.

The more intuitive explanation was from the problem statement... imagine that you have N cards in a deck, each representing a different value.  You randomly select K cards to make your hand, and the strength of your hand is equal to the card with the highest value... so for instance if you had the hand (2, 4, 6, 8)... your hand would score as 8.

Now given this, you want to consider all possible hands, and figure out the average score.  More specifically, you just want the sum of all hands mod 1000000007.  To solve this problem, you are given N, K, and the array A containing all the values in the deck.

My approach was to solve the problem looking at each element of A, asking how many possible hands exist given that this card is the highest in the hand.  So imagine we have A sorted as follows, with K = 3 and N = 10....

(2, 5, 6, 8, 9, 12, 15, 17, 19, 20)

if we consider how many K=3 card hands could have 15 as the highest card, we recognize that we need to select two cards from the set to the left of 15.  So in this case, we want 2 Choose 6 to fill out the remainder of the hand ( remember that K Choose N = N!/(K!*(N-K)!) ).. so in this case there are 15 possible 3 card hands from this set where 15 is the highest card.

Generalizing this to sum all possible hands together, and taking into account we get the main part of the program looking something like this...

public String Solve(int n, int k, List a)
{
int mod = 1000000007;
var answer = a.OrderBy(i => i).ToList().Select((strength, i) => (combinations(i, k - 1, mod) * (strength % mod)) % mod).Sum() % mod;
}


The next thing to note is the nature of big numbers... factorials combined with the sizes of the data sets means that you must be capable of taking care of the modulo requirement mid calculation.  First, we recognize the large prime number provided in the problem means we can structure the answer in terms of congruent answers...  I found this cheat sheet from U.C. San Diego to brush up on rules of modular arithmetic... and the section one rules tell us that if our binomial coefficient operator produces a congruent value mod the given modulo, we can just keep modding at each step of the operation to get an equivalent answer.  I had to look up an example of fast binomial coefficient generation, since my attempts overflowed with big numbers or were inaccurate... I found my baseline on stackoverflow here.

So that was my answer to the first problem... more to come with my approaches to the other two problems.

## Wednesday, January 30, 2013

This weekend during our company trip to Mexico, a group of us decided to participate in the qualification rounds for the Facebook Hacker Cup, 2013.  I missed last years, but love these types of competitions.  Its significantly smaller than the Google Code Jam, but the format is similar.  There were three problems in the qualification round, if you got any of them, you moved on to the next round (which I believe is this coming weekend).  You can use whatever language you'd like to answer the questions, as you download an input file which you have to provide outputs for within a given time frame (in this case, 6 minutes).

The first problem, called "beautiful strings", was to find a one to one mapping between letters and values 1-26, and return the maximum possible score that a string could achieve.  I ran into problems misinterpreting the problem in a variety of ways... first assuming that it was a global one time mapping that produced the scores, and then assuming that the scores were for the best word in the string.  Once that was cleared up, the solution came quickly.

The second problem, called "balanced smileys", was to determine if a string could be interpreted as having balanced parenthesis if you were allowed to interpret emoticons either as a face or as a colon parens combo...  I had some flashbacks to a recursive descent parser we did in school, so writing such a parser with a lookahead was pretty easy, and it only took maybe another hour to do that part.

At this point, we were getting heckled by coworkers for wasting our vacation on programming problems, and spent the night drinking and partying... the qualification round was something like 72 hours, so we had to stop.  The plan was to come back and finish it the next day...

But that night, the norovirus struck the resort hard, myself included.  The next few days were spent battling that rather than facebook problems.  Too bad, would have loved to get that last problem knocked out.

Looking forward to the next round, coming up this weekend...  Hope to see you there!