Saturday, December 15, 2012

Golden Section Search

Golden Ratio seen in nature
  Last time, I demonstrated how to implement a gradient descent optimization algorithm, which is widely used in artificial intelligence to optimize neural networks.  It is versatile, able to accept any multidimensional input function, and simple to understand and implement.  However, you wouldn't be likely to find this in a situation where performance matters... it takes many iterations to find a local minimum.  If it is expensive to execute your function being minimized, you rapidly discover the importance of performance in optimization, and hence the large number of approaches.  Another drawback is that Gradient Descent doesn't provide a simple mechanism for constraining the search.  If you know the region where a minimum must exist, you need to look for another approach.

  Enter the Golden Section Search, a one dimensional search tool with a number of benefits over gradient descent.  First, it is a class of minimization techniques which work by reducing a bracket containing a minimum.  This means we can preset the bracket based on prior information, which can be a major upside.  Second, it is typically more efficient than a gradient descent search.  The downside is that it is limited to a single input dimension, although it can be combined with a conjugate gradient search method, minimizing a single value in the current search direction.

Golden Section Search Diagram

  So how does this approach work?  Lets start with the above graph, where we have some unknown function f(x), and three points (x1, f1), (x2, f2), (x3, f3) which we already know something about.  x4 is our next candidate x value, and f4a and f4b are possible y values for this candidate.  We remember from the extreme value theorem that there will be a local minimum within a bracket when there is a point inside the bracket that falls below the line connecting the bracketing points.  In this case, f2 clearly falls below the line connecting f1 and f3.  Now we pick our bracketing points in a way that there will always be a slightly bigger side, and our next x value will be in the side with the bigger boundary.  In this case, b is bigger than a (along the x-axis in the diagram), so our next point will be to the right of x2.  Now with your minds eye, you can think of a parabola passing through f1, f2, and f4a, and you can see that the minimum value is in between f4a and f1... where as there would be a maximum if you made a parabola between f2, f4a, and f3.  Because of this, you would use x1, x2, and x4 as your next three bracket points.  If instead we were to do the same thing with f4b, the bracket would instead work out to x2, x4, and x3.  This process is recursively repeated until your bracket is small enough to achieve a desired precision.

Now some code.

    public delegate double Function1D(double x);

    public class GoldenSectionSearch
    {
        private readonly double _goldenRatio = 2.0 - (1.0 + Math.Sqrt(5.0))/2.0;
        public double Threshold { get; set; }

        public GoldenSectionSearch()
        {
            Threshold = 0.001;
        }

        public double Minimize(Function1D function, double lowerBound, double upperBound)
        {
            var f = CachedFunction(function);
            var midpoint = StartingSearchPoint(lowerBound, upperBound);
            return Minimize(f, lowerBound, midpoint, upperBound);
        }

        private double Minimize(Function1D f, double lower, double midpoint, double upper)
        {
            var x = NextSearchPoint(lower, midpoint, upper);
            if (IsSearchDone(lower, midpoint, upper, x))
                return FinalPosition(lower, upper);

            if (IsMinimumInNewSearchBracket(f, midpoint, x))
            {
                if (IsUpperBracketLarger(lower, midpoint, upper))
                    return Minimize(f, midpoint, x, upper);
                return Minimize(f, lower, x, upper);
            }
            else
            {
                if (IsUpperBracketLarger(lower, midpoint, upper))
                    return Minimize(f, lower, midpoint, x);
                return Minimize(f, x, midpoint, upper);
            }
        }

        private Function1D CachedFunction(Function1D function)
        {
            var cache = new Dictionary();
            Function1D f = (x) =>
            {
                if (cache.ContainsKey(x))
                {
                    return cache[x];
                }
                var result = function(x);
                cache[x] = result;
                return result;
            };
            return f;
        }

        private double StartingSearchPoint(double lower, double upper)
        {
            return (upper - lower)*_goldenRatio + lower;
        }

        private double NextSearchPoint(double lower, double midpoint, double upper)
        {
            if(IsUpperBracketLarger(lower, midpoint, upper))
                return midpoint + _goldenRatio * (upper - midpoint);
            return midpoint - _goldenRatio * (midpoint - lower);
        }

        private bool IsSearchDone(double lower, double midpoint, double upper, double x)
        {
            return Math.Abs(upper - lower) < Threshold*(Math.Abs(midpoint) + Math.Abs(x));
        }

        private double FinalPosition(double lower, double upper)
        {
            return (upper + lower)/2.0;
        }

        private bool IsUpperBracketLarger(double lower, double midpoint, double upper)
        {
            return (upper - midpoint > midpoint - lower);
        }

        private bool IsMinimumInNewSearchBracket(Function1D f, double midpoint, double x)
        {
            return f(x) < f(midpoint);
        }
    }

Some explanation here...

First we use a function delegate to force the user to provide a 1d function to minimize.  It is less flexible than the gradient descent delegates, but a necessary feature of this algorithm.

Inside the public Minimize function, we wrap the input with a caching decorator, so that any value we have already evaluated is reused.

The search termination criteria in the IsSearchDone is somewhat confusing, but a recommended formula by the numerical recipes guys.  The gist is we say a minimum is found when the width of the bracket is smaller than a precision that we care about.

We use the "golden ratio" to pick the size of our bracket steps.  This gives us special properties when we are reducing the size of the bracket to make sure the ratios never change, and also minimizes the number of steps required to reach the minimum.  It is a surprising result that it is actually more efficient than bisection.

The private Minimize helper function is the workhorse, where the recursive search happens.  Each iteration, we check to see if we are done, and if not, reduce the bracket, and pick a lower, midpoint, and upper bounds for the next iteration based on the intuition described above the code.

So now a test to show it in action....

        [Test]
        public void GoldenSectionSearchWorks()
        {
            var gss = new GoldenSectionSearch { Threshold = 0.0001 };
            const double lowerBounds = -10.0;
            const double upperBounds = 10.0;
            Function1D f = (x) => Math.Pow(x - 4.0, 2.0);
            var solution = gss.Minimize(f, lowerBounds, upperBounds);
            Assert.AreEqual(solution, 4.0, 0.001);
        }

And we are done!  A C# Golden Section Search algorithm.

Saturday, December 1, 2012

Gradient Descent



Whats the right plan to achieve our goals?
  I've come across this story many times in my career... tell me if it sounds familiar to you:  I'm working on a program that has to make a decision about what to do next.  I can evaluate the outcome for whatever decision I make, but I can't predetermine which actions will lead to the best solution.  This is a situation that calls for numerical optimization techniques.  The use of numerical optimization is pervasive throughout computer science.  Artificial intelligence uses it to optimize neural networks.  Engineering tools use it to improve the design of components.  Computer games use it to have the bad guys make decisions that look intelligent.  So how does it work?

Black Hole Gravity Field
  We can start by making a conceptual leap, illustrated by an example.  Suppose you are in a space ship, which you are trying to pilot to the center of a black hole.  You can't see where the black hole is, but you can feel its gravity.  You know that its gravity will get stronger as you get close to the center.  In this example, we can think of the effect of gravity on space as our optimization function.  If we graph that against our position, we get a plot with a nice hole where our black hole sits.  If we had a ball, all we would have to do is set it down on that graph, and let it slide downhill until it reached the center.

Gradient Descent


  This simple strategy is actually the formulation for the gradient descent algorithm.  If at each point, if we know the slope of our function to be optimized, we can just take small steps in the downhill direction.  Eventually, we will reach a point at the bottom of the hill, and that is where we stop.

Gradient Descent Formula
  With this understanding of what we are doing, it is actually very easy to understand the formulation.  Here, F(x) is our function being evaluated, x is our position in the optimization space, gamma is the size of the step willing to take, and n is the step number.  our position at step n+1 can be evaluated as our position at step n plus a small movement in the downhill direction.  When our update fails to sufficiently improve our evaluation, we assume that a local minimum has been reached, and the search is terminated.

Code Design

  From a usability perspective, we want a design flexible for reuse with any arbitrary function to be minimized.  We don't want the user to have to provide a second function describing the gradient of their optimization function.  And we want to allow the user to provide a good guess as a starting point for the optimization.  

        public double Precision { get; set; }
        public double Eps { get; set; }
        public int MaxIterations { get; set; }
        public GradientDescent()
        {
            Precision = 0.001;
            Eps = 0.1;
            MaxIterations = 1000;
        }

  We want to expose a couple parameters that effect the time it takes to reach a solution.  First, we expose the precision we are looking to achieve, typically a very small floating point number.  Next, we expose Eps, which is the step size we take each iteration.  This can be tuned to the scale of your inputs, but typically we still want fairly small steps.  Finally, we expose the maximum number of iterations.  This is a practical measure, as small errors in the gradient can lead to infinite loops near the minimum, while the optimizer bounces back and forth between two positions.  If we get stuck, we rely on the iterations to time out, and a satisfactory answer is returned.

public delegate double Function(List x);
public delegate List Derivative(List x); 

  To represent our optimization functions, the user must implement a function conforming to the first "Function" delegate pattern.  It is designed to map an n-dimensional input vector to a single output metric.  This makes it flexible enough to handle a wide variety of input functions.  We are going to use a second delegate to represent the local derivative, so that we can generate a functor wrapping the input function, generically providing the derivative for easy use.

        public double Distance(List a, List b)
        {
            return Math.Sqrt(a.Zip(b, (i, j) => Math.Pow(i - j, 2.0)).Sum());
        }

  We can recognize an ending connection when the next step is approximately the same location as the current step.  Note that at a local minimum, the gradient of F(x) goes to zero, and thus so does our step length.  We can use the Euclidean distance to measure how much our step has changed, and stop when this metric is sufficiently small.

        public Derivative Gradient(Function f)
        {
            return x => Enumerable.Range(0, x.Count)
                            .Select(i => (f(x.Select((y, j) => j == i ? y + Precision : y).ToList()) -
                                          f(x.Select((y, j) => j == i ? y - Precision : y).ToList()))
                                         / (2*Precision))
                            .ToList();
        }

If you reach back to your calculus class, the definition of the derivative gives us these simple formulations.  I choose to use central differencing, because it is slightly more accurate, but still only requires two function evaluations.

        public List Optimize(Function f, List seed)
        {
            var fPrime = Gradient(f);
            var cnt = 0;
            var x_old = seed.Select(x => x + Eps).ToList();
            var x_new = seed;
            while (Distance(x_new, x_old) > Precision && cnt < MaxIterations)
            {
                x_old = x_new;
                x_new = fPrime(x_old).Zip(x_old, (i, j) => j - Eps*i).ToList();
                cnt++;
            }
            return x_new;
        }

Finally, we have our actual optimization function.  the "seed" variable represents the starting point of the search, and good guesses can dramatically impact the performance of the algorithm.  Also, since we generically defined the Function delegate to take an arbitrary sized vector as input, we need to use the seed to infer a specific size to the inputs of our function.

        [Test]
        public void OptimizerWorks()
        {
            var gd = new GradientDescent();
            var seed = new List {0.0, 0.0};
            var solution = gd.Optimize((data) => { return Math.Pow(data[0] - 2, 2.0) + Math.Pow(data[1] - 4, 2.0); },
                                       seed);

            Assert.AreEqual(solution[0], 2.0, 0.001, "best x found");
            Assert.AreEqual(solution[1], 4.0, 0.001, "best y found");
        }

Now, we have a test showing how to use the module, and voila, a simple C# Gradient Descent module!




Tuesday, November 27, 2012

A Morse Decoder



I couldn't resist tackling this puzzle at home, given that it is named for my distant relative, Samuel F.B. Morse.  My colleague brought it in as a potential onsite interview question for a candidate we were considering.  I can't help but be addicted by puzzles anyways, I love the challenge.



The premise is this:  You have found a stream of dots and dashes which someone has told you is a Morse Code representation of some phrase.  The problem is that Morse Code uses a timing gap to represent spaces between letters, and the spaces were ignored here.  If you were given a list of possible words, and a hash table mapping of Morse style dots and dashes to the alpha character they represent, could you recover the original phrase?



This problem really has two steps... recover possible encoding of Morse code to letters, and then breaking the letters up into possible words that could match the letters.  Both of these steps are actually the same procedure with a different mapping, so we may be able to achieve our goals with a single function reused multiple times.  Clearly the input stream could be thought of as a tree of possible mappings, and this lends itself well to a recursive approach.  Each character that has a Morse encoding that could match to the front of the stream is a potential branch.  If we assume the phrase is only complete words, we throw out any leaf where we couldn't completely find valid mappings all the way to the end of the string.  Likewise, we through out any letter mapping that doesn't finish with a valid word.

My approach is to create a class that has properties for the MorseCode characters, and the list of words making up the dictionary, so that one instance could be reused to recover phrases without having to constantly pass the same mappings.  I'm going to have a public "DecodePhrases" method that takes a string of Morse Code charactors, and returns an enumeration of possible valid solutions.  I have one private helper function which takes in some additional parameters about the node of the tree that we are currently at, and reuse this function for both encoding steps.  My code came out as follows:

    public class Decoder
    {
        public Dictionary Words { get; set; }
        public Dictionary MorseLetters { get; set; }

        public Decoder(List words, Dictionary morseLetters)
        {
            MorseLetters = morseLetters;
            Words = new Dictionary();
            words.ForEach(w => Words[w] = " " + w);
        }

        public IEnumerable DecodePhrases(String input)
        {
            return GenerateCandidates(input, "", MorseLetters)
                .SelectMany(letters => GenerateCandidates(letters, "", Words))
                .Select(phrase => new String(phrase.Skip(1).ToArray()));
        }

        private IEnumerable GenerateCandidates(String input, String candidate, Dictionary validCandidates)
        {
            if (input == "")
                return new List(){candidate};
            return Enumerable.Range(1, input.Count())
                .Select(i => new
                                 {
                                     inp = new String(input.Skip(i).ToArray()),
                                     can = new String(input.Take(i).ToArray())
                                 })
                .Where(c => validCandidates.ContainsKey(c.can))
                .SelectMany(c => GenerateCandidates(c.inp, candidate + validCandidates[c.can], validCandidates));
        }
    }

I followed it up with some NUnit tests to just make sure I wasn't full of crap...

        [Test]
        public void TestSolutionThisIsATest()
        {
            var words =
                "this is a test of the emergency broadcast system.  it is only a test."
                .Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).ToList();


            var testPhrase = String.Join("", "thisisatest".Select(c => MorseLetters.Where(p => p.Value == c.ToString()).First().Key));

            var decoder = new Decoder(words, MorseLetters);
            decoder.DecodePhrases(testPhrase);

            Assert.AreEqual("this is a test", decoder.DecodePhrases(testPhrase).First(), "properly decodes the message");
        }

And there you go! A simple function for decoding Morse Code when spaces and stops are omitted.

Tuesday, November 20, 2012

Interviewing Engineers


  This past week, I participated on the interview committee for three candidates at our company.  I've been on both sides of the interview process, both at big and small companies, and am convinced that the industry does a poor job of identifying stellar candidates.  Too many unqualified candidates are able to slip through our preliminary screens because people will fill out their resume with buzz words which they don't really understand.  Too many people slip through phone screens because the questions asked are too simple, and don't adequately separate good from the bad.  Too many good candidates are screened out by questions testing development in a way that doesn't match real programming.  Finally, too many bad candidates are simply given a pass because a binary evaluation  in a vacuum of information favors the candidate.  So what kind of steps could the industry take to insure the best candidate gets the job?
Good candidates should be found by looking for the right things


  First, where do we find our candidates?  Companies pull from a wide list of sources, some of which are more reliable than others.  Recruiting services and job fairs represent the shotgun approach, leading to the full gambit of qualified and unqualified candidates.  It is far too expensive to adequately reduce this list to candidates who would be a good fit, so they should just be eliminated.  Niche communities of programmers represent candidates who get involved in their craft at a greater depth, and could serve as a more effective source of recruiting.  Job boards on developer sites like StackOverflow, or from communities like TopCoder have a built in bread crumb trail to help validate a candidate before ever even looking at their resume.  Good candidates stick out, and that should be your first screen.  Another primary source of candidates should be your dev evangelists.  Engineers who attend tech talks outside of working hours have demonstrated a level of commitment beyond the average developer.  And if they are going to a talk about your own product, they probably already like you, and are ahead on the learning curve.

Good candidates should be able to make good designs


  Second, what questions should we ask them during a screen?  I agree with Jeff Atwood on this point; the phone screen should weed out candidates with gaping holes in their knowledge.  There are just some broad areas of knowledge that all good software engineers know something about.  First, algorithms.  I'm not talking about remembering the details of an algorithm taught in a classroom, but the on the fly problems like reversing the order of all words in a string.  Candidates should be able to analyze the efficiency of their algorithms, and a good candidate with a  handle on different types of data structures will know the trade-offs    that lead to more optimal solutions.  Next, candidates need to show they know how to design systems.  Ideally, a candidate has a grasp of common design patterns, so if you give them a problem that just screams for a solution, good candidates will lean on that past experience.  Candidates without explicit experience with design patterns still can preform well, since design patterns just give a language to ideas that all engineers should be familiar with anyways.  Third, scripting and regular expressions.  Parsing data turns out to be an important part of so many tasks, so a good candidate will have these tools in their belt.  Fourth, bits and bytes.  If you ask a candidate to find the number of bits set to 1 in an integer, they should be able to whip through a solution that picks them out.  If you have a candidate who does well with these questions, it will go a long way to assuring that your candidate is qualified.



  Next comes the onsite interview.  I'd have three goals of things to learn about the candidate at this stage.  First, does the candidate have any specialized knowledge relevant to the position you are hiring for?  For this, you should get an interviewer who knows the technology stack in question, and can answer.  Second, can a programmer work through a more complicated problem, something that is going to take a significant amount of time?   Third, are they a good culture fit with your company?  The biggest issue I have with current industry practices is that most developers are asked to solve problems at the whiteboard, which in my opinion bares little resemblance to how a developer actually writes code.  I would ask a candidate to bring in their laptop, and solve the problem using any environment they want to use, and any online resources they need.  The problem asked can't be easy to solve, so I might pick a question similar to those used in a TopCoder SRM, or in a round of Google Code Jam, which are intended to be solved in about a half hour.

Good candidates should not be evaluated in a vacuum, but against other candidates

  Finally, comes the hire/no hire decision being asked of the interviewer.  In most instances, they aren't given any metrics against which to compare for evaluating a candidate, other than their own gut feelings.  And software engineers have a track record about getting things wrong when they go with their gut.  That is a big motivation why the Agile process was developed, as a way of giving engineers mechanisms for estimating complexity of code by comparing against the complexity of past stories.  I think similar lessons could be applied to the interview process.  If interviewers were instead asked to rank candidates they had seen before, or assign planning poker numbers to the candidates, we would gain a better understanding of how an interview experience went.  Due diligence means each interviewer should be interviewing a wide range of candidates before a final hire no-hire decision is made for a position.  It may be a timely and expensive process, but I assure you hiring the wrong candidate is much more so.

  While there may not be a foolproof way to identify good teammates, there are many steps that can be taken to make the process more efficient.  Limiting your sources to places where more good candidates live, vetting candidates based on their online bread crumb trail, asking relevant questions that test their skills in a way that compares well to how real development is done, and evaluating candidates against other candidates are just a few ideas I have about how we can approach those goals.  Better developers means better teams means better products.  This is all something we should strive to provide.

Saturday, November 17, 2012

ACM Queue ICPC Challenge


Every year, ACM puts on a month long programming contest challenging developers to pit their wits against one another in some unique type of battle royal.  Typically, the competition takes place on some kind of loosely physically based game board, where each person controls a team of agents who must cooperate in achieving some sort of collective goal, while thwarting the efforts of their opponents.  It tests many elements of a software engineer.  How well can you break down a complex problem?  How good are you at balancing tactics and strategy?  How do you coordinate your efforts?  Its an AI geeks dream.


The competition has traditionally been built off the previous ICPC collegiate challenge... so we can get a sneak peek at this years challenge here.  This year, we get to control a fleet of bumper boats, carrying a baton between way-points forming a course.  Complete the circuit more times than your opponent completes theirs, and you win.

Here is a video of some of the competition at the collegiate challenge.  There are just so many good ideas for strategies, I can hardly wait to start developing my own AI.  I'll post updates about my engine as I get going. Looking forward to another good year of competition!

Thursday, November 15, 2012

My Technical Blog

One of the core qualities of a professional software developer is to constantly strive for self improvement.  That can mean many things, from being active in local user groups, to reading technical books, to learning a new programming language every year, to starting your own blog.  I've been blessed with a few mentors that I can look to as professional role models.  These are people who take their craft very seriously, and the industry benefits from their experience.  For them, it is always about radically stretching themselves to find new and better ways to build systems and develop software, always on the leading edge.  And for them, it also means giving back to the generous community that helped support them in becoming the professionals they are today.  Well, now its my turn to take a crack at it.  I hope to use this blog both as a tool for myself to explore different aspects of being on the bleeding edge of technology, and as a guidepost to others who may still be early on in their careers, to help them learn from my mistakes, and encourage them to become guides to others.