Tuesday, November 27, 2012

A Morse Decoder

I couldn't resist tackling this puzzle at home, given that it is named for my distant relative, Samuel F.B. Morse.  My colleague brought it in as a potential onsite interview question for a candidate we were considering.  I can't help but be addicted by puzzles anyways, I love the challenge.

The premise is this:  You have found a stream of dots and dashes which someone has told you is a Morse Code representation of some phrase.  The problem is that Morse Code uses a timing gap to represent spaces between letters, and the spaces were ignored here.  If you were given a list of possible words, and a hash table mapping of Morse style dots and dashes to the alpha character they represent, could you recover the original phrase?

This problem really has two steps... recover possible encoding of Morse code to letters, and then breaking the letters up into possible words that could match the letters.  Both of these steps are actually the same procedure with a different mapping, so we may be able to achieve our goals with a single function reused multiple times.  Clearly the input stream could be thought of as a tree of possible mappings, and this lends itself well to a recursive approach.  Each character that has a Morse encoding that could match to the front of the stream is a potential branch.  If we assume the phrase is only complete words, we throw out any leaf where we couldn't completely find valid mappings all the way to the end of the string.  Likewise, we through out any letter mapping that doesn't finish with a valid word.

My approach is to create a class that has properties for the MorseCode characters, and the list of words making up the dictionary, so that one instance could be reused to recover phrases without having to constantly pass the same mappings.  I'm going to have a public "DecodePhrases" method that takes a string of Morse Code charactors, and returns an enumeration of possible valid solutions.  I have one private helper function which takes in some additional parameters about the node of the tree that we are currently at, and reuse this function for both encoding steps.  My code came out as follows:

    public class Decoder
        public Dictionary Words { get; set; }
        public Dictionary MorseLetters { get; set; }

        public Decoder(List words, Dictionary morseLetters)
            MorseLetters = morseLetters;
            Words = new Dictionary();
            words.ForEach(w => Words[w] = " " + w);

        public IEnumerable DecodePhrases(String input)
            return GenerateCandidates(input, "", MorseLetters)
                .SelectMany(letters => GenerateCandidates(letters, "", Words))
                .Select(phrase => new String(phrase.Skip(1).ToArray()));

        private IEnumerable GenerateCandidates(String input, String candidate, Dictionary validCandidates)
            if (input == "")
                return new List(){candidate};
            return Enumerable.Range(1, input.Count())
                .Select(i => new
                                     inp = new String(input.Skip(i).ToArray()),
                                     can = new String(input.Take(i).ToArray())
                .Where(c => validCandidates.ContainsKey(c.can))
                .SelectMany(c => GenerateCandidates(c.inp, candidate + validCandidates[c.can], validCandidates));

I followed it up with some NUnit tests to just make sure I wasn't full of crap...

        public void TestSolutionThisIsATest()
            var words =
                "this is a test of the emergency broadcast system.  it is only a test."
                .Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).ToList();

            var testPhrase = String.Join("", "thisisatest".Select(c => MorseLetters.Where(p => p.Value == c.ToString()).First().Key));

            var decoder = new Decoder(words, MorseLetters);

            Assert.AreEqual("this is a test", decoder.DecodePhrases(testPhrase).First(), "properly decodes the message");

And there you go! A simple function for decoding Morse Code when spaces and stops are omitted.

1 comment: