The Imitation Game: Solving Big Data Problems in the 1940’s

This past weekend, I watched a movie called The Imitation Game. If you haven’t seen a trailer, you can watch it below.

There were millions of combinations to check to decrypt the messages making the task almost impossible.

In the movie, Alan Turing, a math genius, created a machine that could check each of the permutations until it found the correct one for that day, allowing the British to decrypt the Nazi messages. This intelligence allowed them to position Britain and their allies to win the war.

Yep, you guessed it – the machine, which was actually called a Bombe, was a very early version of the computer.

Why am I writing about this? Great question.

Essentially, the group at Bletchley Park was dealing with a big data problem – even though their idea of “big data” is a fraction of what we process today. There was a continuous stream of messages to decode and they had to decode them using the encryption key found by the Turing machine. Hence, lots of data.

When Turing first started his machine, it ran and ran, checking each of the permutations faster than any human or team of humans could. Unfortunately, there were too many permutations, so the machine never stopped running. The 24 hour code expiration made the machine obsolete.

While their biggest issue was dealing with large amounts of structured data in a 24 hour time period, today we’ve figured out how to handle that load efficiently. Our biggest issue currently is processing unstructured data, like social data, in an hour time period or less. The goal today is real-time analysis and response, right? Here’s an example. Let’s say that your Facebook page has over 2 million likes, over 43,000 are talking about you.  Great! But what are they saying? What does it mean? How can you use that information to optimize your messaging?

This is why we say that big data isn’t about the data at all; it’s about how the data is analyzed and the results that it can provide.  Without a way to organize and interpret the data, it will never be useful – you might as well not have any data at all!

But back to the movie… {ATTENTION: SPOILER ALERT}

There is a scene where a casual conversation turns on a light bulb in Turing’s head. He races back to his machine and explains that if there are consistencies in the messages, they could use that known piece of data to reduce the number of permutations needed to be tried in order to crack the code. He knew that each message signed off with “Heil Hitler”, so they used that predicted text to turn their big data problem into small data. By eliminating unlikely combinations, they could find the key that decoded the message in minutes, rather than days.

Getting back to the Facebook issue… If you had a tool or set of tools that excelled at collecting chatter and sorting it, then it would be a lot easier to respond. Like Turing’s machine, if you wanted only the Facebook comments that had negative sentiment, you don’t want to look through every comment separately. Sorting the unstructured social data by a frown face or specific negative words used makes your data set much smaller and, therefore, easier to act on. Essentially data tools like this turn big, time consuming tasks, into faster, more automated tasks, while succeeding in getting information you can use.

As you can see, there is a contrast between the 1940’s big data and that of present day. Today’s “big” data will become tomorrow’s average sized data as we create the technology that handles the volume and produces the insights we need from it. No matter how big or small the data set is, you only benefit from having the data if you can decode the trends and insights. Only then can you take action in a measured and meaningful way.

Jan, 13, 2015