Big Data Finance 2014 (Part One of Three)

When Jim Angel of the Wharton School at Georgetown praised my definition of “Big Data” on a subsequent panel, I was quietly relieved.   I was the keynote, and therefore the first, speaker at Big Data Finance 2014 held at New York University.  Almost all the other speakers were renowned sages of mathematical finance including Steve Shreve of Carnegie Mellon and Lawrence Glosten of Columbia.  I felt privileged to be in such company, yet wondered how the audience would react to my definition of Big Data, which I was about to present on my next slide.  Professor Glosten was sitting immediately opposite the podium smiling.

In the broader retail world, the concept of Big Data is becoming better defined.  For example, the folks whom Harvard Business Review described as holding “the sexiest jobs of the 21st century” capture our history of shopping, web searches and online video habits, combine with millions of other peoples’ information becoming Big Data that they analyze to maximize their employers’ opportunity.   Pinning down what qualifies as Big Data in finance is a little trickier.

My definition was simply “More data than we are used to”.

The purposes vary. Clearly, you are hoping to solve problems or take opportunities but what makes it Big Data is that it is more than you are accustomed to handling.

I showed the slide but could not gauge the reaction to my simple view.  An example and a picture seemed to put it in context.

A well-known Big Data problem confronted the Allies in World War II.  Working at Bletchley Park, England, they were striving very urgently to break the Enigma and Lorenz ciphers.  These codes were used by the Axis Powers, particularly Nazi Germany, to encrypt their communications.  The stakes were high:  transatlantic convoys were being sunk by U-boats, lives were being lost and basic supplies were badly needed by Britain.   The Nazis presumed that, even though the allies could intercept their communications, the data would be too vast to analyze and decrypt the codes.  The Allies certainly had far more data than they were used to handling.  However, the Nazis’ presumption was, as it turned out, very wrong.   By using mechanical computers and later the electronic Colossus computer, the Allies did break the codes shortening the war by up to two years, by some accounts.

Colossus Computer Image via Wikipedia

Colossus Computer
Image via Wikipedia

Gartner Group has a more formal definition of Big Data than mine.  It is useful as it includes three metrics: High Volume, High Velocity and High Complexity.   By Gartner’s definition, foreign exchange data quickly becomes Big Data.   Most particularly, it is complex:  it is credit screened, global in nature and subject to little regulation so venues are free to differentiate themselves.  No two venues have precisely the same structure.   Also, we all know that the velocity of FX information has been increasing for many years, accentuated by shrinking deal sizes akin to other asset classes, and our volume has been growing exponentially.  This is particularly so as secondary venues, which send data tick by tick, have been growing while the leading two venues, which throttle and time slice data, have been losing share.

If you concur that Big Data means “more data than we are used to” and that high volume, high velocity and high complexity are the metrics, then the era of Big Data in FX has arrived.  The next question is – what will we do with it?