Tuesday, May 14, 2013

From Big Data to Long Data

Stephen Few on Big Data is worth a read. Remember the SOA years? Service Oriented Architecture? My biggest problem with SOA was - and is - that there is no sane, concise, consensus on what an SOA is. No yardstick that could be used to determine whether or not something claiming to be an SOA really had some agreed-upon set of attributes.

Now whatever else you might think of "structured programming" or "object oriented design" or "flow based programming", at least they have identifiable technical characteristics that are generally agreed upon. I have seen all of the following claimed as "SOA"s : Relational Databases, J2EE, SOAP/WSDL, synchronous and asynchronous method invocations, CORBA, DCOM, MQSeries....

I tried - and failed - back in the day, to promote the idea that asynchronous structured message passing is the key defining characteristic of an SOA. (I believe that synchronous invocation of functions/methods/services is the root of all evil at Internet scale, but that is another story for another day.)

Today, there is a real risk that Big Data will be as content-free as SOA turned out to be. That would be a shame. At the risk of repeating my SOA mistake by putting forth what I believe to be the defining characteristic of big data, I hereby asset that IMO, the modelling of *time* is what makes Big Data different from other data.

Gone are the days of "the backup". Gone are the days of Relational Models that just record "now". We can and should move to a model of computing in which history (last second, last hour, last year...) is a first class member of our models so that we can query and mine it for insights.

Samuel Arbesmen is on to something. Read this, then go read his book The Half-life of Facts.