Sunday, April 05, 2009

Questioning the form of the question

We live in interesting times on the web-o-data front. An interesting discussion is taking place on Jon Udell's blog.

Lets imagine a Web awash with machine readable numbers and with the ability to perform all sorts of wonderful calculations for you on said numbers behind the scenes. So wonderful if fact that you, as an information consumer, do not need to know or care if the answer to your questions was pre-computed or computed on the fly.

Now. How would you best like to express your questions to this web-o-data/computations? Visually? Textually? What software abstractions exist now for framing such questions? Google has a text box. Geo-systems have maps. RDBs have query-by-example and good old SQL. What else? Lots of programming languages of course, but what else can we use in an end-user, non-programmer context?

Well, I think the spreadsheet is one of the most powerful abstractions for asking questions of data/computations ever discovered. However, there ia a huge disconnect between the old spreadsheet concept of offline, localized data/localized formulae and what the Web enables.

We kmow how to decentralize the data. All the bits are in place. We have the HTTP, URI's, XML, CSV, JSON, RSS/Atom. We know how to notify people/processes when data changes : e-mail, RSS/ATOM, XMPP, WebHooks yada, yada.

Two necessary bits are late to the party. The first is well on the way : cloud computing. It is what will allow us to take *a computation* and put it on the web as a first class, always-on, scalable resource. Once we can compute easily in the cloud, we can derive facts from existing facts and - critically - derive notification events when the inter-relationships between fact objects change. (Sidebar: Change...think about how many business functions you know of that are triggered by the changing relationships between facts. Lots right?)

The second missing bit is the paradigm for asking the questions of the web-o-data. Wolfram Research are making exciting noises in the area of natural language questions. The Geo brigade are creating ever more fantastic stuff for geo-located data. RDF etc. continues to provide a grand unifying theory of it all but...where is the end-user facing paradigm for interacting with a humongous web of smart numeric data and its concomitant legion of web-hosted computations?

I think the spreadsheet metaphor is an interesting place to start. Facts are cells. Questions (known as formulas) as also cells. New facts and new formulas can be created based on existing facts/formulas ad infinitum.

Now put that concept natively on the Web.

Wow.