Sunday, April 05, 2009

Questioning the form of the question

We live in interesting times on the web-o-data front. An interesting discussion is taking place on Jon Udell's blog.

Lets imagine a Web awash with machine readable numbers and with the ability to perform all sorts of wonderful calculations for you on said numbers behind the scenes. So wonderful if fact that you, as an information consumer, do not need to know or care if the answer to your questions was pre-computed or computed on the fly.

Now. How would you best like to express your questions to this web-o-data/computations? Visually? Textually? What software abstractions exist now for framing such questions? Google has a text box. Geo-systems have maps. RDBs have query-by-example and good old SQL. What else? Lots of programming languages of course, but what else can we use in an end-user, non-programmer context?

Well, I think the spreadsheet is one of the most powerful abstractions for asking questions of data/computations ever discovered. However, there ia a huge disconnect between the old spreadsheet concept of offline, localized data/localized formulae and what the Web enables.

We kmow how to decentralize the data. All the bits are in place. We have the HTTP, URI's, XML, CSV, JSON, RSS/Atom. We know how to notify people/processes when data changes : e-mail, RSS/ATOM, XMPP, WebHooks yada, yada.

Two necessary bits are late to the party. The first is well on the way : cloud computing. It is what will allow us to take *a computation* and put it on the web as a first class, always-on, scalable resource. Once we can compute easily in the cloud, we can derive facts from existing facts and - critically - derive notification events when the inter-relationships between fact objects change. (Sidebar: Change...think about how many business functions you know of that are triggered by the changing relationships between facts. Lots right?)

The second missing bit is the paradigm for asking the questions of the web-o-data. Wolfram Research are making exciting noises in the area of natural language questions. The Geo brigade are creating ever more fantastic stuff for geo-located data. RDF etc. continues to provide a grand unifying theory of it all but...where is the end-user facing paradigm for interacting with a humongous web of smart numeric data and its concomitant legion of web-hosted computations?

I think the spreadsheet metaphor is an interesting place to start. Facts are cells. Questions (known as formulas) as also cells. New facts and new formulas can be created based on existing facts/formulas ad infinitum.

Now put that concept natively on the Web.



C. M. Sperberg-McQueen said...

When the user asks for some calculation based on quantities x and y, and the Web turns out to offer conflicting values for x, how does the system decide which to use?

Is it cynicism that makes me fear that might be subject to abuse?

Anonymous said...

Sean I believe there are two huge holes in this plan of yours. The spreadsheet is a nice idea is theory. The practice of it all is that most people still save their information in the binary .doc format. Most excel spreadsheets I've seen, used and shared (unfortunately have been in the Microsoft .xls format, a closed binary cludge which cannot be interpreted particularly well to this day by the best the Open Source community has thrown at the problem.
Yes we could all most to Open Office and their ODF (which has been ratified and i believe an open source plugin does exist for excel to allow read/write). However when the market share for Microsoft Office still makes it more or less the dominant office solution, and when the default file type in Excel is still a binary cludge, there is little hope that in the real world your spreadsheet example would be practically or realistically possible.