Wednesday, September 08, 2010

The Semantic Web is not a data format

Surfing around today, I get the impression that some folk believe that the Semantic Web is a data format question. It isn't in my opinion. It is an inference algorithm question. Data is just fuel to the engine. If we get sufficient value-add through the inference algorithms - the engines - the data format questions will fall like so many skittles. Deciding on a data format is, compared to the problem of creating useful inference engines, trivial.

Of course, to create an environment where clever inference algorithms can be incubated, you need a web of data but that is the petri dish for this grand experiment - not the experiment itself.

When I characterize the effort as an "experiment" I mean that it is not yet clear (at least to me) if the Semantic Web will usher in a new class of algorithms that provide significantly better inference value-add over the algorithmic approaches of the weak/strong AI community of the eighties. E.g. Forward chaining, Backward chaining, Fuzzy logic, Bayesian inference, Blackboard algorithms, Neural nets, probabilistic automata etc.

If it does, then great! The Semantic Web will be a new thing in the world of computer science. If it doesn't, the absolute *worst* that can happen is that we end up with a great big Web of machine readable data because of all the data format debates :-)

Even if the algorithms end up staying much as they were in the Eighties, we will see more interesting outputs when they are applied today because of the richness and the volume of data becoming available on the Web. However, that does not constitute a new leap forward in computer science. It is this point which is the sticking point for many who are dubious about the brouhaha surrounding the Semantic Web in my opinion.

I've never met anybody who thinks a web of machine readable data is a bad idea. I have met people who think the web-o-data *is* the semantic web. I have also met people who think that the semantic web is all about the inference performed over the data.

Of course, there are many who characterize the Semantic Web differently out there and one of the great sources of debate at the moment is that people find themselves passing each other at 30,000 feet because they do not have a shared conceptual model of what critical terms like "web of data", "semantics", RDF, sparql, deductive/inductive logic etc. mean.

Part of the problem no doubt is that many approaches to machine readable semantics involve the creation of declarative syntaxes for use in inference engines. These data formats are really "config files" for inference engines as opposed to discrete facts (such as RDF triples) to be processed by inference engines. Ontologies are a classic example.

My personal opinion : if the Semantic Web proponents were to stand up and say "Hey, there was all this amazing computer science done in the Eighties but there was never a rich enough set of machine readable facts for it to flourish...Lets give it another go!". I'd be shouting from the rooftops in support.

However, I tend not to hear that. Perhaps its the circles I move in? Most of what I hear is "The Semantic Web is a brand new thing on this earth. Come join the party!"

The CompSci major in me has trouble with that characterization. Its not universal but it does seem quite pervasive.

Yes, it is ironic that the stumbling block for the semantic web is establishing the semantics of "semantics" :-)

Yes, I derive too much pleasure from that. It goes with the territory.

No comments: