Sunday, April 17, 2005

Semantic Steganography

It is Sunday morning and phrases like "Semantic Steganography" sometimes creep into my head unheralded this time of the week. Especially before the kettle boils.

Sometimes I have moments of clarity pre-kettle-boil and I suspect this is one of them. The future of semantic markup is the tunnelling of semantics, unseen, inside harmless looking, presentation-oriented XHTML.

I'm thinking microformats and I'm thinking things like XOXO and HMML and RDDL

Here is the thing: with span and div elements, it is possible to encode any XML instance into a valid XHTML instance. It is a trivial matter to reverse the process to get the explict element-oriented XML back out.

In my mind's eye I see RESTian web services where users dereference URIs as a matter of course and see human-oriented stuff. Yet, hiding underneath as attributes in the XHTML are the semantic parasite attributes. These encode the semantic structure that process-to-process integration software feeds on.

Gee! But what about validation? All our grammer oriented validation technologies are element-type oriented not attribute-value-oriented! No problem. Do a quick XSLT transform prior to validation proper, to make the element structure explicit. Maybe, just maybe, this will be the use case that makes XML pipelining creep into common consciousness.

About time too.

[P.S. for markup geeks] This post has hidden, pre-caffiene semantics. Do "View Source" to see them. At least it had when it left my feed editor. I will be interested to see what happens to the markup when it goes into the big bad world out there.

1 comment:

Orion Montoya said...

I came here through my own (post-)caffeine-related moment of clarity. I was thinking about how snippets of poetry come into my mind from time to time, and I reevaluate the semantic implications of certain syntactic combinations, then suddenly discover a previously-hidden meaning in a text that I've been familiar with for years. When a verb has multiple senses, and a noun has multiple senses, their syntactic combination is selected from a cartesian product of the senses of both. I made the leap to think that this is a sort of semantic steganography in natural language.

I've also been an XML person for a lot of my career, so I can see the many uses of steganographic semantics in document markup. This is a very old post. Can we say that XML pipelining has made any inroads into common consciousness, in the past dozen years? I occasionally see netspeak using tags, but other examples are scarce. In some ways I think that React.js's JSX syntax for defining modules on web pages has finally made semantic markup trendier than usual -- except it is for interactively-modeled documents, not for documents that decorate natural-language semantics.

I wonder if you have any recent thoughts about this!