Friday, May 28, 2004

XML tag share analysis and power law distributions

Here are four graphs (click on the image for bigger version).

XML Repository of Irish Legislation (Custom schema)
Linux HowTo's (Docbook)
Postgres Manuals (Docbook)
Shakespeare's Plays (custom schema).


They are all basically power
law distributions
. Take a bow Mssrs Zipf and Pareto. Given that
the data sets underneath are wildly different in shape, size and
subject matter, the similarity in the graphs is striking.

The graphs are produced by charting element types against frequency of occurence.

I have been generating such graphs from SGML/XML datasets for years
and they always take the same general shape. I call it tag share analysis.

Takeaway
    Always do a tag-share analysis before writing an XML
    up/down/cross-translate in XSLT or DOM/SAX or whatever. A remarkably small number of element types make up the bulk of the markup - regardless of the size of the schema.

Thursday, May 27, 2004

Ding ding ding - come and get it!

Python 2.3.4 is now available.

Fireworks

If <insert programming language name here> was a firework then...

Here are contributions for Python, Perl and C++ http://dev.r.tucows.com/blog/_archives/2004/5/26/76434.html.

[via Daily Python-URL].

Tuesday, May 25, 2004

Martin Fowler on databases in Service Oriented Architecture


    "The recent rise of Service Oriented Architecture seems to mean very different things to different people, but one plausible thread is a rise of autonomous applications with their own Application Database that communicate through service interfaces - effectively replacing shared database integration with rpc or messaging based integration. I'm very sympathetic to this view, particularly favoring integration through messaging - which is why I encouraged the development of EIP[1] In this view of the world the integration database is no longer the default assumption."

    [1] EIP."Enterprise Integration Patterns. Designing, Building, and Deploying Messaging Solutions"
    by Gregor Hohpe, Bobby Woolf.

Amen.

Hiearchies and aesthetics


    "In all branches of cognitive endeaver, our highest praise is reserved for works that build the deepest hierarchies."
    Robert Jourdain.


Ok. That explains the number of musicians and audiophiles I come across in the XML world but does it justify docbook? :-).

Event driven, temporally decoupled business processes

Forget about business process execution languages and belocolus "business process engines" that will protect you from all harm.

Your problems do not stem from the lack of a silver bullet syntax or the lack of a killer programming language feature. Your problems lie in two fundamental concepts that need to be central to how you think about business processes in a distributed world - (a) event driven execution and (b) temporal decoupling.

Without those, all the WF*, BP*, WS* and *ML's in the world cannot help you.

More on this line of thought in this week's ITWorld aricle : When modelling business processes, upside down is the right way up.