I am seeing a
significant up-tick in interest in the concept of structured/semantic
documents in the world of law at present. My guess is that this is as
a consequence of the activity surrounding machine learning/AI in law
at the moment.
It has occurred to me that some people with
law/law-tech backgrounds are coming to some of the structured/semantic document
automation concepts anew whereas people with backgrounds in, for
example, electronic publishing (Docbook etc.), financial reporting
(XBRL etc.), healthcare (HL7 etc.) have already “been around the
block” so-to-speak, on the opportunities, challenges and pragmatic
realities behind the simple sounding – and highly appealing –
concept of a “structured” document.
In this series of
posts, I am going to outline how I see structured documents, drawing from the 30 (phew!) or so
years of experience I have accumulated in working with them. My hope is that what I have to say on the subject will be of interest to those newly arriving in
the space. I suspect that at least some of the new arrivals are asking themselves
“surely this has been tried before?” and looking to learn what
they can from those who have "been there". Hopefully, I
can save some people some time and help them avoid some of the
potential pitfalls and “gotchas” as I have had plenty of experience in finding these.
As I start out on this series of blog posts,
I notice with some concern that a chunk of this history – from late
Eighties to late Nineties – is getting harder and harder to find
online as the years go by. So many broken links to old conference
websites, so many defunct publications....
This was the dawn of the
electronic publishing era and coincided with a rapid transition from
mainframe green-screens to dialup compuserv, to CD-ROMs, to the
Internet and then to the Web, bringing us to where we are today. A
period of creative destruction in the world of the written word
without parallel in the history of civilization actually. I cannot help feeling that we have a better record of what happened in the world from the time of Gutenburg's printing press to the glory years of paper-centric desktop publishing, than we do for the period that followed it when we increasingly transitioned away from fixed-format, physical representations of knowledge. But I digress....
For me, the story
starts in June 1992 with a Byte magazine article by Jon Udell[1] with
a title that promised a way to “turn mounds of documents into
information that can boost your productivity and innovation”. It was exactly what I was looking for in 1992 for a project I was working on. An electronic education reference guide to be distributed on 3.5 inch floppy disks to every school in Ireland.
Turning mounds of documents into information. Sound familiar? Sound
like any recent pitch you have heard in the world of law? Well, it
may surprise you to hear that the technology Jon Udell's article was
about – SGML – was largely invented by a lawyer called Dr Charles
F. Goldfarb[2]. SGML set in motion a cascade of technologies that
have lead to the modern web. HTML is the way it is, in large part,
because of SGML. In other words, we have a lawyer to thank for a
large aspect of how the Web works. I suspect that I have just
surprised some folks by saying that:-)
Oh, and while I am on a
roll making surprising statements, let me also state that the cloud –
running as it does in large part on linux servers – is, in part,
the result of a typesetting R&D project in AT&T Bell Labs
back in the Seventies.
So, in an interesting
way, modern computing can trace its feature set back to a problem in the legal department. Namely, how best to create documents in computers so that
the content of the documents can be processed automatically and re-used in different
contexts?
More on that later, but best to start at the beginning which for me was 1985. The
year when a hirsute computer science undergraduate (me) took a class in
compiler design from Dr. David Abrahamson[3] in Trinity College
Dublin and was introduced to the wonderful world of machine readable
documents.
Yes, 1985.
Next: Part 2.