Monday, January 15, 2018

What is a document? - part 4


In the late Eighties, I had access to an IBM PC XT machine that had Wordperfect 5.1[1] installed on it. Wordperfect was both intimidating and powerful. Intimidating because when it booted, it completely cleared the PC screen and unless you knew the function keys (or had the sought-after function key overlay [2]) you were left to you own devices to figure out how to use it.

It was also very powerful for its day. It could wrap words automatically (a big deal!). It could  redline/strikeout text which made it very popular with lawyers working with contracts. It could also split its screen in two, giving you a normal view of the document on top and a so-called “reveal codes” view on the bottom. In the “reveal codes” area you could see the tags/markers used for formatting the text. Not only that, but you could choose to modify the text/formatting from either window.

This idea that a document could have two “faces” so to speak and that you could move between them made a lasting impression on me. Every other DOS-based word processor I came across seemed to me to be variations on the themes I had first seen in Wordperfect e.g. Wordstar, Multimate and later Microsoft Word for DOS. I was aware of the existence of IBM Displaywriter but did not have access to it. (The significance of IBM in all this document technology stuff only became apparent to me later.)

The next big "aha moment" for me came with the arrival of a plug-in board for IBM PCs called the Hercules Graphics Card[3]. Using this card in conjunction with the Ventura Publisher[4] on DRI's GEM graphics environment [5] dramatically expanded the extent to which documents could be formatted - both on screen an on the resultant paper. Multiple fonts, multiple columns, complex tables, equations etc. Furthermore, the on-screen representation mirrored the final printed output closely in what is now universally known as WYSIWYG.

Shortly after that, I found myself with access to an Apple Lisa [6] and then an Apple Fat Mac 512 with Aldus (later Adobe) Pagemaker [7] and an Apple Laserwriter[8]. My personal computing world split into two. Databases, spreadsheets etc. revolved around IBM PCs and PC compatibles such as Compaq, Apricot etc. Document processing and Desktop Publishing revolved around Apple Macs and Laser Printers.

I became intoxicated/obsessed with the notion that the formatting of documents could be pushed further and further by adding more and more powerful markup into the text. I got myself a copy of The Postscript Language Tutorial and Cookbook by Adobe[9] and started to write Postscript programs by hand.

I found that the original Apple Laserwriter had a 25 pin RS/232 port. I had access to an Altos multi-terminal machine [10]. It had some text-only applications on it. A spreadsheet from Microsoft called – wait for it – Multiplan (long before Excel) – running on a variant of – again, wait for it – Unix call Microsoft Xenix [11].

Well, I soldered up a serial cable that allowed me to connect the Altos terminal directly to the Apple Laserwriter. I found I could literally type in Postscript command at the terminal window and get pages to print out. I could make the Apple Laserwriter do things that I could not make it do via Aldus Pagemaker by taking directly to its Postscript engine. 

Looking back on it now, this was as far down the rabbit hole of “documents as computer programs” that I ever went. Later I would discover TeX and find it in many ways easier to work with than programming Postscript directly. My career started to take me into computer graphics rather than document publishing. For a few years I was much more concerned with Bezier Curves and Bitblits[12] using a Texas Instruments TMS 34010[13] to generate realtime displays of financial futures time-series analysis (A field known as technical analysis in the world of financial trading [14]).

It would be some years before I came back to the world of documents and when I did, my approach route back, caused me to revisit my “documents as programs” world view from the ground up.

It all started with a database program for the PC called dBase by Ashton Tate[15]. Starting from the perspective of a database made all the difference to my world view. More on that, next time.



Tuesday, January 02, 2018

What is a document? - Part 3


Back in 1983, I interacted with computers in three main ways. First, I had access to a cantankerous digital logic board [1] which allowed me to play around with boolean logic via physical wires and switches.

Second I had access to a Rockwell 6502 machine with 1k of RAM (that's 1 kilobyte) which had a callous-forming keyboard and a single line (not single monitor – single line) LED display called an Aim 65[2]. Third, at home I had a Sinclair ZX80 [3] which I could hook up to a black and white TV set and get a whopping 256 x 192 pixel display.

Back then, I had a fascination with the idea of printing stuff out from a computer. An early indication – that I completely blanked on at the time – that I was genetically predisposed to an interest in typesetting/publishing. The Aim 65 printed to a cash register roll which was not terribly exciting (another early indicator that I blanked on at the time). The ZX80 did not have a printer at all...home printing was not a thing back in 1984. In 1984 however, the Powers That Be in TCD gave us second year computer science newbies rationed access to a Vax 11/870, with glorious Adm3a[4] terminals.

In a small basement terminal room on Pearst St, in Dublin, there was a clutch of these terminals and we would eagerly stumble down the stairs at the appointed times, to get at them. Beating time in the corner of that terminal room, most days, was a huge, noisy dot matrix printer[5], endlessly chewing boxes of green/white striped continuous computer paper. I would stare at it as it worked. In particular, finding it particularly fascinating that it could create bold text by the clever trick of backing up the print head and re-doing text with a fresh layer of ink.

We had access to a basic e-mail system on the Vax. One day, I received an e-mail from a classmate (sender lost in the mists of time) in which one of the words was drawn to the screen twice in quick succession as the text scrolled on the screen (these were 300 baud terminals - the text appeared character by character, line by line, from top to bottom). Fascinated by this, I printed out the e-mail, and found that the twice-drawn word ended up in bold on paper.

"What magic is this?", I thought.  By looking under the hood of the text file, I found that the highlighted word – I believe it was the word “party” – came out in bold because five control characters (Control-H [5] characters[6]) had been placed right after the word. When displayed on screen, the ADM3a terminal drew the word, then backed up 5 spaces because of the Control-H's, then drew the word again. When printed, the printer did the same but because ink is cumulative, the word came out in bold. Ha!

Looking back on it, this was the moment when it occurred to me that text files could be more that simply text. They could also include instructions and these instructions could do all sorts of interesting things to a document when it was printed/displayed...As luck would have it, I also had access to a wide-carriage Epson FX80[7] dot matrix printer through a part-time programming job I had while in college.

Taking the Number 51 bus to college from Clondalkin in the mornings, I read the Epson FX-80 manual from cover to cover. Armed with a photocopy of the “escape codes”[8] page, I was soon a dab hand at getting text to print out in bold, condensed, strike-through, different font sizes...

After a while, my Epson FX-80 explorations ran out of steam. I basically ran out of codes to play with. There was a finite set of them to choose from. Also, it became very apparent to me that littering my text files with these codes was an ugly and error prone way to get nice print outs. I began to search for a better way.  The “better way” for me had two related parts. By day, on the Vax 11/780 I found out about a program called Runoff[9]. And by night I found out about a word-processor called Worstar[10].

Using Runoff, I did not have to embed, say, Epson FX80 codes into my text files, I could embed more abstract commands that the program would then translate to printer-specific commands when needed. I remember using “.br” to create a line break (ring any bells, HTML people?). “.bp” began a new page, “.ad” right-aligned text. etc.

Using Wordstar on an Apple II machine running CP/M (I forgot to mention I had access to one of them also...I wrote my first ever spreadsheet in Visicalc on this machine, but that is another story.) I could so something similar. I could add in control codes for formatting and it would translate for the current printer as required.

So far, everything I was using to mess around with documents was based on visible coding systems. i.e. the coding added to the documents was always visible on the screen interspersed with the text. So far also, the codes added to the documents where all control codes. i.e. imperative instructions about how a document should be formatted.

The significance of this fact only became clear to me later but before we get there, I need to say a few words about my early time with Wordperfect on an IBM PC XT. My first encounter with a pixel-based user interface – it was called GEM [11] and ran on top of DOS on IBM PCs. An early desktop publishing system called Ventura Publisher from Ventura Software which ran on GEM. I also need to say a little about the hernia-generating Apple Lisa[12] that I once had to carry up a spiral stair-case. 

Oh, and the mind blowing moment I first used Aldus Pagemaker[13] on a Fat Mac 512k[14] to produce a two columned sales brochure on an Apple Laserwriter[15] and discovered the joys of Postscript.

Next : What is a document? - Part 4.

[5] Similar to this http://bit.ly/2CFhue9

Thursday, December 14, 2017

What is a document? - Part 2


Back in 1985, when I needed to create a “document” on a computer, I had only two choices. (Yes, I am indeed avoiding trying to define “document” just yet. We will come back to it when we have more groundwork laid for a useful definition.) The first choice involved typing into what is known generically as a “text editor”. Back in those days, US ASCII was the main encoding for text and it allowed for just the basic symbols of letters, numbers and a few punctuation symbols. In those days, the so called “text files” created by these “text editors” could be viewed on screens which typically had 80 columns and 25 rows. They could also be printed onto paper, using either “dot matrix” printers or higher resolution, computerized typewriters such as the so-called “golf ball” typewriters/printers which mimicked a human typist using a ribbon-based impact printer.

The second choice was to wedge the text into little boxes  called "fields" to be stored in a "database". Yes, My conceptual model of text in computers in those early days was a very binary one. (Some nerd humour in the last sentence.)

On one hand, I could type stuff into small “boxes” on a screen which typically resulted in the creation of some form of “structured” data file e.g. a CODASYL database [1]. On the other hand, I could type stuff into an expandable digital sheet of paper without imposing any structure on the text, other than a collection of text characters, often chunked with what we used to call CRLF separators (Carriage Return, Line Feed).

(Aside: You can see the typewriter influence in the terminology here. Return the carriage (holding the print head) to the left of the page. Feed the page upwards by one line. So Carriage Return + Line Feed  = CR/LF).

(Aside:I find the origins of some of this terminology is often news to younger developers who wonder why moving to a new line is two characters instead of one on some machines. Surely “newline” is one thing? Well, it was two originally because one command moves the carriage back (the “CR”) and another command moved the paper up a line “LF”, hence the common pairing: CR/LF. When I explain this I double up by explaining “uppercase/lowercase”. The origins of the latter in particular, are not well known to digital natives in my experience.)

From my first encounters with computers, this difference in how the machines handled storing data intrigued me. On one hand, there were “databases”. These were stately, structured, orderly digital objects. Mathematicians could say all sorts of useful things about them and create all sorts of useful algorithms to process them. The “databases” are designed for automation.

On the other hand, there was the rebellious, free-wheeling world of text files. Unstructured. Disorderly. A pain in the neck for automation. Difficult to reason about and create algorithms for, but fantastically useful precisely because they were unstructured and disorderly.

I loved text files back then. I still love them today. But as I began to dig deeper into computer science I began to see that the binary world view : database versus text. Structured versus unstructured. Was simple, elegant and wrong. Documents can indeed be “structured”. Document processing could indeed be automated. It is possible to reason about them, and create algorithms for them, but it took me quite a while to get to grips with how this can be done.

My journey of discovery started with an ADM 3A+ terminal to a VAX 11/780 mini-computer (by day) [2] and an Apple IIe personal computer running CP/M – by night[3].

For the former, a program called RUNOFF. For the latter, a program called Wordstar and one of my favorite pieces of hardware of all time : an Epson FX80  dot matrix printer.



Thursday, December 07, 2017

What is a document? Part 1.

I am seeing a significant up-tick in interest in the concept of structured/semantic documents in the world of law at present. My guess is that this is as a consequence of the activity surrounding machine learning/AI in law at the moment.

It has occurred to me that some people with law/law-tech backgrounds are coming to some of the structured/semantic document automation concepts anew whereas people with backgrounds in, for example, electronic publishing (Docbook etc.), financial reporting (XBRL etc.), healthcare (HL7 etc.) have already “been around the block” so-to-speak, on the opportunities, challenges and pragmatic realities behind the simple sounding – and highly appealing – concept of a “structured” document.

In this series of posts, I am going to outline how I see structured documents, drawing from the 30 (phew!) or so years of experience I have accumulated in working with them. My hope is that what I have to say on the subject will be of interest to those newly arriving in the space. I suspect that at least some of the new arrivals are asking themselves “surely this has been tried before?” and looking to learn what they can from those who have "been there". Hopefully, I can save some people some time and help them avoid some of the potential pitfalls and “gotchas” as I have had plenty of experience in finding these.

As I start out on this series of blog posts, I notice with some concern that a chunk of this history – from late Eighties to late Nineties – is getting harder and harder to find online as the years go by. So many broken links to old conference websites, so many defunct publications....

This was the dawn of the electronic publishing era and coincided with a rapid transition from mainframe green-screens to dialup compuserv, to CD-ROMs, to the Internet and then to the Web, bringing us to where we are today. A period of creative destruction in the world of the written word without parallel in the history of civilization actually. I cannot help feeling that we have a better record of what happened in the world from the time of Gutenburg's printing press to the glory years of paper-centric desktop publishing, than we do for the period that followed it when we increasingly transitioned away from fixed-format, physical representations of knowledge. But I digress....

For me, the story starts in June 1992 with a Byte magazine article by Jon Udell[1] with a title that promised a way to “turn mounds of documents into information that can boost your productivity and innovation”. It was exactly what I was looking for in 1992 for a project I was working on. An electronic education reference guide to be distributed on 3.5 inch floppy disks to every school in Ireland.

Turning mounds of documents into information. Sound familiar? Sound like any recent pitch you have heard in the world of law? Well, it may surprise you to hear that the technology Jon Udell's article was about – SGML – was largely invented by a lawyer called Dr Charles F. Goldfarb[2]. SGML set in motion a cascade of technologies that have lead to the modern web. HTML is the way it is, in large part, because of SGML. In other words, we have a lawyer to thank for a large aspect of how the Web works. I suspect that I have just surprised some folks by saying that:-)

Oh, and while I am on a roll making surprising statements, let me also state that the cloud – running as it does in large part on linux servers – is, in part, the result of a typesetting R&D project in AT&T Bell Labs back in the Seventies.

So, in an interesting way, modern computing can trace its feature set back to a problem in the legal department. Namely, how best to create documents in computers so that the content of the documents can be processed automatically and re-used in different contexts?

More on that later, but best to start at the beginning which for me was 1985. The year when a hirsute computer science undergraduate (me) took a class in compiler design from Dr. David Abrahamson[3] in Trinity College Dublin and was introduced to the wonderful world of machine readable documents.

Yes, 1985.

Next: Part 2.


Tuesday, November 07, 2017

Programming Language Frameworks

Inside every programming language framework is exactly one application that fits it like a glove.

Wednesday, October 04, 2017

It is science Jim, but not as we know it.

Roger Needham once said that computing is noteworthy in that the technology often precedes the science[1]. In most sciences, it is the other way around. Scientists invent new building materials, new treatments for disease and so on. Once the scientists have moved on, the technologists move in to productize and commercialize the science.

In computing, we often do things the other way around. The technological tail seems to wag the scientific dog so to speak. What happens is that application-oriented technologists come up with something new. If it flies in the marketplace, then more theory oriented scientists move in to figure out how to make it work better, faster or sometimes to try to discover why the new thing works in the first place.

The Web for example, did not come out of a laboratory full of white coats and clipboards. (Well actually, yes it did but they were particle physicists and were not working on software[2]). The Web was produced by technologists in the first instance. Web scientists came later.

Needham's comments in turn reminded me of an excellent essay by Paul Graham from a Python conference. In that essay, entitled 'The hundred-year language'[3] Graham pointed out that the formal study of literature - a scientific activity in its analytical nature - rarely contributes anything to the creation of literature - which is a more technological activity.

Literature is an extreme example of the phenomenon of the technology preceding, in fact trumping, the science. I am not suggesting that software can be understood in literary terms. (Although one of my college lecturers was fond of saying that programming was language with some mathematics thrown in.) Software is somewhere in the middle, the science follows the technology but the science, when it comes, makes very useful contributions. Think for example of the useful technologies that have come out of scientific analysis of the Web. I'm thinking of things like clever proxy strategies, information retrieval algorithms and so on.

As I wander around the increasingly complex “stacks” of software, I cannot help but conclude that wherever software sits in the spectrum of science versus technology, there is "way" too much technology out there and not enough science.

The plethora of stacks and frameworks and standards is clearly not a situation that can be easily explained away on scientific innovation grounds alone. It requires a different kind of science. Mathematicians like John Nash, economists like Carl Shapiro and Hal Varian, Political Scientists like Robert Axelrod, all know what is really going on here.

These Scientists and others like them, that study competition and cooperation as phenomena in their own right would have no trouble explaining what is going on in today's software space. It has only a little to do with computing science per se and everything to do with strategy - commercial strategy. I am guessing that if they were to comment, Nash would talk about Equilibria[4], Shapiro and Varian would talk about Standards Wars[5], Robert Axelrod would talk about the Prisoners Dilemma and coalition formation[6].

All good science Jim, but not really computer science.

[1] href="http://news.bbc.co.uk/1/hi/technology/2814517.stm

[2] http://public.web.cern.ch/public/

[3] http://www.paulgraham.com/hundred.html

[4] http://www.gametheory.net/Dictionary/NashEquilibrium.html

[5] http://marriottschool.byu.edu/emp/Nile/mba580/handouts/art_of_war.pdf

[6] http://pscs.physics.lsa.umich.edu/Software/ComplexCoop.html

Wednesday, September 20, 2017

What is Law? - Part 17

Last time, we talked about how the concept of a truly self-contained contract, nicely packaged up and running on a blockchain, is not really feasible. The primary stumbling block being that it is impossible to spell out everything you might want to say in a contract, in words.

Over centuries of human affairs, societies have created dispute resolution mechanisms to handle this reality and provide a way of “plugging the gaps” in contracts and contract interpretation. Nothing changes if we change focus towards expressing the contract in computer code rather than in natural language. The same disambiguation difficulty exists.

Could parties to an agreement have a go at it anyhow and eschew the protections of a third party dispute resolution mechanism? Well, yes they could, but all parties are then forgoing the safety net that impartial third party provides when agreement turns to a dis-agreement. Do you want to take that risk? Even if you are of the opinion that the existing state supplied dispute resolution machinery – for example the commercial/chancery courts systems in common law jurisdictions - can be improved upon, perhaps with an online dispute resolution mechanism, you cannot remove the need for a neutral third party dispute resolution forum, in my opinion. The residual risks of doing so for the contracting parties are just too high. Especially when one party to a contract is significantly bigger than the other.

Another reason is that there are a certain number of things that must collective exist for a contract to exist in the first place. Only some of these items can usefully be thought of as instructions suitable for computer-based execution. Simply put, the legally binding contract dispute resolution machinery of a state is only available to parties that actually have a contract to be in dispute over.

There are criteria that must be met known as Essentialia negotii (https://en.wikipedia.org/wiki/Essentialia_negotii). Simply put, the courts are going to look for intention to contract, evidence of an offer, evidence of acceptance of that offer, a value exchange and terms. These are the items which collectively, societies have decided are necessary for a contract to even exist. Without these, you have some form of promise. Not a contract. Promises are not enforceable.

Now only some of these "must have" items for a contract are operational in nature. In other words, only some of these are candidates to be executed on computers. The rest are good old fashioned documents, spreadsheets, images and so on.

These items are inextricably linked to whatever subset of the contract can actually be converted into computer code. As the contract plays out over time, these materials are the overarching context that controls each transaction/event that happens under the terms of the contract.

The tricky bit, is to be able to tie together this corpus of materials from within the blockchain records of transactions/events so that each transaction/event can be tied back to the controlling documents as they were at the moment that the transaction/event happened (Disclosure: this is the area where my company, Propylon, has a product offering.)

This may ring a bell because referencing a corpus of legal materials as they were at a particular point in time, is a concept I have returned to again and again in this series. It is a fundamental concept in legisprudence in my opinion and is also fundamental in the law of contracts.

So, being able to link from the transactions/events back to the controlling documents is necessary because the executable code can never be a self contained contract in itself. In addition, it is not unusual for the text of a contract to change over time and this again, speaks to the need to identify what everything looked like, at the time a disputed contract event occurs. Changes to contract schedules/appendices are a common example. Changes to master templates such as ISDA Master Agreements happen through time, are another common example.

A third reason why fully self-contained contracts is problematic is that ambiguity can be both strategic and pragmatic in contracts. Contract lawyers are highly skilled in knowing when a potential ambiguity in a contract is in their clients favor – either in the sense of creating a potential advantage, or, perhaps most commonly, in allowing the deal to be done in a reasonable amount of time. As we have seen, it would be possible to spend an eternity spelling out what a phrase like “reasonable time period” or indeed, a noun like “chicken” actually means. Contract law has, over the centuries, built up a large corpus of materials the help decide what “reasonable” means and what “chicken” means in a myriad of contracting situations. At the end of the day, both parties want to contract so both parties have an interest in getting on with it. Lawyers facilitate this “getting on with it” by being selective in what potential ambiguities they spend time removing from a draft contract and which ones they let slide.

I think of contracts like layers of an onion. At the center, we have zero or more computable contract clauses. i.e. clauses that are candidates for execution on a computer. Surrounding that, we have the rest of the contract : documents, spreadsheets etc. Surrounding that we have global context. It contains things like “the current price of a barrel of oil” or “Dollar/Yen exchange rate”. Surrounding that we have “past dealings” which relates to how the contracting parties have dealt in the past. Surrounding that again, we have hundreds of years of contract law/precedents etc. to help disambiguation the language of the contract.

As you can see, this ever-expanding context used to resolve disputes in contracts is tantamount to taking a snapshot of the world of human affairs at time T – the time of the disputed event. This is not possible unless the world is in fact a simulation inside a universe sized computer but that is a topic for another time:-)

One final thing. I have been talking about the courts as an independent third party dispute resolution mechanism. There is more to it than that, in that courts often act as enforcers of public policy. For example, a contract that tries to permanently stop party A from competing with party B in the future, is likely to be seen as against the public interest and therefore invalid/unconscionable. See https://www.law.cornell.edu/ucc/2/2-302 for an example of this sort of "public good" concept.

In conclusion, IT professionals approaching the world of contracts are entering a world where semantic ambiguity will resist any and all attempts at complete removal through computer coding. In the words of Benjamin Cardozo:

"the law has outgrown its primitive stage of formalism when the precise word was the sovereign talisman...it takes a broader view today." https://en.wikipedia.org/wiki/Wood_v._Lucy,_Lady_Duff-Gordon

IT people may bristle a little at the characterization of word formalism as “primitive” but the onus is on the current wave of contract technology disruptors who claim to be reinventing contracts, to show how and why the current ambiguity laden system, with its enormous and ponderous dispute resolution dimension – can be fully replaced by “smart” contracts.

My view is that it cannot be fully replaced. Enhanced and improved, yes absolutely. Insofar as discrete contract clauses can be made executable, I see great potential value in making these clauses "smart". But this is an evolution of the current approach to contracts, not a radical replacement of it.


I think I will end this series at this point. I never thought, back in March when I started this series that it would take me so many posts to outline my thoughts in this area. I will end by nodding in the general direction of James Joyce by ending this series with an internal reference back to the beginning of the series, thus creating a hermeneutic circle structure that feels appropriate for a topic as complex and fascinating as the exegesis of law.