Thursday, September 16, 2010

Pssst...there is no such thing as an authentic/original/official/master electronic legal text

I know of no aspect of legal informatics this is plagued with more terminology problems than the question of authentic/original/official/master versions of legal material such as bills and statute and caselaw. In this attempt at addressing some of the confusion, I run the risk of adding more confusion but here goes...

1 - The sending fallacy
What would it mean for the Library of Congress to send their Gutenburg Bible to me? Well, they would put it in a box and ship it to me. Afterwards, I would have that instance of the Gutenburg Bible and they would not have it. The total number of instances of the Gutenburg Bible in the world would remain the same. The instance count chez McGrath would increment and the instance count chez LOC would decrement.

If they were to electronically send it to me, there would be no "sending" going on at all. Instead, a large series of replications would happen - from storage medium to RAM to Network Buffers to Routers...culminating in the persistent storage of a brand new thing in the world, namely, my local replica of the bit-stream that the Library of Congress sent (replicated) to me. The instance count chez McGrath would increment and the instance count chez LOC would remain unchanged. I would have mine but they would still have theirs.

Sadly, the word "send" is used when we really mean "replication" and this is the source of untold confusion as it leads us to map physical-world concepts onto electronic world concepts where there is an imperfect fit...Have you ever sent and e-mail. I mean really "sent" an e-mail? Nope.

2 - The signing fallacy
An example of that imperfect fit is the concept of "signing". What would it mean for the Library of Congress to sign their physical copy of the Gutenburg Bible? They could put ink on a page or maybe imprint a page with an official embossing seal or some such. The nature of physical media makes it relatively easy to make the signing tamper-evident and hard to counterfeit.

What would it mean for the Library of Congress to sign their electronic replica of the Gutenburg Bible with PKI and replicate it (see point 1 above) to me? Well, its really very, very different from a physical signing.

It is just more bits. Every replica completes a completely perfect replica of the original "signature". There is no "original" to compare it too. The best you can do is check for "sameness" and check the origin of the replica but doing these checks rapidly becomes a complex web of hashes and certificates and revocations and trusted third parties and...lots of stuff that is not required for physical-world signatures.

3 - The semantics fallacy
What does it mean for me to render a page of my replica of the the Gutenburg Bible on my computer screen? Am I guaranteed to be seeing the "same" thing you see when you do something similar? Does it matter if the file is a TIFF or a Microsoft Word file? Does it matter what operating system I am using or what my current printer is or my screen resolution? Do any of these differences amount to anything when it comes to the true meaning of the page?

The unfortunate fact - as discussed earlier as part of the KLISS series - is that the semantics of information is sometimes a complex mix of the bits themselves and the rendering created from those bits by software.

Sometimes - for sure - the different renderings have no impact on meaning but it is fiendishly difficult to find consensus on where the dividing line is. Moreover, the signing fallacy (see above) adds to the problem by insisting that a document that passes the signing checks is "the same" as the replica it was replicated from. No account is taken of the fact that that perfect replicate at a bit-stream level may look completely different to me, depending on what software I use to render it and the operating context of the rendering operation.

Semantics in digital information is a complex function of the data bits, the algorithms used to process the bits, and the operating context in which the algorithms act in the bits. Consequently, the question "are these replicas 'the same'?" is not simple to answer...

4 - The either/or fallacy
...When someone asks me, as they sometimes do - and I quote - "How do I know that you sent me the original, authentic document?". I answer that it all depends on what you mean by the words "sent", "original", "authentic" and "document" :-)

Part of the problem is that fake/real, same/different are very binary terms. In the physical world, this not a huge problem. What are the chances that the Gutenburg Bible in the Library of Congress is a fake? I would argue that it is non-zero but extremely small. The same goes for ever dollar note, every passport, every drivers licence on the planet.

In the physical world, we can reduce the residual risk of fakes very effectively. In the electronic world, it is much much harder. How do I know that the replica of the Gutenburg Bible on my computer is not a fake? When you consider points 1,2 and 3 above I think you will see that it is not an easy question to answer...

What to do?

...It all looks quite complicated! Is there a sane way through this? Well, there had better be because, at least in the legal world, we seem to be heading rapidly into a situation where electronic texts of various forms are considered authentic/original/official/masters etc.

I personally believe that there are effective, pragmatic and inexpensive approaches that will work well, but we need to get out from under the terrible weight of unsuitable and downright misleading terminology we have foisted upon ourselves by stretching real world analogies way past their breaking points.

If I had my way "hashing" and "signing" would be utterly distinct. The term "non-repudiation" would be banned from all discourse. I would love to see all the technology around PKI re-factored to completely separate out encryption concerns from counterfeit detection concerns. The two right now feature some of the same tools/techniques, but the amount of confusion it causes is striking. I have lost count of the number of times I have encountered encryption as a proposed solution for counterfeit detection.

As time permits over the next while, I will be blogging more about this area and putting forward some proposed approaches for use in electronic legal publishing. I will also be talking about approaches that are applicable to machine readable data such as XML as well as frozen renderings such as PDF. A concept that is very important in the context of the data.gov/law.gov movements.

I expect pushback because I will be suggesting that we need to re-think the role of PKI and digital signatures and get past the dubious assertion that this stuff is necessarily complicated and expensive.

I truly believe that neither of these are true but it will take more time that I currently have to explain what I have in mind. Soon hopefully...