Friday, April 21, 2017

What is law? - Part 10

Previously: What is Law? - Part 9

Earlier on in this series, we imagined an infinitely patient and efficient person who has somehow managed to acquire the entire corpus of law at time T and has read it all for us and can now "replay" it to us on demand. We mentioned previously that the corpus is not a closed world and that meaning cannot really be locked down inside the corpus itself. It is not corpus of mathematical truths, dependent only on a handful of axioms. This is not a bug to be fixed. It is a feature to be preserved.

We know we need to add a layer of interpretation and we recognize from the outset that different people (or different software algorithms) could take this same corpus and interpret it differently. This is ok because, as we have seen, it is (a) necessary and (b) part of the way law actually works. Interpreters differ in the opinions they arrive at in reading the corpus. Opinions get weighed against each other, opinions can be over-ruled by higher courts. Some courts can even over-rule their own previous opinions. Strongly established opinions may then end up appearing directly in primary law or regulations, new primary legislation might be created to clarify meaning...and the whole opinion generation/adjudication/synthesis loop goes round and round forever... In law, all interpretation is contemporaneous, tentative and de-feasible. There are some mathematical truths in there but not many.

It is tempting - but incorrect in my opinion - to imagine that the interpretation process works with the stream of words coming into our brains off of the pages, that then get assembled into sentences and paragraphs and sections and so on in a straightforward way.

The main reason it is not so easy may be surprising. Tables! The legal corpus is awash with complex table layouts. I included some examples in a previous post about the complexties of law[1]. The upshot of the use of ubiquitous use of tables is that reading law is not just about reading the words. It is about seeing the visual layout of the words and associating meaning with the layout. Tables are  such a common tool in legal documents that we tend to forget just how powerful they are at encoding semantics. So powerful, that we have yet to figure out a good way of extracting back out the semantics that our brains can readily see in law, using machines to do the "reading".

Compared to, say, detecting the presence of headings or cross-references or definitions, correctly detecting the meaning implicit in the tables is a much bigger problem. Ironically, perhaps, much bigger than dealing with high visual items such as  maps in redistricting legislation[2] because the actual redistricting laws are generally expressed purely in words using, for example, eastings and northings to encode the geography.

If I could wave a magic wand just once at the problem of digital representation of the legal corpus I would wave it at the tables. An explicit semantic representation of tables, combined with some controlled natural language forms[4] would be, I believe, as good a serialization format as we could reasonably hope for, for digital law. It would still have the Closed World of Knowledge problem of course. It would also still have the Unbounded Opinion Requirement but at least we would be in position to remove most of the need for a visual cortex in this first layer of interpreting and reasoning about the legal corpus.

The benefits to computational law would be immense. We could imagine a digital representation of the corpus of law as an enormous abstract syntax tree[5] which we could begin to traverse to get to the central question about how humans traverse this tree to reason about it, form opinions about it, and create legal arguments in support of their opinions.

Next up: What is law? - Part 11.

[1] http://seanmcgrath.blogspot.ie/2010/06/xml-in-legislatureparliament_04.html
[2] https://ballotpedia.org/Redistricting
[3] https://en.wikipedia.org/wiki/Easting_and_northing
[4] https://en.wikipedia.org/wiki/Controlled_natural_language
[5] https://en.wikipedia.org/wiki/Abstract_syntax_tree


Wednesday, April 19, 2017

What is law? - Part 9

Previously: What is law? - Part 8

For the last while, we have been thinking about the issues involved in interpreting the corpus of legal materials that is produced by the various branches of government in US/UK style environments. As we have seen, it is not a trivial exercise because of the ways the material is produced and because the corpus - by design - is open to different interpretations and open to interpretation changing with respect to time. Moreover, it is not an exaggeration to say that it is a full time job - even within highly specialized sub-topics of law - to keep track of all the changes and synthesize the effects of these changes into contemporaneous interpretations.

For quite some time now - centuries in some cases - a second legal corpus has evolved in the private sector. This secondary corpus serves to consolidate and package and interpret the primary corpus, so that lawyers can focus on the actual practice of law. Much of this secondary corpus started out as paper publications, often with so-called loose-leaf update cycles. These days most of this secondary corpus is in the form  of digital subscription services. The vast majority of lawyers utilize these secondary sources from legal publishers. So much so that over the long history of law, a number of interesting side-effects have accrued.

Firstly, for most day-to-day practical purposes, the secondary corpus provides de-facto consolidations and interpretations of the primary corpus. I.e. although the secondary sources are not "the law", they effectively are. The secondary sources that are most popular with lawyers are very high quality and have earned a lot of trust over the years from the legal community.

In this respect, the digital secondary corpus of legal materials is similar to modern day digital abstractions of currency such as bank account balances and credit cards etc. I.e. we trust that there are underlying paper dollars that correspond to the numbers moving around digital bank accounts. We trust that the numbers moving around digital bank accounts could be redeemed for real paper dollars if we wished. We trust that the real paper dollars can be used in value exchanges. So much so, that we move numbers around bank accounts to achieve value exchange without ever looking to inspect the underlying paper dollars. The digital approach to money works because it is trusted. Without the trust, it cannot work. The same is true for the digital secondary corpus of law, it works because it is trusted.

A second, interesting side-effect of trust in the secondary corpus is that parts of it have become, for all intents and purposes, the primary source. If enough of the worlds legal community is using secondary corpus X then even if that secondary corpus differs from the primary underlying corpus for some reason, it may not matter in practice because everybody is looking at the secondary corpus.

A third, interesting side effect of the digital secondary corpus is that it has become indispensable. The emergence of a high quality inter-mediating layer between primary legal materials and legal practitioners has made it possible for the world of law to manage greater volumes and greater change rates in the primary legal corpus.  Computer systems have greatly extended this ability to cope with volume and change. So much so, that law as it is today would collapse if it were not for the inter-mediating layer and the computers.

The classic image of a lawyers office involves shelves upon shelves of law books. For a very long time now, those shelves have featured a mix of primary legal materials and secondary materials from third party publishers. For a very long time now, the secondary materials have been the day-to-day "go to" volumes for legal practitioners - not the primary volumes. Over the last 50 years, the usage level of these paper volumes has dwindled year on year to the point where today, the beautiful paper volumes have become primarily interior decoration in law offices. The real day-to-day corpus is the digital versions and most of those digital  resources are from the secondary - not the primary legal sources.

So, in a sense, the law has already been inter-mediated by a layer of interpretation. In some cases the secondary corpus has become a de-facto primary source by virtue of its ubiquity and the trust placed in it by the legal community.

This creates and interesting dilemma for our virtual legal reasoning machine. The primary legal corpus - as explained previously - is not at all an easy thing to get your hands on from the government entities that produce it. And even if you did get it, it might not be what lawyers would consider the primary authority anyway. On the other hand, the secondary corpus is not a government-produced corpus and may not be available at all outside of the private world of  one of the major legal publishers.

The same applies for the relatively new phenomenon of computer systems encoding parts of the legal corpus into computational logic form. Classic examples  of this include payroll/income tax and eligibility determinations. These two  sub-genres of law tend to have low representational complexity[1]. Simply put, they can readily be converted into programming languages as they are mostly mathematical with low dependencies on the outside world.

Any encoding of the primary legal text into a computer program is, itself, an interpretation. Remembering back to the Unbounded Opinion Requirement, programmers working with legal text are - necessarily encoding opinions as to what the text means. It does not matter if this encoding process is being performed by a government agency or by a third party, it is still an interpretation.

These computer programs - secondary sources of interpretation - can become de-facto interpretations if enough of the legal community trusts them. Think of the personal taxes software applications and e-Government forms for applying for various government services. If enough of the community use these applications,  they become de-facto interpretations.

The legal concept of interpretation forebearance applies here. If a software application interprets a tax deduction in a particular way that is reasonable it may be allowed *even* if a tax inspector would have interpreted the deduction differently.

I am reminded of the concept of reference implementations as forms of specification in computer systems. Take Java Server Pages for an example. If you have a query as to what a Servlet Engine should do in some circumstance, how do you find out what the correct behavior is? It is not in the documentation. It is in the *reference implementation* which is Apache Tomcat.

I am also reminded of the SEC exploring the use of the Python programming language as the legal expression of complex logic in asset backed securties[2]. On the face of it, this would be better than English prose, right? How much more structured can you get than expressing it in a programming language? Well, what version of Python are you talking about? Python 2 family? The Python 3 family? Jython? Is it possible the same same program text can produce different answers if you interprert it with different interpreters? Yes, absolutely. Is it possible, to get different answers from the same code running the same interpreter on different operating systems? Again, yes, absolutely. What about running it tomorrow rather than today? Yes, again!

Even programming languages need interpretation and correct behavior is difficult - perhaps impossible - to capture in the abstract - especially when the program cannot be expressed in a fully closed world without external dependencies. Echoing Wittgenstein again, the true meaning of a computer program manifests when you run it, not in the syntax:-) The great mathematician and computer scientist Don Knuth once warned users of a program that he had written to be careful as he had only proven it to be correct, but had not tried it out.[3]

By now, I hope I have established a reasonable defense of my belief that establishing the meaning of the legal corpus is a tricky business. The good news is that creating interpretations of the corpus is not a new idea. In fact it has been going on for centuries. Moreover, in recent decades, some of the corpus has gradually crept into the form of computer programs and even though it is still rare to find a computer program given formal status in primary law, computer programs are increasingly commonplace in the secondary corpus where they have de-facto status. I hope I have succeeded in explaining why conversions into computer program form do not magically remove the need for interpretation and in some respects just move the interpretation layer around, rather than removing it.

So where does all this leave our legal reasoning black box? I think it leaves it in good shape actually, because we have been using variations on the reasoning black box for centuries. Every time we rely on a third party aggregation or  consolidation or digitization or commentary, we are relying on an interpretation. Using a computer to do it, just makes it better/faster/cheaper but it is a well, well established paradigm at this point. A paradigm already so entrenched that the modern world of law could not operate without it. All the recent interest in computational law, artificial intelligence, smart contracts etc. is not a radically new concept. It is really just a recent rapid acceleration of a trend that started its acceleration in the Seventies with the massive expansion in the use of secondary sources that was ushered in by the information age.

So, we are just about ready, I think, to tackle the question of how our virtual legal reasoning box should best go about the business of interpreting the legal corpus. The starting point for this will be to take a look at how humans do it and will feature a perhaps surprising detour into cognitive psychology for some unsettling facts about human reasoning actually works. Hint: its not all tidy logical rules and neatly deductive logic.

This is where we will pick up in Part 10.

[1] http://web.stanford.edu/group/codex/cgi-bin/codex/wp-content/uploads/2014/01/p193-surden.pdf
[2] https://www.sec.gov/rules/proposed/2010/33-9117.pdf
[3] https://en.wikiquote.org/wiki/Donald_Knuth