Tuesday, August 08, 2017

Would the real copy of the contract, please stand up?

Establishing authenticity of digital materials is a topic I have worked on for a long time now in the the context of electronic laws. The UELMA act[1],  the best records rule[2], federal rules of evidence[3], the OAIS model[4]  etc.

Nearly a decade ago now, I wrote an article for ITWorld called "Would the real, authentic copy of the document please stand up? [5]

I happened across it again today and re-reading it, I find it all still relevant, but Smart Contracts are bringing a new use case to the fore. The authenticity and tamper-evidence and judicial admissibility of digital laws is - I admit -  a very specialist area.

Contracts on the other hand....well that is a much much bigger area and one that a much larger group of people are interested in.

All the same digital authenticity challenges apply but over the next while I suspect I will be updating my own corpus of language to cater for the new Smart Contracts eco-system.

Old digital authenticity terms like content addressable stores, fixity, idempotent rendering, registrar etc. look like they will all have new lives under new names in the world of Smart Contracts.

Plus ├ža change...

I am happy to see it happening for a number of reasons but one of them is that the challenges of digital authenticity and preservation of legal materials can only benefit from an injection of fresh interest in the problem from the world of contracts.

[1] http://www.uniformlaws.org/Act.aspx?title=Electronic%20Legal%20Material%20Act
[2] https://en.wikipedia.org/wiki/Best_evidence_rule
[3] https://www.rulesofevidence.org/
[4] https://en.wikipedia.org/wiki/Open_Archival_Information_System
[5] http://www.itworld.com/article/2781645/business/would-the-real--authentic-copy-of-the-document-please-stand-up-.html

Thursday, August 03, 2017

Wednesday, July 19, 2017

What is Law? - part 15

Previously: What is Law? - part 14.

In part one of this series, a conceptual model of legal reasoning was outlined based on a “black box” that can be asked legal type questions and give back legal type answers/opinions. I mentioned an analogy with the “Chinese Room” used in John Searle's famous Chinese Room thought experiment[1] related to Artificial Intelligence.

Simply put, Searle imagines a closed room into which symbols (Chinese language ideographs) written on cards, can be inserted via a slot. Similar symbols can also emerge from the room.

To a Chinese speaking person outside the room inserting cards and and receiving cards back, whatever is inside the room appears to understand Chinese. However, inside the box is simply a mechanism that matches input symbols to output symbols, with no actual understanding of Chinese at all.

Searle's argument is that such a room can manifest “intelligence” to a degree, but that it is not understanding what it is doing in the way a Chinese speaker would.

For our purposes here, we imagine the symbols entering/leaving the room as being legal questions. We can write a legal question on a card, submit it into the room and get an opinion back. At one end of the automation spectrum, the room could be the legal research department shared by partners in a law firm. Inside the room could be lots of librarians, lawyers, paralegals etc. taking cards, doing the research, and writing the answer/opinion cards to send back out. At the other end of the spectrum, the room could be a fully virtual room that partners interact with via web browsers or chat-bots or interactive voice assistants.

Regardless of where we are on that spectrum, the law firm partners will judge the quality of such a room by its outputs. If the results meet expectations, then isn't it a moot point whether or not the innards of the room in some sense “understand” the law?

Now let us imagine that we are seeing good results come from the room and we wish to probe a little to get to a level of comfort about the good results we are seeing. What would we do to get to a level of comfort? Well, most likely, we would ask the virtual box to explain its results. In other words, we would do exactly what we would do with any person in the same position. If the room can explain its reasoning to our satisfaction, all is good, right?

Now this is where things get interesting. Imagine that each legal question submitted to the room generates two outputs rather than one. The first being the answer/opinion in a nutshell (“the parking fine is invalid : 90% confident.”). The second being the explanation “The reasoning as to why the parking fine is invalid is as follows....”). If the explanation we get is logical i.e. it proceeds from facts through inferences to conclusions, weighing up the pros and cons of each possible line of reasoning....we feel good about the answer/opinion.

But how can we know that the explanation given is actually the reasoning that was used in arriving at the answer/opinion? Maybe the innards of the room just picked a conclusion based on its own biases/preferences and then proceeded to back-fill a plausible line of reasoning to defend the answer/opinion it had already arrive at?

Now this is where things may get a little uncomfortable. How can we know for sure that a human presenting us with a legal opinion and an explanation to back it up, is not doing exactly the same thing?

This is an old old nugget in jurisprudence, re-cast into today's world of legal tech and Artificial Intelligence. Legal scholars refer to it as the conflict between so-called rationalist and realist models of legal reasoning. It is a very tricky problem because recent advances in cognitive science have shone a somewhat uncomfortable light on what actually goes on in our mental decision making processes.

Very briefly, we are not necessarily the bastions of cold hard logic that we might think we are. This is not just true in the world of legal reasoning, by the way. The same is true for all forms of reasoning including – shock! - mathematicians.

Recent research[2][3] suggests that human legal reasoning is best viewed as a bi-directional process that oscillates between working forward from premises/facts and working backwards from conclusions to supporting premises/facts.

Mention was previously made of the feature of law whereby different legal minds can look at the same corpus and come up with different conclusions. In this respect, our virtual legal reasoning room is just another source of a legal opinion. Another legal “mind” if you will. The quality of the opinions produced are judged on their merits – the explanations - not on its actual means of production of answers/opinions.

To this way of thinking, lawyers should enthusiastically embrace these new virtual research assistants that are emerging. Who wouldn't see benefit from being able to get other legal “minds” to look at a legal question and offer opinions. Who wouldn't see benefit from being able to ask such a virtual research assistant to argue for and against a given assertion to help sharpen a line of reasoning for use in a legal opinion or in a court room?

Some see problems with the modern machine learning approach to legal AI because of the inability of these systems to explain their conclusions in the form of classic forward-chaining logic. I do not see this being a problem in practice because these systems will develop ways to explain their opinions. They will most likely do it as a completely separate activity. We may know for a fact that they  are reasoning "backwards" but we can never know if the same isn't true for the opinions given by our fellow humans – including the opinions we provide to ourselves!

We have a tendency to get caught up in the notion of intelligent machines replacing humans. We look at the incredible progress machines have made in playing Chess of Go, identifying faces in photographs etc. and some wonder how long it will be before the machines replace the lawyers. I believe there is a qualitative difference between practicing law and, say, playing chess that gets glossed over in the excitement about AI in law.

In chess, there is a small number of variables and a huge, huge set of permutations/combinations of possible moves. Moreover, the key variables can all be encoded for the machine to work with. This makes this sort of game-playing a great candidate for complete mechanisation. i.e. getting to the point where the machine can play the game unaided.

Not so with law. A lawyer's reasoning processes invariable are a lot more expansive covering variables such as the overall goals of the client, trade offs between time and opportunity cost, reputational risk factors, budget constraints, team dynamics etc. etc. On top of these, I have argued in previous posts that the entire legal system is not and cannot be, reduced to a set of rules – no matter how large the set of rules might be envisaged to be.

Rather than think of machines are replacements for lawyers, better to think of machines as augmenting lawyers in my opinion. Machines are no longer confined to document management and mechanical search&retrieval. Machines are increasingly offering opinions as to what is relevant. They have been doing that for quite some time - from the dawn of search result ranking - but in recent years their role as sources of opinion has grown significantly. This trend will continue apace in my opinion. I think we will soon see the day when every lawyer in private practice has access to legal virtual assistants that can provide answers/opinions to supplement the lawyers own research/experience and that of their colleagues.

If I were a professional chess player, I would be a lot more worried about career viability in the age of intelligent machines than I would be as an lawyer, or an accountant or a medical doctor. Yes, intelligent machines will impact these professions as more and more of the mechanizable tasks become mechanized. But the machines can only compute with what they have visibility of and it is in all the stuff that the machines cannot have visibility of that the 21st Century professionals will live.

A good example of this can be found in the world of contracts and in particular, the emerging world of “smart contracts” which is where we will turn to next.

Tuesday, June 27, 2017

Blockchain and Byzantium

Establishing authenticity - "single sources of truth" is a really important concept in the real world and in the world of computing.  From title deeds, to contracts, to laws and currencies, we have evolved ways of establishing single sources of truth over many centuries of trial and error.

Knowingly or not, many of the ways of solving the problem rely on the properties of physical objects: clay tablets (Code of Hammurabi), Bronze Plates (The Twelve Tables of Rome), Goat Skin (Celtic Brehon Laws). Typically, this physicality is mixed in with a bit of trust. Trust in institutions. Trust in tamper evidence. Trust in probabilities.

Taken together: the physical scheme aspect, plus the trust aspect, allows the establishment of consensus. It is consensus, at the end of the day, that makes all this stuff work in the world of human affairs. Simply put, if enough of us behave as though X is the authentic deed/deposition/derogation/dollar then X is, ipso facto, for all practical purposes, the real deal.

In the world of digital data, consensus is really tricky because trust becomes really tricky. Take away the physicality of objects and establishing trust in the truth/authenticity of digital objects is hard.

Some folk say that blockchain is slow and inefficient and they are right - if you are comparing it to today's consensus as to what a "database" is.

Blockchain is the way it is because it is trying to solve the trust problem. A big part of that is what is called Byzantine Consensus. Basically how to establish consensus when all sorts of things can go wrong, ranging from honest errors to sabotage attempts.

The problem is hard and also very interesting and important in my opinion. Unfortunately today, many folks see the word "database" associated with blockchain and all they see is the incredible inefficiency and cost per "transaction" compared to, say, a relational database with ACID properties.

Yes, blockchain is a truly dreadful "database" - if your metric for evaluation is the same as the use cases for relational databases.

Blockchain is not designed to be one of those. Blockchain is the way it is because byzantine consensus is hard. Is it perfect? Of course not but a proper evaluation of it requires looking at the problems it is trying to solve. Doing so, requires getting past common associations most people carry around in their heads about what a "database" is and how it should behave/perform.

Given the unfortunate fact that the word "database" has become somewhat synonymous with the term "relational database", I find it amusing that Blockchain has itself become a byzantine consensus problem. Namely, establishing consensus about what words like  "database" and "transaction" and "trust" really mean.

Wednesday, June 14, 2017

What is Law? - part 14

Mention has been made earlier in this series to the presence of ambiguity in the corpus of law and the profound implications that the presence of ambiguity has on how we need to conceptualize computational law, in my opinion.

In this post, I would like to expand a little on the sources of ambiguity in law. Starting with the linguistic aspects but then moving into law as a process and an activity that plays out over time, as opposed to being a static knowledge object.

In my opinion, ambiguity is intrinsic in any linguistic formalism that is expressive enough to model the complexity of the real world. Since law is attempting to model the complexity of the real world, the ambiguity present in the model is necessary and intrinsic in my opinion. The linguistic nature of law is not something that can be pre-processed away with NLP tools, to yield a mathematically-based corpus of facts and associated inference rules.

An illustrative example of this can be found in the simple sounding concept of legal definitions. In language, definitions are often hermeneutic circles[1] which are formed whenever we define a word/phrase in terms of other words/phrases. These are themselves defined in terms of yet more words/phrases, in a way that creates definitional loops.

For example, imagine a word A that is defined in terms of words B, and C. We then proceed to define both B and C to try to bottom out the definition of A. However, umpteen levels of further definition later, we create a definition which itself depends on A – the very thing we are trying to define - thus creating a definitional loop. These definitional loops are known as hermeneutic circles[1].

Traditional computer science computational methods hate hermeneutic circles. A large part of computing consists of creating a model of data that "bottoms out" to simple data types. I.e. we take the concept of customer and boil it down into a set of strings, dates and numbers. We do not define a customer in terms of some other high level concept such as Person which might, in turn, be defined as a type of customer. To make a model that classical computer science can work on, we need a model that "bottoms out" and is not self-referential in the way hermeneutic circles are.

Another way to think about the definition problem is in term of Saussure's linguistics[2] in which language (or more generically "signs") get their meaning because of how they differ from other signs - not because they "bottom out" into simpler concepts.

Yet another way to think about the definition problem is in terms of what is known as the descriptivist theory of names[3] in which nouns can be though of as just arbitrary short codes for potentially open-ended sets of things which are defined by their descriptions. I.e. a "customer" could be defined as the set of all objects that (a) buy products from us, (b) have addresses we can send invoices to, (c) have given us their VAT number.

The same hermeneutic circle/Sauserrian issue arises here however as we try to take the elements of this description and bottom out the nouns they depend on (e.g., in the above example, "products", "addresses", "invoices" etc.).

For extra fun, we can construct a definition that is inherently paradoxical and sit back as our brains melt out of our ears trying to complete a workable definition. Here is a famous example:
The 'barber' in town X is defined as the person in town X who cuts the hair of anyone in town who do not choose to cut their own hair.

This sounds like a reasonable starting point for a definition of a 'barber', right? Everything is fine until we think about who cuts the barber's hair[4].

The hard facts of the matter are that the real world is full of things we want to make legal statements about but that we cannot formally define, even though we have strong intuitions about what they are. What is a "barber"? What is the color "red"? Is tomato ketchup a vegetable[5]? What is "duty"? What is "ownership"? etc. etc. We all carry around intuitions about these things in our heads, yet we struggle mightily to define them. Even when we can find a route to "bottom out" a definition, the results often seem contrived and inflexible. For example we could define "red" as 620–750 nm on the visible spectrum but are we really ok with 619nm or 751nm being "not red"?

Many examples of computing blips and snafus in the real world can be traced to the tendency of classical computing to put inflexible boundaries around things in order to model them. What does it mean for a fly-by-wire aircraft to be "at low altitude"? What does it mean for an asset to be trading at "fair market value"? The more we attempt to bottom these concepts out into hard numeric ranges - things classical computing can easily work with - the more we risk breaking our own intuitions with the real world versions of these concepts.

If this is all suggesting to you that computational law sounds more like a problem that requires real numbers (continuous variables) and statistical calculations as opposed to natural numbers and linear algebraic calculations, I think that is spot on.

I particularly like the concept of law as a continuous, analog process as it allows a key concept in law to be modeled more readily - namely the impact of the passage of time.

We have touched on the temporal aspects already but here I would like to talk a little about how the temporal aspects impact the ambiguity in the corpus.

As time passes, the process of law will itself change the law. One of the common types of change is a gradual reduction in levels of ambiguity in the corpus. Consider a new law which needs to define a concept. Here is how the process plays out, in summary form.

  • A definition is created in natural language. Everybody involves in the drafting knows full well that definitions cannot be fully self-contained and that ambiguity is inevitable. In the interests of being able to actually pass a law before the heat death of the universe, a starter definition is adopted in the law.
  • As the new law finds its way into effect, regulations, professional guidance notes etc. are created that refine the definition.
  • As the new law/regulations/professional guidance impacts the real world, litigation events may happen which result in the definition being scrutinized. From this scrutiny, new caselaw is produced which further refines the definition, reducing but never completely removing, the amount of ambiguity associated with the defintion.

A closely related process - and a major source of pragmatic, pre-meditated ambiguity in the process of law - is contracts. While drafting a contract, the teams of lawyers on both sides of the contract know that ambiguity is inevitable. It is simply not possible, for all the reasons mentioned above, to bottom out all the ambiguities.

The ambiguity that necessarily will remain in the signed contract is therefore used as a negotiating/bargaining item as the contract is being worked. Sometimes, ambiguity present in a draft contract gives you a contractual advantage so you seek to keep it. Other times, it creates a disadvantage so you seek to have it removed during contract negotiations. Yet other times, the competing teams of lawyers working on a contract with an ambiguity might know full well that it might cause difficulties down the road for both sides. However it might cost so much time and money to reduce the ambiguity now that both sides let it slide and hope it never becomes contentious post contract.

So to summarize, ambiguity in law is present for two main reasons. Firstly there is ambiguity present that is inevitable because of what law is trying to model - i.e. the real world. Secondly, there is ambiguity present that is tactical as lawyers seek to manipulate ambiguities so as to favor their clients.

[5] https://en.wikipedia.org/wiki/Ketchup_as_a_vegetable

Wednesday, June 07, 2017

What is law - part 12a

Previously: what is law part 12

Perhaps the biggest form of push-back I get from fellow IT people with respect to the world of law relates to the appealing-but-incorrect notion that in the text of the law, there lies a data model and a set of procedural rules for operating on that data model, hidden inside the language.

The only thing stopping us computerizing the law, according to this line of reasoning, is that we just need to get past all the historical baggage of foggy language and extract out the procedural rules (if-this-then-that) and the data model (definition of a motor controlled vehicle, definition of 'theft', etc.). All we need to do is leverage all our computer science knowledge with respect to programming languages and data modelling, combine it with some NLP (natural language processing) so that we can map the legacy linguistic form of law into our shiny new digital model of law.

In previous parts in this series I have presented a variety of technical arguments as to why this is not correct in my opinion. Here I would like to add some more but this time from a more sociological perspective.

The whole point of law, at the end of the day, is to allow society to regulate its own behavior, for the greater good of that society. Humans are not made from diamonds cut at right angles. Neither are the societal structures we make for ourselves, the cities we build, the political systems we create etc. The world and the societal structures we have created on top of it are messy, complex and ineffable. Should we be surprised that the world of law which attempts to model this, is itself, messy, complex and ineffable?

We could all live in cities where all the houses are the same and all the roads are the same and everything is at right angles and fully logical. We could speak perfectly structured languages where all sentences obey a simple set of structural rules. We could all eat the same stuff. Wear the same clothes. Believe in the same stuff...but we do not. We choose not to. We like messy and complex. It suits us. It reflects us.

In any form of digital model, we are seeking the ability to model the important stuff. We need to simplify - that is the purpose of a model after all - but we need to preserve the essence of the thing modeled. In my opinion, a lot of the messy stuff in law is there because law tries to model a messy world. Without the messy stuff, I don't see how a digital model of law can preserve the essence of what law actually is. The only outcome I can imagine from such an endeavor (in the classic formulation of data model + human readable rules) is a model that fails to model the real world.

In my opinion, this is exactly what happened in the Eighties when people got excited about how Expert Systems[1] could be applied to law. In a nutshell, it was discovered that the modelling activity lost so much of the essence of law, that the resultant digital systems were quite limited in practice.

Today, as interest in Artificial Intelligence grows again, I see evidence that the lessons learned back in the Eighties are not being taken into account. Today we have XML and Cloud Computing and better NLP algorithms and these, so the story goes, will fix the problems we had in the Eighties.

I do not believe this is the case. What we do have today, that did not exist in the Eighties, is much much better algorithms for training machines - not programming them  to act intelligently - training them to act intelligently. When I studied AI in the Eighties, we spent about a week on Neural Networks and the rest of the year on expert systems i.e. rules-based approaches. Today's AI courses are the other way around!

Rightly so, in my opinion because there has not been any great breakthrough in the expert systems/business rules space since the Eighties. We tried all the rules-based approaches in the Eighties. A lot of great computer science minds worked on it. It came up short in the real world of law.

When you combine the significant advances in Neural Network approaches with all the compute advantages of cloud and the ready availability of lots and lots of digital data, things get interesting again. This is where we are today. And it is very interesting indeed.

I numbered this blog post "12a", for a reason that is hopefully both humorous and relevant. I know of both legal texts and legal business processes that avoid the number 13. I know of a legal text with so many sub-paragraphs that the number 666 was needed, and 665a was used instead. This kind of thing drives rules-based computing mad but is exactly the kind of human footprint that is literally all over the world of law.

The human touch can be seen in all its splendor in the area of legal fictions[2]. Everything from life insurance claims to resigning from office uses forms of logic that are very foreign to the world of classic computing concepts of rules and data models.

Yet there the are... in all their messy, complex, splendidly human glory. Spend a few moments with the Chiltern Hundreds. It is worth your time [3]. Spend some time thinking about how we humans can both navigate ambiguity when we have to, or when it suits us, and - when it suits us - create new ambiguity. Then read about contra proferentem[4].

Now we can refuse to believe the messy ambiguity and complexity is intrinsic and spend our time trying to remove it with computers - as we did in the Eighties. Or we can take a deep breath, dive in and embrace it.

I recommend the latter. Next up: What is Law? - Part 14.

[1] https://en.wikipedia.org/wiki/Expert_system
[2] https://en.wikipedia.org/wiki/Legal_fiction

Wednesday, May 31, 2017

The Great Inversion in Computing

Methinks we may be witnessing a complete inversion in the computing paradigm that has dominated the world since the Sixties.

In 1968, with Algol68[1] we started treating algorithms as forms of language. Chomsky's famous hierarchy of languages[2] found a huge new audience outside of pure linguistics.

In 1970, relational algebra came along[3] and we started treating data structures as mathematical objects with formal properties and theorems and proofs etc. Set theory/operator theory found a huge new audience outside of pure mathematics.

In 1976, Nicklaus Wirth published "Algorithms + Data Structures =  Programs"[4] crisply asserting that programming is a combination of algorithms and data structures.

The most dominant paradigm since the Sixties maps algorithms to linguistics (Python, Java etc.) and data structures to relational algebra (relational  databases, third normal form etc.).

Todays Deep Learning/AI etc. seems to me to be inverting this mapping. Algorithms are becoming mathematics and data is becoming linguistic e.g. "unstructured" text/documents/images/video etc.

Perhaps we are seeing a move towards "Algorithms (mathematics) + data structures (language) = Programs" and away from "Algorithms (language) + data structures (mathematics) = Programs"

[1] https://en.wikipedia.org/wiki/ALGOL_68
[2] https://en.wikipedia.org/wiki/Chomsky_hierarchy
[3] https://en.wikipedia.org/wiki/Relational_algebra
[4] https://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures_%3D_Programs

Tuesday, May 16, 2017

What is law? - part 12

Previously : what is law? - part 11

There are a few odds and ends that I would like to bundle up before proceeding. These are items that have occurred to me since I wrote the first What is Law? post back in March. Items I would have written about earlier in this series, if they had occurred to me. Since I am writing this series as I go, this sort of thing is inevitable I guess. Perhaps if I revisit the material to turn it into an essay at some point, I will fold this new material in at the appropriate places.

Firstly, in the discussion about the complexity of the amendatory cycle in legislation I neglected to mention that it is also possible for a new item of primary legislation to contain amendments to itself. In other words it may be that as soon as a bill becomes and act and is in force, it is immediately necessary to modify it using modifications spelled out in the act itself. Looking at it another way, a single Act can be both a container for new law and a container for amendatory instructions, all in one legal artifact. Why does this happen? Legislation can be crafted over long periods of time and consensus building may proceed piece by piece. In a large piece of legislation, rather than continually amending the whole thing – perhaps thousands of pages – sometimes amendments are treated as additional material tacked on the end so as to avoid re-opening debate – and editorial work - on material already processed through the legislative process. It is a bit of a mind bender. Basically if an Act becomes law at time T then it may instantaneously need to be codified in itself before we can proceed to codify it into the broader corpus.

Secondly, I mentioned that there is no central authority that controls the production of law. This complicates matters for sure but it also has some significant benefits that I would like to touch on briefly as the benefits are significant. Perhaps the biggest benefit of the de-centralized nature of law making is that it does not have a single point of failure. In this respect, it is reminiscent of the distributed packet routing protocol used on the internet. Various parts of the whole system are autonomic resulting in an overall system that is very resilient as there is no easy way to interrupt the entire process.

This distribution-based resilience also extends into the semantic realm where it combine with the textual nature of law to yield a system that is resilient to the presence of errors. Mistakes happen. For example, a law might be passed that requires train passengers to be packaged in wooden crates. (Yes, this happened.). Two laws might be passed in parallel that contradict each other (yes, this has happened many times.) When this sort of thing happens, the law has a way of rectifying itself, leveraging the “common sense” you can get with human decision making. Humans can make logical errors but they have a wonderful ability to process contradictory information in order to fix up inconsistent logic. Also humans possess an inherent, individual interpretation of equity/fairness/justice and the system of law incorporates that, allowing all participants to evaluate the same material in different ways.

Thirdly, I would like to return briefly to the main distinction I see between legal deductive logic and the deductive logic computer science people are more familiar with. When deductive logic is being used (remembering always that it is just one form of legal reasoning and rarely used on its own) in law, the classic “if this then that” form can be identified as well as classical syllogistic logic. However, legal reasoning involves weighing up the various applicable deductive statements using the same sort of dialectic/debate-centric reasoning mentioned earlier. Put another way, deductive logic in law very rarely proceeds from facts to conclusion in some nice tidy decision tree. Given the set of relevant facts (which have themselves to be argued as “the relevant facts”) there may well be multiple applicable deductive logic forms in the corpus of law which, depending on which ones are used and the order they are used, will result in different conclusions.

Again, this is where the real skills of a lawyer manifest. The possible routes through the law at Time T, that can be applied to a set of relevant facts F, is often vast and grows exponentially with the complexity of the facts being considered. Lawyers develop the ability to “prune” the routes down to something manageable in the same way that, say, chess grand masters, prune the set of options in any chess game situation.

This is perhaps the biggest "oops" moment I have seen when IT people first see the “rules” expressed in legal language. They see stuff that looks like it can be turned into classical logic e.g. indicative mood statements and then proceed to the non sequitur that it can be re-expressed in classical mathematical logic forms. What computer science people tend not to see at first is the rhetorical structure that sits underneath the indicative statements. I don't think it is overstating the case to say that every legal question is essentially a debate. You can analyse the corpus to find in favour of any given proposition or against any given proposition. Each line of reasoning can feature chunks of good old fashioned mathematical logic but the final conclusions do not come from the decision trees, they come from the fuzzier process of weighing up the logic on every side of the debate, in order to arrive at a best – but necessarily tentative – answer. As Immanual Kant said, there are no rules for the application of rules.

Nick Szabo (the man who coined the term “smart contracts” which we will be turning to soon), uses the terms “wet code” and “dry code” to describe the difference between legal reasoning and classical computer reasoning. Dry code is the stuff with low representational complexity we can convert into classical computer software. There is some of that in law for sure, but a lot less than you might think. Most of it is “wet code” because of the open textured nature of the text of the law, the unbounded opinion requirement, the extensive use of analogical reasoning and the dialectic nature of the deductive logic in law.

Thursday, May 04, 2017

What is law? - part 11

Gliding gracefully over all the challenges alluded to earlier with respect to extracting the text level meaning out of the corpus of Law at time T, we now turn to thinking about how it is actually interpreted and utilized by practitioners. To do that, we will continue with our useful invention of an infinitely patient person who has somehow found all of the primary corpus and read it all from the master sources, internalized it, and can now answer our questions about it and feed it back to us on demand.

The first order of business is where to start reading? There are two immediate issues here. Firstly, the corpus is not chronologically accretive. That is, there is no "start date" to the corpus we can work from, even if, in terms of historical events, a foundation date for a state can be identified. The reasons for this have already been discussed. Laws get modified. Laws get repealed. Caselaw gets added. Caselaw gets repealed. New laws get added. I think of it like a vast stormy ocean, constantly ebbing and flowing, constantly adding new content (rainfall, rivers) and constantly loosing content (evaporation) - in an endless cycle. It has no "start point" per se.

In the absence of an obvious start point, some of you may be thinking "the index", which brings us to the second issue. There is no index! There is no master taxonomy that classifies everything into a nice tidy hierarchy. There are some excellent indexes/taxonomies in the secondary corpus produced by legal publishers, but not in the primary corpus.

Why so? Well, if you remember back to the Unbounded Opinion Requirement mentioned previously, creating an index/taxonomy is, necessarily, the creation of an opinion on the "about-ness" of a text in the corpus. This is something the corpus of law stays really quite vague about - on purpose - in order to leave room for interpretation of the circumstances and facts about any individual legal question. Just because a law was originally passed to do with electricity usage in phone lines, does not mean it is not applicable to computer hacking legislation. Just because a law was passed relating to manufacturing processes does not mean it has no relevance to ripening bananas. (Two examples based on real world situations, I have come across by the way.)

So, we have a vast, constantly changing, constantly growing corpus. So big it is literally humanly impossible to read, regardless of the size of your legal team, and there are no finding aids in the primary corpus to help us navigate our way through it....

...Well actually, there is one and it is an incredibly powerful finding aid. The corpus of legal materials is woven together by an amazingly intricate web of citations. Laws invariably cite other laws. Regulations cite laws. Regulations cite regulations. Caselaw cites law and regulations and other caselaw....creating a layer that computer people would call a network graph[1]. Understanding the network graph is key to understanding how practitioners navigate the corpus of law. The don't go page-by-page, or date-by-date, they go citation-by-citation.

The usefulness of this citation network in law cannot be overstated. The citation network helps practitioners to find related materials, acting as a human-generated recommender algorithm for practitioners. The citation networks not only establish related-ness, they also establish meaning, especially in the caselaw corpus. We talked earlier about the open-textured nature of the legal corpus. It is not big on black an white definitions of things. Everything related to meaning is fluid on purpose. The closest thing in law to true meaning is arguably established in the caselaw. In a sense, the caselaw is the only source of information on meaning that really matters because at the end of the day, it does not matter what you or I or anyone else might think a part of the corpus means. What really matters is what the courts say it means. Caselaw is the place you go to find that out.

"But", I hear you say, "graphs do not necessarily have a start point either!". True. But this is where one of the real skills of a lawyer manifests itself. Legal reasoning, is, for the most part (UK/US style), reasoning by analogy. For any given case, a lawyer looks to take the facts, the desired outcome and then seek to make an analogy with a previously adjudicated case so that if the analogy holds up, the desired outcome is achieved by virtue of the over-arching desire of the legal ecosystem to maintain consistency with previous decisions. There is perhaps no other field where formulating the right question is as important as it is in law.

Having constructed an analogy, initial entry points into the corpus of law can be identified and from there, the citation network works it magic to route you through the bottomless seas of content, to the most relevant stuff. The term "most relevant" here is oftentimes signaled by the presence of lots of in-bound citations. I.e. in caselaw, if your analogy brings you to case X and case X has been cited by lots of other cases with the outcome you are looking to achieve, and if case X is still good law (has not been repealed), then case X is a good one to cite in your legal argument.

If this leveraging of the citation network link topology reminds you of Google's original page rank algorithm then you are on the right track. Lawyers, perhaps to the surprise of computer science and math folk, have been leveraging the properties of scale free network graphs[2] for centuries[3].

I said "legal argument" above and this is another critical point in understanding what law actually is and how it works...The corpus of law is not a place you go to find black and white answers to black and white questions. Rather, it is a place you go with an analogy you have formed in order to find arguments for and against your desired outcome from that analogy. It is a form of rhetoric. A form of debate. It is not a form of formulaic application of crisp rules that generate crisp answers.

In short. It is not mathematics in the sense that many computer science folks might initially assume when they hear of talk of "rules" and "decisions" and so on. However it arguably is mathematics in some other ways. Leveraging the citation graph is a very mathematical thing. Weighing up the pros and cons of legal argument strategies often exhibits properties familiar from optimization problems and game theory.

It is in these latter senses of "mathematical" that most of the recent surge in interest in computational law have arisen. In particular, machine learning and neural network-centric approaches to artificial intelligence are re-igniting interest in computational law after an overall disappointing outcome in the Eighties. Back then, rule-centric approaches prevailed and although there have been some noticeable successes in areas such as income tax calculation, rules-based approaches have largely run out of steam in my opinion.

The citation network - and in particular - how the citation network changed over time, is, in my opinion, the key to unlocking computational law. I do not think it is stretching things to state that the citation network is the underlying DNA that holds the world of law together. Rather that seek to replace this DNA - in all its magnificent power and complexity - with nice tidy lego-bricks of conditional logic and data objects, we need to embrace it. Of course it has its flaws. Nothing is perfect. But it is the way it is, for the most part, for good reasons. We will make progress in computational law faster if more computing folk understand the world of law for what it is - as opposed to what they might initially think it is at a high level, or perhaps wish it to be.

I hope this series of blog posts has helped in some small way, to show what it really is. At least, from my perspective which of course, is just one persons opinion. As we have seen in this series of posts on law - "opinion" is as good as it gets in law. Again, finally, this not a bug. It is a feature...In my opinion:-)

Wednesday, April 26, 2017

Zen and the art of motorcycle....manuals

I heard the sad news about Robert Pirsig passing.

His book : Zen and the art of motorcycle maintenance was a big influence on me and piqued my interest in philosophy.

While writing the book his day job was writing computer manuals.

About 15 years ago, I wrote an article for ITWorld about data modelling with XML called Zen and the art of motorcycle manuals, inspired in part by Pirsig's book and his meditations on how the qualities in objects such as motorcycles are more than just the sum of the parts that make up the motorcycle.

So it is with data modelling. For any given modelling problem there are many ways to do it that are all "correct" at some level. Endlessly seeking to bottom out the search and find the "correct" model is a pointless exercise. At the end of the day "correctness" for any data model is not  a function of the data itself. It is a function of what you are planning to do with the data.

This makes some folks uncomfortable. Especially proponents of top-down software development methodologies who like to conceptualize analysis as an activity that starts and ends before any prototyping/coding begins.

Maybe somewhere out there Robert Pirsig is talking with Bill Kent - author of another big influence on my thinking : Data and Reality.

Maybe they are discussing how best to model a bishop :-)

Friday, April 21, 2017

What is law? - Part 10

Previously: What is Law? - Part 9

Earlier on in this series, we imagined an infinitely patient and efficient person who has somehow managed to acquire the entire corpus of law at time T and has read it all for us and can now "replay" it to us on demand. We mentioned previously that the corpus is not a closed world and that meaning cannot really be locked down inside the corpus itself. It is not corpus of mathematical truths, dependent only on a handful of axioms. This is not a bug to be fixed. It is a feature to be preserved.

We know we need to add a layer of interpretation and we recognize from the outset that different people (or different software algorithms) could take this same corpus and interpret it differently. This is ok because, as we have seen, it is (a) necessary and (b) part of the way law actually works. Interpreters differ in the opinions they arrive at in reading the corpus. Opinions get weighed against each other, opinions can be over-ruled by higher courts. Some courts can even over-rule their own previous opinions. Strongly established opinions may then end up appearing directly in primary law or regulations, new primary legislation might be created to clarify meaning...and the whole opinion generation/adjudication/synthesis loop goes round and round forever... In law, all interpretation is contemporaneous, tentative and de-feasible. There are some mathematical truths in there but not many.

It is tempting - but incorrect in my opinion - to imagine that the interpretation process works with the stream of words coming into our brains off of the pages, that then get assembled into sentences and paragraphs and sections and so on in a straightforward way.

The main reason it is not so easy may be surprising. Tables! The legal corpus is awash with complex table layouts. I included some examples in a previous post about the complexties of law[1]. The upshot of the use of ubiquitous use of tables is that reading law is not just about reading the words. It is about seeing the visual layout of the words and associating meaning with the layout. Tables are  such a common tool in legal documents that we tend to forget just how powerful they are at encoding semantics. So powerful, that we have yet to figure out a good way of extracting back out the semantics that our brains can readily see in law, using machines to do the "reading".

Compared to, say, detecting the presence of headings or cross-references or definitions, correctly detecting the meaning implicit in the tables is a much bigger problem. Ironically, perhaps, much bigger than dealing with high visual items such as  maps in redistricting legislation[2] because the actual redistricting laws are generally expressed purely in words using, for example, eastings and northings to encode the geography.

If I could wave a magic wand just once at the problem of digital representation of the legal corpus I would wave it at the tables. An explicit semantic representation of tables, combined with some controlled natural language forms[4] would be, I believe, as good a serialization format as we could reasonably hope for, for digital law. It would still have the Closed World of Knowledge problem of course. It would also still have the Unbounded Opinion Requirement but at least we would be in position to remove most of the need for a visual cortex in this first layer of interpreting and reasoning about the legal corpus.

The benefits to computational law would be immense. We could imagine a digital representation of the corpus of law as an enormous abstract syntax tree[5] which we could begin to traverse to get to the central question about how humans traverse this tree to reason about it, form opinions about it, and create legal arguments in support of their opinions.

Next up: What is law? - Part 11.

[1] http://seanmcgrath.blogspot.ie/2010/06/xml-in-legislatureparliament_04.html
[2] https://ballotpedia.org/Redistricting
[3] https://en.wikipedia.org/wiki/Easting_and_northing
[4] https://en.wikipedia.org/wiki/Controlled_natural_language
[5] https://en.wikipedia.org/wiki/Abstract_syntax_tree

Wednesday, April 19, 2017

What is law? - Part 9

Previously: What is law? - Part 8

For the last while, we have been thinking about the issues involved in interpreting the corpus of legal materials that is produced by the various branches of government in US/UK style environments. As we have seen, it is not a trivial exercise because of the ways the material is produced and because the corpus - by design - is open to different interpretations and open to interpretation changing with respect to time. Moreover, it is not an exaggeration to say that it is a full time job - even within highly specialized sub-topics of law - to keep track of all the changes and synthesize the effects of these changes into contemporaneous interpretations.

For quite some time now - centuries in some cases - a second legal corpus has evolved in the private sector. This secondary corpus serves to consolidate and package and interpret the primary corpus, so that lawyers can focus on the actual practice of law. Much of this secondary corpus started out as paper publications, often with so-called loose-leaf update cycles. These days most of this secondary corpus is in the form  of digital subscription services. The vast majority of lawyers utilize these secondary sources from legal publishers. So much so that over the long history of law, a number of interesting side-effects have accrued.

Firstly, for most day-to-day practical purposes, the secondary corpus provides de-facto consolidations and interpretations of the primary corpus. I.e. although the secondary sources are not "the law", they effectively are. The secondary sources that are most popular with lawyers are very high quality and have earned a lot of trust over the years from the legal community.

In this respect, the digital secondary corpus of legal materials is similar to modern day digital abstractions of currency such as bank account balances and credit cards etc. I.e. we trust that there are underlying paper dollars that correspond to the numbers moving around digital bank accounts. We trust that the numbers moving around digital bank accounts could be redeemed for real paper dollars if we wished. We trust that the real paper dollars can be used in value exchanges. So much so, that we move numbers around bank accounts to achieve value exchange without ever looking to inspect the underlying paper dollars. The digital approach to money works because it is trusted. Without the trust, it cannot work. The same is true for the digital secondary corpus of law, it works because it is trusted.

A second, interesting side-effect of trust in the secondary corpus is that parts of it have become, for all intents and purposes, the primary source. If enough of the worlds legal community is using secondary corpus X then even if that secondary corpus differs from the primary underlying corpus for some reason, it may not matter in practice because everybody is looking at the secondary corpus.

A third, interesting side effect of the digital secondary corpus is that it has become indispensable. The emergence of a high quality inter-mediating layer between primary legal materials and legal practitioners has made it possible for the world of law to manage greater volumes and greater change rates in the primary legal corpus.  Computer systems have greatly extended this ability to cope with volume and change. So much so, that law as it is today would collapse if it were not for the inter-mediating layer and the computers.

The classic image of a lawyers office involves shelves upon shelves of law books. For a very long time now, those shelves have featured a mix of primary legal materials and secondary materials from third party publishers. For a very long time now, the secondary materials have been the day-to-day "go to" volumes for legal practitioners - not the primary volumes. Over the last 50 years, the usage level of these paper volumes has dwindled year on year to the point where today, the beautiful paper volumes have become primarily interior decoration in law offices. The real day-to-day corpus is the digital versions and most of those digital  resources are from the secondary - not the primary legal sources.

So, in a sense, the law has already been inter-mediated by a layer of interpretation. In some cases the secondary corpus has become a de-facto primary source by virtue of its ubiquity and the trust placed in it by the legal community.

This creates and interesting dilemma for our virtual legal reasoning machine. The primary legal corpus - as explained previously - is not at all an easy thing to get your hands on from the government entities that produce it. And even if you did get it, it might not be what lawyers would consider the primary authority anyway. On the other hand, the secondary corpus is not a government-produced corpus and may not be available at all outside of the private world of  one of the major legal publishers.

The same applies for the relatively new phenomenon of computer systems encoding parts of the legal corpus into computational logic form. Classic examples  of this include payroll/income tax and eligibility determinations. These two  sub-genres of law tend to have low representational complexity[1]. Simply put, they can readily be converted into programming languages as they are mostly mathematical with low dependencies on the outside world.

Any encoding of the primary legal text into a computer program is, itself, an interpretation. Remembering back to the Unbounded Opinion Requirement, programmers working with legal text are - necessarily encoding opinions as to what the text means. It does not matter if this encoding process is being performed by a government agency or by a third party, it is still an interpretation.

These computer programs - secondary sources of interpretation - can become de-facto interpretations if enough of the legal community trusts them. Think of the personal taxes software applications and e-Government forms for applying for various government services. If enough of the community use these applications,  they become de-facto interpretations.

The legal concept of interpretation forebearance applies here. If a software application interprets a tax deduction in a particular way that is reasonable it may be allowed *even* if a tax inspector would have interpreted the deduction differently.

I am reminded of the concept of reference implementations as forms of specification in computer systems. Take Java Server Pages for an example. If you have a query as to what a Servlet Engine should do in some circumstance, how do you find out what the correct behavior is? It is not in the documentation. It is in the *reference implementation* which is Apache Tomcat.

I am also reminded of the SEC exploring the use of the Python programming language as the legal expression of complex logic in asset backed securties[2]. On the face of it, this would be better than English prose, right? How much more structured can you get than expressing it in a programming language? Well, what version of Python are you talking about? Python 2 family? The Python 3 family? Jython? Is it possible the same same program text can produce different answers if you interprert it with different interpreters? Yes, absolutely. Is it possible, to get different answers from the same code running the same interpreter on different operating systems? Again, yes, absolutely. What about running it tomorrow rather than today? Yes, again!

Even programming languages need interpretation and correct behavior is difficult - perhaps impossible - to capture in the abstract - especially when the program cannot be expressed in a fully closed world without external dependencies. Echoing Wittgenstein again, the true meaning of a computer program manifests when you run it, not in the syntax:-) The great mathematician and computer scientist Don Knuth once warned users of a program that he had written to be careful as he had only proven it to be correct, but had not tried it out.[3]

By now, I hope I have established a reasonable defense of my belief that establishing the meaning of the legal corpus is a tricky business. The good news is that creating interpretations of the corpus is not a new idea. In fact it has been going on for centuries. Moreover, in recent decades, some of the corpus has gradually crept into the form of computer programs and even though it is still rare to find a computer program given formal status in primary law, computer programs are increasingly commonplace in the secondary corpus where they have de-facto status. I hope I have succeeded in explaining why conversions into computer program form do not magically remove the need for interpretation and in some respects just move the interpretation layer around, rather than removing it.

So where does all this leave our legal reasoning black box? I think it leaves it in good shape actually, because we have been using variations on the reasoning black box for centuries. Every time we rely on a third party aggregation or  consolidation or digitization or commentary, we are relying on an interpretation. Using a computer to do it, just makes it better/faster/cheaper but it is a well, well established paradigm at this point. A paradigm already so entrenched that the modern world of law could not operate without it. All the recent interest in computational law, artificial intelligence, smart contracts etc. is not a radically new concept. It is really just a recent rapid acceleration of a trend that started its acceleration in the Seventies with the massive expansion in the use of secondary sources that was ushered in by the information age.

So, we are just about ready, I think, to tackle the question of how our virtual legal reasoning box should best go about the business of interpreting the legal corpus. The starting point for this will be to take a look at how humans do it and will feature a perhaps surprising detour into cognitive psychology for some unsettling facts about human reasoning actually works. Hint: its not all tidy logical rules and neatly deductive logic.

This is where we will pick up in Part 10.

[1] http://web.stanford.edu/group/codex/cgi-bin/codex/wp-content/uploads/2014/01/p193-surden.pdf
[2] https://www.sec.gov/rules/proposed/2010/33-9117.pdf
[3] https://en.wikiquote.org/wiki/Donald_Knuth

Friday, April 14, 2017

What is law? - part 8

Previously:  what is law? - Part 7.

A good place to start in exploring the Closed World of Knowledge (CWoK) problem in legal knowledge representation is to consider the case of a spherical cow in a vacuum...

Say what? The spherical cow in a vacuum[1] is a well known humorous metaphor for a very important fact about the physical world. Namely, any model we make of something in the physical world, any representation of it we make inside a mathematical formula or a computer program, is necessarily based on simplifications (a "closed world") to make the representation tractable.

The statistician George Box once said that "all models are wrong, but some are useful." Although this mantra is generally applied in the context of applied math and physics, this concept is incredibly important in the world of law in my opinion. Law can usefully be thought of as an attempt at steering the future direction of the physical world in a particular direction. It does this by attempting to pick out key features of the real world (e.g. people, objects, actions, events) and making statements about how these things ought to inter-relate (e.g. if event E happens, person P must perform action A with object O).

Back to cows now. Given that the law may want to steer the behavior of the world with respect to cows, for example, tax them, regulate how they are treated, incentivize cow breeding programs etc. etc., how does law actually speak about cows? Well, we can start digging through legislative texts to find out but what we will find is not the raw material from which to craft a good definition of a cow for the purposes of a digital representation of it. Instead, we will find some or all of the following:
  • Statements about cows that do not define cows at all but proceed to make statements about them as if we all know exactly what is a cow and what is not a cow
  • Statements that "zoom in" in cow-ness without actually saying "cow" explicitly e.g. "animals kept on farms", "milk producers" etc,
  • Statements that punt on the definition of a cow by referencing the definition in some outside authority e.g. an agricultural taxonomy
  • Statements that "zoom in" on cow-ness by analogies to other animals eg. "similar in size to horses, bison and camels."
  • Statements that define cows to be things other than cows(!) e.g. "For the purposes of this section, a cow is any four legged animal that eats grass."
What you will not find anywhere in the legislative corpus, is a nice tidy, self contained mathematical object denoting a cow, fully encapsulated in a digital form. Why? Well, the only way we could possibly do that would be to make a whole bunch of simplifications on "cow-ness" and we know where that ends up. It ends up with spherical objects in vacuums just as it does in the world of physics! There is simply no closed world model of a cow that captures everything we might want to capture about cows in laws about cows.

Sure, we could keep adding to the model of a cow, refining it, getting it close and closer to cow-ness. However, we know from the experience of the world of physics that we reach the point where have to stop, because it is a bottomless refinement process.

This might sound overly pessimistic or pedantic and in the case of cows for legislative purposes it clearly is, but I am doing it to make a point. Even everyday concepts in law such as aviation, interest rates and theft are too complex (in the mathematical sense of complex) to be defined inside self-contained models.

Again, fractals spring to mind. We can keep digging down into the fractal boundary that splits the world into cow and not-cow. Refining our definitions until the cows come home (sorry, could not resist) and we will never reach the end of the refinement process. Moreover many of the real world phenomena law wants to talk about exhibit a phenomenon known as "sensitivity to initial conditions"[3]. It turns our that really, really small differences in the state of the world when an event kicks off, can result is completely different outcomes for the same event. This is why, in the case of aviation for example, mathematical models of the behavior of an aircraft wing can only get you so far. There comes a point where the only way to find out what will happen in the real world is to try it in the real world (for example, in a wind tunnel.) So it is with law. Small changes in any definitions of people, objects, actions, events, can lead to very different outcomes. The sensitivity to initial conditions means that it is not possible to fully "steer" outcomes by refining the state of affairs to greater and greater depth. Outcomes are going to be unpredictable, no matter how hard you work on refining your model.

We can come at this CWoK problem from a number of other perspectives, each of which shine extra light on the representation problem. From a linguistics perspective, in searching for a definition of "cow" we can end up in some familiar territory. For Sausserre[4] for example, words have meaning as a result of their differences...from other words:-) Think of a dictionary that has all the words in the English language in it. Each word is explained....in terms of other words! Simply put, language does not appear to be a system of symbols that gets its meaning by mapping it onto the world. It gets is meaning by mapping back onto itself.

From a philosophical perspective, trying to figure out what a word like "cow" actually means has been a field of study for thousands of years. It is surprising tricky[5], especially when you add in the extra dimension of time as Searle does with the concept of Rigid Designation[6].

Fusing philosophy and linguisics, Charles Sanders Pierce noted that nothing exists independently. I.e. everything we might put a name on only exits in relation to the other things we put names on[9] . We can approach the same idea from an almost mystical/religious perspective and find ourselves questioning the very existence of cows:-) Take a look at this picture from Zen Master Steve Hagen for example[8] Do you see a cow? Some people will, some will not. How can we ever hope to produce a good enough representation of a cow unless we all share the same mental ability to split the world between cow and non-cow?

Echoing Charles Sanders Pierce, we find the ancient Eastern concept of dependent origination[10]. Everything that we think exists, only exists in relation to other things that we think exist. Not only that, but because everything is constantly changing with respect to time, the relationships between the things – and thus their very definitions - keep changing too.

This is essentially where philosopher John Searle ends up in his book Naming and Necessity[11]. For Searle, the meaning of nouns, ultimately, is a social convention and meaning *changes* as social convention changes. One final philosophical reference and then we will move on. Wittgenstein famously stated that the meaning of language can only be found in how it is used - not in dictionaries[12]. As with Pierce, the meaning can change as the usage changes.

This doesn't sound very promising does it? How can law do its job if even the simple sounding concept of rigorously defining terms is intractable? Law does it by not getting caught up in formal definitions and formal logic at all. Instead, the world of law takes the view that it is better to leave a lot of interpretation to a layer of processing that is outside the legal corpus itself. Namely, the opinions formed by lawyers and judges. The way the system works is that two lawyers, looking at the same corpus of legal materials can arrive at different conclusions as to what it all means and this is ok. This is not a bug. It is a feature. Perhaps the feature of law that differentiates it from classical computing.

The law does not work by creating perfect unambiguous definitions of things in the world and states of affairs in the world. It works by sketching these things out in human language and then letting the magic of human language do its thing. Namely, allowing different people to interpret the same material differently. In law, what matters is not that the legal corpus itself spells everything out in infinite detail. What matters is that humans (and increasingly, cognitively augmented humans) can form opinions as to meaning and then defend those opinions to other humans. This is the concept of legal argumentation in a nutshell. It is not just inductive reasoning[13], taking a big corpus of rules and a corpus of facts and “cranking the handle” to get an answer to a question. It is, in large part, abductive reasoning[14] in which legislation, regulations, caselaw are analyzed and used to construct an argument in favour of a particular interpretation of the corpus.

That is why parties to a legal event such as a contract or a court case have their own lawyers (at least in the US/UK common law style of legal system). It is an adversarial system [15] in which each legal team does its best to interpret the corpus of law in the way that best serves their team. The job of the judge then is to decide which legal argument – which interpretation of the corpus presented by the legal teams – is most persuasive.

This is what I think of as the Unbounded Opinion Requirement (UoR) of law. This UoR aspect, kicks in very, very quickly in the world of law because the corpus – for reasons we have talked about – doesn't feature the clear cut definitions and mathematically based rules that computer people are so fond of. The corpus of law, does not spell out its own interpretation. It cannot be “structured” in the sense that computer people tend to think of structure. It has as many possible interpretations as there are humans – or computers - to read it and construct defenses for their particular interpretations.

I have been arguing that a “golden” interpretation cannot be in the legal corpus itself, but I think it is actually true that even if it could, it should not be. The reasons for this relate to how the corpus evolves over time and how interpretation itself evolves over time and that this is actually a very good thing.

A classic example of a legal statement that drives computer people to distraction is a statement like “A shall communicate with B in a reasonable amount of time and make a fair market value offer for X.” What does “reasonable amount of time” mean? What does “fair market value” mean?

A statement like “reasonable amount of time” for A to communicate with B is a good example of a statement that may be better left undefined so that the larger context of the event can be taken into account in the event of any dispute. For example what would a reasonable communications delay be, say, between Europe and the USA in 1774? In 1984? In 2020? Well, it depends on communications technology and that keeps changing. By leaving it undefined in the corpus, the world of law gets to interpret “reasonable” with respect to the bigger, “open world” context of the world at the time of the incident.

In situations where the world of law feels that some ambiguity should be removed, perhaps as a result of cultural mores, scientific advances etc. it has the medium of caselaw (if the judiciary is doing the interpretation refinement), regulations/statutory instruments (if the executive branch is doing the interpretation refinement) and primary law (if the legislature/parliament) is doing the interpretation refinement.

One final point, we have only just scratched the surface on the question of interpreting meaning from the corpus of law here and indeed, there are very different schools of thought on this matter within the field of jurisprudence. A good starting point for those who would like to dig deeper is textualism[16] and legislative intent[17].

In conclusion and attempting a humorous summary of this long post, the legal reasoning virtual box we imagined in part 1 of this series, is unavoidably connected to it surroundings in the real world. Not just to detect, say, the price of barrel of oil at time T, but also for concepts like “price” and “barrel” and maybe even “oil”!

On the face of it, the closed world of knowledge (CwoK) and the Unbounded Opinion Requirement (UoR) might seem like very bad news for the virtual legal reasoning box 
However, I think the opposite is actually true, for reasons I will explain in the nextpost in this series.