Friday, April 21, 2017

What is law? - Part 10

Previously: What is Law? - Part 9

Earlier on in this series, we imagined an infinitely patient and efficient person who has somehow managed to acquire the entire corpus of law at time T and has read it all for us and can now "replay" it to us on demand. We mentioned previously that the corpus is not a closed world and that meaning cannot really be locked down inside the corpus itself. It is not corpus of mathematical truths, dependent only on a handful of axioms. This is not a bug to be fixed. It is a feature to be preserved.

We know we need to add a layer of interpretation and we recognize from the outset that different people (or different software algorithms) could take this same corpus and interpret it differently. This is ok because, as we have seen, it is (a) necessary and (b) part of the way law actually works. Interpreters differ in the opinions they arrive at in reading the corpus. Opinions get weighed against each other, opinions can be over-ruled by higher courts. Some courts can even over-rule their own previous opinions. Strongly established opinions may then end up appearing directly in primary law or regulations, new primary legislation might be created to clarify meaning...and the whole opinion generation/adjudication/synthesis loop goes round and round forever... In law, all interpretation is contemporaneous, tentative and de-feasible. There are some mathematical truths in there but not many.

It is tempting - but incorrect in my opinion - to imagine that the interpretation process works with the stream of words coming into our brains off of the pages, that then get assembled into sentences and paragraphs and sections and so on in a straightforward way.

The main reason it is not so easy may be surprising. Tables! The legal corpus is awash with complex table layouts. I included some examples in a previous post about the complexties of law[1]. The upshot of the use of ubiquitous use of tables is that reading law is not just about reading the words. It is about seeing the visual layout of the words and associating meaning with the layout. Tables are  such a common tool in legal documents that we tend to forget just how powerful they are at encoding semantics. So powerful, that we have yet to figure out a good way of extracting back out the semantics that our brains can readily see in law, using machines to do the "reading".

Compared to, say, detecting the presence of headings or cross-references or definitions, correctly detecting the meaning implicit in the tables is a much bigger problem. Ironically, perhaps, much bigger than dealing with high visual items such as  maps in redistricting legislation[2] because the actual redistricting laws are generally expressed purely in words using, for example, eastings and northings to encode the geography.

If I could wave a magic wand just once at the problem of digital representation of the legal corpus I would wave it at the tables. An explicit semantic representation of tables, combined with some controlled natural language forms[4] would be, I believe, as good a serialization format as we could reasonably hope for, for digital law. It would still have the Closed World of Knowledge problem of course. It would also still have the Unbounded Opinion Requirement but at least we would be in position to remove most of the need for a visual cortex in this first layer of interpreting and reasoning about the legal corpus.

The benefits to computational law would be immense. We could imagine a digital representation of the corpus of law as an enormous abstract syntax tree[5] which we could begin to traverse to get to the central question about how humans traverse this tree to reason about it, form opinions about it, and create legal arguments in support of their opinions.

[1] http://seanmcgrath.blogspot.ie/2010/06/xml-in-legislatureparliament_04.html
[2] https://ballotpedia.org/Redistricting
[3] https://en.wikipedia.org/wiki/Easting_and_northing
[4] https://en.wikipedia.org/wiki/Controlled_natural_language
[5] https://en.wikipedia.org/wiki/Abstract_syntax_tree


Wednesday, April 19, 2017

What is law? - Part 9

Previously: What is law? - Part 8

For the last while, we have been thinking about the issues involved in interpreting the corpus of legal materials that is produced by the various branches of government in US/UK style environments. As we have seen, it is not a trivial exercise because of the ways the material is produced and because the corpus - by design - is open to different interpretations and open to interpretation changing with respect to time. Moreover, it is not an exaggeration to say that it is a full time job - even within highly specialized sub-topics of law - to keep track of all the changes and synthesize the effects of these changes into contemporaneous interpretations.

For quite some time now - centuries in some cases - a second legal corpus has evolved in the private sector. This secondary corpus serves to consolidate and package and interpret the primary corpus, so that lawyers can focus on the actual practice of law. Much of this secondary corpus started out as paper publications, often with so-called loose-leaf update cycles. These days most of this secondary corpus is in the form  of digital subscription services. The vast majority of lawyers utilize these secondary sources from legal publishers. So much so that over the long history of law, a number of interesting side-effects have accrued.

Firstly, for most day-to-day practical purposes, the secondary corpus provides de-facto consolidations and interpretations of the primary corpus. I.e. although the secondary sources are not "the law", they effectively are. The secondary sources that are most popular with lawyers are very high quality and have earned a lot of trust over the years from the legal community.

In this respect, the digital secondary corpus of legal materials is similar to modern day digital abstractions of currency such as bank account balances and credit cards etc. I.e. we trust that there are underlying paper dollars that correspond to the numbers moving around digital bank accounts. We trust that the numbers moving around digital bank accounts could be redeemed for real paper dollars if we wished. We trust that the real paper dollars can be used in value exchanges. So much so, that we move numbers around bank accounts to achieve value exchange without ever looking to inspect the underlying paper dollars. The digital approach to money works because it is trusted. Without the trust, it cannot work. The same is true for the digital secondary corpus of law, it works because it is trusted.

A second, interesting side-effect of trust in the secondary corpus is that parts of it have become, for all intents and purposes, the primary source. If enough of the worlds legal community is using secondary corpus X then even if that secondary corpus differs from the primary underlying corpus for some reason, it may not matter in practice because everybody is looking at the secondary corpus.

A third, interesting side effect of the digital secondary corpus is that it has become indispensable. The emergence of a high quality inter-mediating layer between primary legal materials and legal practitioners has made it possible for the world of law to manage greater volumes and greater change rates in the primary legal corpus.  Computer systems have greatly extended this ability to cope with volume and change. So much so, that law as it is today would collapse if it were not for the inter-mediating layer and the computers.

The classic image of a lawyers office involves shelves upon shelves of law books. For a very long time now, those shelves have featured a mix of primary legal materials and secondary materials from third party publishers. For a very long time now, the secondary materials have been the day-to-day "go to" volumes for legal practitioners - not the primary volumes. Over the last 50 years, the usage level of these paper volumes has dwindled year on year to the point where today, the beautiful paper volumes have become primarily interior decoration in law offices. The real day-to-day corpus is the digital versions and most of those digital  resources are from the secondary - not the primary legal sources.

So, in a sense, the law has already been inter-mediated by a layer of interpretation. In some cases the secondary corpus has become a de-facto primary source by virtue of its ubiquity and the trust placed in it by the legal community.

This creates and interesting dilemma for our virtual legal reasoning machine. The primary legal corpus - as explained previously - is not at all an easy thing to get your hands on from the government entities that produce it. And even if you did get it, it might not be what lawyers would consider the primary authority anyway. On the other hand, the secondary corpus is not a government-produced corpus and may not be available at all outside of the private world of  one of the major legal publishers.

The same applies for the relatively new phenomenon of computer systems encoding parts of the legal corpus into computational logic form. Classic examples  of this include payroll/income tax and eligibility determinations. These two  sub-genres of law tend to have low representational complexity[1]. Simply put, they can readily be converted into programming languages as they are mostly mathematical with low dependencies on the outside world.

Any encoding of the primary legal text into a computer program is, itself, an interpretation. Remembering back to the Unbounded Opinion Requirement, programmers working with legal text are - necessarily encoding opinions as to what the text means. It does not matter if this encoding process is being performed by a government agency or by a third party, it is still an interpretation.

These computer programs - secondary sources of interpretation - can become de-facto interpretations if enough of the legal community trusts them. Think of the personal taxes software applications and e-Government forms for applying for various government services. If enough of the community use these applications,  they become de-facto interpretations.

The legal concept of interpretation forebearance applies here. If a software application interprets a tax deduction in a particular way that is reasonable it may be allowed *even* if a tax inspector would have interpreted the deduction differently.

I am reminded of the concept of reference implementations as forms of specification in computer systems. Take Java Server Pages for an example. If you have a query as to what a Servlet Engine should do in some circumstance, how do you find out what the correct behavior is? It is not in the documentation. It is in the *reference implementation* which is Apache Tomcat.

I am also reminded of the SEC exploring the use of the Python programming language as the legal expression of complex logic in asset backed securties[2]. On the face of it, this would be better than English prose, right? How much more structured can you get than expressing it in a programming language? Well, what version of Python are you talking about? Python 2 family? The Python 3 family? Jython? Is it possible the same same program text can produce different answers if you interprert it with different interpreters? Yes, absolutely. Is it possible, to get different answers from the same code running the same interpreter on different operating systems? Again, yes, absolutely. What about running it tomorrow rather than today? Yes, again!

Even programming languages need interpretation and correct behavior is difficult - perhaps impossible - to capture in the abstract - especially when the program cannot be expressed in a fully closed world without external dependencies. Echoing Wittgenstein again, the true meaning of a computer program manifests when you run it, not in the syntax:-) The great mathematician and computer scientist Don Knuth once warned users of a program that he had written to be careful as he had only proven it to be correct, but had not tried it out.[3]

By now, I hope I have established a reasonable defense of my belief that establishing the meaning of the legal corpus is a tricky business. The good news is that creating interpretations of the corpus is not a new idea. In fact it has been going on for centuries. Moreover, in recent decades, some of the corpus has gradually crept into the form of computer programs and even though it is still rare to find a computer program given formal status in primary law, computer programs are increasingly commonplace in the secondary corpus where they have de-facto status. I hope I have succeeded in explaining why conversions into computer program form do not magically remove the need for interpretation and in some respects just move the interpretation layer around, rather than removing it.

So where does all this leave our legal reasoning black box? I think it leaves it in good shape actually, because we have been using variations on the reasoning black box for centuries. Every time we rely on a third party aggregation or  consolidation or digitization or commentary, we are relying on an interpretation. Using a computer to do it, just makes it better/faster/cheaper but it is a well, well established paradigm at this point. A paradigm already so entrenched that the modern world of law could not operate without it. All the recent interest in computational law, artificial intelligence, smart contracts etc. is not a radically new concept. It is really just a recent rapid acceleration of a trend that started its acceleration in the Seventies with the massive expansion in the use of secondary sources that was ushered in by the information age.

So, we are just about ready, I think, to tackle the question of how our virtual legal reasoning box should best go about the business of interpreting the legal corpus. The starting point for this will be to take a look at how humans do it and will feature a perhaps surprising detour into cognitive psychology for some unsettling facts about human reasoning actually works. Hint: its not all tidy logical rules and neatly deductive logic.

This is where we will pick up in Part 10.

[1] http://web.stanford.edu/group/codex/cgi-bin/codex/wp-content/uploads/2014/01/p193-surden.pdf
[2] https://www.sec.gov/rules/proposed/2010/33-9117.pdf
[3] https://en.wikiquote.org/wiki/Donald_Knuth



Friday, April 14, 2017

What is law? - part 8

Previously:  what is law? - Part 7.

A good place to start in exploring the Closed World of Knowledge (CWoK) problem in legal knowledge representation is to consider the case of a spherical cow in a vacuum...

Say what? The spherical cow in a vacuum[1] is a well known humorous metaphor for a very important fact about the physical world. Namely, any model we make of something in the physical world, any representation of it we make inside a mathematical formula or a computer program, is necessarily based on simplifications (a "closed world") to make the representation tractable.

The statistician George Box once said that "all models are wrong, but some are useful." Although this mantra is generally applied in the context of applied math and physics, this concept is incredibly important in the world of law in my opinion. Law can usefully be thought of as an attempt at steering the future direction of the physical world in a particular direction. It does this by attempting to pick out key features of the real world (e.g. people, objects, actions, events) and making statements about how these things ought to inter-relate (e.g. if event E happens, person P must perform action A with object O).

Back to cows now. Given that the law may want to steer the behavior of the world with respect to cows, for example, tax them, regulate how they are treated, incentivize cow breeding programs etc. etc., how does law actually speak about cows? Well, we can start digging through legislative texts to find out but what we will find is not the raw material from which to craft a good definition of a cow for the purposes of a digital representation of it. Instead, we will find some or all of the following:
  • Statements about cows that do not define cows at all but proceed to make statements about them as if we all know exactly what is a cow and what is not a cow
  • Statements that "zoom in" in cow-ness without actually saying "cow" explicitly e.g. "animals kept on farms", "milk producers" etc,
  • Statements that punt on the definition of a cow by referencing the definition in some outside authority e.g. an agricultural taxonomy
  • Statements that "zoom in" on cow-ness by analogies to other animals eg. "similar in size to horses, bison and camels."
  • Statements that define cows to be things other than cows(!) e.g. "For the purposes of this section, a cow is any four legged animal that eats grass."
What you will not find anywhere in the legislative corpus, is a nice tidy, self contained mathematical object denoting a cow, fully encapsulated in a digital form. Why? Well, the only way we could possibly do that would be to make a whole bunch of simplifications on "cow-ness" and we know where that ends up. It ends up with spherical objects in vacuums just as it does in the world of physics! There is simply no closed world model of a cow that captures everything we might want to capture about cows in laws about cows.

Sure, we could keep adding to the model of a cow, refining it, getting it close and closer to cow-ness. However, we know from the experience of the world of physics that we reach the point where have to stop, because it is a bottomless refinement process.

This might sound overly pessimistic or pedantic and in the case of cows for legislative purposes it clearly is, but I am doing it to make a point. Even everyday concepts in law such as aviation, interest rates and theft are too complex (in the mathematical sense of complex) to be defined inside self-contained models.

Again, fractals spring to mind. We can keep digging down into the fractal boundary that splits the world into cow and not-cow. Refining our definitions until the cows come home (sorry, could not resist) and we will never reach the end of the refinement process. Moreover many of the real world phenomena law wants to talk about exhibit a phenomenon known as "sensitivity to initial conditions"[3]. It turns our that really, really small differences in the state of the world when an event kicks off, can result is completely different outcomes for the same event. This is why, in the case of aviation for example, mathematical models of the behavior of an aircraft wing can only get you so far. There comes a point where the only way to find out what will happen in the real world is to try it in the real world (for example, in a wind tunnel.) So it is with law. Small changes in any definitions of people, objects, actions, events, can lead to very different outcomes. The sensitivity to initial conditions means that it is not possible to fully "steer" outcomes by refining the state of affairs to greater and greater depth. Outcomes are going to be unpredictable, no matter how hard you work on refining your model.

We can come at this CWoK problem from a number of other perspectives, each of which shine extra light on the representation problem. From a linguistics perspective, in searching for a definition of "cow" we can end up in some familiar territory. For Sausserre[4] for example, words have meaning as a result of their differences...from other words:-) Think of a dictionary that has all the words in the English language in it. Each word is explained....in terms of other words! Simply put, language does not appear to be a system of symbols that gets its meaning by mapping it onto the world. It gets is meaning by mapping back onto itself.

From a philosophical perspective, trying to figure out what a word like "cow" actually means has been a field of study for thousands of years. It is surprising tricky[5], especially when you add in the extra dimension of time as Searle does with the concept of Rigid Designation[6].

Fusing philosophy and linguisics, Charles Sanders Pierce noted that nothing exists independently. I.e. everything we might put a name on only exits in relation to the other things we put names on[9] . We can approach the same idea from an almost mystical/religious perspective and find ourselves questioning the very existence of cows:-) Take a look at this picture from Zen Master Steve Hagen for example[8] Do you see a cow? Some people will, some will not. How can we ever hope to produce a good enough representation of a cow unless we all share the same mental ability to split the world between cow and non-cow?

Echoing Charles Sanders Pierce, we find the ancient Eastern concept of dependent origination[10]. Everything that we think exists, only exists in relation to other things that we think exist. Not only that, but because everything is constantly changing with respect to time, the relationships between the things – and thus their very definitions - keep changing too.

This is essentially where philosopher John Searle ends up in his book Naming and Necessity[11]. For Searle, the meaning of nouns, ultimately, is a social convention and meaning *changes* as social convention changes. One final philosophical reference and then we will move on. Wittgenstein famously stated that the meaning of language can only be found in how it is used - not in dictionaries[12]. As with Pierce, the meaning can change as the usage changes.

This doesn't sound very promising does it? How can law do its job if even the simple sounding concept of rigorously defining terms is intractable? Law does it by not getting caught up in formal definitions and formal logic at all. Instead, the world of law takes the view that it is better to leave a lot of interpretation to a layer of processing that is outside the legal corpus itself. Namely, the opinions formed by lawyers and judges. The way the system works is that two lawyers, looking at the same corpus of legal materials can arrive at different conclusions as to what it all means and this is ok. This is not a bug. It is a feature. Perhaps the feature of law that differentiates it from classical computing.

The law does not work by creating perfect unambiguous definitions of things in the world and states of affairs in the world. It works by sketching these things out in human language and then letting the magic of human language do its thing. Namely, allowing different people to interpret the same material differently. In law, what matters is not that the legal corpus itself spells everything out in infinite detail. What matters is that humans (and increasingly, cognitively augmented humans) can form opinions as to meaning and then defend those opinions to other humans. This is the concept of legal argumentation in a nutshell. It is not just inductive reasoning[13], taking a big corpus of rules and a corpus of facts and “cranking the handle” to get an answer to a question. It is, in large part, abductive reasoning[14] in which legislation, regulations, caselaw are analyzed and used to construct an argument in favour of a particular interpretation of the corpus.

That is why parties to a legal event such as a contract or a court case have their own lawyers (at least in the US/UK common law style of legal system). It is an adversarial system [15] in which each legal team does its best to interpret the corpus of law in the way that best serves their team. The job of the judge then is to decide which legal argument – which interpretation of the corpus presented by the legal teams – is most persuasive.

This is what I think of as the Unbounded Opinion Requirement (UoR) of law. This UoR aspect, kicks in very, very quickly in the world of law because the corpus – for reasons we have talked about – doesn't feature the clear cut definitions and mathematically based rules that computer people are so fond of. The corpus of law, does not spell out its own interpretation. It cannot be “structured” in the sense that computer people tend to think of structure. It has as many possible interpretations as there are humans – or computers - to read it and construct defenses for their particular interpretations.

I have been arguing that a “golden” interpretation cannot be in the legal corpus itself, but I think it is actually true that even if it could, it should not be. The reasons for this relate to how the corpus evolves over time and how interpretation itself evolves over time and that this is actually a very good thing.

A classic example of a legal statement that drives computer people to distraction is a statement like “A shall communicate with B in a reasonable amount of time and make a fair market value offer for X.” What does “reasonable amount of time” mean? What does “fair market value” mean?

A statement like “reasonable amount of time” for A to communicate with B is a good example of a statement that may be better left undefined so that the larger context of the event can be taken into account in the event of any dispute. For example what would a reasonable communications delay be, say, between Europe and the USA in 1774? In 1984? In 2020? Well, it depends on communications technology and that keeps changing. By leaving it undefined in the corpus, the world of law gets to interpret “reasonable” with respect to the bigger, “open world” context of the world at the time of the incident.

In situations where the world of law feels that some ambiguity should be removed, perhaps as a result of cultural mores, scientific advances etc. it has the medium of caselaw (if the judiciary is doing the interpretation refinement), regulations/statutory instruments (if the executive branch is doing the interpretation refinement) and primary law (if the legislature/parliament) is doing the interpretation refinement.

One final point, we have only just scratched the surface on the question of interpreting meaning from the corpus of law here and indeed, there are very different schools of thought on this matter within the field of jurisprudence. A good starting point for those who would like to dig deeper is textualism[16] and legislative intent[17].

In conclusion and attempting a humorous summary of this long post, the legal reasoning virtual box we imagined in part 1 of this series, is unavoidably connected to it surroundings in the real world. Not just to detect, say, the price of barrel of oil at time T, but also for concepts like “price” and “barrel” and maybe even “oil”!

On the face of it, the closed world of knowledge (CwoK) and the Unbounded Opinion Requirement (UoR) might seem like very bad news for the virtual legal reasoning box 
However, I think the opposite is actually true, for reasons I will explain in the nextpost in this series.


-->

Friday, April 07, 2017

What is law? - Part 7


Last time we ended with the question : “Given a corpus of law at time T, how can we determine what it all means?”

There is a real risk of disappearing down a philosophical rabbit hole about how meaning is encoded in the corpus of law. Now I really like that particular rabbit hole but I propose that we not go down it here This whole area is best perused, in my experience, with comfy chairs, time to kill and a libation or two (semiotics, epistemolgy and mereotopology anyone?).

Instead, we will simply state that because the corpus of law is mostly written human language it inherits some fascinating and deep issues to do with how written text establishes shared meaning and move on. For our purposes, we will imagine an infinitely patient person with infinite stamina, armed with a normal adults grasp of English, who is going to read the corpus and explain it back to us, so that we computer people can turn it into something else inside a computer system. The goal of that “something else” being to capture the meaning but be easier to work with inside a computer than a big collection of “unstructured” documents.

This little conceptual trick of employing a fantastic human to read the current corpus and explain it all back to us, allows us to split the problem of meaning into two parts. The first part relates to how we could read it in its current form and extract its meaning. The second part relates to how we would encode the extracted meaning in something other than a big collection of unstructured documents. Exploring this second question, will, I believe, help us tease out the issues in determining meaning in the corpus of law in general, without getting bogged down in trying to get machines to understand the current format (lots and lots of unstructured documents!) right off the bat.

I hope that makes sense? Basically, we are going to skip over how we would parse it all out of its current myriad document-form into a human brain and instead look at how we would extract it from said brain and store it again – but into something more useful than a big collection of documents. Assuming we can find a representation that is good enough, the reading of the current corpus should be a one-off exercise because as the corpus of law gets updated, we would update our bright shiny new digital representation of the corpus and never have to re-process all the documents ever again.

So what options do we have for this digital knowledge representation? Surely there is something better than just unstructured document text? Text after all, is what you get if you use computers as typewriters. Computers do also give us search, which is a wonderful addition to typesetting, but understanding is a very different thing again. In order to have machines understand the corpus of law we need a way to represent the knowledge present in the law - not just what words are present (search) or how the words look on the page (formatting).

This is the point where some of you are likely hoping/expecting that I am about to suggest some wonderful combination of XML and Lisp or some such that will fit the bill as a legal corpus knowledge representation alternative to documents... It would be great if that were possible but in my opinion, the textual/document-centric nature of a significant part of the legal corpus is unavoidable for reasons I will hopefully explain. Note that I said “significant part”. There are absolutely components of the corpus that do not have to be documents. In fact, some of the corpus has, already transitioned out of documents but, if anything, this has actually increased the interpretation complexities – of establishing meaning - not reduced them. I will hopefully explain that too:-)

I think the best way of explaining why I think some form of electronic documents is as good as we can hope for, for large parts of the legal corpus, is to look at the things that are not actually part of the corpus of documents at all, but are key to how law actually works. It turns out that these things cannot be put into a computer at all, in my opinion.

What are these mystical things? There are two of them. The first I call the closed world of knowledge (CWoK) and the second I call the Unbounded Opinion Requirement (UOR) of law.

We will look at CwoK and UOR in Part 8.

-->

Friday, March 31, 2017

What is law? - Part 6

Previously: What is law? - Part 5.

To wrap up our high level coverage of the sources of law we need to add a few items to the “big 3” (Statutes/Acts, Regulations/Statutory Instruments and Case law) covered so far.

Many jurisdictions have a foundational written document called a constitution which essentially "bootstraps" a jurisdiction by setting out its core principles, how its government is to be organized, how law will be administered etc.

The core principles expressed in constitutions are, in many respects, the exact opposite of detailed rules/regulations. They tend to be deontic[1] in nature, that it, they express what ought to be true. They tend to be heavily open textured[2] meaning that they refer to concepts that are necessarily abstract/imprecise (e.g. concepts such as "fairness", "freedom" etc.).

Although they only make up a tiny tiny fraction of the corpus of law in terms of word count, they are immensely important, as essentially everything that follows on from the constitution in terms of Statutes/Acts, Regulations/Statutory Instruments and case law has to be compatible with the constitution. Like everything else, the constitution can be changed and thus all the usual "at time T" qualifiers apply to constitutionality.

Next up is international law such as international conventions/treaties which cover everything from aviation to cross-border criminal investigation to intellectual property to doping in sport.

Next up, at local community level residents of specific areas may have rules/ordinances/bye-laws which are essentially Acts that apply to a specific geographic area. There may be a compendium of these, often referred to as a "Municipal Code" in the case of cities.

I think that just about wraps up the sources of law. It would be possible to fill many blog posts with more variations on these (inter-state compacts, federations/unions, executive orders, private members bills etc.). It would also be possible to fill many blog posts with how these all overlap differently in different situations (e.g. what law applies when there are different jurisdictions involved in an inter-jurisdictional contract.).

I don't think it would be very helpful to do that however. Even scratching the surface as we have done here will hopefully serve to adequately illustrate they key point I would like to make with is this: the corpus of law applicable to any event E which occurred at time T is a textually complex, organizationally distributed, vast corpus of constantly changing material. Moreover, there is no central authority that manages it. It is not necessarily available as it was at time T - even if money is no object.

To wrap up, let us summarize the potential access issues we have seen related to accessing the corpus of law at time T.

  • Textual codification at time T might not be available (lack of codification, use of amendatory language in Acts. etc.)
  • Practical access at time T may not be available (e.g. it is not practical to gather the paper versions of all court reports for all the caselaw, even if theoretically freely available.)
  • Access rights at Time T may not be available (e.g. incorporated-by-reference rulebooks referenced in regulations)

All three access issues can apply up and down the scale of location specificity from municipal codes/bye-laws, regulations/statutory instruments, Acts/Statutes, case law, union/federation law to international law and, most recently, space law[3].

We are going to glide serenely over the top of these access issues as the solutions to them are not technical in nature. Next we turn to this key question:
Given a corpus of law at time T, how can we determine what it all means?


See you in Part 7.

Wednesday, March 29, 2017

What is law? - Part 5



The Judicial Branch is where the laws and regulations created by the legislative and executive branches make contact with the world at large. The most common way to think of the judiciary is as the public forum where sentences/fines for not abiding by the law are handed down and as the public forum where disputes between private parties can be adjudicated by a neutral third party. This is certainly a major part of it but it is also the place where law gets clarified with finer and finer detail over time, in USA-style and UK-style "common law" legal systems.

I like to think of the judicial branch as being a boundary determinator for legal matters. Any given incident e.g. a purported incident of illegal parking, brings with it a set of circumstances unique to that particular incident. Perhaps the circumstances in question are such that the illegal parking charge gets thrown out, perhaps not. Think of illegal parking as being – at the highest level – a straight line, splitting a two dimensional plane into two parts. Circumstances to the left of the line make the assertion of illegal parking true, circumstances to the right of the line make the assertion false.

In the vast majority of legal matters, the dividing line is not that simple. I think of the dividing line as a Koch Snowflake[1]. The separation between legal and illegal start out as a simple Euclidian boundary but over time, the boundary becomes more and more complex as each new "probe" of the boundary (a case before the courts), more detail to the boundary is added. Simple put, the law is a fractal[2]. Even if a boundary starts out as a simple line segment separating true/false, it can become more complex with every new case that comes to the courts. Moreover, between any two sets of circumstances for a case A and B, there are an infinity of circumstances that are in some sense, in between A and B. Thus an infinity of new data points that can be added between A and B over time.

Courts record their judgments in documents known collectively as “case law”. The most important thing about case law in our focus areas of USA-style and UK-style legal systems is that it is actually law. It is not just a housekeeping exercise, recording the activity of the courts. Each new piece of case law produced at time T, serves as an interpretation of the legal corpus at time T. That corpus consists of the Acts/Statutes in force, Regulations/Statutory Instruments in force *and* all other caselaw in force at time T. This is the legal concept of precedent, also known as stare decesis[3].

The courts strive, first and foremost, for consistency with precedents. A lot of weight is attached to arriving at judgements in new cases that are consistent with the judgements in previous cases. The importance of this cannot be over-estimated in understanding law from a computational perspective. Where is the true meaning of law to be found in common law jurisdictions? It is found in the case law! - not the Acts or the regulations/Statutory Instruments. If you are reading an Act or a regulation and are wondering what it actually means, the place to go is the case law. The case law, in a very real sense, is the place where the actual meaning of law is spelled out.

From a linguistics perspective you can think of this in terms of the pragmatics counterpart to grammar/syntax. Wittgenstein fans can think of it as “language is use”. i.e. the true meaning of language can be found in how it is actually used in the real world. Logical Positivists might think of it as a behaviorist approach to meaning. That is, meaning comes from behavior. To understand what a law means – watch what the courts interpret it to mean.

The meaning of the law comes from how it is used in practice and that use comes from the empirically observable behavior of the courts. I could be a staunch advocate of my interpretation of the law at time T as written in the Acts/Statutes and Regulations/SIs but if the caselaw supports a different interpretation to mine, I will have an uphill battle defending my interpretation in court.

There is a useful computing analogy here too. Every programmer knows that there are times when the quickest way to get to the true meaning of a piece of code is to run it and see what happens. In the world of law, the quickest way to get an understanding of the true meaning of some legislative material is to find how it has been treated in the case law. It is also a highly efficient way of getting to "truth" because, at the end of the day, it does not matter how many possible interpretations might be valid for any given point of law. What really matters is how the courts have interpreted it in the past.

Extending the programming analogy a little bit. It is often easier to figure out what some code does in a particular set of circumstances by looking for a unit test that matches the circumstances of interest. Extending the analogy even further, any new tests added should not invalidate the existing tests and any code changes to acommodate new circumstances should not invaldate any existing caselaw. In other words caselaw behaves a little like regression testing in software development. The courts strive to not "break" previous judgements.

It will come as no suprise that most lawyers place great importantance on caselaw searching. It might come as a suprise that there is no central entity that publishes the official caselaw. Typically courts act autonomously and publish their own volumes of caselaw periodically. Much of it, to this day, still on paper with the paper being the definitive source. I.e. if a court does produce paper plus electronic case law, the paper "wins" in the event of any discrepency.

There is a long history going back to at least the Nineteenth century of third parties acting as aggregators of caselaw. Mostly notably, West Publishing (now part of ThomsonReuters) and LexisNexis, now part of ReedElsevier. I think it is fair to say that the modern practice of law in common law jurisdictions would not be possible if practitioners did not have the ability to rapidly search caselaw.

The sheer volume of existing caselaw and the rate of creation of new caselaw is such that without computers, the common law system would not be able to function as it does today. Most of the computational support to date has been in the form of document production and search/retrieval. There are signs that that is changing now as machines start to help practitioners interpret the caselaw. This is a topic for another day!.

It is important to note that the corpus of caselaw is not purely accretive. Caselaw is, from time-to-time, repealed and practitioners need to be careful in citing caselaw to ensure that the cited caselaw is still considered "good law". Again, enter the computers and their search capabilities. In particular a ubiquitous legal term "shepardizing"[4] which refers to looking up a case to find its status and find what other cases cite it and what other cases it also cites.

The fact that caselaw is not purely accretive creates yet another interesting “at time T” issue. Any judgement the courts might arrive at, at time T is necessarily contingent on the full corpus at time T. The exact same issue, examined at some future time point T+1 might produce a different result if some of the caselaw that was “good law” at time T is no longer “good law” at time T+1. We will return to this later on when we talk about defeasible logic and analogical reasoning but they are best parked for now until we have finished the survey of the sources of law itself.

It is in the area of caselaw that our imagined virtual legal reasoning box runs into its biggest challenge with respect to access to the raw materials of law. In common law jurisdictions, the volume of caselaw that is considered "good law" at any time T is vast, goes back centuries and is not available anywhere as a single corpus. Plus, the best sources are in fee-based repositories.
The good news is that the caselaw corpus is not homogenous in terms of its importance to precedent. In many interesting respects the caselaw corpus is the grand daddy of all Social Networks. Yes, you read that right. Social Networks! Cases are linked to other cases by means of a formal referencing mechanism. A commonly used set of standards for these citations is known as the Blue Book[5]. The links between cases are not randomly distributed. They are in fact a poster child for the concept of a power law distribution[6].

Lawyers and judges working with the caselaw corpus spend their time on the subset of cases that naturally follows from following citations. Cases that that have high "rank" - where rank refers to the inbound and outboud connections to other cases – are very important cases, by virtue of the citation network around them. If this reminds you of the original Google concept of Page Rank you are exactly right. Citations serve two primary purposes. Firstly, they are indicative of the relative importance of a case (“how often has this case been cited positively in cases like mine?”) and secondly, they give a good indication of how robust a case is likely to be against repeal. The citation network can tell you a lot about the knock-on effect of repeal and remember, in common law systems, a really important "logic" at work is the logic of consistency with previous decisions.

There is an old adage that it is not possible to step into the same river twice. This adage capture my mental model of caselaw. It is like a fast flowing river with the added twist that over time, it can forge new pathways and route around obstacles, while continuing to be "the same" river. In fact, the entire corpus of law is like that. It is constantly changing – every day. Every new piece of caselaw, every new regulation, every new act, adds new data points into the fractal geometry of the previously mentioned Koch Snowflake of legality.

Rivers are not easy things to capture in databases! That is why, to my mind , the key challenge in regulatory data management is actually regulatory change management. In fact, I think of the two as being the same thing. Modelling the data is hard enough - as I hope you can appreciate based on the material presented here so far. Modelling how the data changes and how any computational system can be kept up to date with the changes, is another matter completely. Regulatory change is not something you can afford to park and deal with another day in any useful conceptual model of law. I would argue it needs to be the central plank of the data model of law. After all, if you have a model that works well when things change, it will work just fine when things do not change. But the reverse is not true.

So that is it for caselaw for now.

Next up, we wrap up coverage of the sources of law with some other sources not yet mentioned. We will also take a stab at summarising and classifying the issues related to accessing the corpus of law, covering all the sources mentioned.

From there, we will turn our attention to the legal reasoning process itself. That is where the fun stuff really starts. See What is Law - Part 6.



-->

Monday, March 27, 2017

What is law? - Part 4



Now we will turn our attention to the second part of the legal corpus, namely regulations/statutory instruments. I think of this material as fleshing out of the material coming in the form of Acts from the Legislature/Parliament. Acts can be super-specific and self contained, but they can also be very high level and delegate finer detail to government agencies to work out and fill in the details. Acts that do this delegation are known as "enabling acts" and the fine detail work takes the form of regulations (USA terminology) or Statutory Instruments (UK terminology).

The powers delegated to executive branch agencies by enabling Acts can be quite extensive and the amount of review done by the Legislature/Parliament differs a lot across different jurisdictions. In some jurisdictions, there is no feedback loop back to the Legislature/Parliament at all. In others, all regulation/statutory instruments must pass a final approval phase back in the Legislature/Parliament.

As with the Acts, the regulations go through a formal promulgation process - typically being activated by public notice in a Government gazette/register publication. As with the Acts, an official compendium of regulations may or may not be produced by Government itself and if it exists, it may lag behind the production of new Regulations/Statutory Instruments by months or even years. As with Acts, third party publishers often add value by keeping a corpus of regulations/SIs up to date with each register/gazette publication (often a weekly publication).

One useful rough approximation is to conceptualize the Regulations/Statutory Instruments as appendices to Acts. Just as with any other type of publication, a full understanding of the text at time T requires a full understanding of the appendices at time T. In other word, to understand the Act at time T you need the Regulations/Statutory Instruments at time T.

This brings us to the first significant complication. The workflows and publication cycles for the Acts and the Regulations/Statutory Instruments are different, and the organizations doing the work are different, resulting in a work synchronization and tracking challenge. Tracking Acts is not enough to understand the Acts. You need to track Regulations/Statutory Instruments too and keep the two in sync with each other.

The next complication comes from the nature of the Regulations/Statutory Instruments themselves. When the need arises for very detailed knowledge about some regulated activity, there is often a separate association/guild/institute of specialists in that regulated activity. Sometimes, the rules/guidelines in use by the separate entity can become part of the law by being incorporated-by-reference into the regulations/statutory instruments[1]. Sometimes, the separate association/guild/institute is formally given powers to regulate and becomes what is known as a Self Regulatory Organization (SRO)[2]. The difficulty this presents for the legal decision-making box we are creating in our conceptual model of law is that this incorporated-by-reference material may not be freely available in the same way that the Acts and Regulations/Statutory Instruments are generally freely available (at least in unconsolidated forms).

In Part 1, reference was made to the legal concept that "ignorance of the law is no defense". Well, you can see the potential problem here with material that is incorporated-by-reference. If I can only read the incorporated-by-reference aspects of the legal corpus at time T by paying money to access them, then the corpus of law itself (however complex and difficult to interpret it might be) is not actually fully available to me.

The important distinction here is between fee-based access to convenient "packaging" and perhaps associated explanatory material, versus fee-based access to the raw materials themselves. This is an open issue in the world of law at the moment. The world continues to become a more and more complex place and the need to delegate detailed work on regulations down to the practitioners who have deep domain expertise, continues to grow. However, expertise is expensive and needs to be paid for somehow. One revenue source for associations/guilds/institutes has, historically been, charging for access to the rulebooks/guidance they produce. If the rulebooks/guidance become free as a result of incorporation-by-reference into regulations, then the revenue stream is removed. The issue is not unique to regulations/statutory instruments. When we take a look at the judiciary, we will see that some similar issues can arise there also. For now, let us slide over the question of fee-based access to incorporated-by-reference material and proceed as if we have access to it from our conceptual model. 

The next challenge we encounter is the diversity of the regulatory/statutory instrument material itself. Unlike Acts, it is not unusual for this material to contain maps, photos and other forms of multimedia. As the trend towards "born digital"[3] materials continues, more and more complex document types are in fact applications, not documents. Examples include GIS (e.g. redistricting[4]), spreadsheet models in finance, web forms in eligibility determination etc.

This is probably the biggest challenge facing the existing fixed writing-centric conceptualization of legal materials which has been with us since The Code of Hammurabi[5] of ancient Babylon, The Twelve Tables[6] of Ancient Rome and The Brehon Laws[7] of ancient Ireland. Namely, we appear to be transitioning parts of the corpus of law away from words and into software applications. The big shift here is that with a software application, behavior can be observed, but not the reasoning behind the behavior. It may be that deep down in the software there are a set of rules that govern the applications behavior but these are invisible to the consumers of the law and, in many cases, proprietary. For extra spice, consider that the present-day trend towards deep learning approaches to law, presents the head-hurting possibility of a software application whose behavior can be observed but for which there is literally no human-digestible set of rules that govern its behavior.

To see why this is a big deal consider a hypothetical court case where the judge finds the defendent guilty as charged but records the reasoning behind the judgement simply as "Because I said so." We would not accept that because there is no defense of the the reasoning (known more formally as an explanans[8]). Software applications behave this way all the time:-) We will return to the profound implications of this later on.

Another twist. In some jurisdictions, regulations/statutory instruments can modify Acts. This one is another "head hurter" because it means that activity happening "downstream" from a Legislature/Parliament in regulations/Sis can potentially change the texts of the Acts produced by that Legislature/Parliament. Tracking this is made more difficult by the often significant organizational boundaries between the Legislature/Parliament function and the Government Agencies function. It also significantly complicates the concept of consolidation. Each new regulation/statutory instrument issued potentially changes the text of the Acts.

One final twist and then we will move on to the judicial branch aspects of the legal corpus Sometimes a regulation/statutory instrument takes the form of interpretive guidance rather than a textual changes. For example, it might say something like "from this day forward, when you read 'X' in Act N, interpret that to mean 'X or Y'".

You might need to let that sink in for a minute. It means that not only can the full meaning of Acts not be found in the Acts themselves – you need the regulations/statutory instruments to fill in the detail - but that the meaning itself can change over time by changes happening in the regulations/Sis without any textual changes to the Acts!

Any software developers out there might want to think of it this way: consider a simple math equation in your code with no external dependencies on the outside world at all. What if I told you that the math of your formula might need to act differently tomorrow because I might redefine some of your variables without telling you? Creates an interesting challenge doesn't it?

Fun stuff, right? As we will see, the third part of the legal corpus - the part produced by the judiciary - has a similar ability to modify semantics. In fact it can modify the semantics not just of the Acts but of the regulations/statutory instruments, and modify its own semantics to boot.

Thank you for sticking with me through all of this sometimes head-hurting stuff. I am trying to lay this out with sufficient detail so that when I say that legal content management is “complicated”, you have a sense of what I mean. I see too many well intentioned architects working in this space incorrectly concluding that “getting the texts of the law” to work on, is an easy first step in computational law. It is complicated, but it can be done as we will see.

But first, time to turn to the judicial branch and look at its contribution to the legal corpus. See you in Part 5.


-->

Thursday, March 23, 2017

What is law? - Part 3

Previously : What is Law? - Part 2.

The corpus of law - the stuff we all, in principle, have access to and all need to comply with, is not, unfortunately a nice tidy bundle of materials managed by a single entity. Moreover, the nature of the bundle itself, differs between jurisdictions. Ireland is quite different from Idaho. Scotland is quite different from the Seychelles. Jersey is quite different from Japan, and so on.

I will focus here on US and UK (Westminister)-style legal corpora to keep the discussion manageable in terms of the diversity. Even then, there are many differences in practice and terminology all the way up and down the line from street ordinances to central government to international treaties and  everything in between. I will use some common terminology but bear in mind that actual terminology and practice in your particular part of the world will very likely be different in various ways, but hopefully not in ways  that invalidate the conceptual model we are seeking to establish.

In general, at the level of countries/states, there are three main sources of law that make up the legal corpus. These are the judiciary, the government agencies and the legislature/parliament.

Let us start with the Legislature/Parliament. This is the source of new laws and amendments to the law in the form of Acts. These start out as draft documents that go through a consideration, amendment and voting process before they become actual law. In the USA, it is common for these Acts to be consolidated into a "compendium", typically referred to as "The Statutes" or "The Code". The Statutes are typically organized according to some thematic breakdown into separate "titles" e.g. Company Law, Environmental Law and so on.

In the UK/Westminster-type of Parliament, the government itself does not produce thematic compendia. Instead, the Acts are a cumulative corpus. So, to understand, for example, criminal law, it may be necessary to look at many different Acts, going back perhaps centuries to get the full picture of the "Act" actually in force. In UK-style systems, areas of law may get consolidated periodically through the creation of so-called "consolidations"/"re-statements". These essentially take an existing set of Acts that are in force, repeal them all and replace them with a single text that is a summation of the individual Acts that it repeals.[1]

It is common for third party publishers to step in and help practitioners of particular areas of law by doing unofficial consolidations to make the job of finding the law in a jurisdiction easier.
Depending on how volatile the area of law is in terms of change, the publisher might produce an update every month, every quarter, every year etc. In the USA, most US states do a consolidation in-house in the legislature when  they produce The Statutes/Code. In a similar manner to third party publishers, this corpus is updated according to a cycle, but it is typically a longer cycle - every year or two years.

So here we get to our first interesting complication with respect to being able to access the law emanating from Legislatures/Parliaments that is in force at any time T. It is very likely that no existing compendium produced by the government itself, is fully up to date with respect to time T. There are a number of distinct reasons for this.

Firstly, for Parliaments that do not produce "compendiums", there may not be an available consolidation/re-statement at time T. Therefore, it is necessary to find a set of Acts that were in force at time T, which then need to be read together to understand what the law was at time T.

Secondly, for Legislatures that produce compendia in the form of Statutes, these typically lag behind the Acts by anything from months to years. Typically, when a Legislature is "in session", busily working on new Acts, it is not working on consolidating them as they pass into law. Instead, they are accumulated into a publication, typically called the Session Laws, and the consolidation process happens after the session  has ended.  This is an area where third party publishers typically add value because they do consolidate "on the fly" and this is something that is very useful to many practitioners.

Thirdly, the concept of "in force" is quite tricky in practice. An Act may become law as soon as it passes through a signing process but the law itself may not take effect until some other event has happened. Typically there is some form of official government publication - register/gazette - and laws come into force when they appear in the register/gazette. Through a device called a "line item veto" it may be that a law comes into force but some parts of it are essentially elided. Trickier still is the concept of conditional legislation which comes into force, if, for example the cost of a barrel of oil hits some threshold value.

Even if it is possible to arrive at the text in force as it stood at Time T, the nature of the text itself has a large role to play in its direct usefulness for practitioners. The clearest example of this is what are known as amendatory acts. An amendatory act, rather than replacing a textual unit with a replacement textual unit, expresses the required changes in terms of amendatory instructions. E.g. "After the first occurrence of the word 'dog', insert 'cat or '". Again, this is an area where third part publishers often step in.

This brings us to a very important point about law that needs to be emphasised and it is this: what the text of the law says at any time T and what the text of  the law means at time T, are two totally different things on a number of levels. At a purely text management level, there is often a big difference between what the law says and what is means because the journey towards true meaning can only start once the editorial aspects of amendment consolidation have taken place and this might not be a function that the government performs at all. Even if it is, it may lag behind the creation of new Acts in a way that impacts its usefulness to practitioners as a definitive reference of the laws in force at any time T.

Once we get past the text management level of 'meaning' in the corpus, we are still only part of the way towards "the law" because the text needs to be read/parsed in order to find the parts of the text that are  in force and what parts are not, at any given target time point T. A simple example of this is a so-called "sunset clause" in which the consolidated text of an  area of law as it was at time T may contain a statement which repeals part of the law - potentially somewhere else entirely in the corpus of law! - at some time later than time T.

Are we having fun yet? Complex, isn't it? I will just add a few more layers to it and then we will take a step back, I promise...

Having arrived - by whatever means - at the text of the law as it stood at Time T, it might not be the case that the text has definitive status as "law" , even if it is produced by the government itself. A good example of this is the United State Code[2]. In the world of law, there is the concept of "prima facie evidence of the law" which is distinct from "the law" because the corpus that is the US Code has not itself passed through Congress as a corpus.

A similar nuance comes up in US State Legislatures where the Journals - essentially the meeting minutes of the formal chambers - may be considered by the judiciary as the one true source of new and amended laws. In this way of thinking, even Statutes produced by Legislatures are, in a sense, secondary sources.

Two more wrinkles and then I will stop. I promise. Stay with me here...

The first is that the corpus of Acts in force is not necessarily self consistent. Over the course of hundreds of years and thousands upon thousands of amendments errors can creep in such that a statement in Act A with is "the law" might contradict another statement in Act B which is also "the law". This is another  point where IT people tend to wince! Paradoxes, the law of the excluded middle[3], the entire glorious edifice of boolean logic, is dependent on the absence of
logical contradictions and yet, they can and do happen in law.

When this happens, jurisdictions do not SEGFAULT or go into endless loops or refuse to boot up in the morning. Rather, the legal system exhibits an interesting property that might be referred to as autonomic resolution[4]. Texts that conflict can co-exist in law (perhaps in the form of "unconsolidated statute") alongside consolidated statute, perhaps in the form of separate acts that conflict with each other. The entity that then deals with it is typically the judiciary, where that most ineffable of concepts : "human judgement" resolves the conflict.

Peter Suber[4] has argued that such contradictions cannot be fully eradicated from law. In his book the Paradox of Self Amendment[5], he uses an argument reminiscent of Godel's Incompleteness Theorem[6] to show that any system that can amend itself needs to be able to break out of contradictions/dead ends it might get itself into through the process of amendment.

In a memorable piece of prose[7], he puts it this way:

"One may regret the lapse of law from abstract logic, appreciate the equitable flexibility it affords, take satisfaction in the pretensions it punctures, or decry the dangers it makes possible."

The second, and final wrinkle I will add for now, is the concept of a  retroactive provisions[8]. These beauties have the effect of changing the way the law as it stood at time T needs to be interpreted at some future time T+1. If your head hurts, you are not alone. It is a tough one to grasp. Basically a full understanding of the law as some historical time point T1 is dependent, not just on the corpus as it was at that time point T1 but also, as it was as some future point T2. This is because the law at T2 may contain retroactive changes to how the law at T1 needs to be interpreted.

By now you will have noticed that I keep saying "the law at time T". Hopefully, given the discussion so far, you are beginning to get a feel for why the concept of time is so important. Time, the passage of time, its impact on the corpus of law....references to time in the law itself....it is inextricably woven into the way law works in my opinion. That is why, I believe any computational model of law must have the concept of time as a first class member of model, to be able to accurately reflect what law really is.

Not convinced about the primary importance of time in the conceptual model of law? Consider this: every single litigation, every single dispute that arrives in a court of law, needs to be able to look backwards to what the law was at the time of the litigation event. The law as it is today is not the point of departure in a court case. It is the law as it was at the date or dates relevant to the case. The nature of court cases is that this can be many years after the events themselves.

The same is true for many compliance issues in regulated industries. The same is true for many tasks in forensic accounting. The same is true for many financial audit scenarios...

I could go on with numerous other interesting aspects of the legislative/parliamentary side of the corpus but I will stop there.

Next up we turn to regulations/statutory instruments which come from the executive branch i.e. government agencies.

[1] https://en.wikipedia.org/wiki/Consolidation_bill
[2] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1517999
[3] https://en.wikipedia.org/wiki/Law_of_excluded_middle4
[4] https://en.wikipedia.org/wiki/Peter_Suber
[5] http://legacy.earlham.edu/~peters/writing/psa/
[6] https://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems
[7] https://en.wikipedia.org/wiki/Autonomic_computing
[8] https://en.wikipedia.org/wiki/Ex_post_facto_law


Wednesday, March 22, 2017

What is law? - Part 2

Previously: What is law? - Part 1.

The virtual legal reasoning box we are imagining will clearly need to either contain the data it needs, or be able to reach outside of the box and access whatever data it needs for its legal analysis. In other words, we can imagine the box having the ability to pro-actively reach out and grab legal data from the outside world when it needs it. And/or we can also imagine the box directly storing data so that it does not need to reach out and get it.

This brings us to the first little conceptual maneuver we are going to make in order to  make reasoning about this whole thing a bit easier. Namely, we are going to treat all legal data that ends up inside the box for the legal analysis as having arrived there from somewhere else. In other words, we don't have to split our thinking into stored-versus-retrieved legal data. All data leveraged by the legal reasoning box is, ultimately, retrieved from somewhere else. It may be that for convenience, some of the retrieved data is also stored inside the box but that is really just an optimization - a form of data caching - that we are not going to concern ourselves with at an architectural level as it does not impact the conceptual model.

A nice side effect of this all-data-is-external conceptualization is that it mirrors how the real world of legal decision making in a democracy is supposed to work. That is, the law itself does not have any private data component. The law itself is a corpus of materials available (more on this availability point later!) to all those who must obey the law. Ignorance of the law is no defense.[1]

The law is a body of knowledge that is"out there" and we all, in principle, have access to the laws we must obey. When a human being is working on a legal analysis, they do so by getting the law from "out there" into their brains for consideration. In other words, the human brain acts as a cache for legal materials during the analysis process. If the brain forgets, the material can be refreshed and nothing is lost. If my brain and your brain are both reaching out to find the law at time T, we both - in principle - are looking at exactly the same corpus of knowledge.

I am reminded of John Adams statement that government should be "A government of laws, not of men."[2] i.e. I might have a notion of what is legal and you might have a different notion of what is legal but because the law is "out there" - external to both of us - we can both be satisfied that we are both looking at the same corpus of law which is fully external to both of us. We may well interpret it differently, but that is another matter, we will be returning to later.

I am also reminded of Ronald Dworkin's Law as Integrity[3] which conceptualizes law as a corpus that is shared by and interpreted for, the community that creates it. Again, the word "interpretation" comes up, but that is another days work. One thing at a time...

So what actually lives purely inside the box if the law itself does not? Well, I conceptualize
it as the legal analysis apparatus itself, as opposed to any materials consumed by that apparatus. Why do I think of this as being inside and not outside the box? Primarily because it reflects how the real world of law actually works. A key point, indeed a feature, of the world of law, is that it is not based on one analysis box. It is, in fact lots and lots of boxes. One for each lawyer and each judge and each court in a jurisdiction...

Legal systems are structured so that these analysis boxes can be chained together in an escalation chain (e.g. district courts, appeal courts, supreme courts etc.) The decision issued by one box can be appealed to a higher box in the decision-making hierarchy. Two boxes at the same level in the hierarchy might look at the facts of a case and arrive at diametrically opposing opinions. Two judges in the same court, looking at the same case might also come to diametrically different opinions of the same set of facts presented to the court.

This is the point at which most IT people start to furrow their brows because it goes against the grain of most other computational systems that they work on. The law is not a set of predicate calculus rules that can combined in a classical conditional logic system. There are very few black and white predicate functions in law. This is not a bug. It is a feature. This is not a lack of logic either. Rather, it is a different type of logic, known as non-monotonic logic[4]. Just as valid and just as useful and necessary as the Boolean Logic IT people are more familiar with.

We will be returning to this later on. For now, suffice it to say that the logic of the analysis process is considered to be inside the box because it is private in the same way that a human brain is private. I might analyse legal data and arrive at a tentative conclusion and then write down my reasoning for others to see but the explanation may or may not reflect what my brain actually did. Moreover, nobody knows if it actually reflects what my brain actually did. Including me. That's brains for you!

So the analysis logic is inside the box and to a degree hidden from view in the same way that humans cannot look inside brains to see what is actually going on. The law itself is outside the box and not hidden from view. It is a corpus of knowledge that is "out there". The analysis process itself is always tentative in its conclusions. The outcomes of courts are called "opinions" for a reason - not "answers".

The world of law not only tolerates but is actively architected to allow differing interpetations of the same corpus of law. Society  deals with the non-determinism through an escalation process (appeals), majority voting (Many judges, same case, one judge one vote. Majority prevails.) and a repeals process (what is valid law today, might not be valid law tomorrow.)

Again, this is not a bug. It is a feature. It is a feature because as you may have noticed, the world is a messy place full of ambiguity and change and shifting views. Human behavior is a messy thing. Justiice/morality/the common good...these are complex concepts. The legal systems of the world have evolved in order to try to deal with the messy parts. Any computer system that gets involved in this has to engage with the reality that a lot of what sure looks like messy aspects in the world of law - especially to most computer programmers - are there for good reasons. They are not "bugs" to be fixed by getting rid of the English and replacing it with computer code.

Having said that I was going to focus on the data side first, I appear to have drifted off to the algorithm side somewhat. Oh well, best laid plans...

Coming back to the data side now, if all the data required for the legal reasoner is outside the box, then what is it? Where do we get it? Can we actually get at all of it?

We will pick this up in What is Law? - Part 3.

[1] https://en.wikipedia.org/wiki/Ignorantia_juris_non_excusat
[2] https://en.wikiquote.org/wiki/John_Adams
[3] https://en.wikipedia.org/wiki/Law_as_integrity
[4] https://plato.stanford.edu/entries/logic-nonmonotonic/