Thursday, June 17, 2010

KLISS: Law as source code

Over the past couple of days I have received some comments - and some pushback - about my assertion that law is basically source code, so I'd like to explain what I mean. As it happens, explaining that is also a good way for me to start to explain the Legislative Enterprise Architecture that underpins KLISS, so here goes.

When I look at a corpus of law being worked by a legislature/parliament I see...
  • text, lots and lots of text, organized into units of various sizes: sections, bills, titles, chapters, volumes, codes, re-statements etc.
  • The units of text are highly stylized, idiomatized, structured forms of natural language.
  • The units of text are highly inter-linked : both explicitly and implicitly. Sections are assembled to produce statute volumes, bills are assembled to produce session laws etc. Bills cite to statutes. Journals cite bills. Bills cite bills...
  • The units of text have rigorous temporal constraints. I.e. a bill that refers to a statute is referring to a statute as it was at a point in time. An explanation of a vote on a bill is an explanation of a vote as it looked at a particular point in time.
  • The law making process consists of taking the corpus of law as it looked at some time T, making some modifications and promulgating a new corpus of law at some future time T+1. That new corpus is then the basis for the next iteration of modifcations.

When I look at a corpus of source code I see...
  • text, lots and lots of text, organized into units of various sizes: modules, components, libraries, objects, services etc.
  • The units of text are highly stylized, idiomatized, structured forms of natural language.
  • The units of text are highly inter-linked : both explicitly and implicitly. Modules are assembled to produce components, components are assembled to produce libraries etc. Source files cite (import and cross-link to) other source files. Header files cite (import and cross-link to) header files. Components cite(instantiate) other components...
  • The units of text have rigorous temporal constraints. I.e. a module that refers to a library is referring to a library as it was at a point in time e.g. version 8.2. A source code comment explaining an API call is written with respect to how the API looked at a particular point in time.
  • The software making process consists of taking the corpus of source as it looked at some time T, making some modifications and promulgating a new corpus - a build - at some future time T+1. That new corpus (build) is then the basis for the next iteration of modifications to the source code.

What we have here are two communities that work with insanely large, complex corpora of text that must be rigorously managed and changed with the utmost care, precision and transparency of intent. Yet, the software community has a much greater set of tools at its disposal to help out.

How do programmers manage their corpus of text - their source code? In a database? No (at least not in the sense that the word "database" is generally used). Instead they use *source code control systems*. What do these things do? Well, the good ones (and there are many) do the following things:

  • Keep scrupulous records of who changed what and when and why
  • Allow the corpus to be viewed as it was at any previous point in time (revision control)
  • Allow the production of "builds" that take the corpus at a point in time and generate internally consistent "products" in which all dependencies are resolved
  • Allow multiple users to collaborate, folding in their work in a highly controlled way. Avoiding "lost updates" and avoiding textual conflicts.

The above could be used as an overview of everything from DARCS to Mercurial to GIT to SVN - and that is just some of the open source tools. It is, to my mind, exactly the sort of feature set that the management of legal texts requires at its foundational storage and abstraction level. Right down at the bottom of the persistence model of KLISS is a storage layer that provides those features. On top of it, there is a property/value metadata store for fast retrieval based on facets, an open-ended event framework for change notification and the whole this is packaged up as a RESTian interface so that the myriad of client applications, from bill drafting to statute publication to journal production to committee meeting management...do not have to even think about it. But I digress. More on that stuff later...

The natural language, textual level of law is my focus. I'm not attempting to make computers "understand" law by turning it into propositional calculus or some such. I'm happy to leave that quest to the strong AI community. The textual focus is why I prefer to say that "law is source code" rather than to say that "law is an operating system" because when you "execute" the law, it does not behave like most software applications. Specifically, a given set of inputs will not necessarily produce the same outputs because of the human dimensions (e.g. juries) the ongoing feedback loop of Stare decisis, the scope for once "good law" to become "bad law" by being overturned in higher courts and so on.

I believe there is much that those who manage corpora of law can learn from how software developers have met the challenge of managing corpora of source code. There are many differences and complicating factors in law for sure (and I'll be addressing many of them in the KLISS posts ahead) but at a textual level - the level at which Bismarck's Sausage Machine largely works - there is a very significant degree of overlap in my opinion. And overlap that can be and should be leveraged.

It will not happen overnight but now seems to me like an excellent time to start. All the stars are aligned: the open government directive, linked data, law.gov, the semantic web initiative, cloud computing, eBooks, revision control systems, text analytics etc. etc. The pressures are building too. The law itself is in danger in my opinion: even if open access and the many paywalls were not a problem, there is a significant authenticity issue that needs to be addressed. In an electronic age, with more and more law "born digital" the old certainties about authenticity and accuracy are rapidly fading. A replacement paradigm simply must be found. More on this topic later on when I get to talking about the KEEP (Kansas Enterprise Electronic Preservation) project I am working on now along with KLISS.

Next up: the importance of names.

5 comments:

Anonymous said...

I very much hope you can get people to adopt it. Have you found lawmakers who want it?

There was a stink in my state recently because someone had inserted into a bill a provision to remove authority from an elected official and given it to industry-connected bureaucrats. It only came out because the offical held a press conference. Nobody knew who had done it. The legislators like it that way.

I'm friends with two former state legislators. They said they never got time to even read the bills. One said she would check them from back to front, because the abuses were always hidden in the back, written in purposely obscure ways. Once she figured out at the last minute that someone was giving himself a million bucks to throw a party.

If you don't have a majority of lawmakers who want it, you're stuck drumming up public support. Since most non-technical people would have no clue how this would work, your task would be a difficult one.

Maybe the best approach would be to start in a small, relatively non-corrupt state. Once one state adopts it and people start seeing real benefits, the idea may spread.

neil said...

I suspect you are aware of this, but there is research in CS on legal texts and codes.

In AI there is the study of deontic logics and reasoning, e.g. http://www.springerlink.com/content/m3634r587755t84h/

In requirements research there is an annual workshop dedicated to RE and law: http://www.csc2.ncsu.edu/workshops/relaw/

Sean said...

Neil,

Yes. I'm aware of (at least some of) the research in this area.

I jokingly refer to the formalism I would love to have a "deontoral" - a combinations of deontic logic and temporal logic.

At some stage in the future, I'd love to have some time to explore this. In particular, I'm interested in formalizations of meeting/assembly rules e.g. Robert's Rules of Order, Mason's Manual, Bourinot's Rules of Order, Demeter's Manual etc. The state of the art today, sadly, consists of hard-wiring business rules from the natural language form directly into imperative forms e.g. lumps of Java, Python, Visual Basic etc.

I like the idea of having a DSL so that a set of chamber/meeting system rules can be derived from known super-set of "good" rules so that we can assert compliance and also, ideally, inspect historical action sequences to determine if they complied with the rules.

Of course, there are limits to what we can do here - especially given the Nomic realities of things. I.e. the rule that allows the suspension of the rules:-)

I know that some work has been done in this area e.g. re-stating Robert's rules in first order predicate logic e.g. http://bit.ly/9cyA8K.

regards,
Sean

Karl Fogel said...

Great post!

I keep thinking that the route to getting better text tools into legislatures is to make things that make staffers' lives easier. Convert the staffers and everything else follows... Would you agree?

Sean said...

Karl,

Violent agreement :-)

The absolute need to keep legislative staffers happy is precisely why in KLISS we hide rigorous text management tooling behind a facade designed to present staffers with the useful illusion that is is just a set of word processor files in a set of folders.

Picture a swan...graceful and serene on the surface...yet paddling hard under it:-)

Sean