Sean McGrath: KLISS: rigid designation URIs in KLISS

Monday, June 21, 2010

KLISS: rigid designation URIs in KLISS

Last time, I talked about rigid designators and their role in the KLISS architecture.

First the bad news. Before continuing with this post, I have a confession to make. I do not believe there is such a thing as a truly rigid designator - certainly not in a form that can be implemented in technology. There are simply too many uncontrollable contextual variables that would need to be locked down and the world – specifically the Web – simply does not work that way. I won't go into the details here. (That is a topic for a bar conversation sometime. If your happen to be interested: take the causal chain that underlies network protocols, mix in sunyata with Saussure's structuralism and Wittgenstein's concept of language games, you end up in the ballpark of my objections to the existence of truly rigid designators.)

Now the good news. The web makes it possible to get closer than ever before because names on the Web are actionable – you can click on them...That sounds trivial but it is not. What is the Web really? (I mean really?). It is an unspeakably large set of names combined with the machinery necessary to pick out units of texts given those names.

The names are called URIs.
The units of text are called resources (or, strictly speaking, representations of resources.)
The machinery for picking out a unit of text (resource) given a name (URI) is the http protocol.

In order to add as much rigidity to our designators as possible for law, we need to find a way to de-contextualize the referring process as much as possible i.e. we do not want the unit of text we get back when we de-reference to be dependent on the context in which the de-referencing takes place. i.e. we retrieve it today, we retrieve it tomorrow, user A retrieves it or user B retrieves it, it is retrieved from continent A or from continent B, from browser A or browser B, in format A or format B...we would like all those referring variants to return the same unit of text.

This is tough. A truly wicked problem. However, we can make tremendous strides towards rigidity on the Web by just attacking one dimension of the problem. Namely, time.

Now you might be thinking "I see where this is going...we add date-stamps into our URIs so that they don't change and we have removed the time dimension from the context". That would certainly be a step in the right direction for law on the web where link rot is a major problem. However, it does not address the full problem. In a corpus of law, I am interested in being able to reference units of text locked down in time but I also want to be able to see the entire corpus as it was at that same moment in time. In other words, if I grab HB2154 as it looked at noon, 1 Jan, 2010, to fully understand what I'm looking at, I would like to be able to see what everything else in the corpus looked like at noon, 1 Jan, 2010.

Why is this "point in time" snapshot important in legislatures/parliaments? A number of reasons

Democratic transparency
Accuracy of aggregate publications
Accuracy of real-time displays
Legislative intent
History in the context of its creation

Let us take each of these in turn.

Democratic transparency

The ability to see moment-by-moment snapshots of the entire recorded democratic process and move seamlessly forward and backwards in time time watching how things changed...following hyper-links from bills to sponsors to committees to journals to statute to committee testimony...not only across documents but also through time...I can do that if I have the ability to re-create the corpus as it looked at any point in time and if I have designed by systems so that all inter-document linkages are expressed in terms of time-oriented URIs.

It is possible to have a legislative/parliamentary process by just publishing law. However, to have a truly democratic legislative/parliamentary process, you need to add participation and transparency. I mentioned earlier that a legislature/parliament can be thought of as a machine that takes the corpus of law at time T and produces a new corpus of law at time T+1. A democratic legislature/parliament does not simply announce the new corpus. It exposes how the new corpus was arrived at. I know of no more powerful way of doing that with technology than recording the moment-by-moment activity and allowing that activity to be "played back" and interpreted moment-by-moment.

Accuracy of aggregate publications

Legislatures/parliaments are awash with aggregate publications. (I discussed these in an earlier post). Bills pull in ("transclude" in the geek vernacular) statute. Journals pull in votes. Statute volumes pull in statute sections, case annotations etc. From an accuracy perspective, it is very important to be able to look at a Bill or a Journal and ask "What units of text actually got aggregated here?". It is very important to be able to ask "what votes were actually in the repository at the time that these 6 were added into the journal?" Armed with the ability to reach into the corpus as it looked at a particular point in time, allows this to be done accurately. In other words, time can be locked down so that the publication is correct as of the time the report was run.

Accuracy of real-time displays

Most legislatures/parliaments produce formal outputs at timescales established in the early days of printing e.g. 24 hour turnaround times necessitated by the need to typeset and then print to paper – generally offsite. In the last 20-30 years, IT has made it possible to shrink these timescales and most legislatures/parliaments operate some form of "real-time" information dissemination such as live audio/video, live bill status screens and so on. Even those that do not provide real-time information, find that a visitors gallery is all it takes these days for worldwide information dissemination - at the speed of light - via twitter and blogs etc.
I think of these "real time" displays as very fast aggregate publications. In fact, from an engineering perspective, they are built the same way in KLISS. So, just as it is vital to know what votes where in the system when it is decided that 6 can go into the journal, it is vital to know what all the actions were against HB1234 in the system at the time that 12 were identified to be listed on the bill status screen. Again, armed with the ability to reach into the corpus as it looked at a particular point in time, this can be done accurately. In other words, time can be locked down so that the publication is correct as of the time the report was run.

Legislative intent

Legislative intent is really a subtopic of Democratic transparency. It is important when establishing intent, to have the full context under which decisions were made. Again, I know of no more powerful technological mechanism for supporting this than the ability to re-create a legislature/parliament as it looked at the moment a decision was taken. Obviously, the more context we can pour into the repository of content, the better the ability to re-create the context becomes. More on that point in a moment...

History in the context of its creation

Finally, there is preservation of history as it is being created. Since the dawn of written law, people have been writing laws down and others have afterwards been poring over the writings trying to "fill in the blanks" to understand the why of it all, to get inside the minds of those who participated in the making of legislative history. In this day and age, there is no need for us to leave blanks in the record. Recording Bismarck's sausage machine purely by its final outputs : laws, journals etc. is somewhat like that. Why not record everything that is non-confidential as it happens? Come to think of it, why not facilitate reporting what is going on, as it is going on rather than at the end of the day or the end of the session?

At first blush, all this might seem too much of a technological leap. I contend that it is not. It is a *lot* of work, but no great new technological breakthroughs are required. The reason I so contend is that we can combine Web technologies with the delta-oriented algorithms of source code control systems to provide an excellent basis for the sort of legislative corpus management system required to achieve the above. Here is what is needed:

A repository that "snapshots" the entire state of the system at every change (revision) and allows for the repository to be re-constituted at it looked at any previous revision
A RESTian interface to that repository that includes revision numbers so that point-in-time views of individual assets or the entire repository as it was at a point in time, can be retrieved.

That is, essentially, how content is persisted and then accessed in KLISS.

The entire name space of documents assets is exposed as URIs.
The URIs include revision numbers allowing all previous versions of any asset, since the dawn of time to be retrieved.
Given an arbitrary timestamp, it is possible to extract the complete repository, as it looked at that timestamp.

Now you might be thinking that this time machine-like view into the history of a legislature/parliament can only contain some of the context required to fully understand what happened, who did what and, ultimately help determine why things happened the way they did. That is true but the context it contains is significantly extended by the fact that in KLISS, pretty much *everything* is a document (or a folder of documents). Chambers are modeled as folders/documents. Members are folders/documents. Committees are folders/documents. Why? Because then, they are first class members of the "time machine". I can re-create the movements of bills through committees by looking back in time at how my committee folders (which are exposed as URIs) changed with respect to time. I can recreate what the entire state of play was at the time a vote was called. Moreover, if I wish, I can replay the events (which of course are all URI posts) in order to re-run history and watch it unfold on front of our eyes...

In summary, in KLISS, names are really really important. We make names are rigid as we can within the confines of what is technologically feasible. We map all non-confidential names directly into URI-space and we add the time dimension to each URI to allow not only the retrieval of an individual asset at a point it time but *all* assets as they stood at that point in time. We do this at a technical level by leveraging the delta algorithms commonplace today in good source code control systems, combined with HTTP and a time-oriented twist on a RESTian interface.

Next up: event generation in KLISS and its role in enabling real-time telemetry as well as notification. Also, now this fits with the Eventual consistency model used in KLISS.

2 comments:

Unknown said...: Sean,

I my experience there are two important contextual dates associated with statute law (well at least in the land of Oz).

The statute "as-at-date" you discuss does not cover the second context which is the "law-in-force" which may be different in jurusdictions where "retrospecitve" amendments are allowed. A more common case is to ask what the law will be at a future date (especially common with taxation laws applying to future fiscal years).

So if I ask what the law was on the 1 June 2009 I will get a different answers when I ask the question on 1 June 2009, 1 June 2010 or the 1 June 2008.

Of course we could then start to talk about other contextual dates like fiscal years which are not about the laws in force but the laws applicable.; 1:01 AM
Sean McGrath said...: Bill,

Determining exactly what statute law is in force at any given point is indeed a thorny problem and differs greatly from jurisdiction to jurisdiction.

Sunrise clauses, sunset clauses, conditional clauses (i.e. if X happens, then Y becomes law), automatic enactment on signature, automatic enactment on publication in a register, automatic enactment if X days pass without further action etc. etc.

All of these boil down to dates eventually but there is only so much that a machine can do to resolve all these without "executing" the law. In an earlier post, I mentioned that I'm happy to stay away from that one:-)

Automating the application of statute deltas based on dates is very doable in the simple case but approaches Turing completeness quickly for the complicated cases.

regards,
Sean; 5:31 AM

Sean McGrath

Featured Post

Linkedin