Saturday, June 26, 2010

KLISS: Organizing legislative material in legislatures/parliaments

Last time in this KLISS series, I talked about the event model in KLISS. I also talked about how it works in concert with the "time machine" model to achieve information consistency in all the "views" of legislative information required for a functioning legislature/parliament. For example a bill statute view, a journal view, a calendar view, and amendment list view, a committee view etc...

I am using the word "view" here is a somewhat unusual way so I would like today to explain what I mean by it. Doing that will help set the scene for an explanation of how legislative/parliamentary assets are organized in the KLISS repository and how metadata-based search/retrieval over the repository works.

It goes without saying (but I need to say it in order to communicate that it need not be said (ain't language wonderful?)), that legislatures/parliaments produce and consume vast amounts of information, mostly in document form. What is the purpose of the documents? What are they for really? In my view, they serve as snapshot containers for the fundamental business process of legislatures/parliaments, which is the making of law. In other words, a document in a legislature is a business process, snapshotted, frozen at a point in time.

By now, if you have been reading along in this KLISS series, you will know that it is very much a document-centric architecture. The documents themselves, in all their presentation-entangled, semi-structured glory, are treated as the primary content. We create folders, and folders inside folders. We create documents with headings and headings inside headings and we put these into folders. We then blur the distinction between folder navigation (inter-document) and heading "outline" navigation (intra--document) so that the whole corpus can be conceptualized as a single hierarchical information store. The entire state of a legislature/parliament, is in KLISS, *itself* a document – albeit a very large one! Simply put, KLISS does not care about the distinction between a folder and a heading. They are both simply hierarchical container constructs.

In KLISS a "view" is simply a time-based snapshot generated from the enormous document that is the repository, seen at a point in time, in some required format. So, a PDF of a bill is such a snapshot view. So too is a the HTML page of a committee report, a journal, a corpus of promulgated law etc. HTML, PDF, CSV, there are all the same in the KLISS information model. They are just views, taken at a point in time, out of the corpus as a whole.

Earlier in this series I talked about how the web blurs the distinction between naming something to pick it out and performing a query to pick it out. KLISS takes advantage of that blurring in the creation of views. So much so that a consumer of a KLISS URI cannot tell if the resource being picked out is "really there" or the result of running a query against the repository.

The hierarchical information model in KLISS has been strongly influenced by Hebert Simon and his essay The Architecture of Complexity. The view/query model is a sort of mashup of ideas from Bertrand Russell (proper nouns as query expressions) and John Kripke (rigid designators) combined with the Web Architecture of Sir Tim Berners Lee.

The most trivial views over the KLISS repository are those that correspond to real bytes-on-the-disk documents. Bills are generally like that. So too are votes. So too are sections of statute. Another level of views are those generated dynamically by assembling documents into larger documents. Volumes of statute are like that. Journals are like that. Once assembled, these documents often go back into the repository as real bytes-on-the-disk documents. This creates a permanent record of the result of the assembly process but it also allows the assemblies to be, themselves part of further assemblies. Permanent journals are like that. Final calendars are like that. Chronologies of statutes are like that.

Yet another level of views are those generated from the KLISS meta-data model...In KLISS, any document in the system can have any number of property/value pairs associated with it. When transactions are stored in the repository, these property/value pairs are loaded into a relational database behind the scenes. This relational database is used by the query subsystem to provide fast, ordered views over the repository. The sort of queries enabled are things like:

Give me all the bill amendments tabled between dates X and Y
Give me all the sponsors for all bills referred to the Agriculture committee last session
Give me all bills with the word "consolidation" in their long titles
How many enrolled bills have we so far this session?

At this point I need to point out that although we use a relational database as the meta-data indexer/query engine in KLISS, we do not use it relationally. This is by design. At this core level of the persistence model, we are not modeling relationships *between* documents. Other levels provide that function (we will get to them later on.). Effectively what we do is utilize a Star schema in which (URI+Revision Number) is the key used to join together all the metadata key, value pairs. The tabular structure of the meta-data fields is achieved via a meta-modeling trick in which the syntax of the field name, indicates what table and what field and what field type should be used for the associated value. In the future, we expect that we will gravitate away from relational back-ends into more non relational stores that are thankfully, finally, beginning to become commonplace.

It is important to note that in KLISS, the meta-data database is not a normative source of information. The master copy of all data is, at all times in the documents themselves. The metadata is stored in the documents themselves (the topic of an upcoming post). The database is constructed from the documents in order to serve search and retrieval needs. That is all. In fact, the database can be blown away and simply re-created by replaying the transactions from the KLISS time machine. I sometimes explain it by saying we use a database in the same way that a music collection application might use a database. Its purpose is to facilitate rapid slicing/dicing/viewing via meta-data.

This brings me to the most important point about how information is organized in KLISS. Lets step all the way back for a moment. Why do us humans organize stuff at all? We organize in order to find it again. In other words, organization is not the point of organization. Retrieval is the point of organization. Organization is something we do now, in anticipation of facilitating retrieval in the future. For most of human history, this has meant creating an organizational structure and packing stuff physically into that structure. Shoe closets, cities, pockets, airplanes, filing cabinets, filo-faxes, bookshelves, dewey decimal classification...

As David Weinberger explains in his book "Everything is Miscellaneous", there is no need for a single organizational structure for electronic information. A digital book does not need exactly one shelf on one wall, classified under one dominant heading. It can be on many shelfs, on may walls under many headings, in many ontologies, all at the same time. In fact, it can be exploded into pieces, mashed up with other books and represented in any order, in any format, any where and any time. Not only is this possible thanks to IT, it cannot be stopped. All known attempts – and their have been numerous – since the dawn of IT have failed to put the organization genie back in the bottle...

Having said that, the tyranny of the dominant decomposition appears, per Herbert Simon to be woven into the fabric of the universe. In order to store information – even electronically - we must *pick* at least some organizational structure to get us started. At the very least, things need to have names right? Ok. What form will those names take...Ten minutes into that train of thought and you have a decomposition on your hands. So what decomposition will be pick for our legislative/parliamentary materials? Do committees contain bills or do bills contain committees? Is a joint committee part of the house data model or part of the senate data model or both? Are bill drafts stored with the sponsor or with the drafter? Are committee reports part of the committee that created them or part of the bills they modify? etc. etc...One hour later, you are in a mereotopology induced coma. You keep searching for the perfect decomposition. If you are in luck, you conclude that there is no such thing as the perfect decomposition and you get on with your life. If you are unlucky, you get drafted into a committee that has to decide on the correct decomposition.

Fact of life: If there are N people in a group tasked with deciding an information model, there are exactly N, mutually incompatible models vying for dominance and each of the N participants is convinced that the other N-1 models are less correct than their own. Legislatures/parliaments provide and excellent example of this phenomenon. Fill a room with drafting attorneys, bill status clerks, journal clerks, committee secretaries, fiscal analysts and ask each of them to white-board their model of, for example bills, you will get as many models as there are people in the room.

That is why, in KLISS, by design, the information model – how it carves up into documents versus folders, paragraphs versus meta-data fields, queries versus bytes-on-the-disk does not really matter. Just pick one! There are many, many models that can work. Given a set of models that will work, there is generally no compelling reason to pick any particular one. In legislatures/parliaments – as in many other content-centric applications the word "correct" needs a pragmatic definition. In KLISS, we consider an information model to be "correct" if it supports the efficient, secure production of the required outputs with the required speed of production. That is essentially it. Everything else is secondary and much of it is just mereotopology.

Two more quick things before I wrap up for today. You may be thinking, "how can a single folder structure hope to meet the divergent needs of all the different stakeholders who likely have different models in their head for how the information should be structured?" The way KLISS does it is that we create synthetic folder structures – known as "virtual views" – over the physical folder structure. That allows us to create the illusion – on a role by role basis – that each group's preferred structure is the one the system uses :-)

As well as helping to create familiar folder structures on a role-by-role basis, virtual views also allow us to implement role based access control. Every role in the system uses a virtual view. Moreover, all event notifications use the virtual views and all attempted access to assets in the repository are filtered through the users virtual view - that includes all search results.

To sum up...KLISS uses a virtualized hierarchical information model combined with property/value pairs arranged in a star-schema fashion. Properties are indexed for fast retrieval and based on scalar data types that we leverage for query operators e.g. date expression evaluation, comparisons of money amounts etc. The metadata model is revision based and the repository transaction semantics guarantee that the metadata view is up to date with respect to the time machine view at all times. All event notifications use the virtual view names for assets.

You may be wondering, "is it possible to have a document with no content other than metadata?". The answer is "yes". That is exactly how we reify non-document concepts like committees, members, roles etc. into document form for storage in the time machine. Yes, in KLISS, *everything* is a document:-)

Next up: Data models, data organization and why the search for the "correct" model is doomed.

No comments: