Last time in this KLISS series of posts I talked about the critical nature of fidelity with historically produced renderings of legislative artifacts. This time, I want to turn to item 4 in the list of challenges for the XML standard model in legislatures/parliaments : the fluid nature of work-in-progress legal assets.
The process of making law does not begin with a draft bill any more than the making of a song starts with a draft of the lyrics with the melody and the harmonies. The process starts with an idea. Nothing more. Sometimes a well formulated idea. Sometimes it is just a doodle, perhaps with more questions than answers...
This is normally the point at which a law maker engages the help of a specialist drafter. The job of the drafter/drafter/council being to work with the law maker to turn the idea into legal form – to structure it per the rules of the relevant legislature/parliament. From a textual perspective what is happening is that unstructured material is being morphed into structured material.
The process of changing law is quite similar. (Most making of law is, in fact, changing of existing law.) The existing – structured – material is taken as input and various approaches to changing it are likely to be investigated. Maybe section X gets repealed and a new section Y added? Maybe section X is updated on its own. Maybe some of the existing section X text is retained? etc. etc. The skill of the drafter/drafter/council is in knowing all the relevant law and knowing all the possible ways in which the intent of the law maker can be met and knowing the pros/cons of each. From a textual perspective what is happening is that structured material is being rendered temporarily unstructured while various possible implementation approaches are being analyzed. At some point, the un-structured morphs back into structured material again.
Now folks who write software will hopefully have spotted that this process is very much akin to software development. Sometimes you start with nothing but an idea. Sometimes you start with an existing program – a corpus of source code – and an idea. Either way, your act of programming consists of moving from messy unstructured content in your text editor towards very structured content – source code.
I want to tease out that analogy further because it is very much at the heart of how I think about legislative systems. Think about it...what are legal texts really? They are highly formalized, deeply interconnected documents that must obey a rigorous set of rules: rule of syntax, rules of form, rules of idiom, rules of inter-relationship. When there are changes to these documents they are “compiled” together into a new “release” to become the new laws of the land. Sound familiar?
Law is basically source code.
The implications of this – if its true (or can be made true) – are very significant. Not only in terms of the extant Operating System meme but in terms of the topic at hand: the fluid nature of work-in-progress legal assets.
The standard XML model has it that legal documents are structured and therefore they can – and should – have grammars in the form of XML schemas. They should also have author/edit systems that know about the grammars so that users can be helped (or some would argue “forced”) to always keep their documents correctly structured per the schema.
I have two major problems with that model. Firstly, it ignores the fact that most the time – at least in law – changes go through a transitionary author/edit phase when the content is no longer necessarily structured. The author is trying out ideas, moving stuff around. Maybe that heading will be come text in a sub-clause. Maybe this para will bubble up to the long title. Maybe I don't want this (probably structurally invalid) single item bulleted list but I'm going to keep it in the bill until I decide if I want it or not...
I have seen XML editing tools that drive people nuts because they refuse to allow users to move stuff around at will because to do so would cause it to be temporarily broken per XML's rules of structure. But breaking it is part and parcel of the workflow of fixing it back up again after change.
I have seen XML editing tools that refuse to allow users to work on the body of a bill draft until all the required pre-amble material and metadata material is added first. Why? Well, because in the schema, that material is required and it comes before the body of the draft. Well sorry but at the point where I'm trying to create a new draft, I may not know what long title I will want and who the bill sponsors will be or what the chamber of introduction will be... This sort of thing drives many users of classic XML editors in legislatures/parliaments absolutely nuts in my experience.
The second major problem I have with this model is my strongly held opinion that source code and legal code are really the same thing. Let us look at what tools software developers choose for themselves in creating what are very, very structured documents....Do they pick tools that beep when the program is not valid Java syntax? Do they pick tools that refuse to save the source code documents unless they “compile”. Do they pick tools that force the programs to be created from the top down, per the rules of the programming language? Of course not. Instead, they use powerful textual tools that get out of the way. Emacs, vi, TextMate etc. When they do use tools that are aware of the syntax rules: Visual Studio, Netbeans, Eclipse etc. they are all very forgiving of syntax/structure failures. These tools often use a soft form of syntax highlighting to inform users of possible structural problems but they never beep!
If I'm right that source code and law code are more similar that not, why oh why is there such an enormous disparity in the toolsets used by each community? In particular, why have we allowed ourselves to be convinced that rigid inviolate structure rules will work for law (code) when everyone knows that there is no way it would work for source code?
When I am working on source code it is structurally broken. That is either precisely why I am working on it or a necessary stepping stone to me making it structured again after a change. My work-in-progress is very fluid and I use author/edit tools that support that fluidity. I simply would not be able to function if the tools did not allow me to “break” all the rules during the work-in-progress part of my workflow.
When I edit legal and regulatory texts (not as a practicing lawyer, I am not a lawyer – but as a system implementor) I break them in a very similar way. All the time. The notion that all the content needs to obey all the post-production rules during production simply does not ring true for me.
Many of the sucessful systems I have seen in legislative/parliamentary systems have one thing in common - a word processor. In fact, often an old word processor. We are talking DOS and even green-screens here! It is all too common in my experience, for folks to think that legislative systems will automatically be better if those nasty, unstructured word processors are replaced by bright shiny, structured XML editors.
Not so. Not even close. For all their flaws, the word processors worked and they worked for a reason. I'm not saying that legislatures/parliaments should stick with their green-screens and DOS PCs but I am saying that the reasons why the word-processors worked – for all their failings –, need to be properly understood before they are replaced. Thankfully, and this is something I will be returning to later on in this series, we now have generally available tools that allow the benefits of XML to be leveraged without departing from the fluidity of the word-processor form. A form that is, in my opinion, critical to extracting value from law-making IT systems.
Next time, I want to turn to number 5 in my list of concerns about the XML standard model in legislatures/parliaments: the complexity of amendment cycle business rules that often pre-date computers and cannot be changed to make life easier for modern software.
Featured Post
These days, I mostly post my tech musings on Linkedin. https://www.linkedin.com/in/seanmcgrath/
Saturday, June 05, 2010
Friday, June 04, 2010
XML in legislature/parliament environments: The critical nature of fidelity with historically produced renderings
Last time, in this KLISS series of posts, I talked about amendatory actions in legislatures/parliaments and how gosh-darned complex they can be unless you properly account for modeling time, numerical data and our old friends line/page numbers in your architecture.
As discussed, none of these three critical items come off-the-shelf in your average XML-bag-o-tricks. They are also not “add-on modules” that you can buy after you have selected an off-the-shelf CMS/DMS. They run too deep in the problem space for that. To be handled properly they need to be at the very heart of eDemocracy system architecture and design.
Anyway, enough on that for now. Today I would like to turn to fourth reason why the standard XML model is not directly applicable in legislatures/parliaments. Namely, the critical nature of fidelity with historically produced renderings of legislative artifacts.
Lets start by recalling the XML mantra that information and the presentation of that information should be separated. Nice idea. In fact, great idea...but it won't work in legislatures/parliaments. (In fact, there are plenty of situations where it won't work outside of legislatures/parliaments either. See Separating Content from Presentation: Easier Said Than Done
Let us look at a few examples of legal-style content where presentation information is of paramount importance.
Below is a beautiful old table rendering taken from Illustrated judgements
Note how important to understanding the content, the layout is. The vertical alignments. The dot leaders. The integral signs...
Now imagine having this in a legal document. Is it even possible to separate the content from the presentation? I doubt it. Even if it was, would you want to? Imagine rendering this information through a different stylesheet or XML transformation...how much of the original rendering would be lost? Would any of it matter? Does anybody really know? Do you want to take the risk that the interpretation might change as a result of a re-rendering? In law, this is a real problem because ever since the twelve tables we have lived in a world where content and presentation of law are fixed at point of promulgation. Heads of state do not sign ones-and-zeros into law. They sign (almost exclusively) paper into law. Paper is the quintessential example of a rendering that doesn't change once you create it. It is locked down. Put it in a safe in the Secretary of States office for 100 years and as long as your vellum and your non-fugitive inks are good, the rendering will be the same.
That is very important for law. Contrast what happens in a purely digital world in which renderings are created on-the-fly through complex algorithms known as word-processor rendering engines or stylesheets...Will a Wordperfect 5.1 document render in Office 11 exactly the same as it did in Wordperfect 5.1? No. Especially once you get into complex layouts like tables. In fact, even if author/edit vendors did wish to standardize how they lay out text in, say, tables, there are patents held by various parties that actively inhibit it.
Will, say, a FOSI stylesheet in XML authoring tool X produce the same layout in authoring tool Y? No. Not even when they are rendering content to the same XML schemas. Will two XSL:FO implementations generate identical PDFs? No...
I don't know about you but I like my laws locked down! I don't like the idea that the semantics of the law – sometimes terribly intertwingled with its presentation – can change depending on what tool is used to look at it...
Below is a more modern but equally beautiful piece of rendering, this time from a piece of Irish legislation.
As a thought experiment, change the rendering in you head. Change the left indents...How about that solitary paragraph with the word “or” in it. Is its position relative to the rest of the text important to the overall meaning? See the hyphens? They all look like they were added by the rendering don't they? So they are not part of the text of the law itself. What if the split occurred between “self-” and “evident”. Would it be self-evident that the hypen is non-discretionary? Would that hyphen be part of the law or an accident of the rendering? Look at the list ornamentations: alpha, to roman to uppercase roman. Are the list ornaments part of the law? Would you get your stylesheets to generate them or wire the values directly into the text? I like my list ornaments to not move! Pretty much every one of these in a legal context is likely to be a target for a citation. It cannot change!
Finally, an example of a form data pretty much every statute book has in some form. A sentencing table for criminal offenses:
Fancy separating the content from presentation there? If the interpretation of that data is in any way related to its rendering you really, really need to know that you can re-create it exactly.
One final point on this...Imagine you are drafting a new law and it is going to pull-in some existing law in order to modify it. In many jurisidictions, the pull-ed in law needs to look exactly as it did when it became law, maybe decades ago, maybe hundreds of years ago. So now your nice shiny new computer system has to typeset law to look like it was produced by a daisywheel typewriter, or a Linotronic or whatever because that is how it looked when it became law.
I have seen situations where the existing law being amended has contained mistakes in its indentation, its list ornamentation, its hyphenation etc. and yet – because this is law – my new computer systems need to be able to replicate the wrongness correctly. Its the law!
I moan when I hear people say that opening up law making and publishing richly marked up legal material is just a simple matter of standardizing on a XML schema and knocking out a few stylesheets...There is a lot, lot more to it than that.
Next up in this whirlwind tour of fun issues with the XML standard model in legislatures/parliaments:, the fluid nature of work-in-progress legal assets.
As discussed, none of these three critical items come off-the-shelf in your average XML-bag-o-tricks. They are also not “add-on modules” that you can buy after you have selected an off-the-shelf CMS/DMS. They run too deep in the problem space for that. To be handled properly they need to be at the very heart of eDemocracy system architecture and design.
Anyway, enough on that for now. Today I would like to turn to fourth reason why the standard XML model is not directly applicable in legislatures/parliaments. Namely, the critical nature of fidelity with historically produced renderings of legislative artifacts.
Lets start by recalling the XML mantra that information and the presentation of that information should be separated. Nice idea. In fact, great idea...but it won't work in legislatures/parliaments. (In fact, there are plenty of situations where it won't work outside of legislatures/parliaments either. See Separating Content from Presentation: Easier Said Than Done
Let us look at a few examples of legal-style content where presentation information is of paramount importance.
Below is a beautiful old table rendering taken from Illustrated judgements
Note how important to understanding the content, the layout is. The vertical alignments. The dot leaders. The integral signs...
Now imagine having this in a legal document. Is it even possible to separate the content from the presentation? I doubt it. Even if it was, would you want to? Imagine rendering this information through a different stylesheet or XML transformation...how much of the original rendering would be lost? Would any of it matter? Does anybody really know? Do you want to take the risk that the interpretation might change as a result of a re-rendering? In law, this is a real problem because ever since the twelve tables we have lived in a world where content and presentation of law are fixed at point of promulgation. Heads of state do not sign ones-and-zeros into law. They sign (almost exclusively) paper into law. Paper is the quintessential example of a rendering that doesn't change once you create it. It is locked down. Put it in a safe in the Secretary of States office for 100 years and as long as your vellum and your non-fugitive inks are good, the rendering will be the same.
That is very important for law. Contrast what happens in a purely digital world in which renderings are created on-the-fly through complex algorithms known as word-processor rendering engines or stylesheets...Will a Wordperfect 5.1 document render in Office 11 exactly the same as it did in Wordperfect 5.1? No. Especially once you get into complex layouts like tables. In fact, even if author/edit vendors did wish to standardize how they lay out text in, say, tables, there are patents held by various parties that actively inhibit it.
Will, say, a FOSI stylesheet in XML authoring tool X produce the same layout in authoring tool Y? No. Not even when they are rendering content to the same XML schemas. Will two XSL:FO implementations generate identical PDFs? No...
I don't know about you but I like my laws locked down! I don't like the idea that the semantics of the law – sometimes terribly intertwingled with its presentation – can change depending on what tool is used to look at it...
Below is a more modern but equally beautiful piece of rendering, this time from a piece of Irish legislation.
As a thought experiment, change the rendering in you head. Change the left indents...How about that solitary paragraph with the word “or” in it. Is its position relative to the rest of the text important to the overall meaning? See the hyphens? They all look like they were added by the rendering don't they? So they are not part of the text of the law itself. What if the split occurred between “self-” and “evident”. Would it be self-evident that the hypen is non-discretionary? Would that hyphen be part of the law or an accident of the rendering? Look at the list ornamentations: alpha, to roman to uppercase roman. Are the list ornaments part of the law? Would you get your stylesheets to generate them or wire the values directly into the text? I like my list ornaments to not move! Pretty much every one of these in a legal context is likely to be a target for a citation. It cannot change!
Finally, an example of a form data pretty much every statute book has in some form. A sentencing table for criminal offenses:
Fancy separating the content from presentation there? If the interpretation of that data is in any way related to its rendering you really, really need to know that you can re-create it exactly.
One final point on this...Imagine you are drafting a new law and it is going to pull-in some existing law in order to modify it. In many jurisidictions, the pull-ed in law needs to look exactly as it did when it became law, maybe decades ago, maybe hundreds of years ago. So now your nice shiny new computer system has to typeset law to look like it was produced by a daisywheel typewriter, or a Linotronic or whatever because that is how it looked when it became law.
I have seen situations where the existing law being amended has contained mistakes in its indentation, its list ornamentation, its hyphenation etc. and yet – because this is law – my new computer systems need to be able to replicate the wrongness correctly. Its the law!
I moan when I hear people say that opening up law making and publishing richly marked up legal material is just a simple matter of standardizing on a XML schema and knocking out a few stylesheets...There is a lot, lot more to it than that.
Next up in this whirlwind tour of fun issues with the XML standard model in legislatures/parliaments:, the fluid nature of work-in-progress legal assets.
Wednesday, June 02, 2010
XML in legislature/parliament environments : the complex nature of amendatory actions
Last time in this KLISS series, I talked about the centrality of line/page number citation in amendment cycles and how important it is to treat line/page number artifacts as first class citizens of XML models in legislative/parliamentary environments. I also talked a little about how wonderful the world would be if citations and amendatory workflows were not based on line/page number citations. There are compelling alternatives that have emerged over the last 30-40 years...
However, no other institutions that I know of have workflows that are so influenced by their own history, their own traditions, and most importantly the powerful force of precedent. Some examples from my own experience in this space:
Some other reasons, more subtle, more technological and sometimes contentious, include:
I will talk more about the alternatives to line/page citation in amendment workflows later in this series of posts. For now, let me just say that Carl Malamud's recent characterization of the law in America as America's Operating System is great and on the right track for sure but I would suggest an addition :
(If you re-conceptualize legal texts as source code, a set of amazing things happen...Much more on this later but for now, back to today's topic : the complex nature of amendatory actions.)
Modifications to legal materials : bills, statute etc. are fascinating in many ways. On the face of it, it is just a case of managing a set of “structured” documents, changing them in a controlled way and reporting on the changes. How hard can that be? Sounds like a perfect job for a database of XML documents, doesn't it? Just get some off the shelf XML stuff and bolt it together! Not so fast...let me pick out just three examples and explain why throwing plain old XML tools at the problem of amendatory cycles will not really help:
In the first example, plain old XML and off-the-shelf XML tools do not get to the heart of the problem because almost all of the work in making this process work is not related to hierarchical structure of the documents themselves (XML's home turf). The hard work is keeping track of, and processing with respect to, the point-in-time transclusions and modifications and feedback loops that are littered throughout this entire amendatory lifecycle problem space.
In the second example, plain old XML and off-the-shelf XML tools do not really help because most folk reading the description of a money bill will find the word “spreadsheet” popping into their heads. I do not disagree. Yet legislatures/parliaments use the same line/page-based citation mechanism to work number-centric money bills that they do to work word-centric bills.
In the third example, plain old XML and off-the-shelf XML tools does not really help because all the work is in managing the relationship between the two documents. The “base document” and the one that describes how the “base document” is to be amended. Our old friend line/page number citation is used extensively here. In some jurisdictions I have worked in, the exact language used to describe proposed amendments changes depending on the point in the workflow where the proposed change occurs, so the amendment list contents is sensitive to not only the exact cut of the bill but exactly where in the business process the amend itself is taking place...Oh, and of course, amendments can themselves be amended...(Are we having fun yet?)
...and that is just 3 examples off the top of my head. I could go on for a long time but there is hopefully no need to labor it. In conclusion:
Legislative amendatory cycles often feature problem areas that XML has no magic dissolution powers over:-
Now let me end by stressing that all is not doom-and-gloom here! I am just taking the time to lay out the problems as I see them so that my proposed approach for dealing with them in an LEA (Legislative Enterprise Architecture) can be understood with respect to the problems I'm trying to solve.
I'm biting off the part of the problem that is hardest (in my opinion) because it also where the most value can be extracted from an eDemocracy perspective. One of the mantras is KLISS is that anything you can do in the presence of government, you can do in the absence of government, without regard to walls or clocks. If you were to walk into a visitors gallery in a state house or parliament you would have great access to what is going on as it happens. You would be able to see the documents in play, watch the motions as they occur, listen to the debate and read the text of the proposed amendments, watch the votes, watch the testimony as it happens...The goal of LEA (Legislative Enterprise Architecture) is to make this degree of access available without regard to your location. Simply put, the goal is to allow legislatures/parliaments to be as “live” and as “real-time” with their publication of information as they want to be....But I'm getting ahead of myself again. More on that later.
Next up is reason number 3 why the standard XML model is not directly applicable in legislatures/parliaments. Namely, the critical nature of fidelity with historically produced renderings of legislative artifacts.
P.S. If you are unfamiliar with legislative/parliamentary procedure and would like a one-picture overview of what goes on inside one. This picture, based on the US Federal Government workflow is a good example of the genre.
However, no other institutions that I know of have workflows that are so influenced by their own history, their own traditions, and most importantly the powerful force of precedent. Some examples from my own experience in this space:
- We do it this way because that approach was established back in Justinian Law times.
- We do it this way because the founding fathers did it this way.
- We do it this way because we have a law in our statute books that says we do it this way.
- We do it this way because chamber rules dictate that we must do it this way.
- We do it this way because leadership sees no need to do it any differently. It ain't broken. Why fix it?
- We do it this way because we have over 1 thousand person years of experience in our drafting office right now that are based on doing it this way.
- We do it this way because hundreds of years of history have evolved it to its current state. It may have its flaws but we know it backwards and have long-established ways of dealing with any problems that occur.
- We do it this way because if we changed it, all sorts of things would be impacted down stream in other agencies, third party publishers etc. and it could impact other branches i.e. executive branch, judicial branch.
Some other reasons, more subtle, more technological and sometimes contentious, include:
- We do it this way because that is how the mainframe worked back in 1971. We had no ability to do redlining, or printing in color, or spreadsheet calculations back in those days...
- We do it this way because we always have. If there is a specific reason, We don't know it is but it might be statutory or in chamber rules somewhere so we don't want to think about changing it. It works. Why risk breaking it?
- We do it this way because it has been handed down - quite literally – for generations. It is simply how we do it. Period. We don't change to meet the needs of technology. We expect technology to change to meet our needs.
I will talk more about the alternatives to line/page citation in amendment workflows later in this series of posts. For now, let me just say that Carl Malamud's recent characterization of the law in America as America's Operating System is great and on the right track for sure but I would suggest an addition :
- If the law in America is America's operating system then the text of that law – is the source code for that operating system.
(If you re-conceptualize legal texts as source code, a set of amazing things happen...Much more on this later but for now, back to today's topic : the complex nature of amendatory actions.)
Modifications to legal materials : bills, statute etc. are fascinating in many ways. On the face of it, it is just a case of managing a set of “structured” documents, changing them in a controlled way and reporting on the changes. How hard can that be? Sounds like a perfect job for a database of XML documents, doesn't it? Just get some off the shelf XML stuff and bolt it together! Not so fast...let me pick out just three examples and explain why throwing plain old XML tools at the problem of amendatory cycles will not really help:
- A bill (at least in North America) is not a document in the same way that a chapter of a novel is a document. Your average bill “pulls in” other content (for example sections of statute). These pulled in documents then may be modified in their new container. Now you have a composite document that “contains” a modified version of another document. The pull in happened at a point in time and resulted in new content that was correct as of a given point in time. Moreover, said content is then itself amended under very strict change control resulting in a new “version” of the statute, at a new point in time, in this bill container.
This new document, is then itself subject to change through time, via committees, floor actions etc. By the time it gets to enrollment (if it gets that far!), it has a very rich history of time-point derivations and modifications. An n-dimensional graph of content inter-relationships, all richly annotated and all critically dependent on point-in-time views and point-in-time-based modifications.
This new document is used as raw material to derive other documents that go on to have independent lives. For example, committee reports, floor amendments, journals etc. Again, all of which are point-in-time based and all of which are themselves subject to change. Then on top of that, if the bill actually makes it into law, the updated statute sections it contains need to be pulled back out and used to update the existing corpus of law. Then on top of that, you have to deal with the fact that not all the sections will be enacted to law at the same time. Some can have sunrise clauses, sunset clauses. The coming-into-force of some might be dependent on some external event e.g. publication is a register etc. etc. - A money bill is, essentially, a large numerical calculation pretending to be a document. Depending on jurisdiction, it might be called an appropriations bill or a finance bill or a budget bill...same concept underneath. Although most money bills do not impact statute, they form an enormous part of what a legislature/parliament actually does. A tremendous amount of work goes into money bills and the amendatory processes take the form of line item modifications (e.g. add $1 million to the funding for X) and formulaic changes (e.g. drop general fund expenditure by 10% across the board.).
- An amendment list (at least in the British Isles) is not a document in the same way that a chapter of a novel is a document. An amendment list (in America think "committee report") is a document but its purpose in life, is to set out how another document – often a bill – is to be modified. A bill, and a set of amendments to a bill, are two documents yet the relationships between them need to be very carefully managed. If one gets out of synch with the other, bad things happen. Either you have an amendment that does not make sense for a particular cut of the bill or a bill that cannot make sense for a particular proposed amendment
In the first example, plain old XML and off-the-shelf XML tools do not get to the heart of the problem because almost all of the work in making this process work is not related to hierarchical structure of the documents themselves (XML's home turf). The hard work is keeping track of, and processing with respect to, the point-in-time transclusions and modifications and feedback loops that are littered throughout this entire amendatory lifecycle problem space.
In the second example, plain old XML and off-the-shelf XML tools do not really help because most folk reading the description of a money bill will find the word “spreadsheet” popping into their heads. I do not disagree. Yet legislatures/parliaments use the same line/page-based citation mechanism to work number-centric money bills that they do to work word-centric bills.
In the third example, plain old XML and off-the-shelf XML tools does not really help because all the work is in managing the relationship between the two documents. The “base document” and the one that describes how the “base document” is to be amended. Our old friend line/page number citation is used extensively here. In some jurisdictions I have worked in, the exact language used to describe proposed amendments changes depending on the point in the workflow where the proposed change occurs, so the amendment list contents is sensitive to not only the exact cut of the bill but exactly where in the business process the amend itself is taking place...Oh, and of course, amendments can themselves be amended...(Are we having fun yet?)
...and that is just 3 examples off the top of my head. I could go on for a long time but there is hopefully no need to labor it. In conclusion:
Legislative amendatory cycles often feature problem areas that XML has no magic dissolution powers over:-
- The time dimension is often critically important and is the basis for many of the critical inter-relationships between legislative artifacts. When referencing a legislative document it is often insufficient to be able to point to what version of a particular document you are interested in. It is often the case that what you really need is a way of referencing the version *of the entire corpus*. (More on that important point later in this series).
- Some very important legislative documents – those related to budgets/appropriations – are treated as documents complete with page/line numbers but what they really are ( in modern day terminology) is spreadsheets – not XML's sweet spot.
- The pre-eminence of line/page numbers – again not XML's sweet spot – has to be addressed as it is central to the amendment cycle and legislatures are all about that amendment cycle. Pumping out the stuff at the end of Bismark's sausage machine is the easy bit.
Now let me end by stressing that all is not doom-and-gloom here! I am just taking the time to lay out the problems as I see them so that my proposed approach for dealing with them in an LEA (Legislative Enterprise Architecture) can be understood with respect to the problems I'm trying to solve.
I'm biting off the part of the problem that is hardest (in my opinion) because it also where the most value can be extracted from an eDemocracy perspective. One of the mantras is KLISS is that anything you can do in the presence of government, you can do in the absence of government, without regard to walls or clocks. If you were to walk into a visitors gallery in a state house or parliament you would have great access to what is going on as it happens. You would be able to see the documents in play, watch the motions as they occur, listen to the debate and read the text of the proposed amendments, watch the votes, watch the testimony as it happens...The goal of LEA (Legislative Enterprise Architecture) is to make this degree of access available without regard to your location. Simply put, the goal is to allow legislatures/parliaments to be as “live” and as “real-time” with their publication of information as they want to be....But I'm getting ahead of myself again. More on that later.
Next up is reason number 3 why the standard XML model is not directly applicable in legislatures/parliaments. Namely, the critical nature of fidelity with historically produced renderings of legislative artifacts.
P.S. If you are unfamiliar with legislative/parliamentary procedure and would like a one-picture overview of what goes on inside one. This picture, based on the US Federal Government workflow is a good example of the genre.
Sunday, May 30, 2010
XML in legislature/parliament environments : The centrality of line/page number citation in amendment cycles
Last time I talked about KLISS, I listed 7 reasons why the standard analysis of how XML can/should be used in legislatures/parliaments is simple, elegant and (I would argue), wrong.
The first reason I listed in support of that strong assertion is the centrality of line/page number citation in amendment cycles. That is the topic I want to address in this post.
The standard XML model goes something like this:
1 - find out what the publication/outputs are (be they paper, cd-roms, websites, e books, whatever)
2 - classify the outputs from a content perspective i.e. bills are different from journals are different from...
3 - create hierarchical schemas that capture the logical structure of the outputs. Separate out the logical structure from the "accidents" of the rendering you are looking at. I.e. separate the content from the presentation of the content. Things like font, line breaks, page breaks, list ornamentations, footnote locations etc.
4 - figure out how to programmatically layer on the presentation information using stylesheet technologies, bespoke rendering algorithms etc.
5 - Create author/edit workflows that concentrate on the logical view of the data. Leave as many aspects of presentation to be applied automatically in the final publication production systems as possible.
If all goes according to plan applying this standard model, you end up in a very happy place. Your content is independent of any one output format. You can create new renderings (which creates new products) from existing content cheaply. You can search through your own content at a rich semantic level. You can re-combine your content in a myriad of different ways to create even more new products from your content. If you have done a good job of making your logical models semantically rich, you can even automate the extraction of content so well that you basically get new products from your content "for free" in terms of ongoing effort...
Now that semantically rich, machine readable data is clearly the next big thing on the internet in terms of content disemmination, you end up being able to pump out RDF triples or RDFa or microformats or good old CSV easily. You get to pop a RESTian interface on front of your repository to allow consumers to self-serve out of it if that is what you want to do.
When this works, it is a truly a thing of beauty. The value proposition is compelling. However, the history of the world is littered with examples - all the way back to the SGML days - of where it wasn't quite so simple in practice.
In order to leverage the benefits you really need to think matters through to another level of detail in all but the most trivial of publishing enterprises. Sadly, many historical XML initiatives (and SGML initatives before that), in legislatures/parliament have not gone to that extra level of analysis before breaking out XML editors and XSLT and cutting loose building systems.
Line/page numbers are a classic example of the extra level of detail I am talking about. Here are some examples:
Note how important - I would argue central - line/page numbers are to what is going on here at a business level. In the standard XML model discussed above, line/page numbers are throwaway artifacts of the publishing back-end. They are not important and certainly should not be part of the logical data model.
But look at the problem from the perspective of the elected representatives, the drafting attorneys, the chamber clerks, the lobbiests,...The line/page numbers are absolutely *central* to the business process of managing how the law at time T becomes that law at time T+1 (as discussed here). It is that process - that change process - that is central to what a legislature/parliament is all about.
I could write a book about this (and someday I probably will) but for now, I want to end with some points - each of which is deserving of its own detailed explication:
Next time: the complex nature of amendatory actions.
The first reason I listed in support of that strong assertion is the centrality of line/page number citation in amendment cycles. That is the topic I want to address in this post.
The standard XML model goes something like this:
1 - find out what the publication/outputs are (be they paper, cd-roms, websites, e books, whatever)
2 - classify the outputs from a content perspective i.e. bills are different from journals are different from...
3 - create hierarchical schemas that capture the logical structure of the outputs. Separate out the logical structure from the "accidents" of the rendering you are looking at. I.e. separate the content from the presentation of the content. Things like font, line breaks, page breaks, list ornamentations, footnote locations etc.
4 - figure out how to programmatically layer on the presentation information using stylesheet technologies, bespoke rendering algorithms etc.
5 - Create author/edit workflows that concentrate on the logical view of the data. Leave as many aspects of presentation to be applied automatically in the final publication production systems as possible.
If all goes according to plan applying this standard model, you end up in a very happy place. Your content is independent of any one output format. You can create new renderings (which creates new products) from existing content cheaply. You can search through your own content at a rich semantic level. You can re-combine your content in a myriad of different ways to create even more new products from your content. If you have done a good job of making your logical models semantically rich, you can even automate the extraction of content so well that you basically get new products from your content "for free" in terms of ongoing effort...
Now that semantically rich, machine readable data is clearly the next big thing on the internet in terms of content disemmination, you end up being able to pump out RDF triples or RDFa or microformats or good old CSV easily. You get to pop a RESTian interface on front of your repository to allow consumers to self-serve out of it if that is what you want to do.
When this works, it is a truly a thing of beauty. The value proposition is compelling. However, the history of the world is littered with examples - all the way back to the SGML days - of where it wasn't quite so simple in practice.
In order to leverage the benefits you really need to think matters through to another level of detail in all but the most trivial of publishing enterprises. Sadly, many historical XML initiatives (and SGML initatives before that), in legislatures/parliament have not gone to that extra level of analysis before breaking out XML editors and XSLT and cutting loose building systems.
Line/page numbers are a classic example of the extra level of detail I am talking about. Here are some examples:
- An example of a floor amendment in the Illinois General Assembly
- An example of an amendatory bill from the Irish Parliament.
- An example of an amendatory bill from the British Parliament.
- An example of a committee report from the US Senate
Note how important - I would argue central - line/page numbers are to what is going on here at a business level. In the standard XML model discussed above, line/page numbers are throwaway artifacts of the publishing back-end. They are not important and certainly should not be part of the logical data model.
But look at the problem from the perspective of the elected representatives, the drafting attorneys, the chamber clerks, the lobbiests,...The line/page numbers are absolutely *central* to the business process of managing how the law at time T becomes that law at time T+1 (as discussed here). It is that process - that change process - that is central to what a legislature/parliament is all about.
I could write a book about this (and someday I probably will) but for now, I want to end with some points - each of which is deserving of its own detailed explication:
- Bismark once said that law and sausages are similar in that you really don't want to know how they are made. If value is to be derived from computer systems in legislatures/parliaments, it is necessary to get into the sausage machine and understand what is really going on inside.
What you will find there is a structured and rigorous process (albeit a very complex one) but the bit that is truly structured is the *change regimen for documents* - not the logical structure of the documents themselves. Granted, when the law making process is finished and out pops a new statute or a new act, the document generally has an identifiable logical structure and generally no longer has line/page numbers. However, it spent most of its life up to that point "un-structured" from a classical XML perspective. If you are going to derive value from XML inside the legislature/parliament as opposed to downstream from it, you need to embrace that fact in my opinion. - On the face of it, it should be a simple matter to throw in line/page numbers into the data model, right? Its just tags, right? Sadly, no.
- Firstly, amendment structures have a nasty habit of overlapping logical structures. I.e. the amendment starts half way through one paragraph and extends to midway through the second bullet point...This is hard to model inside XML (or SGML) as they are both rooted (pun intended) in the notion of one dominant, directed acyclic graph inside which all the content lives. Also, most XML validation techniques are ultimately based on Chomsky-esque production grammars that again, have the concept of a dominant hierarchical decomposition at their core.
- Secondly - this one is a doozy - line/page numbers only come into existence once a document is rendered. This creates a deep complexity because now part of what you need to manage in your data model, only comes into existence when the data model is processed through a rendering algorithm.
- Thirdly - this one is the real kicker - line/page numbers are the result of applying algorithms (known as H+J algorithms). Every word processor, XML editor, every web browser, every DTP package, every typesetting system, every XSL:FO implementation, every DSSSL implementation on the planet that I know of has its own way of doing it. You are pretty much guaranteed to get different results when you switch H+J algorithms.
Moreover, the result of the H+J is influenced by the fonts you have installed, the printer your computer has as its default printer, the version of postscript being used...
- Firstly, amendment structures have a nasty habit of overlapping logical structures. I.e. the amendment starts half way through one paragraph and extends to midway through the second bullet point...This is hard to model inside XML (or SGML) as they are both rooted (pun intended) in the notion of one dominant, directed acyclic graph inside which all the content lives. Also, most XML validation techniques are ultimately based on Chomsky-esque production grammars that again, have the concept of a dominant hierarchical decomposition at their core.
- I have heard of drafting attorneys saying "I did not spent years of my life and X dollars in law school just to type in line and page numbers". The implications of not addressing line/page number issues can be enormous to the value proposition.
- Sadly, not only are H+J algoritms very different in different systems, they are often proprietary and essentially beyond human comprehension. Take Microsoft Word for example, how many people do you think completely understand the complex business logic it applies in deciding how to render its pages and thus where the line/page numbers fall? The variables are numerous. The state space, huge.
- Searching for hyphenation in patent databases produces a large number of hits
- I have seen folks pat themselves on the back because they have XML in their database and XML in their authoring system and therefore they *own* their own content and destiny. I would argue that unless you also own the algorithms that produced the rendering of that content, then you don't actually own your own data - at least not in legislative/parliamentary environments. If I cannot be sure that when I re-render this document in 10 years time, I will get the same result all the way down to the line/page numbers critical to the business process, do I really own my own destiny?
- A cherished tenet of the standard XML model is that content and presentation can and should be separated. In legislatures/parliaments it is vital that they are not separated. Many classically trained XML technologists find that deeply disturbing.
- An excellent way of locking down line and page numbers of course is to render to a paginated form such as PDF. This works fine on the output side of the legislature/parliament but fails inside it because legislative documents are iterated through amendment cycles. The line/page numbers output on iteration one are the input to the second amendment cycle which produces a new set of line/page numbers...
- I find myself frustrated at times when I hear folks talk about standardizing data formats for legislative materials as they often fly by the centrality of line/page numbers and go straight to purely logical models. It is particularly frustrating from an eDemocracy perspecitive because to do that right, in my opinion, you want to provide access to the work-in-progress of a legislature/parliament. Bills as they are being worked in Bismark's sausage machine. Having deep electronc access to the laws after they are laws is great and good but wouldn't it be better for eDemocracy if we had access to them before they were baked?
- You might be thinking that the solution to this whole problem is to do away with line/page number-based amendment cycles completely. I do not disagree. However, in many jurisdictions we are dealing with incredibly long standing, incredibly historic, tried-and-trusted business processes inside what are the most risk averse and painstakingly detail-oriented institutions I have ever come across, staffed with some of the sharpest and most conscientious minds you will find anywhere. The move away from line/page numbers will not be swift.
- Finally, although I hope I have convinced you that line/page numbers are important and create problems complex and subtle in legislative/parliamentary IT, I don't want to appear negative in my analysis. It is entirely possible to get significant business benefit out of IT - and XML in particular - in legislatures/parliaments. That is what we have done in KLISS and more generally, what is going on in LWB. It is just that you need to bring a lot of knowledge about how legislatures/parliaments actually work into the analysis and design. Implementing straight out of the standard XML playbook isn't going to cut the mustard.
Next time: the complex nature of amendatory actions.
Subscribe to:
Posts (Atom)