Featured Post

Linkedin

 These days, I mostly post my tech musings on Linkedin.  https://www.linkedin.com/in/seanmcgrath/

Monday, August 16, 2010

More on the KLISS workflow model

Last time in this KLISS series I introduced the KLISS approach to workflow and (hopefully) explained why workflow in legislative environments can get very complex indeed. I mentioned that the complexity can be tamed by zooming in on the fundamental features that all legislative workflows share. This post will concentrate on fleshing that assertion out some more.

Somebody once said that a business document such as a form, is a work flow snapshotted at a point in time. I really like that idea but I do not think that a document alone can serve as a snapshot of the workflow in all but the simplest of cases. To do that, in my opinion, you need an extra item : a set of pigeon holes.

The pigeon holes I am talking about are not just storage shelves with some sort of alphabetic or thematic sorting system. I am talking about the kind of pigeon holes that have labels on them that indicate what state the documents in each hole are in. Some classic states for documents to be in (in a legislative environment) include:

- Awaiting introduction in the Senate
- Pending engrossment into the Statute
- Bills currently being processed in the Agriculture committee
- etc.

The power of the incredibly simple, time honored pigeon hole system is too often overlooked in our database centric digital world. The electronic equivalent of these pigeon holes is, of course, nothing more complex than the concept of a file-system folder. In truth, the electronic pigeon hole is generally more powerful than its physical analog because in the electronic world, folders can trivially contain other folders to any required depth. Moreover, electronic folders can have any required capacity.

Sadly, I have rather a lot of personal experience of how this simple-yet-powerful concept of recursive, expandable folders can be "pooh poohed" by folks who think that data cannot possible be considered "managed" unless it it loaded into a database or otherwise constrained in terms of shape and volume. Oftentimes, said folks use the words "database" and "relational database" interchangeably. For such folks, the data model for a "record" is the center of the universe. Insofar as that record has workflow, the workflow is an attribute of the record – not a "place" where the record lives... This record-centric world view is oftentimes the beginning of a slippery slope in legislative informatics where designers find themselves tied up in knots trying to:

  • create enough state variables – fields – in the tables to capture all possible workflow states
  • capture all the business rules for workflow transitions in machine readable form
  • shred the legislative content into pieces (often-times with XML) to fit into the non-recursive, tabular slots provided by relational databases
  • re-assemble the shredded pieces to re-constitute working documents for publication


I do not subscribe to this record-centric model. It works incredibly well when record structures are simple, workflows are finite and record inter-dependencies are few. That is not the world we live in in legislative informatics. Legislative content is messy, hierarchical, time-oriented and often densely interlinked. Relational databases are just not a good fit either for the raw data or for the workflows that work on that raw data. Having said all that, I hasten to point out that ye-olde recursive folder structure on its own is not a perfect fit either. There are two main missing pieces.

Firstly, as I've said before, legislative informatics is all about how content changes over time and the audit trail that allows the passage through time to be accessed on demand. Out-of-the-box recursive file systems do not provide this today. (Aside: those with long memories may remember Digital Equipment Corporations VMS operating system. It was the last mainstream operating system to transparently version files at the operating system level.).

Secondly, legislative informatics is heavily event-oriented. i.e. when an event happens, entire sets of sub-sequent events are kicked off, each of which is likely to create more events which may in turn, create more events... Out-of-the-box recursive file systems do not provide this easily today. i.e. a way of triggering processing based on file-system transaction events (Yes, you can do it at a very low level with device driver shenanigans and signals but its not for the faint of heart).

To address these two short-comings of a classic folder structure for use as a workflow substrate, the KLISS model added two extra dimensions.

  • Imagine a system of recursive pigeon holes that starts empty and then remembers all Create/Read/Update/Delete/Lock operations of pigeon holes and of the documents that flow through them
  • Imagine a system of recursive pigeon holes in which each hole carries a complete history of everything that has ever passed through it (including other pigeon holes)
  • Imagine a system of recursive pigeon holes in which each hole can trigger any required data processing at the point where new content arrives into it.


The first two items above are provided by the time-machine that I have previously talked about. The last one is what we call the Active Folder Framework in KLISS. The best way to explain it is perhaps by analogy with a workflow system realized with a good old fashioned set of physical pigeon holes. Consider this example:
    A new bill is introduced in the House. The requested bill draft is acquired from the sponsor (or perhaps legislative council) and placed in the "introduced" pigeon-hole. This event kicks off the creation of an agenda item where the initial fate of introduced bill will be discussed. That agenda item is lodged in the "pending agenda items" pigeon hole. Later, when the order of business gets to it, items from the "introduced" pigeon hole are taken out and considered. They may go back into that pigeon hole or be moved to pigeon holes specific to particular committees.

KLISS - and more generally the Legislative Enterprise Architecture that underlines it - operates like that. Workflow items - documents - are moved around named folders. Every move is audit-trailed in the time machine. Every time something is changed, events are fired so that down-stream processes that update their internal views of what the pigeon-holes represent. In KLISS all the workflow folders are "active" in the sense that they are not just passive place-holders for work artifacts. Putting something into a folder triggers an event. Taking something out triggers and event etc. Moreover, the event processors have access to the pigeon-hole structure so that the event-processors can create new work artifacts and move them around...this triggering more events. The event processors can even trigger the creation of new folders and new event processors!

The combination of (a) recursive named folders, (b) time machine audit trail and (c) event propagation covers a tremendous amount of ground. These are the three "pillars" on top of which, most of KLISS is built. Internally in Propylon we call them the Three pillars of Zen or TPOZ for short.

At a business level, there are some very attractive upshots to this model.

  • The abstraction that the end-users interact with is a very familiar one. Files in folders...All the time machine and event propagation machinery is transparent to end-users.
  • Ad-hoc workflows can be very easily accommodated without custom programming. Just create some folders and shunt work through them. The audit-trail will continue to be rigorous and the event-propagation will continue to function even for workflows created on the fly by staff operating under pressure (i.e. the House has just suspended the rules and is now about to do X...)
  • Automation can be added incrementally. i.e. if workflow step X is currently manual, the entire workflow can be put in place now and manual steps can be automated over time. The system as a whole operates on the basis that all active folder processing is asynchronous in nature. i.e. we assume that there is a non-deterministic delay for each workflow action. The net result of automating any given folder in KLISS is simply that its associated workflow steps simple get faster over time. Nothing else in the system changes.
  • Workflows have autonomic characteristics. For example, an interface to a voting board may malfunction because of a network error. The result would be that an active folder (an automated workflow step) ceases to be active. No problem, simply revert to the manual processing of the electronic voting documents i.e. fill in the vote forms to create new vote items. Remember : the complete audit trail and event machinery is still working away under the hood. Everything else in the system will continue to function unaffected by the point-failure of one component.

Perhaps the most subtle aspect of the workflow model to grasp is the asynchronous nature of it all. I wrote earlier about naming things with rigid designators in KLISS and that is critical to workflow processing as is the consistency model. Each active folder processor works to its own concept of time, always referring to content in the system via point-in-time URLs that lock down – snapshot – the entire repository as it was at that moment in time. Events that happen in the repository are queued up for consumption by active folder processors. If a processor is slow or goes offline for an upgrade, no problem, the event messages are queued up to be processed whenever the active folder comes back on line.

In summary, KLISS models workflows by extending the familiar pigeon hole abstraction with temporal and event-oriented dimensions. In terms of formalisms in systems theory, it is perhaps closest to Petri nets in which the "tokens" moving between states are information-carrying objects such as digital bill jackets or votes or explanatory memoranda.

So far, pretty much everything I have discussed in this KLISS series has been server-side focused. The next few posts will be client side focused. Next up: author/edit sub-systems in legislative environments.

2 comments:

tom.ryan said...

Sean, This is a fascinating approach. How would TpoZ compare to RDBMS time-wise in terms of setup and maintenance? Suppose a company has nothing at time n=zero except a choice between TpoZ and RDBMS to handle workflows. Would they require roughly the same amount of training/effort to set up? Also, would the company need to hire the equivalent to DBAs to manage the system long term? (TpoZAs?)

Sean McGrath said...

Tom,
From a setup perspective, I believe the TPOZ model is better than the straight RDBMS model for workflow (of course I'm biased!) because the folder structure that captures the workflow is dynamically created. I.e. there is no locked down set of workflow states and state transitions. There are other systems that are similarly fluid w.r.t. workflow. Lotus Notes for example, comes to mind although it takes a very different approach.

In terms of on-going care and feeding by DBAs or their equivalents, I would say TpoZ approach and the RDBMS end up about the same.

regards,
Sean