Friday, September 10, 2010

Sustainable data.gov initiatives in 4 easy steps

(Note: this post is largely directed at government agencies and businesses working with government agencies working on data.gov projects.)

At the recent Gov 2.0 Summit Ellen Miller expressed the concern that the data transparency initiative of the Obama administration has stalled. Wandering the halls at the conference, I heard some some assenting voices. Concerns that there is more style than substance. Concerns about the number of data sets, the accuracy of the data, the freshness of the data and so on.

Having said that, I heard significantly more positives than negatives about the entire data.gov project. The enthusiasm was often palpable over the two day event. The vibe I got from most folks there was that this is a journey, not a destination. Openness and transparency are the result of an on-going process, not a one-off act.

These folks know that you cannot simply put up a data dump in some machine readable format and call your openness project "done". At least at the level of the CIOs and CTOs, my belief is that there is a widespread appreciation that there is more to it than that. It is just not that simple. It will take time. It will take work but a good start is half the work and that is what we have right now in my opinion, a good start.

I have been involved in a number of open data initiatives over the years in a variety of countries. I have seen everything from runaway successes to abject failures and everything in between. In this post, I would like to focus on 4 areas that my experiences lead me to believe are critical to convert a good start into a great success story in open data projects.

1 - Put some of your own business processes downstream of your own data feeds


The father of lateral thinking, Edward de Bono was once asked to advise on how best to ensure a factory did not pollute a river. De Bono's solution was brilliantly simple. Ensure that the factory take its clean water *downstream* from the discharge point. This simple device put the factory owners on the receiving end of whatever they were outputting into the river. The application of this concept to data.gov projects is very simple. To ensure that your organization remains focused on the quality of the data it is pushing out, make sure that the internal systems consume it.

That simple feedback loop will likely have a very positive impact on data quality and on data timeliness.

2 - Break out of the paper-oriented publishing mindset


For much of the lifetime of most government agencies, paper has been the primary means of data dissemination. Producing a paper publication is expensive. Fixing a mistake after 100,000 copies have been printed is very expensive. Distribution is time consuming and expensive...

This has resulted – quite understandably – in a deeply ingrained "get it right first time" publishing mentality. The unavoidable by-product of that mindset is latency. You check, you double check, then you check again...all the while the information itself is sliding further and further from freshness. Data that is absolutely perfect - but 6 months too late to be useful - just doesn't cut it in the Internet age of instantaneous publishing.

I am not for a minute suggesting that the solution is to push out bad data. I am however suggesting that the perfect is the enemy of the good here. Publish your data as soon as it is in reasonable shape. Timestamp your data into "builds" so that your customers know what date/time they are looking at with respect to data quality. Leave the previous builds online so that your customers can find out for themselves, what has changed from release to release. Boldly announce on your website that the data is subject to ongoing improvement and correction. Create a quality statement. When errors are found – by your or by your consumers – they can be fixed with very little cost and fixed very quickly. This is what makes the electronic medium utterly different from the paper medium. Actively welcome data fixes. Perhaps provide bug bounties in the same way that Don Knuth does for his books. Harness Linus Torsvald's maxim that "given enough eyeballs all bugs are shallow" to shake bugs out of your data. If you have implemented point 1 above and your are downstream of your own data feeds, you will benefit too!

3 - Make sure you understand the value exchange


Whenever A is doing something that will benefit B but A is taking on the costs, there must be a value exchange for the arrangement to be sustainable. Value exchange comes in many forms:
- An entity a may provide data because it has been mandated to do it. The value exchange here is that the powers-that-be will smile upon entity A.
- An entity A may provide data out of a sense of civic duty. The value exchange here is that A actively wants to do it and receives gratification – internally or from peers - from the activity.
- An entity A may provide data because entity B will return the favor.
- And so on.

One of the great challenges of the public sector all over the world is that inter-agency data exchanges tend to put costs and benefits into different silos of money. If agency A has data that agency B wants, why would agency A spend resources/money doing something that will benefit B? The private sector often has similar value exchange problems that get addressed through internal cross-billing. i.e. entity A sees value in providing data to B because B will "pay" for it, in the internal economy.

If that sort of cross-billing is not practical – and in many public sector environments, it is not – there are a number of alternatives. One is reciprocal point-to-point value exchange. i.e. A does work to provide data to B, but in return B does work to provide data that A wants. Another – and more powerful model in my opinion – is a data pool model. Instead of creating bi-lateral data exchange agreements, all agencies contribute to a "pool" of data in a sort of "give a penny, take a penny" basis. i.e. feel free to take data but be prepared to be asked by the other members of the pool, to provide data too.

In scenarios where citizens or the private sector are the consumers of data, the value exchange is more complex to compute as it involves less tangible concepts like customer satisfaction. Having said that, the Web is a wonderful medium for forming feedback loops. Unlike in the paper world, agencies can cheaply and easily get good intelligence about their data from, for example, an electronic "thumbs up/down" voting system.

The bottom line is that value exchanges come in all shapes and sizes but in order to be sustainable, I believe a data.gov project must know what the value exchange is if it is going to be sustainable.

4 - Understand the Government to Citizen dividend that comes from good agency-to-agency data exchange


In the last point, I have purposely emphasized the agency-to-agency side of data.gov projects. Some may find that odd. Surely the emphasis of data.gov projects should be openness and transparency and service to citizens?

I could not agree more, but I believe that the best way to service citizens and businesses alike, is to make sure that agency-to-agency data exchange functions effectively too.

Think of it this way: how many forms have you filled in with information you previously provided to some other agency? We all know we need an answer for the "we have a form for that" phenomenon but I believe the right answer is oftentimes not "we have an app for that" but rather "there is no app, and no form, for that data because it is no longer necessary for you to send it to us at all".

Remember: The best Government form is the form that disappears in a puff of logic caused by good agency-to-agency data integration.

In summary


1 - Put yourself downstream of your own data.gov initiative
2 - Break out of the paper-oriented "it must be perfect" mindset
3 - Make sure you understand the value exchanges. If you cannot identify one that makes sense, the initiative will most likely flounder at some point and probably sooner than you imagine
4 – When government agencies get their agency-to-agency data exchange house in order, better government-to-citizen and government-to-business data exchange, is the result.

No comments: