Featured Post


 These days, I mostly post my tech musings on Linkedin.  https://www.linkedin.com/in/seanmcgrath/

Friday, October 30, 2009

No PDFs?

No PDFs! is not the answer. The answer is to make data available in a variety of formats and explain the relationship between them. In such a world, PDF becomes one format amongst many. That is good.

The problem with a lot of the "just give us XML!" exhortations is that it is easy to over-simply the issues that arise with legal and regulartory materials.

In an XML L&R world, PDF has its place. So too does Microsoft Word, OpenOffice, TIFF, custom-schema XML, industry-standard-schema XML, JSON, RDFa etc.

The whole point of using XML "upstream" is to allow a multiplicity of transforms downstream. However, care needs to be taken when the documents are critical - like legislation...It is critical that the normative copy is made explicit. The ideal normative copy is one that can partake in author/edit cycles. However, the normative copy is (typically) the result of a printing process because paper is signed by empowered officers with an ink pen. On the way to paper, there are umpteen points of intervention in the typical paper printing workflow. Camera-ready or direct-to-plate workflows in print shops involve page imposition and all sorts of pre-flight work that can - and often do - render the upstream content suspect with respect to the final printed pages.

Legislative artifacts - especially bills - need very close attention to line/page numbers because of the time honored way in which legislative amendment cycles work. Most knee-jerk "structured" XML approaches fall flat on their faces as a result. With legislation, line/page numbers are not throw-away artifacts. They are as important as the words themselves...

None of the problems are insurmountable but they involve a lot of care and thought. Throwing out PDF isn't the solution. Simply plugging in a structured XML editor with a custom/industry schema won't work either. The solution involves combining structured XML technology with wordprocessor technology and DTP technology. The key is recognizing that a multiplicity of formats/techniques are required in order to serve the needs of the complete legislative workflow; and to be absolutely clear - every step of the way - what the normative copy of the digital text is.