Sean McGrath

Friday, February 19, 2016

It's obvious really

Nothing is more deserving of questioning, than an obvious conclusion.

Thursday, February 11, 2016

Fixity, Vellums and the curious case of the rotting bits

So, vellum may be on the way out in the UK Parliament

Many deep and thorny issues here.

Thought experiment: In your hand you have a 40 page document. On your computer screen you have an electronic document open in a word processor. You have been told they they are "the same document".

How can you tell? What does it even mean to say that they are the "same"? Does it matter if there is no sure-fire way to prove it?

Let us start at the end of that list of questions and work backwards. Does it matter that there is no sure-fire way to prove it? Most of the time, it does not matter if you cannot prove they are the same. Over the years since the computerization of documents, we have devised various techniques for managing the risks of differences arising between what the computer says and what the sheets of paper say. However, when it does matter it tends to matter a whole bunch. Examples are domains such as legal documents, mission critical procedure manuals, that sort of thing.

A very common way of mitigating the risk of differences arising between paper and electronic texts is to declare the electronic version to be the real, authentic document and treat the paper as a "best efforts" copy or rendering of the authentic document. If the printing messes up and some text gets chopped off the right hand margin we think "No big deal". Annoying but not cataclysmic. The electronic copy is the real one and we can just go back to the source any time we want...

...Yes, as long as the electronic source is not, itself, an ambiguous idea. Again, we have developed practices to mitigate this risk. If I author a document in, say, FrameMaker but export RTF to send to you, the FrameMaker is considered the real, authentic electronic file. If anything happens to the RTF content - either as it is exported, transmitted or imported by you into some other application - we refer back to the original electronic file which is the FrameMaker incarnation....

...If we still have it up to date. The problem is that we do not print FrameMaker or Word or Quark Express. We tend to print "frozen" renderings of these things. Things like postscript and PDF. On the way to paper, it is not uncommon for fixes to be required just prior to the creation of very expensive printing plates. If something small needs to be fixed, it will probably get fixed at 2 a.m. in the postscript or PDF file...which is now out of sync with the original FrameMaker file...

...Which, come to think of it, might not have been as clear cut an authoritative source as I made it out to be. It is not uncommon for applications like FrameMaker, Adobe CS2, Quark etc. to be used downstream of an authoring process that utilizes Microsoft Word or Corel Wordperfect or OpenOffice or some Webb-y browser plug-in.

If (i.e. when) errors are found in document proofs the upstream documents should really be fixed and the DTP versions re-constituted. Otherwise, the source documents get out of sync with the paper copy very quickly indeed. Worse, the differences between the source documents and the paper copy may be in the form of small errors. A period missing here, a dollar sign there...Small enough to be very hard to spot with proofreading but large enough to be very serious in for example, legal documents.

What to do? Well, we need to freeze-dry "cuts" of these documents to remove all ambiguity and then institute rigorous policies and procedures to ensure that changes are properly reflected everywhere along the document production toolchain...

...Which, these days, can be quite a complicated tool chain. For example, it is quite likely that web page production is feeding off the content prior to when it goes in to the DTP program. So, when a fix is needed you need to chase down all copies made and fix them, preferably all at the same time. Oh, and tools for editing (yes, I did say "editing") PDF documents are becoming more and more commonplace. So much for simple freeze-drying of digital content...

...Wait. This is getting too complex and has too many points of human intervention which introduces costs and the potential for human error. Best to simplify the tool chain...

..Yes, that would be nice but unfortunately DTP packages do useful things - things that word processors do not do. Word processors do useful things - things that Webby-plugins cannot do. Layout formats like PDF, Postscript, SVG do things that authoring formats like ODF do not do. HTML can be both a layout format and an authoring format but only at the expense of leaving behind a lot of very useful stuff for large document publication...

...So, where does that leave us? Well, behind that beautifully produced 40 pager you hold in you hand we have, roughly speaking, umpteen different electronic variations of it. Each of which may or may not be "the same" as the paper in a variety of subtle and (to me anyway) interesting ways...

...We have a problem. Consider this, search around the Web for companies offering data capture services from paper. Lots to choose from right? Now where do you think all that paper is coming from? Old, old content that pre-dates computerization? No. Some of it falls into that category but only some. Filled in, paper based forms that do not exit in computers at all? Yes, there is a bunch of that. But a lot of it is content that came into existence purely electronically, at some stage over the last 30 years. It passed through some complicated tool chain and workflow on its way to paper. The paper then became the only reliable incarnation of the content. Any electronic versions of it that the owners could dig out were found to be potentially flawed in some way... (cannot read them, cannot trust them to be the same as the paper etc. etc.)

...Thus the need to capture the content from paper. An exercise which, even with rigorous QA to say, 99.998% accuracy, is guaranteed to introduce its own set of errors...

It is messy, right? Well, get rid of the paper/vellum's, get rid of the witnessing/signing formalisms that have evolved over centuries, and I think you create an even bigger problem.

Solution? Tamper evident digital audit trails.

Friday, February 05, 2016

The biggest IT changes in the last 5 years: The re-emergence of data flow design

My first exposure to data flow as an IT design paradigm came around 1983/4 in the form of Myers and Constantine's work on "Structured Design" which dates from 1974.

I remember at the time finding the idea really appealing but yet, the forces at work in the industry and in academic research pulled mainstream IT design towards non-flow-centric paradigms. Examples include Stepwise Decomposition/Structured Programming (e.g Dijkstra), Object Oriented Design e.g. (Booch), Relational Data modelling (e.g. Codd).

Over the years, I have seen pockets of mainstream IT design terms emerging that have data flow-like ideas in them. Some recent relevant terms would be Complex Event Processing and stream processing.

Many key dataflow ideas are built into Unix. Yet creating designs leveraging line-oriented data formats, piped through software components, local-and-remote, everything from good old 'cat' to GNU Parallels and everything in between, has never, to my knowledge, been given a design name reflective of just how incredibly powerful and commonplace it is.

Things are changing I believe, thanks to cloud computing and multi-core parallel computing in general. Amazon AWS pipeline, Google Dataflow, Google Tensorflow are good examples. Also, bubbling away under the radar are things like FBP (Flow Based Programming), buzz around Elixer and similar such as shared-nothing architectures.

A single phrase is likely to emerge soon I think. Many "grey beards" from JSD (Jackson Stuctured Design), to IBM MQSeries (asynch messaging), to Ericsson's AXE-10 Erlang engineers, to Unix pipeline fans, will do some head-scratching of the "Hey, we were doing this 30 years ago!" variety.

So it goes.

Personally, I am very excited to see dataflow re-emerge to mainstream. I naturally lean towards thinking in terms of dataflow anyway. I can only benefit from all the cool new tools/techniques that come with mainstreaming of any IT concept.

Thursday, February 04, 2016

The 'in', 'on' and 'with' questions of software development

I remember when the all important question for a software dev person looking at a software component/application was "What is it written in?"

Soon after that, a second question became very important "What does it run on?"

Nowadays, there is a third, really important question, "What is it build/deployed with?"

"In" - the programming language of the component/app itself
"On" - the run-time-integration points e.g. OS, RDB, Browser, Logging
"With" - the dev/ops tool chain eg. source code control, build, regression, deploy etc.

In general, we tend to underestimate the time, cost and complexity of all three :-) However, the "With" category is the toughest to manage as it is, by definition, scaffolding used as part of the creation process. Not part of the final creation.

Tuesday, February 02, 2016

Blockchain this, blockchain that...

It is fun watching all the digital chatter about blockchain at the moment. There is wild stuff at both ends of the spectrum. I.e. "It is rubbish. Will never fly. All hype. Nothing new here. Forget about it." on one end and "Sliced bread has finally met its match! Lets appoint a CBO (Chief Blockchain Officer)" on the other.

Here is the really important bit I think: The blockchain shines a light on an interesting part of the Noosphere. The place where trust in information is something than can be established without needing a central authority.

That's it. Everything about how consensus algorithms work, how long they take to run, how computationally expensive they are, are all secondary and the S curve will out (http://en.wikipedia.org/wiki/Innovation). I.e. that implementation stuff will get better and better.

Unless of course, the proves to be some hard limit imposed by information theory that cannot be innovated around e.g. something analogous to the CAP Theorem or Entropy Rate Theorem or some such.

To my knowledge, no such fundamental limits are on the table at this point. Thus the innovators are free to have a go and that is what they will do.

The nearest thing to a hard limit that I can see on the horizon is the extent to which the "rules" found in the world of contracts/legislation/regulation can be implemented in "rules" that machines can work with. This is not so much an issue for the trust concepts of Blockchain as it is for the follow-on concept of Smart Contracts.

Tuesday, January 26, 2016

The biggest IT changes in the last 5 years: The death-throes of backup-and-delete based designs

One of the major drivers in application design is infrastructure economics i.e. the costs - both in capex an opex terms - of things like RAM, non-volatile storage, compute power, bandwidth, fault tolerance etc.

These economic factors have changed utterly in the 35 years I have been involved in IT, but we still have a strong legacy of designs/architectures/patterns that are ill suited to the new economics of IT.

Many sacred cows of application design such as run-time efficiencies of compiled code versus interpreted code or the consistency guarantees of ACID transactions, can be traced back to the days when CPU cycles were costly. When RAM was measured in dollars per kilobyte and when storage was measured in dollars per megabyte.

My favorite example of a deeply held paradigm which I believe has little or no economic basis today is the concept of designs that only keep a certain amount of data in online form, dispatching the rest, at periodic intervals, to offline forms e.g. tape, disc that require "restore" operations to get them back into usable form.

I have no problem with the concept of backups:-) My problem is with the concept of designs that only keep, say, 1 years worth of data online. This made a lot of sense when storage was expensive because the opex costs of manual retrieval were smaller than the opex costs of keeping everything online.

I think of these designs as backup-and-delete designs. My earliest exposure to such a design was on an IBM PC with twin 5 and 1/4 inch floppy disk drives. An accounting application ran from Drive A. The accounting data file was in Drive B. At each period-end, the accounting system rolled forward ledger balances and then - after a backup floppy was created - deleted the individual transactions on Drive B. That was about 1984 or so.

As organizations identified value in their "old data" -for regulatory reporting or training or predictive analytics, designs appeared to extract the value from the "old" data. This lead to a flurry of activity around data warehousing, ETL (extract, transform, load), business intelligence dashboards etc.

My view is that these ETL-based designs are a transitionary period. Designers in their twenties working today, steeped as they are in the new economics of IT, are much more likely to create designs that eschew the concept of ever deleting anything. Why would you when online storage (local disk or remote disk, is so cheap?) and there is alway the possibility of latent residual value in the "old" data.

Rather than have one design for day-to-day business and another design for business intelligence, regulatory compliance, predictive analytics, why not have one design that addresses all of these? Apart from the feasibility and desirability of this brought about by the new economics of IT, there is another good business reason to do it this way. Simply put, it removes the need for delays in reporting cycles and predictive analytics. Ie. rather than pull all the operational data into a separate repository and crunch it once a quarter or once a month, you can be looking at reports and indicators that are in near-realtime.

I believe that the time is coming when the economic feasibility of near-realtime monitoring and reporting becomes a "must have" in regulated businesses because the regulators will take the view that well run businesses should have it. In the same way that a well run business today, is expected to have low communications latencies between its global offices (thanks to cheap availability of digital communications), they will be expected to have low latency reporting for their management and for the regulators.

We are starting to see terminology form around this space. I am hopelessly biased because we have been creating designs based on never-deleting-anything for many years now. I like the terms "time-based repository" and the term "automatic audit trail". Others like the terms "temporal database", "provenance system", "journal-based repository"...and the new kid on the block (no pun intended!) - the block chain.

The block-chain, when all is said and done, is a design based on never-throwing-away anything *combined* with providing a trust-free mechanism for observers of the audit-trail to be able to have confidence in what they see.

There is lots of hype at present around block chain and with hype comes the inevitable "silver bullet" phase where all sorts of problems not really suited to the block-chain paradigm are shoe-horned into it because it is the new thing.

When the smoke clears around block chain - which it will - I believe we will see many interesting application designs emerge which break away completely from the backup-and-delete models of a previous economic era.

Monday, January 25, 2016

The biggest IT changes in the last 5 years: Multi-gadget User Interfaces

In the early days of mobile computing, the dominant vision was to get your application suite on "any device, at any time". Single purpose devices such as SMS messengers, email-only messengers faded from popularity, largely replaced by mobile gadgets that could, at least in principle, do every thing that a good old fashioned desktop computer could do.

Operating system visions started to pop up everywhere aimed at enabling a single user experience across a suite of productivity applications, regardless of form-factor, weight etc.

Things (as ever!) have turned out a little differently. Particular form factors e.g. smart phone, tend to be used as the *primary* device for a subset of the users full application suite. Moreover, many users like to use multiple form-factors *at the same time*.

Some examples from my own experiences. I can do e-mail on my phone but I choose to do it on my desktop most of the time. I will do weird things like e-mail myself when on-the-road, using a basic e-mail sender, to essentially put myself in my own in-box. (Incidentally, my in-box is my main daily GTD focus. I can make notes to myself on my desktop but I tend to accumulate notes on my smart phone. I keep my note-taker-app open on the phone even when I am at the desktop computer and often pick it up to make notes.

I can watch Youtube videos on my desktop but tend to queue up videos instead and then pick them off one-by-one from my smart-phone, trying to fit as many of them into "down time" as I can. Ditto with podcasts. I have a TV that has all sorts of "desktop PC" aspects from web browsers to social media clients but I don't use any of it. I prefer to use my smartphone (sometimes my tablet) while in couch-potato mode and will often multi-task my attention between the smartphone/tablet and the TV. I find it increasingly annoying to have to sit through advertizing breaks on TV and automatically flick to smartphone/tablet for ad breaks.

I suspect there is a growing trend towards a suite of modalities (smartphone, tablet, smart TV, smart car) and a suite of applications that in practical use, have smaller functionality overlaps than the "any device, at any time" vision of the early days would have predicted. A second, related trend is increasingly common use-cases where users are wielding multiple devices *at the same time* to achieve tasks.

Each of us in this digital world, is becoming a mini-cloud of computing power hooked together over a common WIFI hub or a bluetooth connection or a physical wire. As we move from home to car to train to office and back again, we reconfigure our own little mini-cloud to get things done. The trend towards smartphones becoming remote controls for all sorts of other digital gadgets is accelerating this.

I suspect that the inevitable result of all of this is that application developers will increasingly have to factor in the idea that the "user interface" may be a hybrid mosaic of gadgets, rather than any one gadget. With some gadgets being the primary for certain functionality.