Friday, February 05, 2016

The biggest IT changes in the last 5 years: The re-emergence of data flow design

My first exposure to data flow as an IT design paradigm came around 1983/4 in the form of Myers and Constantine's work on "Structured Design" which dates from 1974.


I remember at the time finding the idea really appealing but yet, the forces at work in the industry and in academic research pulled mainstream IT design towards non-flow-centric paradigms. Examples include Stepwise Decomposition/Structured Programming (e.g Dijkstra), Object Oriented Design e.g. (Booch),  Relational Data modelling (e.g. Codd).



Over the years, I have seen pockets of mainstream IT design terms emerging that have data flow-like ideas in them. Some recent relevant terms would be Complex Event Processing and stream processing.

Many key dataflow ideas are built into Unix. Yet creating designs leveraging line-oriented data formats, piped through software components, local-and-remote, everything from good old 'cat' to GNU Parallels and everything in between, has never, to my knowledge, been given a design name reflective of just how incredibly powerful and commonplace it is.

Things are changing I believe, thanks to cloud computing and multi-core parallel computing in general. Amazon AWS pipeline, Google Dataflow, Google Tensorflow are good examples. Also, bubbling away under the radar are things like FBP (Flow Based Programming), buzz around Elixer and similar such as shared-nothing architectures.

A single phrase is likely to emerge soon I think. Many "grey beards" from JSD (Jackson Stuctured Design), to IBM MQSeries (asynch messaging), to Ericsson's AXE-10 Erlang engineers, to Unix pipeline fans, will do some head-scratching of the "Hey, we were doing this 30 years ago!" variety.

So it goes.

Personally, I am very excited to see dataflow re-emerge to mainstream. I naturally lean towards thinking in terms of dataflow anyway. I can only benefit from all the cool new tools/techniques that come with mainstreaming of any IT concept.

Thursday, February 04, 2016

The 'in', 'on' and 'with' questions of software development

I remember when the all important question for a software dev person looking at a software component/application was "What is it written in?"

Soon after that, a second question became very important "What does it run on?"

Nowadays, there is a third, really important question, "What is it build/deployed with?"

"In" - the programming language of the component/app itself
"On" - the run-time-integration points e.g. OS, RDB, Browser, Logging
"With" - the dev/ops tool chain eg. source code control, build, regression, deploy etc.

In general, we tend to underestimate the time, cost and complexity of all three :-) However, the "With" category is the toughest to manage as it is, by definition, scaffolding used as part of the creation process. Not part of the final creation.

Tuesday, February 02, 2016

Blockchain this, blockchain that...

It is fun watching all the digital chatter about blockchain at the moment. There is wild stuff at both ends of the spectrum. I.e. "It is rubbish. Will never fly. All hype. Nothing new here. Forget about it." on one end and "Sliced bread has finally met its match! Lets appoint a CBO (Chief Blockchain Officer)" on the other.

Here is the really important bit I think: The blockchain shines a light on an interesting part of the Noosphere. The place where trust in information is something than can be established without needing a central authority.

That's it.   Everything about how consensus algorithms work, how long they take to run, how computationally expensive they are, are all secondary and the S curve will out (http://en.wikipedia.org/wiki/Innovation). I.e. that implementation stuff will get better and better.

Unless of course, the proves to be some hard limit imposed by information theory that cannot be innovated around e.g. something analogous to the CAP Theorem or Entropy Rate Theorem or some such.

To my knowledge, no such fundamental limits are on the table at this point.  Thus the innovators are free to have a go and that is what they will do.

The nearest thing to a hard limit that I can see on the horizon is the extent to which the "rules" found in the world of contracts/legislation/regulation can be implemented in  "rules" that machines can work with. This is not so much an issue for the trust concepts of Blockchain as it is for the follow-on concept of Smart Contracts.





Tuesday, January 26, 2016

The biggest IT changes in the last 5 years: The death-throes of backup-and-delete based designs

One of the major drivers in application design is infrastructure economics i.e. the costs - both in capex an opex terms - of things like RAM, non-volatile storage, compute power, bandwidth, fault tolerance etc.

These economic factors have changed utterly in the 35 years I have been involved in IT, but we still have a strong legacy of designs/architectures/patterns that are ill suited to the new economics of IT.

Many sacred cows of application design such as run-time efficiencies of compiled code versus interpreted code or the consistency guarantees of ACID transactions, can be traced back to the days when CPU cycles were costly. When RAM was measured in dollars per kilobyte and when storage was measured in dollars per megabyte.

My favorite example of a deeply held paradigm which I believe has little or no economic basis today is the concept of designs that only keep a certain amount of data in online form, dispatching the rest, at periodic intervals, to offline forms e.g. tape, disc that require "restore" operations to get them back into usable form.

I have no problem with the concept of backups:-) My problem is with the concept of designs that only keep, say, 1 years worth of data online. This made a lot of sense when storage was expensive because the opex costs of manual retrieval were smaller than the opex costs of keeping everything online.

I think of these designs as backup-and-delete designs. My earliest exposure to such a design was on an IBM PC with twin 5 and 1/4 inch floppy disk drives. An accounting application ran from Drive A. The accounting data file was in Drive B. At each period-end, the accounting system rolled forward ledger balances and then - after a backup floppy was created - deleted the individual transactions on Drive B. That was about 1984 or so.


As organizations identified value in their "old data" -for regulatory reporting or training or predictive analytics, designs appeared to extract the value from the "old" data. This lead to a flurry of activity around data warehousing, ETL (extract, transform, load), business intelligence dashboards etc.

My view is that these ETL-based designs are a transitionary period.  Designers in their twenties working today, steeped as they are in the new economics of IT, are much more likely to create designs that eschew the concept of ever deleting anything. Why would you when online storage (local disk or remote disk, is so cheap?) and there is alway the possibility of latent residual value in the "old" data.

Rather than have one design for day-to-day business and another design for business intelligence, regulatory compliance, predictive analytics, why not have one design that addresses all of these? Apart from the feasibility and desirability of this brought about by the new economics of IT, there is another good business reason to do it this way. Simply put, it removes the need for delays in reporting cycles and predictive analytics.  Ie. rather than pull all the operational data into a separate repository and crunch it once a quarter or once a month, you can be looking at reports and indicators that are in near-realtime.

I believe that the time is coming when the economic feasibility of near-realtime monitoring and reporting becomes a "must have" in regulated businesses because the regulators will take the view that well run businesses should have it. In the same way that a well run business today, is expected to have low communications latencies between its global offices (thanks to cheap availability of digital communications), they will be expected to have low latency reporting for their management and for the regulators.

We are starting to see terminology form around this space. I am hopelessly biased because we have been creating designs based on never-deleting-anything for many years now. I like the terms "time-based repository" and the term "automatic audit trail". Others like the terms "temporal database", "provenance system", "journal-based repository"...and the new kid on the block (no pun intended!) - the block chain.

The block-chain, when all is said and done, is a design based on never-throwing-away anything *combined* with providing a trust-free mechanism for observers of the audit-trail to be able to have confidence in what they see.

There is lots of hype at present around block chain and with hype comes the inevitable "silver bullet" phase where all sorts of problems not really suited to the block-chain paradigm are shoe-horned into it because it is the new thing.

When the smoke clears around block chain - which it will - I believe we will see many interesting application designs emerge which break away completely from the backup-and-delete models of a previous economic era.

Monday, January 25, 2016

The biggest IT changes in the last 5 years: Multi-gadget User Interfaces

In the early days of mobile computing, the dominant vision was to get your application suite on "any device, at any time". Single purpose devices such as SMS messengers, email-only messengers faded from popularity, largely replaced by mobile gadgets that could, at least in principle, do every thing that a good old fashioned desktop computer could do.

Operating system visions started to pop up everywhere aimed at enabling a single user experience across a suite of productivity applications, regardless of form-factor, weight etc.

Things (as ever!) have turned out a little differently. Particular form factors e.g. smart phone, tend to be used as the *primary* device for a subset of the users full application suite. Moreover, many users like to use multiple form-factors *at the same time*.

Some examples from my own experiences. I can do e-mail on my phone but I choose to do it on my desktop most of the time. I will do weird things like e-mail myself when on-the-road, using a basic e-mail sender, to essentially put myself in my own in-box. (Incidentally, my in-box is my main daily GTD focus. I can make notes to myself on my desktop but I tend to accumulate notes on my smart phone. I keep my note-taker-app open on the phone even when I am at the desktop computer and often pick it up to make notes.

I can watch Youtube videos on my desktop but tend to queue up videos instead and then pick them off one-by-one from my smart-phone, trying to fit as many of them into "down time" as I can. Ditto with podcasts. I have a TV that has all sorts of "desktop PC" aspects from web browsers to social media clients but I don't use any of it. I prefer to use my smartphone (sometimes my tablet) while in couch-potato mode and will often multi-task my attention between the smartphone/tablet and the TV. I find it increasingly annoying to have to sit through advertizing breaks on TV and automatically flick to smartphone/tablet for ad breaks.

I suspect there is a growing trend towards a suite of modalities (smartphone, tablet, smart TV, smart car) and a suite of applications that in practical use, have smaller functionality overlaps than the "any device, at any time" vision of the early days would have predicted. A second, related trend is increasingly common use-cases where users are wielding multiple devices *at the same time* to achieve tasks.

Each of us in this digital world, is becoming a mini-cloud of computing power hooked together over a common WIFI hub or a bluetooth connection or a physical wire. As we move from home to car to train to office and back again, we reconfigure our own little mini-cloud to get things done. The trend towards smartphones becoming remote controls for all sorts of other digital gadgets is accelerating this.

I suspect that the inevitable result of all of this is that application developers will increasingly have to factor in the idea that the "user interface" may be a hybrid mosaic of gadgets, rather than any one gadget. With some gadgets being the primary for certain functionality.


Tuesday, January 19, 2016

The biggest IT changes in the last 5 years : The fragmentation of notifications

I remember the days of "push". I remember Microsoft CDF (Channel Definition format). I remember, RSS in its original Netscape form and all the forms that followed it. I remember ATOM...I remember the feed readers. I remember thinking "This is great. I subscribe to stuff that is all over the place on the web and it comes to me when there are changes. I don't go to it."

But as social media mega-hubs started to emerge (Facebook, Twitter, LinkedIn etc.), this concept of a site/hub-independent notification infrastructure started to fragment.

Today, I find myself drawn into Facebook for Facebook updates, Twitter for Twitter updates, LinkedIn for LinkedIn updates, Youtube for Youtube updates....I am back going to stuff rather than have that stuff come to me.

At some stage here I am going to have to invest the time to find a mega-aggregator that can go aggregate the already partially aggregated stuff from all the hubs I have a presense on.

Rather than look for updates from the old-school concept of "web-site", our modern day aggregators need to be able to pull updates from hubs like Facebook, Twitter etc. which are themselves, obviously, providing aggregation.

The image this conjures into my mind is of a hierarchical "roll up" where each level of the hierarchy aggregates the content from its children which may themselves be aggregates.

The hierarchical/recursive picture has a lot of power obviously but I do wonder if it has the unfortunate side-effect of facilitating the emergence of web-gateway models for hubs. I.e. models in which the resources behind the gateway are not themselves, referenceable via URLs. We end up with no option but to "walk" the main nodes to do aggregation.

I remember a quote from Tim Berners Lee where he said something along the lines of "the great thing about hypertext is that it subverts hierarchy."

Perhaps, the mega-hubs model of the modern web subverts hypertext?


Monday, January 18, 2016

The biggest IT changes in the last 5 years : domain names ain't what they used to be

The scramble for good domain names appears to be a thing of the past. A couple of factors at work I think.

Firstly, there is obviously a limited supply of {foo}.com and {bar}.com. Out of necessity, "subdivided domain names" seems to be getting more and more popular. e.g. {foo}.{hub}.com and {bar}.{hub}.com are on the same domain name.  So too are {hub}.com/{foo} and {hub}.com/{bar}.

This has worked out well for those who provide hub-like web presence e.g. facebook, github.com, bandcamp.com etc.

Secondly, browser address windows have morphed into query expressions, often powered by Google under the hood. Even if I know I am looking for an entity {foo} that I know owns {foo}.com, I will often just type into the search bar and let the search engine do the rest.

Extending DNS with new domains like .club etc. only pushes the problem down the road a bit.

I am reminded very much of addresses of locations in the real world. Number 12 Example Avenue may start out as one address but it may decide to sub-divide and rent/sell apartments at that address. Now you have suite 123,Number 12, Example Avenue....

Nothing new in the world. DNS is like Manhattan. All the horizontal real estate of DNS is taken. The only want to get a piece of it now is to grab a piece of an existing address. DNS has entered its "high rise" era.