Featured Post


 These days, I mostly post my tech musings on Linkedin.  https://www.linkedin.com/in/seanmcgrath/

Wednesday, December 21, 2016

The new Cobol, the new Bash

Musing, as I do periodically, on what the Next Big Thing in programming will be, I landed on a new (to me) thought.

One of the original design goals of Cobol was English-like nontechnical readability. As access to NLP and AI continues to improve, I suspect we will see a fresh interest in "executable pseudo-code" approaches to programming languages.

In parallel with this, I think we will see a lot of interest in leveraging NLP/AI from chat-bot CUI's in programming command line environments such as the venerable bash shell.

It is a short step from there I think, to a read-eval-print loop for an English-like programming environment that is both the programming language and the operating system shell.


Friday, November 25, 2016

Recommender algorithms R Us

Tommorow, at congregation.ie, my topic is recommender algorithms, although, at first blush, it might look like my topic is the role of augmented reality in hamster consumption.

A Pokemon ate my hamster.

Friday, November 04, 2016

J2EE revisited

The sheer complexity of the Javascript eco-system at present, is eerily reminiscent of the complexity that caused many folk to balk at J2EE/DCOM back in the day.

Just sayin'.

Wednesday, October 19, 2016

Nameless things within namless things

So, I got to thinking again about one of my pet notions - names/identifiers - and the unreasonable amount of time IT people spend naming things, then mapping them to other names, then putting the names into categories that are .... named.... aliasing, mapping, binding, bundling, unbundling, currying, lamdizing, serializing, reifying, templating, substituting, duck-typing, shimming, wrapping...

We do it for all forms of data. We do it for all forms of algorithmic expression. We name everything. We name 'em over and over again. And we keep changing the names as our ideas change, and the task to be accomplished changes, and the state of the data changes....

It gets overwhelming. And when it does, we have a tendency to make matters worse by adding another layer of names. A new data description language. A new DSL. A new pre-processor.

Adding a new layer of names often *feels* like progress. But it often is not, in my experience.

Removing the need for layers of names is one of the great skills in IT in my opinion. It is so undervalued, the skill doesn't have um, a, name.

I am torn between thinking that this is just *perfect* and thinking it is unfortunate.

Wednesday, October 05, 2016

Semantic CODECs

It occurred to me today that the time-honored mathematical technique of taking a problem you cannot solve and re-formulating it as a problem (perhaps in a completely different domain) that you can solve, is undergoing a sort of cambrian explosion.

For example, using big data sets and deep learning, machines are getting really good at parsing images of things like cats.

The more general capability is to use a zillion images of things-like-X to properly classify a new image being either  like-an-X or not-like-an-X, for any X you like.

But X is not limited to things we can take pictures of. Images don't have to come from cameras. We can create images from any abstraction we like. All we need is an encoding strategy....a Semantic CODEC if you will.

We seem to be hurtling towards large infrastructure that is specifically optimized for image classification. It follows, I think, that if you can re-cast a problem into an image recognition problem - even if it has nothing to do with images - you get to piggy-back on that infrastructure.


Wednesday, September 21, 2016

The next big exponential step in AI

Assuming, for the moment, that the current machine learning bootstrap pans out, the next big multiplier is already on the horizon.

As more computing is expressed in forms that require super-fast, super-scalable linear algebra algorithms (a *lot* of machine learning techniques do this), it becomes very appealing to find ways to execute them on quantum computers. Reason being, exponential increases are possible in terms of parallel execution of certain operations.

There is a fine tradition in computing of scientists getting ahead of what today's technology can actually do. Charles Babbage, Dame Ada Lovlace, Alan Turing, Doug Englebart, Vannever Bush, all worked out computing stuff that was way ahead of the reality curve, and then reality caught up with their work.

If/when quantum computing gets out of the labs, the algorithms will already be sitting in the Machine Learning libraries ready to take advantage of them, because forward looking researchers are working them out, now.

In other words, it won't be a case of "Ah, cool! We have access to a quantum computer! Lets spend a few years working out how best to use them.". Instead it will be "Ah, cool!. We have access to a quantum computer! Lets deploy all the stuff we have already worked out and implemented, in anticipation of this day."

It reminds me of the old adage (attributable to Poyla, I think) about "Solving for N". If I write an algorithm that can leverage N compute nodes, then it does not matter that I might only be able to deploy it with N = 1 because of current limitations. As soon as new compute nodes become available, I can immediately set N = 2 or 2000 or 20000000000 and run stuff.

With the abstractions being crafted around ML libraries today, the "N" is being prepped for some very large potential values of N.

Thursday, September 15, 2016

Deep learning, Doug Englebart and Jimmy Hendrix

The late great Doug Englebart did foundational work in many areas of computing and was particularly interested in the relationship between human intelligence and machine intelligence.

Even a not-so-smart machine can augment human productivity if even simple cognitive tasks can be handled by the machine. Reason being, machines are super fast. Super fast can compensate for "not-so-smart" in many useful domains. Simple totting up of figures, printing lots of copies of a report, shunt lots of data around, whatever.

How do you move a machine from "not-so-smart" to "smarter" for any given problem? The obvious way is to get the humans to do the hard thinking and come up with a smarter way. It is hard work because the humans have to be able to oscillate between smart thinking and thinking like not-so-smart machines because ultimately the smarts have to be fed to the not-so-smart machine in grindingly meticulous instructions written in computer-friendly (read "not-so-smart") programs. Simple language because machines can only grok simple language.

The not-so-obvious approach is to create a feedback loop where the machine can change its behavior over time by feeding outputs back into inputs. How to do that? Well, you got to start somewhere so get the human engineers to create feedback loops and teach them to the computer. You need to do that to get the thing going - to bootstrap it....

then stand back....

Things escalate pretty fast when you create feedback loops! If the result you get is a good one, it is likely to be *a lot* better than your previous best because feedback loops are exponential.

Englebart's insight was to recognize that the intelligent, purposeful creation of feedback loops can be a massive multiplier : both for human intellect at the species level, and at the level of machines. When it works, it can move the state of the art of any problem domain forward, not by a little bit, but by *a lot*.

A human example would be the invention of writing. All of a sudden knowledge could survive through generations and could spread exponentially better than it could by oral transmission.

The hope and expectation around Deep Learning is that it is basically a Doug Englebart Bootstrap for machine intelligence. A smart new feedback loop in which the machines can now do a vital machine intelligence step ("feature identification") that previously required humans. This can/should/will move things forward *a lot* relative to the last big brohuha around machine intelligence in the Eighties.

The debates about whether or not this is really "intelligence" or just "a smarter form of dumb" will rage on in parallel, perhaps forever.

Relevance to Jimmy Hendrix? See https://www.youtube.com/watch?v=JMyoT3kQMTg

Thursday, August 11, 2016

The scourge of easily accessible abstraction

In software, we are swimming in abstractions. We also have amazingly abstract tools that greatly enhance our ability to create even more abstractions.

"William of Ockham admonished philosophers to avoid multiplying entities, but computers multiple them faster than his razor can shave." -- John F. Sowa, Knowledge Representation.

Remember that the next time you are de-referencing a URL to get the address of a pointer to a factory that instantiates an instance of meta-class for monad constructors...

Wednesday, August 03, 2016

Sebastian Rahtz, RIP

It has just now come to my attention that Sebastian Rahtz passed away earler this year.
RIP. Fond memories of conversations on the xml-dev mailing list.


Wednesday, July 20, 2016

Software self analysis again

Perhaps a better example for the consciousness post  would have been to allow the application on the operating system to have access to the source code for the hypervisor two levels down. That way, the app could decide to change the virtualization of CPUs or the contention algorithms on the virtualized network interfaces and bootstrap itself a new hypervisor to host its own OS.

The question naturally arises, what scope has an app - or an OS - got for detecting that it is on a hypervisor rather than real hardware? If the emulation is indistinguishable, you cannot tell - by definition. At which point the emulated thing and the thing emulated have become indistinguishable. At which point you have artificially re-created that thing.

This is all well worn territory in the strong Vs weak AI conversations of course.

My favorite way of thinking about it is this:

1 - we don't understand consciousness and thus we cannot be sure we won't re-create it by happenstance, as we muck about with making computers act more intelligently.

2 - If we do create it, we likely won't know how we did it (especially since it is likely to be a gradual, multi-step thing rather than a big-bang thing)

3 - because we won't know what we did to create it, we won't know how to undo it or switch it off

4 - if it improves by iteration and it iterates a lot faster in silicon than we do in carbon, we could find ourselves a distant second in the earth-based intelligence ranking table rather quickly :-)

Best if we run all the electrical generation on the planet with analog methods and vacuum tubes so that we can at least starve it of electricity if push comes to shove:-)

Saturday, July 16, 2016

English as an API

Like the last post, this I am filing under "speculative"

Chat bots strip away visual UI elements in favor of natural language in a good old fashioned text box.

Seems kind of retro but, perhaps something deeper is afoot. For the longest time, we have used the phrase "getting applications to talk to each other" as a sort of business-level way of saying, "get applications to understand each others APIs and/or data structures."

Perhaps, natural language - or a controlled version of natural language - will soon become a viable way of getting applications to talk to each other. I.e. chatbots chatting with other chatbots, by sending/receiving English.

One of the big practical upshots of that - if it transpires - is that non-programmers will have a new technique for wiring up disparate applications. I.e. talk to each of them via their chat interface, then gradually get them talking to each other...


The surprising role of cloud computing in the understanding of consciousness

I am filing this one under "extremely speculative".

I think it was Douglas Hofstadter's book "I am a strange loop" that first got me thinking about the the possible roles of recursion and self-reference in understanding consciousness.

Today - for no good reason - it occurred to me that if the Radical Plasticity Theory is correct, to emulate/re-create consciousness[1] we need to create the conditions for consciousness to arise. Doing that requires arranging a computing system that can observe every aspect of itself in operation.

For most of the history of computing, we have had a layer of stuff that the software could only be dimly aware of, called the hardware.

With virtualization and cloud computing, more and more of that hardware layer is becoming, itself, software and thus, in principle, open to fine grained examination by the software running on....the software, if you see what I mean.

To take an extreme example, a unix application could today be written that introspects itself, concludes that the kernel scheduler logic should be changed, writes out the modified source code for the new kernel, re-compiles it, boots a Unix OS image based on it, and transplant itself into a process on this new kernel.


[1] Emulation versus re-creation of consciousness. Not going there.

Monday, June 20, 2016

The subtle complexities of legal/contractual ambiguity

The law is not a set of simple rules and the rule of law is not - and arguably cannot -be reduced to a Turing Machine evaluating some formal expression of said rules.

A theme of mine for some time has been how dangerous it is to junp to conclusions about the extent to which the process of law - and its expression in the laws themselves  - can be looked upon purely in terms of a deductive logic system in disguise.

Laws, contracts etc. often contain ambiguities that are there on purpose. Some are tactical. Some are there in recognition of the reality that concepts like "fairness" and "reasonable efforts" are both useful and unquantifiable.

In short there are tactical, social and deep jurisprudence-related reasons for the presence ambiguity in laws/contracts.

Trying to remove them can lead to unpleasant results.

Case in point : the draining of millions of dollars from the DAO. See writeup on Bloomberg : Ethereum Smart Contracts

Friday, June 17, 2016

25 years if the Internet in Ireland - a personal recollection of the early days

So today is the Internets 25th anniversary in Ireland.
In 1991 I was working with a financial trading company, developing technical analysis software for financial futures traders in 8086 assembly language and C using PCs equipped with TMS34010 graphics boards.

I cannot remember how exactly...possible through the Unix Users Group I ended up getting a 4800 KBS modem connection to a Usenet feed from Trinity via the SLIP protocol.

Every day I would dial up and download comp.text.sgml from Usenet onto my Sun Roadrunner X86 "workstation".

Not long thereafter, Ireland Online happened and I was then dialling up Furbo in the Gaeltacht of Connemara because it was the first access point to the WWW in Ireland.

I ditched my compuserv e-mail account not long after and became digitome@iol.ie on comp.text.sgml

So much has changed since those early days...and yet so much as stayed the same.

Friday, May 20, 2016

From textual authority to interpretive authority: the next big shift in legal and regulatory informatics

This paper : law and algorithms in the public domain from a journal on applied ethics is representative, I think, of the thought processes going on around the world at present regarding machine intelligence and what it means for law/regulation.

It seems to me that there has been a significant uptick in people from diverse science/philosophy backgrounds taking an interest in the world of law. These folks range from epistemologists to bioinformaticians to statisticians to network engineers. Many of them are looking at law/regulation through the eyes of digital computing and asking, basically, "Is law/regulation computation?" and also "If it is not currently computation, can it be? Will it be? Should it be?"

These are great, great questions. We have a long way to go yet in answering them. Much of the world of law and the world of IT is separated by a big chasm of mutual mis-understanding at present. Many law folk - with some notable exceptions - do not have a deep grasp of computing and many computing folk - with some notable exceptions - do not have a deep grasp of law.

Computers are everywhere in the world of law, but to date, they have primarily been wielded as document management/search&retrieval tools. In this domain, they have been phenomenally successful. To the point where significant textual authority has now transferred to digital modalities from paper.

Those books of caselaw and statute and so on, on the shelves, in the offices. They rarely move from the shelves. For much practical, day-to-day activity, the digital instantiations of these legislative artifiacts are normative and considered authoritative by practitioners. How often these days to legal researchers go back to the paper-normative books? Is it even possible anymore in a world where more and more paper publication is being replaced by cradle-to-grave digital media? If the practitioners and the regulators and the courts are all circling around a set of digital artifacts, does it matter any more if the digital artifact is identical to the paper one?

Authority is a funny thing. It is mostly a social construct.  I wrote about this some years ago here: Would the real, authentic copy of the document please stand up?  If the majority involved in the world of law/regulation use digital information resource X even though strictly speaking X is a "best efforts facsimile" of paper information resource Y, then X has de-facto authority even though it is not de-jure authoritative. (The fact that de-jure texts are often replaced by de facto texts in the world of jure - law! - is a self-reference that will likely appeal to anyone who has read The Paradox of Self Amendment by Peter Suber.

We are very close to being at the point with digital resources in law/regulation have authority for expression but it is a different kettle of fish completely to have expression authority compared to interpretive authority.

It is in this chasm between authority of expression and authority of interpretation that most of the mutual misunderstandings between law and computing will sit in the years ahead I think. On one hand, law folk will be too quick to dismiss what the machines can do in the interpretive space and IT people will be too quick to think the machines can quickly take over the interpretive space.

The truth - as ever - is somewhere in between. Nobody knows yet where the dividing line is but the IT people are sure to move the line from where it currently is (in legal "expression" space) to a new location (in legal "interpretation" space).

The IT people will be asking the hard questions of the world of law going forward. Is this just computing in different clothing? If so, then lets make it a computing domain. If it is not one today, then can we make it one tomorrow? If it cannot be turned into a computing domain - or should not be - then why, exactly?

The "why" question here will cause the most discussion. "Just because!", will not cut it as an answer. "That is not what we do around here young man!" will not cut it either. "You IT people just don't understand and can't understand because you are not qualified!", will not cut it either.

Other domains - medicine for example - have gone through this already. Medical practitioners are not algorithms or machines but they have for the most part divested various activities to the machines. Not just expressive (document management/research) but also interpretive (testing,  hypothesis generation, outcome simulation).

Law is clearly on this journey now and should emerge in better shape, but the road ahead is not straight, has quite a few bumps and a few dead ends too.

Strap yourself in.

Monday, May 16, 2016

From BSOD to UOD

I don't get many "Blue Screen of Death" type events these days : In any of the Ubuntu, Android, iOS, Window environments I interact with. Certainly not like the good old days when rebooting every couple of hours felt normal. (I used to keep my foot touching the side of my deskside machine. The vibrations of the hard disk used to be a good indicator of life back in the good old days. Betcha that health monitor wasn't considered in the move to SSDs. Nothing even to listen too these days, never mind touch.)

I do get Updates of Death though - and these are nasty critters!

For example, your machine auto-updates and disables the network connection leaving you unable to get at the fix you just found online....


Monday, May 09, 2016

Genetic Football

Genetic Football is a thing. Wow.

As a thing, it is part of a bigger thing.

That bigger thing seems to be this: given enough cheap compute power, the time taken to perform zillions of iterations can be made largely irrelevant.

Start stupid. Just aim to be fractionally less stupid the next time round, and iterations will do the rest.

The weirdest thing about all of this for me is that if/when iterated algorithmic things start showing smarts, we will know the causal factors that lead to the increased smartness, but not the rationale for any individual incidence of smart-ness.

As a thing, that is part of a bigger thing.

That bigger thing is that these useful-but-unprovable things will be put to use in areas where humankind as previously expected the presence of explanation. You know, rules, reasoning, all that stuff.

As a thing, that is part of a bigger thing.

That bigger thing is that in many areas of human endeavor it is either impossible to get explanations - (i.e. experts who know what to do, but cannot explain why in terms of rules.), or the explanations need to be taken with a pinch of post-hoc-ergo-propter-hoc salt, or a pinch or retroactive goal-setting salt.

As a thing, that is part of a bigger thing.

When the machines come, and start doing clever things but cannot explain why....

...they will be just like us.

Thursday, May 05, 2016

Statistics and AI

We live at a time where there is more interest in AI than ever and it is growing every day.

One of the first things that happens when a genre of computing starts to build up steam is that pre-existing concepts get subsumed into the new genre. Sometimes, the adopted concepts are presented in a way that would suggest they are new concepts, created as part of the new genre. Sometimes they are. But sometimes they are not.

For example, I recently read some material that presented linear regression as a machine learning technique.

Now of course, regression has all sorts of important contributions to make to machine learning but it was invented/discovered long long before the machines came along.

Thursday, April 14, 2016

Cutting the inconvenient protrusions from the jigsaw pieces

There is a school of thought that goes like this....

(1) To manage data means to put it in a database
(2) A 'database' means a relational database. No other database approach is really any good.
(3)  If the data does not fit into the relational data model, well just compromise the data so that it does. Why? See item (1).

I have no difficulty whatsover with recommending relational databases where there is a good fit between the data, the problem to be solved, and the relational database paradigm.

Where the fit isn't good, I recommend something else. Maybe index flat files, or versioned spreadsheets, documents, a temporal data store....whatever feels least like I am cutting important protrusions off the data and off the problem to be solved.

However, whenever I do that, I am sure to have to answer the "Why not just store it in [Insert RDB name]?" question.

It is an incredibly strong meme in modern computing.

Monday, March 14, 2016

Algorithms where human understanding is optional - or maybe even impossible

I think I am guilty of holding on to an AI non-sequitur for a long time. Namely the idea that AI is fundamentally limited by our ability as humans to code the rules for the computer to execute. If we humans cannot write down the rules for X, we cannot get the computer to do X.

Modern AI seems to have significantly lurched over to the "no rules" side of the field where phrases like CBR (case based reasoning) and Neural Net Training Sets abound...

But with an interesting twist that I have only recently become aware of. Namely, using bootstrapping to use generation X of an AI system to produce generation X+1.

The technical write-ups about the recent stunning AlphaGo victory make reference to the boostrapping of AlphaGo. As well as learning from the database of prior human games, it has learned by playing against itself....

Doug Englebart springs to mind and his bootstrapping strategy.

Douglass Hofstadter springs to mind and his strange loops model of consciousness.

Stephen Wolfram springs to mind and his feedback loops of simple algorithms for rapidly generating complexity.

AI's learning by using the behavior of the previous generation AI as "input" in the form of a training set sounds very like iterating a simple Wolfram algorithm or a fractal generating function, except that the output of each "run", is the algorithm for the next run.

The weird, weird, weird thing about all of this, is that we humans don't have to understand the AIs we are creating. We are just creating the environment in which they can create themselves.

In fact, it may even be the case that we cannot understand them because, by design, there are no rules in there to be dug out and understood. Just an unfathomably large state space of behaviors.

I need to go to a Chinese room, and think this through...

Thursday, March 10, 2016


LoRa feels like a big deal to me. In general, hardware-lead innovations tend to jumpstart software design into interesting places, moreso than software-lead innovations drag hardware design into interesting places.

With software driving hardware innovation, the results tend to be of the bigger, faster, cheaper variety. All good things but not this-changes-everything type moments.

With hardware driving software innovation however, software game changers seem to come along sometimes.

Telephone exchanges -> Erlang -> Elixer.
Packet switching -> TCP/IP -> Sockets

BGP Routers -> Multihoming
VR Headsets -> Immersive 3D worlds


I have noticed that things tend to come full circle though. Sooner or later, the any hardware bits that can themselves be replaced by software bits, are replaced:-)

This loopback trend is kicking into a higher gear at the moment because of 3D printing. I.e. a hardware device is conceived of. In order to build the device, the device is simulated in software to drive the 3D printer.  Any such devices that *could* remain purely software, do so eventually.

A good example is audio recording. A modern DAW like ProTools or Reaper now provides pure digital emulators for pretty much any piece of audio hardware kit you can think of: EQs, pre-amps, compressors, reverbs etc.

Friday, March 04, 2016

XML and St Patrick

I am finding it a bit hard to believe that I wrote this *fourteen* years ago.

Patrick to be Named Patron Saint of Software Developers
In a dramatic development, scholars working in Newgrange, Ireland, have deciphered an Ogham stone thought to have been carved by St. Patrick himself. The text on the stone predicts, with incredible accuracy, the trials-and-tribulations of IT professionals in the early 21st century. Calls are mounting for St. Patrick to be named the patron saint of Markup Technologists.

The full transcription of the Ogham stone is presented here for the first time:


    Go placidly amid the noise and haste and remember what peace there may be in silence.

    As far as possible, without surrender, accommodate the bizarre tag names and strange attribute naming conventions of others.

    Speak your truth quietly and clearly, making liberal use of UML diagrams. Listen to others, even the dull and ignorant, they too have their story and won't shut up until you have heard it.

    Avoid loud style sheets and aggressive time scales, they are vexations to the spirit. If you compare your schemas with others, you will become vain and bitter for there will always be schemas greater and lesser than yours -- even if yours are auto-generated.

    Enjoy the systems you ship as well as your plans for new ones. Keep interested in your own career, however humble. It's a real possession in the changing fortunes of time and Cobol may yet make a comeback.

    Exercise caution in your use of namespaces for the world is full of namespace semantic trickery. Let this not blind you to what virtue there is in namespace-free markup. Many applications live quite happily without them.

    Be yourself. Especially do not feign a working knowledge of RDF where no such knowledge exists. Neither be cynical about Relax NG; for in the face of all aridity and disenchantment in the world of markup, James Clark is as perennial as the grass.

    Take kindly the counsel of the years, gracefully surrendering the things of youth such as control over the authoring subsystems and any notion that you can dictate a directory structure for use by others.

    Nurture strength of spirit to nourish you in sudden misfortune but do not distress yourself with dark imaginings of wholesale code re-writes.

    Many fears are born of fatigue and loneliness. If you cannot make that XML document parse, go get a pizza and come back to it.

    Beyond a wholesome discipline, be gentle with yourself. Loosen your content models to help your code on its way, your boss will probably never notice.

    You are a child of the universe no less than the trees and all other acyclic graphs; you have a right to be here. And whether or not it is clear to you, no doubt the universe is unfolding as it should.

    Therefore be at peace with your code, however knotted it may be. And whatever your labors and aspirations, in the noisy confusion of life, keep peace with your shelf of manuals. With all its sham, drudgery, and broken dreams, software development is a pretty cool thing to do with your head. Be cheerful. Strive to be happy.

Friday, February 26, 2016

Software complexity accelerators

It seems to me that complexity in software development, although terribly hard to measure, has steadily risen from the days of Algol 68 and continues to rise.

In response to the rise, we have developed mechanisms for managing - not removing - managing the complexity.

These management - or perhaps I should say 'containment' mechanisms have an interesting negative externality. If a complexity level of X was hard to contain before, but thanks to paradigm Y is not contained, the immediate side-effect is an increase in the value of X:-)

It reminds me of an analysis I found somewhere about driving speed and seat belts. Apparently, steat belts can have the effect of increasing driving speed. Reason being, we all have a risk level we sub-consciously apply when driving. Putting on a seat belt can make us feel that a higher speed is now possible without increasing our risk level.

So what sort of "seat belts" have we added into software development recently? I think Google Search is a huge one. Rather than reduce the complexity of an application as evidenced by the amount of debugging/head-scratching you need to do, we have accelerated the process of finding fixes online.

Another one is open source. We can now leverage a world-wide hive-mind that collectively "wraps its head around" a code-base so that code-base can become more complex than it could if a finite team work the code-base.

Another one is cloud. Client/Server-style computing models push most of the complexity of management into the server side. Applications that would be incredibly complex to manage in todays diverse OS world if they were thick-clients are easier to manage server-side, thus creating headroom for new complexity which, sure enough gets added to the mix.

Is this phenomenon of complexity acceleration thanks to better and better complexity containment a bad thing?

I honestly don't know.

Friday, February 19, 2016

It's obvious really

Nothing is more deserving of questioning, than an obvious conclusion.

Thursday, February 11, 2016

Fixity, Vellums and the curious case of the rotting bits

So, vellum may be on the way out in the UK Parliament

Many deep and thorny issues here.

Thought experiment: In your hand you have a 40 page document. On your computer screen you have an electronic document open in a word processor. You have been told they they are "the same document".

How can you tell? What does it even mean to say that they are the "same"? Does it matter if there is no sure-fire way to prove it?

Let us start at the end of that list of questions and work backwards. Does it matter that there is no sure-fire way to prove it? Most of the time, it does not matter if you cannot prove they are the same. Over the years since the computerization of documents, we have devised various techniques for managing the risks of differences arising between what the computer says and what the sheets of paper say. However, when it does matter it tends to matter a whole bunch. Examples are domains such as legal documents, mission critical procedure manuals, that sort of thing.

A very common way of mitigating the risk of differences arising between paper and electronic texts is to declare the electronic version to be the real, authentic document and treat the paper as a "best efforts" copy or rendering of the authentic document. If the printing messes up and some text gets chopped off the right hand margin we think "No big deal". Annoying but not cataclysmic. The electronic copy is the real one and we can just go back to the source any time we want...

...Yes, as long as the electronic source is not, itself, an ambiguous idea. Again, we have developed practices to mitigate this risk. If I author a document in, say, FrameMaker but export RTF to send to you, the FrameMaker is considered the real, authentic electronic file. If anything happens to the RTF content - either as it is exported, transmitted or imported by you into some other application - we refer back to the original electronic file which is the FrameMaker incarnation....

...If we still have it up to date. The problem is that we do not print FrameMaker or Word or Quark Express. We tend to print "frozen" renderings of these things. Things like postscript and PDF. On the way to paper, it is not uncommon for fixes to be required just prior to the creation of very expensive printing plates. If something small needs to be fixed, it will probably get fixed at 2 a.m. in the postscript or PDF file...which is now out of sync with the original FrameMaker file...

...Which, come to think of it, might not have been as clear cut an authoritative source as I made it out to be. It is not uncommon for applications like FrameMaker, Adobe CS2, Quark etc. to be used downstream of an authoring process that utilizes Microsoft Word or Corel Wordperfect or OpenOffice or some Webb-y browser plug-in.

If (i.e. when) errors are found in document proofs the upstream documents should really be fixed and the DTP versions re-constituted. Otherwise, the source documents get out of sync with the paper copy very quickly indeed. Worse, the differences between the source documents and the paper copy may be in the form of small errors. A period missing here, a dollar sign there...Small enough to be very hard to spot with proofreading but large enough to be very serious in for example, legal documents.

What to do? Well, we need to freeze-dry "cuts" of these documents to remove all ambiguity and then institute rigorous policies and procedures to ensure that changes are properly reflected everywhere along the document production toolchain...

...Which, these days, can be quite a complicated tool chain. For example, it is quite likely that web page production is feeding off the content prior to when it goes in to the DTP program. So, when a fix is needed you need to chase down all copies made and fix them, preferably all at the same time. Oh, and tools for editing (yes, I did say "editing") PDF documents are becoming more and more commonplace. So much for simple freeze-drying of digital content...

...Wait. This is getting too complex and has too many points of human intervention which introduces costs and the potential for human error. Best to simplify the tool chain...

..Yes, that would be nice but unfortunately DTP packages do useful things - things that word processors do not do. Word processors do useful things - things that Webby-plugins cannot do. Layout formats like PDF, Postscript, SVG do things that authoring formats like ODF do not do. HTML can be both a layout format and an authoring format but only at the expense of leaving behind a lot of very useful stuff for large document publication...

...So, where does that leave us? Well, behind that beautifully produced 40 pager you hold in you hand we have, roughly speaking, umpteen different electronic variations of it. Each of which may or may not be "the same" as the paper in a variety of subtle and (to me anyway) interesting ways...

...We have a problem. Consider this, search around the Web for companies offering data capture services from paper. Lots to choose from right? Now where do you think all that paper is coming from? Old, old content that pre-dates computerization? No. Some of it falls into that category but only some. Filled in, paper based forms that do not exit in computers at all? Yes, there is a bunch of that. But a lot of it is content that came into existence purely electronically, at some stage over the last 30 years. It passed through some complicated tool chain and workflow on its way to paper. The paper then became the only reliable incarnation of the content. Any electronic versions of it that the owners could dig out were found to be potentially flawed in some way... (cannot read them, cannot trust them to be the same as the paper etc. etc.)

...Thus the need to capture the content from paper. An exercise which, even with rigorous QA to say, 99.998% accuracy, is guaranteed to introduce its own set of errors...

It is messy, right? Well, get rid of the paper/vellum's, get rid of the witnessing/signing formalisms that have evolved over centuries, and I think you create an even bigger problem.

Solution? Tamper evident digital audit trails.

Friday, February 05, 2016

The biggest IT changes in the last 5 years: The re-emergence of data flow design

My first exposure to data flow as an IT design paradigm came around 1983/4 in the form of Myers and Constantine's work on "Structured Design" which dates from 1974.

I remember at the time finding the idea really appealing but yet, the forces at work in the industry and in academic research pulled mainstream IT design towards non-flow-centric paradigms. Examples include Stepwise Decomposition/Structured Programming (e.g Dijkstra), Object Oriented Design e.g. (Booch),  Relational Data modelling (e.g. Codd).

Over the years, I have seen pockets of mainstream IT design terms emerging that have data flow-like ideas in them. Some recent relevant terms would be Complex Event Processing and stream processing.

Many key dataflow ideas are built into Unix. Yet creating designs leveraging line-oriented data formats, piped through software components, local-and-remote, everything from good old 'cat' to GNU Parallels and everything in between, has never, to my knowledge, been given a design name reflective of just how incredibly powerful and commonplace it is.

Things are changing I believe, thanks to cloud computing and multi-core parallel computing in general. Amazon AWS pipeline, Google Dataflow, Google Tensorflow are good examples. Also, bubbling away under the radar are things like FBP (Flow Based Programming), buzz around Elixer and similar such as shared-nothing architectures.

A single phrase is likely to emerge soon I think. Many "grey beards" from JSD (Jackson Stuctured Design), to IBM MQSeries (asynch messaging), to Ericsson's AXE-10 Erlang engineers, to Unix pipeline fans, will do some head-scratching of the "Hey, we were doing this 30 years ago!" variety.

So it goes.

Personally, I am very excited to see dataflow re-emerge to mainstream. I naturally lean towards thinking in terms of dataflow anyway. I can only benefit from all the cool new tools/techniques that come with mainstreaming of any IT concept.

Thursday, February 04, 2016

The 'in', 'on' and 'with' questions of software development

I remember when the all important question for a software dev person looking at a software component/application was "What is it written in?"

Soon after that, a second question became very important "What does it run on?"

Nowadays, there is a third, really important question, "What is it build/deployed with?"

"In" - the programming language of the component/app itself
"On" - the run-time-integration points e.g. OS, RDB, Browser, Logging
"With" - the dev/ops tool chain eg. source code control, build, regression, deploy etc.

In general, we tend to underestimate the time, cost and complexity of all three :-) However, the "With" category is the toughest to manage as it is, by definition, scaffolding used as part of the creation process. Not part of the final creation.

Tuesday, February 02, 2016

Blockchain this, blockchain that...

It is fun watching all the digital chatter about blockchain at the moment. There is wild stuff at both ends of the spectrum. I.e. "It is rubbish. Will never fly. All hype. Nothing new here. Forget about it." on one end and "Sliced bread has finally met its match! Lets appoint a CBO (Chief Blockchain Officer)" on the other.

Here is the really important bit I think: The blockchain shines a light on an interesting part of the Noosphere. The place where trust in information is something than can be established without needing a central authority.

That's it.   Everything about how consensus algorithms work, how long they take to run, how computationally expensive they are, are all secondary and the S curve will out (http://en.wikipedia.org/wiki/Innovation). I.e. that implementation stuff will get better and better.

Unless of course, the proves to be some hard limit imposed by information theory that cannot be innovated around e.g. something analogous to the CAP Theorem or Entropy Rate Theorem or some such.

To my knowledge, no such fundamental limits are on the table at this point.  Thus the innovators are free to have a go and that is what they will do.

The nearest thing to a hard limit that I can see on the horizon is the extent to which the "rules" found in the world of contracts/legislation/regulation can be implemented in  "rules" that machines can work with. This is not so much an issue for the trust concepts of Blockchain as it is for the follow-on concept of Smart Contracts.

Tuesday, January 26, 2016

The biggest IT changes in the last 5 years: The death-throes of backup-and-delete based designs

One of the major drivers in application design is infrastructure economics i.e. the costs - both in capex an opex terms - of things like RAM, non-volatile storage, compute power, bandwidth, fault tolerance etc.

These economic factors have changed utterly in the 35 years I have been involved in IT, but we still have a strong legacy of designs/architectures/patterns that are ill suited to the new economics of IT.

Many sacred cows of application design such as run-time efficiencies of compiled code versus interpreted code or the consistency guarantees of ACID transactions, can be traced back to the days when CPU cycles were costly. When RAM was measured in dollars per kilobyte and when storage was measured in dollars per megabyte.

My favorite example of a deeply held paradigm which I believe has little or no economic basis today is the concept of designs that only keep a certain amount of data in online form, dispatching the rest, at periodic intervals, to offline forms e.g. tape, disc that require "restore" operations to get them back into usable form.

I have no problem with the concept of backups:-) My problem is with the concept of designs that only keep, say, 1 years worth of data online. This made a lot of sense when storage was expensive because the opex costs of manual retrieval were smaller than the opex costs of keeping everything online.

I think of these designs as backup-and-delete designs. My earliest exposure to such a design was on an IBM PC with twin 5 and 1/4 inch floppy disk drives. An accounting application ran from Drive A. The accounting data file was in Drive B. At each period-end, the accounting system rolled forward ledger balances and then - after a backup floppy was created - deleted the individual transactions on Drive B. That was about 1984 or so.

As organizations identified value in their "old data" -for regulatory reporting or training or predictive analytics, designs appeared to extract the value from the "old" data. This lead to a flurry of activity around data warehousing, ETL (extract, transform, load), business intelligence dashboards etc.

My view is that these ETL-based designs are a transitionary period.  Designers in their twenties working today, steeped as they are in the new economics of IT, are much more likely to create designs that eschew the concept of ever deleting anything. Why would you when online storage (local disk or remote disk, is so cheap?) and there is alway the possibility of latent residual value in the "old" data.

Rather than have one design for day-to-day business and another design for business intelligence, regulatory compliance, predictive analytics, why not have one design that addresses all of these? Apart from the feasibility and desirability of this brought about by the new economics of IT, there is another good business reason to do it this way. Simply put, it removes the need for delays in reporting cycles and predictive analytics.  Ie. rather than pull all the operational data into a separate repository and crunch it once a quarter or once a month, you can be looking at reports and indicators that are in near-realtime.

I believe that the time is coming when the economic feasibility of near-realtime monitoring and reporting becomes a "must have" in regulated businesses because the regulators will take the view that well run businesses should have it. In the same way that a well run business today, is expected to have low communications latencies between its global offices (thanks to cheap availability of digital communications), they will be expected to have low latency reporting for their management and for the regulators.

We are starting to see terminology form around this space. I am hopelessly biased because we have been creating designs based on never-deleting-anything for many years now. I like the terms "time-based repository" and the term "automatic audit trail". Others like the terms "temporal database", "provenance system", "journal-based repository"...and the new kid on the block (no pun intended!) - the block chain.

The block-chain, when all is said and done, is a design based on never-throwing-away anything *combined* with providing a trust-free mechanism for observers of the audit-trail to be able to have confidence in what they see.

There is lots of hype at present around block chain and with hype comes the inevitable "silver bullet" phase where all sorts of problems not really suited to the block-chain paradigm are shoe-horned into it because it is the new thing.

When the smoke clears around block chain - which it will - I believe we will see many interesting application designs emerge which break away completely from the backup-and-delete models of a previous economic era.

Monday, January 25, 2016

The biggest IT changes in the last 5 years: Multi-gadget User Interfaces

In the early days of mobile computing, the dominant vision was to get your application suite on "any device, at any time". Single purpose devices such as SMS messengers, email-only messengers faded from popularity, largely replaced by mobile gadgets that could, at least in principle, do every thing that a good old fashioned desktop computer could do.

Operating system visions started to pop up everywhere aimed at enabling a single user experience across a suite of productivity applications, regardless of form-factor, weight etc.

Things (as ever!) have turned out a little differently. Particular form factors e.g. smart phone, tend to be used as the *primary* device for a subset of the users full application suite. Moreover, many users like to use multiple form-factors *at the same time*.

Some examples from my own experiences. I can do e-mail on my phone but I choose to do it on my desktop most of the time. I will do weird things like e-mail myself when on-the-road, using a basic e-mail sender, to essentially put myself in my own in-box. (Incidentally, my in-box is my main daily GTD focus. I can make notes to myself on my desktop but I tend to accumulate notes on my smart phone. I keep my note-taker-app open on the phone even when I am at the desktop computer and often pick it up to make notes.

I can watch Youtube videos on my desktop but tend to queue up videos instead and then pick them off one-by-one from my smart-phone, trying to fit as many of them into "down time" as I can. Ditto with podcasts. I have a TV that has all sorts of "desktop PC" aspects from web browsers to social media clients but I don't use any of it. I prefer to use my smartphone (sometimes my tablet) while in couch-potato mode and will often multi-task my attention between the smartphone/tablet and the TV. I find it increasingly annoying to have to sit through advertizing breaks on TV and automatically flick to smartphone/tablet for ad breaks.

I suspect there is a growing trend towards a suite of modalities (smartphone, tablet, smart TV, smart car) and a suite of applications that in practical use, have smaller functionality overlaps than the "any device, at any time" vision of the early days would have predicted. A second, related trend is increasingly common use-cases where users are wielding multiple devices *at the same time* to achieve tasks.

Each of us in this digital world, is becoming a mini-cloud of computing power hooked together over a common WIFI hub or a bluetooth connection or a physical wire. As we move from home to car to train to office and back again, we reconfigure our own little mini-cloud to get things done. The trend towards smartphones becoming remote controls for all sorts of other digital gadgets is accelerating this.

I suspect that the inevitable result of all of this is that application developers will increasingly have to factor in the idea that the "user interface" may be a hybrid mosaic of gadgets, rather than any one gadget. With some gadgets being the primary for certain functionality.

Tuesday, January 19, 2016

The biggest IT changes in the last 5 years : The fragmentation of notifications

I remember the days of "push". I remember Microsoft CDF (Channel Definition format). I remember, RSS in its original Netscape form and all the forms that followed it. I remember ATOM...I remember the feed readers. I remember thinking "This is great. I subscribe to stuff that is all over the place on the web and it comes to me when there are changes. I don't go to it."

But as social media mega-hubs started to emerge (Facebook, Twitter, LinkedIn etc.), this concept of a site/hub-independent notification infrastructure started to fragment.

Today, I find myself drawn into Facebook for Facebook updates, Twitter for Twitter updates, LinkedIn for LinkedIn updates, Youtube for Youtube updates....I am back going to stuff rather than have that stuff come to me.

At some stage here I am going to have to invest the time to find a mega-aggregator that can go aggregate the already partially aggregated stuff from all the hubs I have a presense on.

Rather than look for updates from the old-school concept of "web-site", our modern day aggregators need to be able to pull updates from hubs like Facebook, Twitter etc. which are themselves, obviously, providing aggregation.

The image this conjures into my mind is of a hierarchical "roll up" where each level of the hierarchy aggregates the content from its children which may themselves be aggregates.

The hierarchical/recursive picture has a lot of power obviously but I do wonder if it has the unfortunate side-effect of facilitating the emergence of web-gateway models for hubs. I.e. models in which the resources behind the gateway are not themselves, referenceable via URLs. We end up with no option but to "walk" the main nodes to do aggregation.

I remember a quote from Tim Berners Lee where he said something along the lines of "the great thing about hypertext is that it subverts hierarchy."

Perhaps, the mega-hubs model of the modern web subverts hypertext?

Monday, January 18, 2016

The biggest IT changes in the last 5 years : domain names ain't what they used to be

The scramble for good domain names appears to be a thing of the past. A couple of factors at work I think.

Firstly, there is obviously a limited supply of {foo}.com and {bar}.com. Out of necessity, "subdivided domain names" seems to be getting more and more popular. e.g. {foo}.{hub}.com and {bar}.{hub}.com are on the same domain name.  So too are {hub}.com/{foo} and {hub}.com/{bar}.

This has worked out well for those who provide hub-like web presence e.g. facebook, github.com, bandcamp.com etc.

Secondly, browser address windows have morphed into query expressions, often powered by Google under the hood. Even if I know I am looking for an entity {foo} that I know owns {foo}.com, I will often just type into the search bar and let the search engine do the rest.

Extending DNS with new domains like .club etc. only pushes the problem down the road a bit.

I am reminded very much of addresses of locations in the real world. Number 12 Example Avenue may start out as one address but it may decide to sub-divide and rent/sell apartments at that address. Now you have suite 123,Number 12, Example Avenue....

Nothing new in the world. DNS is like Manhattan. All the horizontal real estate of DNS is taken. The only want to get a piece of it now is to grab a piece of an existing address. DNS has entered its "high rise" era.

Friday, January 15, 2016

The biggest IT changes in the last 5 years : dynamically typed data

Back in 2010 I believe it was, I started writing bits-and-pieces about NoSQL and I remember a significant amount of push-back from  RDB folks at the time.

Since then, I think it is fair to say that there has been a lot more activity in tools/techniques for unstructured/semi-structured/streamed data than for relational data.

The word "unstructured" is a harsh word:-) Even if the only thing you know about a file is that it contains bytes, you *have* a data structure. Your data structure is the trivial one called "a stream of bytes". For some operations, such as data schlepping, that is all you need to know. For other operations, you will need a lot more.

The fact that different conceptualizations of the data are applicable for different operations is an old, old idea but one that is, I think, becoming more and more useful in the distributed, mashup-centric data world we increasingly live in. Jackson Structured Programming for example, is based on the idea that for any given operation you can conceptualize the input data as a "structure" using formal grammar-type concepts. Two different tasks on exactly the same data may have utterly different grammars and that is fine.

The XML world has, over time, developed the same type of idea. Some very seasoned practitioners of my acquaintance make a point of never explicitly tying an XML data set to a single schema. Preferring to create schemas as part of the data processing itself. Schemas that model the data in  a way that suites the task at hand.

I think there is an important pattern in there. I call it Dynamically Typed Data. Maybe there is an existing phrase for it that I don't know:-)

"Yes, but", I hear the voices say, "surely it is important to have a *base* schema that describes the data 'at rest'?"

I oscillate on that one:-) In the same way that I oscillate on the idea that static type checking is just one more unit test on dynamically typed code.

More thinking required.

Wednesday, January 13, 2016

The biggest IT changes in the last 5 years - Github

Github is a big, big deal. I don't think it is a big deal just because it is underpinned by Git (and thus "better" - by sheer popularity - than Mercurial/DARCs etc).

I think it is a big deal because the developers of Github realized that developers are a social network. Github have done the social aspects of coding so much better than the other offerings from Google, SourceForge etc. Social computing seems to gravitate towards a small number of thematic nodes: family - facebook, business - linkedin, musicians - bandcamp.  In the same vein : Coders - Github.

Git concepts like pull requests etc. certainly help to enable linkages between developers but it is github that gives all those dev-social interconnections a place to hang out in cyberspace.

Tuesday, January 12, 2016

The biggest IT changes in the last 5 years - Quicksand OS

Back around Windows 7 there was a big change in the concept of an Operating System version. For many years prior to Windows 7, the Windows world at large had a concept of operating system that involved periods of OS quiescence that might last for years, punctuated by "Service Packs". Periodic CD-ROM releases with big accumulations of fixes.

Many developers in the Windows ecosystem remember the days of "XP Service Pack 3" which stood out as one of those punctuation marks. "Lets start with a clean XP SP 3 and go from there...."

Roll the clock forward to today and we have a more realtime environment for updates/upgrades. Every day (or so it seems to me!) my Windows machine, my iPad, my Android Phone and my Ubuntu machine either announce the availability of updates/upgrades or announce that they have been installed for me overnight.

Although the concept of OS version numbers still exists, as any developer/tester will tell you, strange things can happen as a result of the constant "upgrade" activity. Especially given that the browser environments these days are so closely twinned to the operating system that an upgrade to a browser can be tantamount to the installation of an old-school service pack.

This has resulted in a big jump in the complexity of application testing as it is becoming increasingly impossible to "lock down" a client-side configuration to test against.

This is a big problem for internal IT teams in enterprises. One that has not got a good solution that I can see. Clearly, true upgrades are good things but it is terribly hard to be sure that an "upgrade" does not introduce collateral damage to installed applications.

Also, upgrades that are very desirable - such as security fixes - can get bundled with upgrades that you would prefer to defer, creating a catch 22.

My sense is that this problem is contained rather than tamed at present because of the wide-spread use of browser front-ends. I.e. server-side dev-ops teams lock thing down as best they can and control the update cycles as best they can while living with the reality that they cannot lock down the client side.

However, as the "thin" client-side becomes "thick", this containment strategy becomes harder to implement.

Locking down client-OS images helps to a degree but the OS vendor strategies of constant updates do not sit well with a lock-down strategy. Plus, BYOD and VPN connections etc. limit what you can lock-down anyway.

Monday, January 11, 2016

The biggest IT changes in the last 5 years - Client/Server-based Standalone Thick Clients

Yes, I know it sounds a bit oxymoronic. Either an application has a server component - distinct from the client - or it doesn't. How could an application design be both client/server and standalone thick client?

By embedding the server side component, inside the client side, you can essentially compile-away the network layer. Inside of course, all the code operates on the basis of client/server communications paradigms but all of that happens without the classic distributed computing fallacies potentially biting you. Ultimately everything network-ish is on the loopback interface.

I like this style much more than I like its mirror image, which is to design with thick client paradigms and then insert a network layer "transparently" by making the procedure calls turn into remote procedure calls.

The problems with the latter are that all the distributed computing fallacies hold and without designing for them, your application is ill-equipped to cope with them.

If we are swinging back towards a more client-side-centric UI model and I believe we are, doing it with things like, say, electron rather than going back to traditional, say Win32 + DCOM/CORBA/J2EE, makes sense now that we are all (mostly!) well familiar with the impossibilty of wishing away the network:-)

Friday, January 08, 2016

More on hash codes as identifiers

I missed something very important in the recent post about using hash codes as identifiers. The hash-coding scheme make is possible to ask "the cloud" to return the byte-stream from wherever is the most convenient replica of the byte-stream, but it does something else too....

It allows an application to ask the cloud : "Hey. Can somebody give me the 87th 4k block of the digital object that hashes to X?"

This in turn means that an application can have a bunch of these data block requests going at any one time and data blocks can come back in any order from any number of potential sources.

So, P2P is alive and well and very likely to be embedded directly in the Web OS data management layer. See for example libp2p. Way, way down the stack of course, lies the TCP/UDP protocols which addresses similar things e.g. data packets that can be routed differently, may not arrive in order, may not arrive at all etc.

There does seem to be a pattern here...I have yet to find a way to put it into words but it does appear as if computing stacks have a tendency to provide overlapping feature sets.

How much of this duplication is down to accidents of history versus, say, universal truths about data communications, I do not know.

Thursday, January 07, 2016

The biggest IT changes in the last 5 years - Meta Configuration

Next on my list of big changes I have seen over the last 5 years in IT is a tremendous growth in what I call meta-configuration.

Not too long ago, configuring an IT system/application was a matter of changing settings in a small handful of configuration files.

Today, the complexity of the application deployment environments is such that there are typically way too many config items to manage them in the traditional way of having a build/install tool write out the config settings as the application is being stood up.

Instead, we now have a slew of new tools in the web-dev stack whose job in life is to manage all the configs for us. Salt is one example.

The fun part about this is that these tools, themselves have configs that need to be managed:-) Some infinite regress ensues that is highly reminiscent of Escher.

This is only the pointy-end of it though. Getting your app deployed and running. The other part of it is the often large and often complex set of bits-and-bobs you need to have in place in order to cut code these days. This has also become so complex that meta-config tools abound. E.g. tools that config your javascript dev enviroment by running lumps of javascript that write out javascript...That sort of thing.

As soon as you move from apps that write out declarative config files to apps that write out .... other apps ... you have crossed a major rubicon.

It is a line that perplexes me. There are times I think it is dreadful. There are times I think it is amazing and powerful.

Dick Sites of DEC once said "I would rather write programs to write programs, than write programs.".

One one hand, this is clearly meta-programming with all the power that can come with that. But boy, can it lead to a tangled mess if not used carefully.

Be careful out there.

Wednesday, January 06, 2016

The biggest IT changes in the last 5 years - Hash-Handled-Heisenfiles

I have taken to using a portmanteau phrase "Hash-Handled-Heisenfiles" to try to capture a web-centric phenomenon that appears to be changing one of the longest-standing metaphors in computing. Namely, the desktop concept of a "file".

In the original web, objects had the concept of "location" and this concept of location was very much tied to the concept of the objects "name".

Simply put, if I write "http://tumboliawinery.ie/stock.html", I am strongly suggesting a geographic location (Ireland from the ".ie"), an enterprise in that geography "Tumbolia Winery", and finally a digital object that can be accessed there "stock.html"

Along with the javascript-ification of everything, referenced in the last post, schemes for naming and locating digital objects are increasingly not based on the (RESTian) concepts underpinning the original Web.

One one end of the spectrum you have the well established concept of UID or GUID as used in Relational Databases , Lotus Notes etc. These identifiers, by design are semantics-free. In other words, if you want to get insight into the object itself, what it means or what it is, you get the object via its opaque identifier and then look at its attributes. You can think of it as a faceted classification system of identity. Any attribute or combination of attributes from the object can serve as a form of name. Given enough attributes, the identifier gradually becomes unique - picking out a single object, as opposed to a set of objects. Another way to look at this is that in relational database paradigms, all identifiers that carry semantics are actually queries in disguise. (This area: naming things. Is one of my, um, fixations.)

This is an old phenomenon in Web terms on the server side. Ever since the days of cgi-gateway scripts, developers have been intercepting URLs and mapping them into queries, running behind the firewall, talking SQL-speak to the relational database.

Well, this appears to be changing in that there is an alternative, non-relational notion of identifier that appears to gaining a lot of traction. Namely, the idea of using the hashcode of a digital object as its opaque identifier. Why? Well, because once you do that, the opaque identifier can be independent of location. It could be anywhere. In fact - and this is key bit - it can be in many places at once. Hence Heisenfiles as a tip-o-the-hat to Heisenberg.

Your browser no longer needs to necessarily go to tumboliawinery.ie to get the stock.html object. Instead it can pick it up from wherever by basically saying "Hey. Has anybody out there got an object that hashes to X?".

I think this is a profound change. Why now? I think it is a combination of things. HTML5 Browsers and local storage. Identifiers disappearing into the Javascript and out of URL space. The bizarre-but-powerful concept of hosting a web-server inside the client-side browser The growing interest in all-things-blockchain, in particular smart contracts and Dapps.

All these things I think hint at a future where "file" and "location" are very distinct concepts and identifiers for file-like-objects are hash-values. Interesting times.

Tuesday, January 05, 2016

The biggest IT changes in the last 5 years

The last time I sat down and seriously worked on content for this blog was, amazingly, over 5 years ago now in 2010.

It coincided with finalizing a large design project for a Legislative Informatics system and resulted in a series of blog posts to attempt to answer the question "What is a Legislature/Parliament?" from an informatics perspective.

IT has changed a lot in the intervening 5 years. Changes creep up on all of us in the industry because they are, for the most part, in the form of a steady stream, rather than a rushing torrent. We have to deal with change every day of our lives in IT. It goes with the territory.

In fact, I would argue that the biggest difference between Computer Science in theory versus Computer Science in practice, is that practitioners have to spend a lot of time and effort dealing with change. Dealing with change effectively, is itself, an interesting design problem and one I will return to here at some point.

If I had to pick out one item to focus on as the biggest change it would without a doubt be the emergence - for good or ill - of a completely different type of World Wide Web. A Web based not on documents and hyperlinks, but on software fragments that are typically routed to the browser in "raw" form and then executed when they get there.

I.e. instead of thinking about http://www.example.com/index.html as a document that can be retrieved and parsed to extract its contents, much of the Web now consists of document "wrappers" that serve as containers for payloads of JavaScript which are delivered to the browser in order to be evaluated as programs.

It can be argued that this is a generalization of the original web in that anything that can be expressed as a document in the original web can be expressed as a program. It can be argued that the modern approach looses nothing but gains a lot - especially in the area of rich interactive behavior in browser-based user interfaces.

However, it can equally be argued that we risk loosing some things that were just plain good about the original Web. In particular, the idea that content can usefully live at a distance from any given presentation of that content. The idea that content can be retrieved and processed with simple tools as well as big javascript enabled browsers.

I can see both sides of it. At the time I did the  closing keynote at XTech 2008 I was firmly in the camp mourning the loss of the web-of-documents. I think I am still mostly there. Especially when I think about documents that have longevity requirements and documents that have legal status. However, I can see a role that things like single-page webapps can play. As is so often the case in IT, we have a tendency to fix what needed fixing in the old model but introducing collateral damage to what was good about the old model.

Over time, in general, the pendulum swings back. I don't think we have hit "peak Javascript" yet but I do believe that there is an increasing realization that Javascript is not a silver bullet, any more than XHTML was ever a silver bullet.

The middle-way, as ever, beckons as a resting place. Who knows when we will get there. Probably just in time to make room for some newly upcoming pendulum swinging that is gathering place on the server side. Namely the re-emergence of content addressable storage which is part of the hashification of everything. I want to get to that next time.