Sean McGrath

Thursday, April 14, 2016

Cutting the inconvenient protrusions from the jigsaw pieces

There is a school of thought that goes like this....

(1) To manage data means to put it in a database
(2) A 'database' means a relational database. No other database approach is really any good.
(3) If the data does not fit into the relational data model, well just compromise the data so that it does. Why? See item (1).

I have no difficulty whatsover with recommending relational databases where there is a good fit between the data, the problem to be solved, and the relational database paradigm.

Where the fit isn't good, I recommend something else. Maybe index flat files, or versioned spreadsheets, documents, a temporal data store....whatever feels least like I am cutting important protrusions off the data and off the problem to be solved.

However, whenever I do that, I am sure to have to answer the "Why not just store it in [Insert RDB name]?" question.

It is an incredibly strong meme in modern computing.

Monday, March 14, 2016

Algorithms where human understanding is optional - or maybe even impossible

I think I am guilty of holding on to an AI non-sequitur for a long time. Namely the idea that AI is fundamentally limited by our ability as humans to code the rules for the computer to execute. If we humans cannot write down the rules for X, we cannot get the computer to do X.

Modern AI seems to have significantly lurched over to the "no rules" side of the field where phrases like CBR (case based reasoning) and Neural Net Training Sets abound...

But with an interesting twist that I have only recently become aware of. Namely, using bootstrapping to use generation X of an AI system to produce generation X+1.

The technical write-ups about the recent stunning AlphaGo victory make reference to the boostrapping of AlphaGo. As well as learning from the database of prior human games, it has learned by playing against itself....

Doug Englebart springs to mind and his bootstrapping strategy.

Douglass Hofstadter springs to mind and his strange loops model of consciousness.

Stephen Wolfram springs to mind and his feedback loops of simple algorithms for rapidly generating complexity.

AI's learning by using the behavior of the previous generation AI as "input" in the form of a training set sounds very like iterating a simple Wolfram algorithm or a fractal generating function, except that the output of each "run", is the algorithm for the next run.

The weird, weird, weird thing about all of this, is that we humans don't have to understand the AIs we are creating. We are just creating the environment in which they can create themselves.

In fact, it may even be the case that we cannot understand them because, by design, there are no rules in there to be dug out and understood. Just an unfathomably large state space of behaviors.

I need to go to a Chinese room, and think this through...

Thursday, March 10, 2016

LoRa

LoRa feels like a big deal to me. In general, hardware-lead innovations tend to jumpstart software design into interesting places, moreso than software-lead innovations drag hardware design into interesting places.

With software driving hardware innovation, the results tend to be of the bigger, faster, cheaper variety. All good things but not this-changes-everything type moments.

With hardware driving software innovation however, software game changers seem to come along sometimes.

Telephone exchanges -> Erlang -> Elixer.
Packet switching -> TCP/IP -> Sockets

BGP Routers -> Multihoming
VR Headsets -> Immersive 3D worlds

etc.

I have noticed that things tend to come full circle though. Sooner or later, the any hardware bits that can themselves be replaced by software bits, are replaced:-)

This loopback trend is kicking into a higher gear at the moment because of 3D printing. I.e. a hardware device is conceived of. In order to build the device, the device is simulated in software to drive the 3D printer. Any such devices that *could* remain purely software, do so eventually.

A good example is audio recording. A modern DAW like ProTools or Reaper now provides pure digital emulators for pretty much any piece of audio hardware kit you can think of: EQs, pre-amps, compressors, reverbs etc.

Friday, March 04, 2016

XML and St Patrick

I am finding it a bit hard to believe that I wrote this *fourteen* years ago.

Patrick to be Named Patron Saint of Software Developers
In a dramatic development, scholars working in Newgrange, Ireland, have deciphered an Ogham stone thought to have been carved by St. Patrick himself. The text on the stone predicts, with incredible accuracy, the trials-and-tribulations of IT professionals in the early 21st century. Calls are mounting for St. Patrick to be named the patron saint of Markup Technologists.

The full transcription of the Ogham stone is presented here for the first time:

DeXiderata

Friday, February 26, 2016

Software complexity accelerators

It seems to me that complexity in software development, although terribly hard to measure, has steadily risen from the days of Algol 68 and continues to rise.

In response to the rise, we have developed mechanisms for managing - not removing - managing the complexity.

These management - or perhaps I should say 'containment' mechanisms have an interesting negative externality. If a complexity level of X was hard to contain before, but thanks to paradigm Y is not contained, the immediate side-effect is an increase in the value of X:-)

It reminds me of an analysis I found somewhere about driving speed and seat belts. Apparently, steat belts can have the effect of increasing driving speed. Reason being, we all have a risk level we sub-consciously apply when driving. Putting on a seat belt can make us feel that a higher speed is now possible without increasing our risk level.

So what sort of "seat belts" have we added into software development recently? I think Google Search is a huge one. Rather than reduce the complexity of an application as evidenced by the amount of debugging/head-scratching you need to do, we have accelerated the process of finding fixes online.

Another one is open source. We can now leverage a world-wide hive-mind that collectively "wraps its head around" a code-base so that code-base can become more complex than it could if a finite team work the code-base.

Another one is cloud. Client/Server-style computing models push most of the complexity of management into the server side. Applications that would be incredibly complex to manage in todays diverse OS world if they were thick-clients are easier to manage server-side, thus creating headroom for new complexity which, sure enough gets added to the mix.

Is this phenomenon of complexity acceleration thanks to better and better complexity containment a bad thing?

I honestly don't know.

Friday, February 19, 2016

It's obvious really

Nothing is more deserving of questioning, than an obvious conclusion.

Thursday, February 11, 2016

Fixity, Vellums and the curious case of the rotting bits

So, vellum may be on the way out in the UK Parliament

Many deep and thorny issues here.

Thought experiment: In your hand you have a 40 page document. On your computer screen you have an electronic document open in a word processor. You have been told they they are "the same document".

How can you tell? What does it even mean to say that they are the "same"? Does it matter if there is no sure-fire way to prove it?

Let us start at the end of that list of questions and work backwards. Does it matter that there is no sure-fire way to prove it? Most of the time, it does not matter if you cannot prove they are the same. Over the years since the computerization of documents, we have devised various techniques for managing the risks of differences arising between what the computer says and what the sheets of paper say. However, when it does matter it tends to matter a whole bunch. Examples are domains such as legal documents, mission critical procedure manuals, that sort of thing.

A very common way of mitigating the risk of differences arising between paper and electronic texts is to declare the electronic version to be the real, authentic document and treat the paper as a "best efforts" copy or rendering of the authentic document. If the printing messes up and some text gets chopped off the right hand margin we think "No big deal". Annoying but not cataclysmic. The electronic copy is the real one and we can just go back to the source any time we want...

...Yes, as long as the electronic source is not, itself, an ambiguous idea. Again, we have developed practices to mitigate this risk. If I author a document in, say, FrameMaker but export RTF to send to you, the FrameMaker is considered the real, authentic electronic file. If anything happens to the RTF content - either as it is exported, transmitted or imported by you into some other application - we refer back to the original electronic file which is the FrameMaker incarnation....

...If we still have it up to date. The problem is that we do not print FrameMaker or Word or Quark Express. We tend to print "frozen" renderings of these things. Things like postscript and PDF. On the way to paper, it is not uncommon for fixes to be required just prior to the creation of very expensive printing plates. If something small needs to be fixed, it will probably get fixed at 2 a.m. in the postscript or PDF file...which is now out of sync with the original FrameMaker file...

...Which, come to think of it, might not have been as clear cut an authoritative source as I made it out to be. It is not uncommon for applications like FrameMaker, Adobe CS2, Quark etc. to be used downstream of an authoring process that utilizes Microsoft Word or Corel Wordperfect or OpenOffice or some Webb-y browser plug-in.

If (i.e. when) errors are found in document proofs the upstream documents should really be fixed and the DTP versions re-constituted. Otherwise, the source documents get out of sync with the paper copy very quickly indeed. Worse, the differences between the source documents and the paper copy may be in the form of small errors. A period missing here, a dollar sign there...Small enough to be very hard to spot with proofreading but large enough to be very serious in for example, legal documents.

What to do? Well, we need to freeze-dry "cuts" of these documents to remove all ambiguity and then institute rigorous policies and procedures to ensure that changes are properly reflected everywhere along the document production toolchain...

...Which, these days, can be quite a complicated tool chain. For example, it is quite likely that web page production is feeding off the content prior to when it goes in to the DTP program. So, when a fix is needed you need to chase down all copies made and fix them, preferably all at the same time. Oh, and tools for editing (yes, I did say "editing") PDF documents are becoming more and more commonplace. So much for simple freeze-drying of digital content...

...Wait. This is getting too complex and has too many points of human intervention which introduces costs and the potential for human error. Best to simplify the tool chain...

..Yes, that would be nice but unfortunately DTP packages do useful things - things that word processors do not do. Word processors do useful things - things that Webby-plugins cannot do. Layout formats like PDF, Postscript, SVG do things that authoring formats like ODF do not do. HTML can be both a layout format and an authoring format but only at the expense of leaving behind a lot of very useful stuff for large document publication...

...So, where does that leave us? Well, behind that beautifully produced 40 pager you hold in you hand we have, roughly speaking, umpteen different electronic variations of it. Each of which may or may not be "the same" as the paper in a variety of subtle and (to me anyway) interesting ways...

...We have a problem. Consider this, search around the Web for companies offering data capture services from paper. Lots to choose from right? Now where do you think all that paper is coming from? Old, old content that pre-dates computerization? No. Some of it falls into that category but only some. Filled in, paper based forms that do not exit in computers at all? Yes, there is a bunch of that. But a lot of it is content that came into existence purely electronically, at some stage over the last 30 years. It passed through some complicated tool chain and workflow on its way to paper. The paper then became the only reliable incarnation of the content. Any electronic versions of it that the owners could dig out were found to be potentially flawed in some way... (cannot read them, cannot trust them to be the same as the paper etc. etc.)

...Thus the need to capture the content from paper. An exercise which, even with rigorous QA to say, 99.998% accuracy, is guaranteed to introduce its own set of errors...

It is messy, right? Well, get rid of the paper/vellum's, get rid of the witnessing/signing formalisms that have evolved over centuries, and I think you create an even bigger problem.

Solution? Tamper evident digital audit trails.