Sean McGrath: 01/03/2016

Friday, January 08, 2016

More on hash codes as identifiers

I missed something very important in the recent post about using hash codes as identifiers. The hash-coding scheme make is possible to ask "the cloud" to return the byte-stream from wherever is the most convenient replica of the byte-stream, but it does something else too....

It allows an application to ask the cloud : "Hey. Can somebody give me the 87th 4k block of the digital object that hashes to X?"

This in turn means that an application can have a bunch of these data block requests going at any one time and data blocks can come back in any order from any number of potential sources.

So, P2P is alive and well and very likely to be embedded directly in the Web OS data management layer. See for example libp2p. Way, way down the stack of course, lies the TCP/UDP protocols which addresses similar things e.g. data packets that can be routed differently, may not arrive in order, may not arrive at all etc.

There does seem to be a pattern here...I have yet to find a way to put it into words but it does appear as if computing stacks have a tendency to provide overlapping feature sets.

How much of this duplication is down to accidents of history versus, say, universal truths about data communications, I do not know.

Thursday, January 07, 2016

The biggest IT changes in the last 5 years - Meta Configuration

Next on my list of big changes I have seen over the last 5 years in IT is a tremendous growth in what I call meta-configuration.

Not too long ago, configuring an IT system/application was a matter of changing settings in a small handful of configuration files.

Today, the complexity of the application deployment environments is such that there are typically way too many config items to manage them in the traditional way of having a build/install tool write out the config settings as the application is being stood up.

Instead, we now have a slew of new tools in the web-dev stack whose job in life is to manage all the configs for us. Salt is one example.

The fun part about this is that these tools, themselves have configs that need to be managed:-) Some infinite regress ensues that is highly reminiscent of Escher.

This is only the pointy-end of it though. Getting your app deployed and running. The other part of it is the often large and often complex set of bits-and-bobs you need to have in place in order to cut code these days. This has also become so complex that meta-config tools abound. E.g. tools that config your javascript dev enviroment by running lumps of javascript that write out javascript...That sort of thing.

As soon as you move from apps that write out declarative config files to apps that write out .... other apps ... you have crossed a major rubicon.

It is a line that perplexes me. There are times I think it is dreadful. There are times I think it is amazing and powerful.

Dick Sites of DEC once said "I would rather write programs to write programs, than write programs.".

One one hand, this is clearly meta-programming with all the power that can come with that. But boy, can it lead to a tangled mess if not used carefully.

Be careful out there.

Wednesday, January 06, 2016

The biggest IT changes in the last 5 years - Hash-Handled-Heisenfiles

I have taken to using a portmanteau phrase "Hash-Handled-Heisenfiles" to try to capture a web-centric phenomenon that appears to be changing one of the longest-standing metaphors in computing. Namely, the desktop concept of a "file".

In the original web, objects had the concept of "location" and this concept of location was very much tied to the concept of the objects "name".

Simply put, if I write "http://tumboliawinery.ie/stock.html", I am strongly suggesting a geographic location (Ireland from the ".ie"), an enterprise in that geography "Tumbolia Winery", and finally a digital object that can be accessed there "stock.html"

Along with the javascript-ification of everything, referenced in the last post, schemes for naming and locating digital objects are increasingly not based on the (RESTian) concepts underpinning the original Web.

One one end of the spectrum you have the well established concept of UID or GUID as used in Relational Databases , Lotus Notes etc. These identifiers, by design are semantics-free. In other words, if you want to get insight into the object itself, what it means or what it is, you get the object via its opaque identifier and then look at its attributes. You can think of it as a faceted classification system of identity. Any attribute or combination of attributes from the object can serve as a form of name. Given enough attributes, the identifier gradually becomes unique - picking out a single object, as opposed to a set of objects. Another way to look at this is that in relational database paradigms, all identifiers that carry semantics are actually queries in disguise. (This area: naming things. Is one of my, um, fixations.)

This is an old phenomenon in Web terms on the server side. Ever since the days of cgi-gateway scripts, developers have been intercepting URLs and mapping them into queries, running behind the firewall, talking SQL-speak to the relational database.

Well, this appears to be changing in that there is an alternative, non-relational notion of identifier that appears to gaining a lot of traction. Namely, the idea of using the hashcode of a digital object as its opaque identifier. Why? Well, because once you do that, the opaque identifier can be independent of location. It could be anywhere. In fact - and this is key bit - it can be in many places at once. Hence Heisenfiles as a tip-o-the-hat to Heisenberg.

Your browser no longer needs to necessarily go to tumboliawinery.ie to get the stock.html object. Instead it can pick it up from wherever by basically saying "Hey. Has anybody out there got an object that hashes to X?".

I think this is a profound change. Why now? I think it is a combination of things. HTML5 Browsers and local storage. Identifiers disappearing into the Javascript and out of URL space. The bizarre-but-powerful concept of hosting a web-server inside the client-side browser The growing interest in all-things-blockchain, in particular smart contracts and Dapps.

All these things I think hint at a future where "file" and "location" are very distinct concepts and identifiers for file-like-objects are hash-values. Interesting times.

Tuesday, January 05, 2016

The biggest IT changes in the last 5 years

The last time I sat down and seriously worked on content for this blog was, amazingly, over 5 years ago now in 2010.

It coincided with finalizing a large design project for a Legislative Informatics system and resulted in a series of blog posts to attempt to answer the question "What is a Legislature/Parliament?" from an informatics perspective.

IT has changed a lot in the intervening 5 years. Changes creep up on all of us in the industry because they are, for the most part, in the form of a steady stream, rather than a rushing torrent. We have to deal with change every day of our lives in IT. It goes with the territory.

In fact, I would argue that the biggest difference between Computer Science in theory versus Computer Science in practice, is that practitioners have to spend a lot of time and effort dealing with change. Dealing with change effectively, is itself, an interesting design problem and one I will return to here at some point.

If I had to pick out one item to focus on as the biggest change it would without a doubt be the emergence - for good or ill - of a completely different type of World Wide Web. A Web based not on documents and hyperlinks, but on software fragments that are typically routed to the browser in "raw" form and then executed when they get there.

I.e. instead of thinking about http://www.example.com/index.html as a document that can be retrieved and parsed to extract its contents, much of the Web now consists of document "wrappers" that serve as containers for payloads of JavaScript which are delivered to the browser in order to be evaluated as programs.

It can be argued that this is a generalization of the original web in that anything that can be expressed as a document in the original web can be expressed as a program. It can be argued that the modern approach looses nothing but gains a lot - especially in the area of rich interactive behavior in browser-based user interfaces.

However, it can equally be argued that we risk loosing some things that were just plain good about the original Web. In particular, the idea that content can usefully live at a distance from any given presentation of that content. The idea that content can be retrieved and processed with simple tools as well as big javascript enabled browsers.

I can see both sides of it. At the time I did the closing keynote at XTech 2008 I was firmly in the camp mourning the loss of the web-of-documents. I think I am still mostly there. Especially when I think about documents that have longevity requirements and documents that have legal status. However, I can see a role that things like single-page webapps can play. As is so often the case in IT, we have a tendency to fix what needed fixing in the old model but introducing collateral damage to what was good about the old model.

Over time, in general, the pendulum swings back. I don't think we have hit "peak Javascript" yet but I do believe that there is an increasing realization that Javascript is not a silver bullet, any more than XHTML was ever a silver bullet.

The middle-way, as ever, beckons as a resting place. Who knows when we will get there. Probably just in time to make room for some newly upcoming pendulum swinging that is gathering place on the server side. Namely the re-emergence of content addressable storage which is part of the hashification of everything. I want to get to that next time.

Sean McGrath

Featured Post

Linkedin