Thursday, January 03, 2019

An alternative model of computer programming : Part 2


This is part two of a series of posts about an alternative model of computer programming I have been mulling for, oh decades now. The first part is here: http://seanmcgrath.blogspot.com/2018/08/an-alternative-model-of-computer.html

The dominant conceptual model of computer programming is that it is computation, which in turn of course is a branch of mathematics. This is incredibly persuasive on many levels. George Boole's book An Investigation of The Laws of Thought, sets out a powerful way of thinking about truth/falsity and conditional reasoning in ways that are purely numerical and thus mathematical and, well, really beautiful. Hence the phrase “boolean logic”. Further back in time still, we find al-Khwārizmī in the Ninth century working out sequences of mathematical steps to perform complex calculations. Hence the word “algorithm”. Further back in the time of the ancient Greeks we find Euclid and Eratosthenes with their elegant algorithms for finding greatest common divisors and prime numbers respectively.

Pretty much every programming language on the planet has a suite of examples/demos that include these classic algorithms turned into math-like language. They all feature the “three musketeers” of most computer programming. Namely, assignment (e.g. y = f(x)), conditional logic (e.g. “if y greater than 0 do THIS otherwise THAT”) and branching (e.g. “goto END”).

These three concepts get dressed up in all sorts of fine clothes in different programming languages but, as Alan Turing showed in the Nineteen Thirties, you only need to be able to assign values and to “branch on 0” in order to be able to compute anything that is computable via a classical computer – a so called Turning Machine. (This is significantly less that everything you might want to compute but that is another topic for another day. For now, we will stick to classical computers as exemplified in the so-called Von Neumann Architecture and leave quantum computing for another day.)

So what's the problem? Mathematics clearly maps very elegantly to expressing the logic and the calculations needed to get algorithms formalized for classical computers. And this mathematics maps very nicely onto todays mainstream programming languages.

Well, buried deep inside this beautiful mapping are some ugly truths that manifest themselves as soon as you go from written software to shipped software. To see these truths we will take a really small example of an algorithm. Here it is:
“Let y be the value of f(x)
If y is 0
then set x to 0
otherwise set x to 1”

The meaning of the logic here doesn't matter and it doesn't matter what f(x) actually calculates. All we need is something that has some assignments and some conditional logic such as the above snippet.

Now ask any programmer to code this in Python or C++ or Java or and they will be done expressing the algorithm in their coding environment in a matter of minutes. It is mostly a question of adding the write “boilerplate” code around the edges, and finding whatever the correct syntax is in the chosen programming language for “if” and for “then” and for demarcating statements and expressing assignments etc.

But in order to ship the code to production items such as error handling, reliability, scalability, predictability.... – sometimes referrred to as the “ilities” of programming end up taking up a lot of time and a lot of coding. So much so that the coding for “ilities” that needs to surround shipped code is often many times larger that the lines of code required for the original purely mathematical mapping into the programming language.

All of this ancilliary code – itself liberally infused with its own assignments and conditional logic – becomes part of the total code the needs to be created to ship code and most of it needs to be managed for the life time of the core code itself. So now we have code for numeric overflows, function call timeouts, exception handlers etc. We have code for builds, running test scripts, shipping to production, monitoring, tracing...the list goes on and on.

The pure world of pure math rarely needs to have any of these as concerns because in math we say “let x = f(x)” without worrying if f(x) will actually fall over and not give us an answer at all, or, perhaps worse, work fine for a year and then start getting slower and slower for some unknown reason.

This second layer of code – the code that surrounds the “pure” code is very hard to quantify. Its very hard to explain to non-programmers how important it might prove to be, how much time it might take and then to make matters worse, it is very unusual to be able to say its “done” in any formal sense. There are always loose ends. Error conditions that code doesn't handle – either because they are believed to be highly unlikely or because there are an open ended set of potential error scenarios and its simply not possible to code for every conceivable eventuality.

Pure math is a land of zero computational latency. A land where calculations are independent of each other and cannot interfere with each other. A land where all communications pathways are 100% reliable. A land where numbers have infinite precision. A land of infinite storage capacity. A land where the power never dies...etc. Etc.

All this is to make the point that in my opinion that for all the appealing mapping from pure math to pure algorithms, actual computer programming involves adding many other layers to cater for the fact that the real world of shipped code is not a pure math “machine”.

Next up. My favorite subject. Change with respect to time....