Tuesday, June 29, 2010

Data models, data organization and why the search for the "correct" model is doomed

I have received some e-mails about my assertion that there is no such thing as the "correct" way to model anything in a computer system. I.e. no "pure" model that does not gain its correctness status via mere engineering concerns such as fitness-for-purpose.

My argument boils down to this:

- to model anything in software you need a human

- that human needs to carve up reality in some way in order to create a model. I.e. name things, classify things, link things to other things, distinguish causes and effects, distinguish entities from actions, declare some aspects of reality "unimportant", create a model boundary etc.

- no two humans carve up reality in exactly the same way as we are all unique creatures whose view of the world is influenced by our language, culture, experiences etc.

- therefore, no two models are likely to be exactly the same

- even if they appeared to be the same, there is no way to be sure because human language is lossy. I.e. there is no way to be sure that the model I have in my head is what I have communicated through language. As Wittgenstein said, some things cannot be said - they can only be shown. In Zen terms, our words are just fingers pointing at the moon.

The best book I have read on this subject - highly recommended - is Bill Kent's Data and Reality.

Kent looks at the world from a relational database perspective. A couple of articles from my scribenatorial past might be of interest..They look at the world from a - surprise - XML perspective:

Next up: KLISS, Law and eDemocracy.

5 comments:

nes said...

"Essentially, all models are wrong, but some are useful. " George E. P. Box

Anonymous said...

If I have the components of a model, why would I bother describing them with language? I'll just tweak the component parameters and interconnections until they match what I think. My 'description' is the model itself, showing exactly how I've represented and simplified the real world to solve a particular problem. Sure, someone else could represent the problem differently, but if both models give good results, that's fine. The modelling problem is more general than data representation. For example, I could model the force of gravity with Newton's laws or Einstein's and get an operationally valid solution to the trajectory of a cannonball, but Newton's model would be far simpler to calculate. On the other hand, Einstein's model gives better results for galaxies and Heisenberg's quantum model works better for subatomic particles. What's 'true'? Probably none of the above is 'true'. Our view of the world will always be mediated through our senses and filtered through our models, but that's ok as long as we can manage to engineer efficient solutions to problems -- unless you're neurotically worried about how many angels can dance on the head of a pin.

Gene said...

Very interesting. Will make interviewing less stressful, when the potential employer says, "What is the correct way to model this system?"

Edward K. Ream said...

I agree completely. Leo is founded on the premise that the best organization for data depends on the task at hand. 100 different tasks require 100 different views of data. Happily, Leo makes that possible.

Edward K. Ream said...

I agree completely. Leo is founded on the premise that the best organization for data depends on the task at hand. 100 different tasks require 100 different views of data. Happily, Leo makes that possible.