Sunday, April 26, 2009

Bad Luck, or "fighting the forcefield"

In my previous post, I used the expression "fighting the forcefield". This might be a somewhat uncommon terminology, but I used it to describe a very familiar situation: actually, I see people fighting the forcefield all the time.

Look at any troubled project, and you'll see people who made some wrong decision early on, and then stood by it, digging and digging. Of course, any decision may turn out to be wrong. Software development is a knowledge acquisition process. We often take decisions without knowing all the details; if we didn't, we would never get anything done (see analysis paralysis for more). Experience should mitigate the number of wrong decisions, but there are going to be mistakes anyway; we should be able to recognize them quickly, backtrack, and take another way.

Experience should also bring us in closer contact with the forcefield. Experienced designers don't need to go through each and every excruciating detail before they can take a decision. As I said earlier, we can almost feel, or see the forcefield, and take decisions based on a relatively small number of prevailing forces (yes, I dare to consider myself an experienced designer :-).
This process is largely unconscious, and sometimes it's hard to rationalize all the internal reasoning; in many cases, people expect very argumentative explanations, while all we have to offer on the fly is aesthetics. Indeed, I'm often very informal when I design; I tend to use colorful expressions like "oh, that sucks", or "that brings bad luck" to indicate a flaw, and so on.

Recently, I've found myself saying that "bad luck" thing twice, while reviewing the design of two very different systems (a business system and a reactive system), for two different clients.
I noticed a pattern: in both cases, there was a single entity (a database table, a in-memory structure) storing data with very different timing/life requirements. In both cases, my clients were a little puzzled, as they thought those data belonged together (we can recognize gravity at play here).
Most naturally, they asked me why I would keep the data apart. Time to rationalize :-), once again.

Had they all been more familiar with my blog, I would have pointed to my recent post on multiplicity. After all, data with very different update frequency (like: the calibration data for a sensor, and the most recent sample) have a different fourth-dimensional multiplicity. Sure, at any given point in time, a sensor has one most recent sample and one set of calibration data; therefore, in a static view we'll have multiplicity 1 for both, suggesting we can keep the two of them together. But bring in the fourth dimension (time) and you'll see an entirely different picture: they have a completely different historical multiplicity.

Different update frequencies also hint at the fact that data is changing under different forces. By keeping together things that are influenced by more than one force, we expose them to both. More on this another time.

Hard-core programmers may want more than that. They may ask for more familiar reasons not to put data with different update frequencies in the same table or structure. Here are a few:

- In a multi-threaded software, in-memory structures requires locking. If your structure contains data that is seldom updated, that means it's being read more than written: if it's seldom read and seldom written, why keep it around at all?
Unfortunately, the high-frequency data is written quite often. Therefore, either we accept to slow down everything using a simple mutex, or we aim for higher performances through a more complex locking mechanism (reader/writer lock), which may or may not work, depending on the exact read/write pattern. Separate structures can adopt a simpler locking mechanism, as one is being mostly read, the other mostly written; even if you go with a R/W lock, here it's almost guaranteed to have good performance.

- Even on a database, high-frequency writes may stall low-frequency reads. You even risk a lock escalation from record to table. Then you either go with dirty reads (betting on your good luck) or you just move the data in another table, where it belongs.

- If you decide to cache database data to improve performances, you'll have to choose between a larger cache with the same structure of the database (with low frequency data too) or a smaller and more efficient cache with just the high-frequency data (therefore revealing once more that those data do not belong together).

- And so on: I encourage you to find more reasons!

In most cases, I tend to avoid this kind of problems instinctively: this is what I really call experience. Indeed, Donald Schön reminds us that good design is not for everyone, and that you have to develop your own sense of aesthetics (see "Reflective Conversation with Materials. An interview with Donald Schön by John Bennett", in Bringing Design To Software, Addison-Wesley, 1996). Aesthetics may not sound too technical, but consider it a shortcut for: you have to develop your own ability to perceive the forcefield, and instinctively know what is wrong (misaligned) and right (aligned).

Ok, next time I'll get back to the notion of multiplicity. Actually, although I've initially chosen "multiplicity" because of its familiarity, I'm beginning to think that the whole notion of fourth-dimensional multiplicity, which is indeed quite important, might be confusing for some. I'm therefore looking for a better term, which can clearly convey both the traditional ("static") and the extended (fourth-dimensional, historical, etc) meaning. Any good idea? Say it here, or drop me an email!