Friday, March 12, 2010

Where is your Knowledge?

Software development is a process of knowledge acquisition and knowledge encoding (see Phillip Armour, copiously quoted in this blog). Where, and how, do we store that knowledge? In several places, in several ways:

In source code: that's executable knowledge
In models: that's formal knowledge
In other kind of documents: that's written knowledge
In our brain, consciously: that's explicit knowledge
In our brain, unconsciously: that's tacit knowledge

Knowledge stored in source code has the extremely useful property of being executable, but we can't store the entire development knowledge in executable statements. Design Rationale, for instance, is not present in code (and not even in most UML diagrams, for that matter), and is basically stored at the conscious/unconscious level. My forcefield diagram is much better at formally capturing rationale.

Explicit knowledge is often passed by as oral tradition, while tacit knowledge is often passed by as "a way of doing things", just by working together. Pair programming, reviews, joint design sessions (and so on) help distribute both explicit and tacit knowledge.

Knowledge has value, but that value is not constant over time. In 1962, Fritz Machlup came up with the concept of Half-life of knowledge: the amount of time that has to elapse before half of the knowledge in a particular area is superseded or shown to be untrue.

Moreover, the initial value of a particular piece of knowledge can be very high, like a new algorithm that took you years to get right, or very small, like a trivial validation rule.

Recently, I began to think about the half-life of our knowledge repositories as well. With my usual boldness, I'll go ahead and define the Half-Life of a Knowledge Repository: the amount of time that has to elapse before half of the knowledge in a repository is unrecoverable or just too costly to recover. I could even define "too costly" as "higher than the discounted value of that knowledge when lookup is attempted".

The concept of recoverable knowledge is slightly deeper than it may seem. Sure, it does cover the obvious problems of losing knowledge for lack of backup procedures, or because it's stored in a proprietary format no longer supported, and so on. But it covers also several interesting cases:

- the knowledge is in the brain of an employee, who leaves the company
- the knowledge is in source code, but it's in an obsolete language
- the knowledge is in source code, but it's extremely hard to read
- etc.

I'll leave it up to you to define the half-life of source code, models, documents, brain (conscious and unconscious). Of course, more details are needed: niche languages, for instance, tend to have a shorter half-life.

Now, here is the real boon :-). We can combine the concept of Knowledge Half-Life, Knowledge Value, and Knowledge Repository Half-Life to map the risk of storing a particular piece of knowledge in a particular repository (only). Here is my first-cut map:

Knowledge Half-Life Knowledge (initial) Value Repository Half-Life Result
Long Long Long OK
Long Long Short Risk
Long Short Long Little Waste
Long Short Short Little Risk
Short Long Long Little Waste
Short Long Short Little Risk
Short Short Long Waste
Short Short Short OK

It's interesting to review one of the values in the Agile Manifesto (Working software over comprehensive documentation) under this perspective.

Let's say we have a piece of knowledge, and that knowledge can be indeed stored in code (as I said, you can't store everything in code).

If the half-life of knowledge is short, storing it in code only is probably the best economical choice. If the half-life of knowledge is long, we have to worry a little more. If we add relevant unit tests to that piece of code, we increase the repository half-life, as they make it easier to recover knowledge from code. If we use a mainstream language, we can also increase the repository half-life.

This may still not be enough. If you had to recover the entire knowledge stored in a non-trivial piece of code (say, an mp4 codec) having only the source code, and no (comprehensive) documentation on what that piece of code is doing, why, and how, it would take you far too much. The half-life of code is shorter than the half-life of code + documents.
Actually, depending on context, given the choice to have just the code and nothing else, or just comprehensive documentation and nothing else, we better be careful about what we choose (when knowledge half-life is long, of course).

Of course, the opposite is also true: if you store knowledge with short half-life outside code, you seriously risk wasting your time.

I've often been critic about teaching and applying principles and techniques without the necessary context. I hope that somehow, the table above and the underlying concepts can move our understanding of when to use what a little further.


Romano Scuri said...

Without the final anotation it would have been easy to make the comparison with the horoscope where all mankind is divisible in only 12 categories...


Carlo Pescio said...

Just step up and propose something better : )

Oh, by the way: there are only 10 types of people; those who understand binary, and those who don't.

Romano Scuri said...

I agree: 10 is true.