Thursday, April 29, 2010

Notes on Software Design, Chapter 0: What?

As an author, I've learnt that the introduction is best written last, when text has unfolded, and you really know what your paper is about.
I'm far from that stage with my Notes on Software Design, but I see things clearly enough now that I can write a reasonable introduction, hence the Chapter 0, coming after Chapter 5.

It all starts with the realization that software is just a material, and software design (at any level) is an act meant to give shape (or form) to the material. Software, however, is not a physical material, so we can't borrow on traditional disciplines to seek guidance. Well, sort of.

What do we know about physical materials?
I'm not trying to write an essay on Materials science, so I'll focus on a few central issues here.

Materials have well defined properties. Examples of properties are Electrical conductivity, Coefficient of thermal expansion, Hygroscopy, and so on.

Properties enrich our language and make reasoning more effective (Donald Norman would say that properties "make us smart"). Arguments based on well-defined properties are more robust and don't lead to endless debates. "This material has high hygroscopy, so keep it away from moisture". Period. No debate.

Properties allow to focus: I need to sustain high compression. Therefore, I need high compressive strength. Etc.

Properties allow to select the best materials. Elasticity is required if you want your material to come back to its original shape after stress. Actually, if you know the stress, you can pick the best material, based on a set of criteria, defined by the target product. An important criterion, of course, would be cost.

Properties provide guidance while shaping the material. In fact, we even have manufacturing properties, like castability (see the wikipedia page above on material properties).

Now, properties are well-defined when they are based on replicable, experimental observations. Most often, they are based on both a quantitative measurement process and on a physical theory of matter.

The quantitative side is often based on forces. Compressibility is based on Pressure (and Volume). So we can link everything back to a few well-understood forces.

The theoretical side is usually based on a model of matter. Physical materials are subjected to the well-know laws, like newtonian physics (at the right scale). The model of matter itself (e.g. the notion of metals crystal structure) is helpful when we try to understand why materials behave in a certain way under some specific force.

Moving up (as I said, I'm not trying to write a comprehensive essay on physical materials), we have construction principles and patterns. This is encoded knowledge, prescribing what we should or should not do, or describing how to do something. The bright side is that, in the modern world, we can trace back most principles and patterns to a set of well-defined forces and properties.

Finally, we have tools. Finite-Element Analysis, for instance, can be used to investigate the mechanical, thermal, electromagnetic, [etc], properties of a large structure.

What do we know about software as a material?

Not much, I'm afraid. Our knowledge is basically articulated in:

- Principles
- Patterns and Blueprints
- Methods
- Metrics
- Ilities

There is no shortage of principles. This interesting page, for instance, lists quite a few. Many are ill-defined and redundant, but a working knowledge of most principles is considered necessary for any good programmer / designer.

We have patterns - I know I don't have to say more about this. We also have reference architectures, which are not quite the same as patterns or pattern languages, but close enough.

We have methods. Test Driven Design, for instance, is a method, not a principle.

We have metrics. Metrics seem to be very close to properties. Halstead defined a concept of "volume" for software, based on information theory. The well-known Chidamber and Kemerer suite defined a number of property-like concepts like Depth of Inheritance Tree, Number of Children, and so on. Metrics have the nice property of being easy to measure (most of the times). They have the dubious property of being very remotely connected with design reasoning. In fact, most literature on metrics is about proving they're not meaningless, mostly by showing some correlation with bug density or change density or something more connected with software development.

We have -ilities, like reliability, scalability, and so on. These are often ill-defined, hard to measure properties of the final product. More akin to defining a "safe car" than to defining the properties of an alloy.
Some authors have contributed more ility-like properties. Robert Martin, for instance, talks about Rigidity, Fragility, Viscosity and so on, but they are mostly based on metaphorical reasoning, not an a solid theory. They're also partially overlapping, and not precisely defined.

What do we ignore about software as a material?

We don't have a theory of forces. Lacking forces, it's basically impossible to come up with meaningful properties. Compressibility can't be defined when you don't even know about compression (pressure).

We don't have true properties. Properties should extend from language design to library design to application design. Properties should encompass paradigms. Properties should be based on a sensible theory of what software design is about, not on the mere fact that we can measure something or come up with a nice formula. Properties must be perceived as useful by software designers.

We don't have a way to model forces and properties, basically because we don't know jack about them.

Some perspective
I live in an old building (and I like it :-). When you look at the walls, however, you can almost hear the architect thinking "every problem can be solved by making some wall thicker". Modern buildings are designed with completely different techniques. The next-generation green buildings will make the most out of our knowledge of construction materials.

I also live in the software world. You probably know the adage "All problems in computer science can be solved by another level of indirection" (David Wheeler). Why is that? What is, exactly, a level of indirection? What is indirection, by the way? Why is it useful? What is the underlying theory, what is a reasonable measure? What is the real difference between indirection in data and in control flow? Pick your favorite book on software design, and look up the answer if you can find it :-).

So what?
Well, in the end, this is what I'm aiming for. A theory of software forces. A set of useful properties of software as a material. This is what all this stuff is about. I haven't worked out everything yet. Actually, I've changed my mind on several things I've written so far (hey, it's a blog, not a book :-). But it's slowly coming together.

Oh, by the way. I know quite a few people that won't feel good about the above. They want software to be an art. They want software to be "about humans".
Let me state this clearly: I'm not trying to pursue some sort of "deskilling", whereby any fool could put together great software by applying some sort of magic process. I don't believe in deskilling. Actually, I believe in upskilling. I also understand the idea of software development as a craft, and even as an art. I know the poetry of code, so to speak :-). I rely on intuition and tacit knowledge every single day.
Still, no amount of craftsmanship will prevent a cold, thin glass from breaking when force is applied. I want to know why, and how to shape it anyway, and I want a better way to say that, to teach that, I want to reach a deeper understanding upon which better, more ambitious systems can be built.

Too much? Most likely :-), but hey, what is life without a noble aspiration?