Carlo Pescio

Wednesday, January 19, 2011

Can we really control the quality/cost trade-off?

An old rule in software project management (also known as the Project triangle is that you can have a system that:
- has low development cost (Cheap)
- has short development time (Fast)
- has good quality (Good)
except you can only choose two :-). That seem to suggest that if you want to go fast, and don't want to spend more, you can only compromise on quality. Actually, you can also compromise on feature set, but in many real-world cases, that happens only after cost, time and quality have already been compromised.

So, say that you want to compromise on quality. Ok, I know, most engineers don't want to compromise on quality; it's almost like compromising on ethics. Still, we actually do, more often that we'd like to, so let's face the monster. We can:

- Compromise on external quality, or quality as perceived by the user.
Ideally, this would be simply defined as MTBF (Mean Time Between Failures), although for many users, the perceived "quality" is more akin to the "user experience". In practice, it's quite hard to predict MTBF and say "ok, we won't do error checking here because that will save us X development days, and only decrease MTBF by Y". Most companies don't even measure MTBF. Those who do, use it just as a post-facto quality goal, that is, they fix bugs until they reach the target MTBF. We just don't know enough, in the software engineering community, to actually plan the MTBF of our product.

- Compromise on internal quality, or quality as perceived by developers.
Say goodbye to clarity of names, separation of concerns, little or no duplication of logic, extendibility in the right places, low coupling, etc. In some cases, poor internal quality may resurface as poor external quality (say, there is no error handling in place, so the code just crashes on unexpected input), but in many cases, the code can be rather crappy but appear to work fine.

- Compromise on process, or quality as perceived by the Software Engineering Institute Police :-).
What usually happens is that people start bypassing any activity that may seem to slow them down, like committing changes into a version control system :-) or being thorough about testing. Again, some issues may resurface as external quality problems (we don't test, so the user gets the thrill of discovering our bugs), but users won't notice if you cut corners in many process areas (well, they won't notice short-term, at least).

Of course, a "no compromises" mindset is dangerous anyway, as it may lead to an excess of ancillary activities not directly related to providing value (to the project and to users). So, ideally, we would choose to work at the best point in the quality/time and/or quality/cost continuum (the "best point" would depend on a number of factors: more on this later). Assuming, however, that there is indeed a continuum. Experience taught me the opposite: we can only move in discrete steps, and those are few and rather far apart.

Quality and Cost
Suppose I'm buying a large quantity of forks, wholesale. Choices are almost limitless, but overall the real choice is between:

- the plastic kind (throwaway)

it works, sort of, but you can't use it for all kinds of food, you can't use it while cooking, and so on.

- the "cheap steel" kind.

it's surely more resistant than plastic: you can actually use it. Still, the tines are irregular and feel uncomfortable in your mouth. Weight isn't properly balanced. And I wouldn't bet on that steel actually being stainless.

- the good stainless steel kind.

Properly shaped, properly balanced, good tasting :-), truly stainless, easy to clean up. You may or may not have decorations like that: price doesn't change so much (more on this below).

- the gold-plated kind.

it shows that you got money, or bad taste, or both :-). Is not really better than the good stainless fork, and must be handled more carefully. It's a real example of "worse is better".

Of course, there are other kind of forks, like those with resin or wood handles (which ain't really cheaper than the equivalent steel kind, and much worse on maintenance), or the sturdy plastic kind for babies (which leverage parental anxiety and charge a premium). So, the quality/cost landscape does not get much richer than that. Which brings us to prices. I did a little research (not really exhaustive – it would take forever :-), but overall it seems like, normalizing on the price of the plastic kind, we have this price scale:

Plastic: 1
Cheap steel: 10
Stainless steel: 20
Gold-plated: 80

The range is huge (1-80). Still, we have a big jump from throwaway to cheap (10x), a relatively small jump from cheap to good (2x), and another significant jump between good and gold-plated (4x). I think software is not really different. Except that a 2x jump is usually considered huge :-).

Note that I'm focusing on building cost, which is the significant factor on market price for a commodity like forks. Once you consider operating costs, the stainless steel is the obvious winner.

The Quality/Cost landscape for Software
Again, I'll consider only building costs here, and leave maintenance and operation costs aside for a moment.

If you build throwaway software, you can get really cheap. You don't care about error handling (which is a huge time sink). You copy&paste code all over. You proceed by trial and error, and sometimes patch code without understanding what is broken, until it seems to work. You keep no track of what you're doing. You use function names like foo and bar. Etc. After all, you just want to run that code once, and throw it away (well, that's the theory, anyway :-).
We may think that there is no place at all for that kind of code, but actually, end-user programming often falls into that case. I've also seen system administration scripts, lot of scientific code, and quite a bit of legacy business code that would easily fall into this category. They were written as throwaways, or by someone who only had experience writing throwaways, and somehow they didn't get thrown away.

The next step is very far apart: if you want your code to be any good, you have to spend much more. It's like the plastic/"cheap steel" jump. Just put decent error handling in place, and you'll spend way more. Do a reasonable amount of systematic testing (automated or not) and you're gonna spend more. Aim for your code to be reasonably readable, and you'll have to think more, go back and refactor as you learn more, etc. At this stage, internal quality is still rather low; code is not really stainless. External quality is acceptable. It doesn't feel good, but it gets [some] work done. A lot of "industrial" code is actually at this level. Again, maintenance may not be cheap, but I'm only considering building costs here; a 5x to 10x jump compared to throwaway code is quite reasonable.

The next step is when you invest more in -ilities, as well as in external quality through more systematic and deeper testing. You're building something that you and your users like. It's truly stainless. It feels good. However, it's hard to be "just a little stainless". It's hard to be "somehow modular" or "sort of thread-safe". Either you're on one side, or you're on the other. If you want to be on the right side, you're gonna spend more. Note that just like with fork, long-term cost is lower for good, stainless code, but as far as building cost is concerned, a 2X increase compared to "cheap steel" code is not surprising.

Gold-plating is obvious: grandiose design that serves no purpose (and that would appear as gratuitous complexity to a good designer), coding standards that add no value (like requiring a comment on each and every parameter of every function), etc. I've seen systems spending a lot of energy trying to be as scalable as Amazon, even though they only had to serve a few hundred concurrent clients. Goldplated systems spend more on building, and may also incur higher maintenance costs afterward (be careful not to scratch the gold :-). Of course, it takes experience to recognize goldplating. To someone that has never designed a system for testability, interfaces for mocking may seem like a waste. To an experienced designer, they tell a different story.

But software is... soft!
As usual, a metaphor can be stretched only so far. Software quality can be easily altered over time. Plastic can't be transformed into gold (if you can, give me a call :-), but we can release a bug-ridden version and fix issues later. We may incur intentional technical debt and pay it back in the next release. This could be the right strategy in the right market at the right time, but that's not always the case.

Indeed, although start-ups, customer discovery, innovative products, etc seem to get all the attention lately, if you happen to work in a mature market you may want to focus on early-on quality. Some customers are known to be forgiving about quality issues, in exchange for early access to innovative products. It's simply not the general case.

Modular Quality?
Theoretically, we could invest more in some "critical" components and less in others. After all, we don't need the same quality (external and internal) in every single portion. We can accept quirks here and there. Due to the fractal nature of software, this can be extended to the entire hierarchy of artifacts. It's surely possible. However, it takes a lot of maturity to develop on the edge of quality. Having to constantly reason about the quality/cost trade off can easily cost you more than you're saving: I'd rather spend that reasoning to create quality code.

Once again, however, we may have a plan to start with plastic and grow into stainless steel over time. We must be aware that it is easier said than done. I've seen it many times over my long career. Improving software quality is just another case of moving our software in the decision space. We decided to make software out of plastic early on, and now we want to move it to another place, say the cheap steel spot. Some work is required. Work is proportional to the distance (which is large – see above), but also to the opposing force. Whatever form that force takes, a significant factor is always the mass to be moved, and here lies the key. It's hard to move a monolithic mass of code into another place (in the decision space). It's easier if you can move it a module at time.

Within a small modular unit, you can easily improve local quality. Change a switch/case into polymorphism. Add an interface to decouple two concrete classes. It's easy and it's cheap, because mass is small. Say, however, that you got your modular structure wrong: an important concept has not been identified, and is now scattered all around. You have to tweak a lot of code. While doing so, you'll find that you're not just fighting the mass you have to move: you have to fight the gravitational attraction of the surrounding code. That may reveal more scattered concepts. It's a damn lot of work, possibly more than it's worth. Software is not necessarily soft.

So, can we really move software around? Yes, in the small. Yes, if you got the modular structure right, and you're only cleaning up within that modular structure. Once again, the fractal nature of software can be our best friend or our worst enemy. Get the right partitioning between applications / services, and you only have to work inside. Get the right partitioning between components, and you only have to work inside. And so on. It's fine to have a plastic component, inasmuch as the interface is right. Get the modular structure wrong, and sooner or later you'll need to make wide-range changes, or accept a "cheap steel" quality. If we got it wrong at a small scale, it's no big deal. At a large scale, it can be a massacre.

Conclusions
Once again, modularity is the key. Not just "any" modularity, but a modularity aligned with the underlying forcefield. If you get that right, you can build plastic components and evolve them to stainless steel quality over time. Been there, done that. Get your modular structure wrong, and you'll lack locality of action. In other words: if you plan on cutting corners inside a module, you have to invest at the interface level. That's simpler if you design top-down than if you're in the "emergent design" camp, because the module will be underengineered inside, and we can't expect a clean interface to emerge from underengineered code.

Sunday, January 02, 2011

Notes on Software Design, Chapter 13: On Change

We have a long history of manipulating physical materials. Cavemen built primitive tools out of rocks and sticks. I would speculate :-) that they didn't know much about physics, yet through observation, serendipity and good old trial and error they managed to survive pretty well, or we wouldn't be here today talking about software.

For centuries, we only had empirical observations and haphazard theories about the real nature of the physical world. The Greek Classical Element were Earth, Water, Air, Fire, and Aether, which is not exactly the way we look at materials today. Yet the Greeks managed to build large and incredibly beautiful temples.

Of course, today we know a quite a bit more. Take any kind of physical material, like a copper wire. In the real world, you can:

- apply thermal stress to the material, e.g. by warming it. Different materials react differently.

- apply mechanical stress to the material, e.g. by pulling it on both ends. Different materials react differently.

- apply electromagnetic forces, e.g. by moving the wire inside a magnetic field. Different materials react differently.

- etc.

To get just a little more into specifics, if we restrict ourselves to mechanical stress, and focus on the so-called static loading, we end up with just a few kind of stress: tension, compression, bending, torsion, and shear (see here and also here).

Although people have built large, beautiful structures without any formal knowledge of material science, it is hard to deny the huge progress made possible by a deeper understanding of what is actually going on with materials. Knowledge of forces and reactions allows us to choose the best material, the best shape, and even to create new, better materials.

The software world
Although we can build large, even beautiful systems, it is hard to deny that we are still in the dark ages. We have vague principles that usually can't be formally defined. We have different materials, like statically typed and dynamically typed languages, yet we don't have a sound theory of forces, so we end up with the usual evangelist trying to convert the world to its material of choice. It's kinda funny, because it's just as sensible as having a Brick Evangelist trying to convince you to use bricks instead of glass for your windows ("it's more robust!"), and a Glass Evangelist trying to convince you to build your entire house out of glass ("you'll get more light!").

Indeed, a key realization in my exploration on the nature of software was that we completely lack the notion of force. Sure, we use a bunch of terms like "flexibility", "heavy load", "fast", "fragile", etc., and they all seem to be somehow inspired by physics, but in the end it's just vernacular usage without any kind of theory behind.

As I realized that, I began looking for a small, general yet useful set of "software forces". Nothing too fancy. Tension, compression, bending, torsion, and shear are not fancy, but are incredibly useful for physical materials (no, I wasn't looking for similar concepts in software, but for something just as useful).

Initially, I focused my attention on artifacts, because design is very much about form, and form is made visible through artifacts. I got many ideas and explored several avenues, but most seemed more like ad hoc concepts or turned out to be dead ends. Eventually, I realized we already knew the fundamental forces, though we never used the term "force" to characterize them.

Interestingly, those forces are better known in the run-time world, and are usually restricted to data. Upon closer scrutiny, it turned out they also apply to procedural knowledge, and to the artifact world as well. Also, it turned out that those forces were intimately connected with the concept of entanglement, and could actually shed some light on entanglement itself. More than that, they could be used to find entangled information, in practice, no more philosophy, thank you :-).

So, let's take the easy road and see what we can do with run-time knowledge expressed through data, and build from there.

It's almost too simple
Think about a simple data-oriented application, like a book library. What can you do with a Book record? It's rather obvious: create, read, update, delete. That's it. CRUD. (I'm still thinking about the need for an identity-preserving Move outside the database realm, but let's focus on CRUD right now). CRUD for run-time data is trivial, and is a well-known concept.

CRUD in the artifact world is also relatively simple and intuitive:

Create: we create a new artifact, being it a component, a class, a function.

Read: this got me thinking for a while. In the end, my best shot is also the simplest: we (the contingent intepreter) read the function, as an act of understanding. More on the idea of contingent/essential interpreter in a rather whimsical :-) presentation by Richard Gabriel, that I already suggested a few years ago.

Update: we change an artifact; for instance, we change the source code of some function.

Delete: we remove an artifact; for instance, we remove a member function from a class.

Note that given the fractal nature of software, some operations are recursively identical (by deleting a component we delete all the sub-artifacts), while others change nature as we move into the fractal hierarchy (by creating a new class inside a component we're also updating the component). This is true in the run-time world of data as well.

Extending CRUD to run-time procedural knowledge is not necessarily trivial. Can we talk about CRUD for functions (as run-time, not compile-time, concepts)? We usually don't, but that's because the most common material and tools (programming languages) we're using are largely based on the idea that procedural knowledge is created once (through artifacts), then compiled and executed.

However, interpreted languages always had the possibility of creating, updating and deleting functions at run-time. Some compiled languages allow dynamic function creation (like C#, through Reflection.Emit) but not update/delete. The ability to update a function through reflection would be an important step toward AOP-like possibilities, but most languages have no support for that. This is fine, though: liquids can't be compressed, but that didn't prevent us from defining the concept of compression. I'm looking for concepts that apply in both worlds (artifacts and run-time) but not necessarily to all materials (languages, etc.). So, to summarize:

- Create, Update, Delete retain the usual meaning also for the run-time concept of functions. It means we're literally creating, updating and deleting a function at run-time. The same applies to the (run-time) concept of class (not object). We can Create, Update, Delete classes at run-time as well (given the appropriate language/material).

- Read, in the artifact world, is the human act of reading, or better, is the contingent interpreter (us) giving meaning to (interpreting) the function. At run-time, the equivalent notion is execution, with the essential interpreter (the machine) giving meaning to the function. Therefore, reading a function in the run-time world simply means executing the function. Executing a function may also update some data, but this is fine: the function is read, data are updated; there is no contradiction.

Where do we go from here?
I'm trying to keep this post short, so I'll necessarily leave a lot of stuff for next time. However, I really want to anticipate something. In my previous post I said:

Two clusters of information are entangled when performing a change on one immediately requires a change on the other.

I can now state that better as:

Two clusters of information are entangled when a C/R/U/D on one immediately requires a C/R/U/D on the other.

Yes, the two definitions are not equivalent, as Read is not a Change. The second definition, however, is better, for reasons we'll explore in the next few posts. One reason is that it provides guidance on the kind of "change" you need to consider. Just restrict yourself to CRUD, and you'll be fine. As we'll see next, we can usually group CD and RU, and in fact I've defined two concepts representing those groupings.

Entanglement through CD and/or RU is not necessarily a bad thing. It means that the entangled clusters are attracting each other, and, inasmuch as that specific force is concerned, rejecting others. In other words, they want to form a single cluster, possibly a composite cluster in the next hierarchical level (like two functions attracting themselves and creating a class). Of course, it might also be the case that the two clusters should be merged into a single cluster.

We'll see more about different form of CRUD-induced entanglement in the next few posts. We'll also see how many concepts (from references to database normalization) are actually just ways to deal with the many forms of entanglement, or to move entanglement from the artifact space to the run-time space. Right now, I'll use my remaining time to explain polymorphism through the entanglement perspective.

Polymorphism explained
Consider the usual idea of multiple shapes (circle, triangle, rectangle, etc.) that can be drawn on a screen. In the pre-object era (say, in ANSI C) we could have defined a ShapeType enum, a Shape struct with a ShapeType field and possibly a ShapeData field, where ShapeData would have probably been a union (with all fields for all different shapes). Draw would have been implemented as a switch/case over the ShapeType field.

We have U/U entanglement between ShapeType and ShapeData, and between ShapeType and Draw. U/U means that whenever we update ShapeType we need to update ShapeData as well. I add the Spiral type, so I need to add the relevant fields in the ShapeData union, and add logic to Draw. We also have U/U entanglement between ShapeData and Draw: if we change the way we store shape data, we have to change the Draw function as it depends on those data. We could live with that, but is not pretty.

What if we have another function that depends on ShapeType, like Area? Well, we need another switch-case, and Area will have the same kind of U/U entanglement with ShapeType and ShapeData as Draw. It is that entanglement that is bringing all those concepts together. ShapeType, ShapeData, Draw and Area want to form a cluster.

Note that without polymorphism (language-supported or simulated through function pointers) the code mass is organized into a potentially degenerative form: every function (Draw, Area, etc) is U/U-entangled with ShapeType, and is therefore attracting new procedural knowledge related to each and every ShapeType value. The forcefield will make those functions bigger and bigger, unless of course ShapeType is no longer extended.

Object-oriented polymorphism "works" because it changes the gravitational centers. Each concrete class (Circle, Triangle, etc) becomes a gravitational center for procedural and structural knowledge entangled with a single (ex)ShapeType value. In fact, ShapeType no longer needs to exist at all. There is still U/U-entanglement between the (ex)ShapeData portion related to Circle and the procedural knowledge in Circle.Draw and Circle.Surface (in a Java/C#ish syntax). But this is fine, because they have now been brought together into a new cluster (the Circle class) that then rejected the other artifacts (the rest of the ShapeData fields, the other functions). As I said in my previous post, entanglement between distant element is bad.

Note that we don't really change the mass of code. We just organize it differently, around different gravitational centers. If you draw functions as tall boxes in the switch-case form, you can see that classes are just "wide" boxes cutting through functions. The area of the boxes (the amount of code) does not change.

Object-oriented polymorphism in a statically-typed language also requires [interface] inheritance. This brings in a new form of U/U-entanglement, this time between the interface and each and every concrete class. Whenever we change the interface, we have to change all the concrete classes.

Interestingly, some of those functions might be entangled in the run-time world: they tend to be called together (R/R-entanglement in the run-time space). That would suggest yet another clustering (in the artifact space), like separating all the drawing operations from the geometrical operations in different interfaces.

Note that the polymorphic solution is not necessarily better: it's just a matter of probability. If the set of functions is unstable, plain polymorphism is not really a good answer. We may want to explore a more complex solution based on the Visitor pattern, or some other clustering of artifacts. Understanding entanglement for Visitor is indeed an interesting exercise.

Coming soon (I hope :-)
CD and RU-entanglement (with their real names :-), paving our way to the concept of isolation, or shielding, or whatever I'm gonna settle for. Oh guys, by the way: happy new year!

Friday, November 19, 2010

Notes on Software Design, Chapter 12: Entanglement

Mechanical devices require maintenance, some more than others. Maintenance is intended to keep them working, because mechanical parts tend to wear out. Maintenance is required so that they can still work "as new", that is, do the same thing they were built to do.

Software programs require maintenance, some more than others. Maintenance is intended to keep them useful, because software parts don't wear out, but they get obsolete. They solve yesterday's problems, not today's problems. Or perhaps they didn't even solve yesterday's problems right (bugs). Maintenance is required so that software can so something different from what it was built to do.

Looking from a slightly different perspective, software is encoded knowledge, and the value of knowledge is constantly decaying (see the concept of half-life of knowledge). Maintenance is required to restore value. Energy (human work) must be supplied to keep encoded knowledge current, that is, valuable.

Either way, software developers spend a large amount of their time changing stuff.

Change
Ideally, changes would be purely additional. We would write a new software entity (e.g. a class) inside a new artifact (a source file). We would transform that artifact into something executable (if needed; interpreted languages don't need this step). We would deploy just the new executable artifact. The system would automagically discover the new artifact and start using it.

Although it's possible to create systems like that (I've done it many times through plug-in architectures), programs are not written this way from the ground up. Plug-ins, assuming they exist at all, usually appears at a relatively coarse granularity. Below that level, maintenance is rarely additional. Besides, additional maintenance is only possible when the change request is aligned with the underlying architecture.

So, the truth is, most often we can't simply add new knowledge to the system. We also have to change and remove existing parts. A single change request then results in a wave of changes, propagating through the system. Depending on some factors (that I'll discuss later on) the wave could be dampened pretty soon, or even amplified. In that case, we'll have to change more and more parts as the wave propagates through the system. If we change a modularized decisions, the change wave is dampened at module boundary.

The nature of change
Consider a system with two hardware nodes. The nodes are based on completely different hardware architectures: for instance, node 1 is a regular PC, node 2 is a DSP+ASIC embedded device. They are connected through standard transport protocols. Above transport, they don't share a single line of code, because they don't calculate the same function.
Yet, if you change the code deployed in node 1, you have to change some (different) code in node 2 as well, or the system won't work. Yikes! Aren't those two systems loosely coupled according to the traditional coupling theory? Sure. But still, changes need to happen on both sides.

The example above may seem paradoxical, or academic, but is not. Most likely, you own one of those devices. It's a decoder, like those for DVB-T, sat TV, or inside DivX players. The decoder is not computing the same function as the encoder (in a sense, it's computing the inverse function). The decoder may not share any code at all with the encoder. Still, they must be constantly aligned, or you won't see squat.

This is the real nature of change in software: you change something, and a number of things must be changed at the same time to preserve correctness. In a sense, change is affecting very distant, even physically disconnected components, and must be simultaneous: change X, and Y, Z, K must all change, at once.

At this point, we may think that all the physical analogies are breaking down, because this is not how the physical world behaves. Except it does :-). You just have to look at the right scale, and move from the familiar Newtonian physics to the slightly less comfortable (for me, at least) quantum physics.

Quantum Entanglement
Note: if you really, really, really hate physics, you can skip this part. Still, if you never came across the concept of Quantum Entanglement, you may find it interesting. Well, when I first heard of it, I thought it was amazing (though I didn't see the connection with software back then).

You probably know about the Heisenberg Uncertainty Principle. Briefly, it says that when you go down to particles like photons, you can't measure (e.g.) both position and wavelength at arbitrarily high precision. It's not a problem of measurement techniques. It's the nature of things.

Turns out, however, that we can create entangled particles. For instance, we can create two particles A and B that share the same exact wavelenght. Therefore, we can try to circumvent the uncertainty principle by measuring particle A wavelength, and particle B position. Now, and this is absolutely mind-blowing :-), it does not work. As soon as you try to measure particle B position, particle A reacts, by collapsing its wave function, immediately, even at an arbitrary distance. You cannot observe A without changing B.

Quoting wikipedia: Quantum entanglement [..] is a property of certain states of a quantum system containing two or more distinct objects, in which the information describing the objects is inextricably linked such that performing a measurement on one immediately alters properties of the other, even when separated at arbitrary distances.

Replace measurement with change, and that's exactly like software :-).

Software Entanglement
The parallel between entangled particles and entangled information is relatively simple. What is even more interesting, it works both in the artifact world and in the run-time world, at every level in the corresponding hierarchy (the run-time/artifact distinction is becoming more and more central, and was sorely missing in most previous works on related concepts). On the other hand, the concept of entanglement has far-reaching consequences, and is also a trampoline to new concepts. This post, therefore, is more like a broad introduction than an in-depth scrutiny.

Borrowing from quantum physics, we can define software entanglement as follows:

Two clusters of information are entangled when performing a change on one immediately requires a change on the other.

A simple example in the artifact space is renaming a function. All (by-name) callers must be changed, immediately. Callers are entangled with the callee. A simple example in the run-time space is caching. Caching requires cache coherence mechanisms, exactly because of entanglement.

Note 1: I said "by name", because callers by reference are not affected. This is strongly related with the idea of dampening, and we'll explore it in a future post.

Note 2: for a while, I've been leaning on using "tangling" instead of "entanglement", as it is a more familiar word. So, in some previous posts, you'll find mentions to "tangling". In the end, I decided to go with "entanglement" because "tangling" has already being used (with different meaning) in AOP literature, and also because entanglement, although perhaps less familiar, is simply more precise.

Not your granpa coupling
Coupling is a time-honored concept, born in the 70s and survived to this day. Originally, coupling was mostly concerned with data. Content coupling took place when a module was tweaking another module's internal data; common coupling was about sharing a global variable; etc. Some forms of coupling were considered stronger than others.

Most of those concepts can be applied to OO software as well, although most metrics for OO coupling takes a more simplified approach and mostly consider dependencies as coupling. Still, some forms of dependency are considered stronger than others. Inheritance is considered stronger than composition; dependency on a concrete class is considered stronger than dependency on an interface; etc. Therefore, the mere presence of an interface between two classes is assumed to reduce coupling. If class A doesn't talk to class B, and doesn't even know about class B existence, they are considered uncoupled.

Of course, lack of coupling in the traditional sense does not imply lack of entanglement. This is why so many attempts to decouple systems through layers are so ineffective. As I said many times, all those layers in business systems can't usually dampen the wave of changes coming from the (seemingly) innocent need to add a field to a database table. In every layer, we usually have some information node that is tangled with that table. Data access; business logic; user interface; they all need to change, instantly, so that this new field will be put to use. That's entanglement, and it's not going away by layering.

So, isn't entanglement just coupling? No, not really. Many systems that would be defined as "loosely coupled" are indeed "heavily entangled" (the example above with the encoder/decoder is the poster child of a loosely coupled / heavily entangled system).

To give honor to whom honor is due, the closest thing I've encountered in my research is the concept of connascence, introduced by Meilir Page-Jones in a little known book from mid-90s ("What Every Programmer Should Know About Object-Oriented Design"). Curiously enough, I came to know connascence after I conceived entanglement, while looking for previous literature on the subject. Stille, there are some notable differences between connascence and entanglement, that I'll explore in the forthcoming posts (for instance, Page-Jones didn't consider the RT/artifact separation).

Implications
I'll explore the implications of entanglement in future posts, where I'll also try to delve deeper into the nature of change, as we can't fully understand entanglement until we fully understand change.

Right now, I'd like to highlight something obvious :-), that is, having distant yet entangled information is dangerous, and having too much tangled information is either maintenance hell or performance hell (artifact vs run-time).

Indeed, it's easy to classify entanglement as an attractive force. This is an oversimplification, however, so I'll leave the real meat for my next posts.

Beyond Good and Evil
There is a strong urge inside the human being to classify things as "good" and "bad". Within each category, we further classify things by degree of goodness or badness. Unsurprisingly, we bring this habit into computer science.

Coupling has long been sub-classified by "strength", with content coupling being stronger than stamp coupling, in turn stronger than data coupling. Even connascence has been classified by strength, with e.g. connascence of position being stronger than connascence of name. The general consensus is that we should aim for the weakest possible form of coupling / connascence.

I'm going to say something huge now, so be prepared :-). This is all wrong. When two information nodes are entangled, changes will propagate, period. The safest entanglement is the one with the minimum likelihood of occurrence. Of course, that must be weighted with the number of entangled nodes. It's very similar to risk exposure (believe or not, I couldn't find a decent link explaining the concept to the uninitiated :-).

There is more. We can dampen the effects of some forms of entanglement. That requires human work, that is, energy. It's usually upfront work (sorry guys :-), although that's not always the case. Therefore, even if I came up with some classification of good/bad entanglement, it would be rather pointless, because dampening is totally context-dependent.

I'll explore dampening in future posts, of course. But just to avoid excessive vagueness, think about polymorphism: it's about dampening the effect of a change (which change? Which other similar change is not dampened at all by polymorphism?). Think about the concept of reference. It's about dampening the effect of a change (in the run-time space). Etc.

Your system is not really aligned with the underlying forcefield unless you have strategies in place to dampen the effect of entanglement with high risk exposure. Conversely, a deep understanding of entanglement may highlight different solutions, where entangled information is kept together, as to minimize the cost of change. Examples will follow.

What's next?
Before we can talk about dampening, we really need to understand the nature of change better. As we do that, we'll learn more about entanglement as well, and how many programming and design concepts have been devised to deal with entanglement, while some concepts are still missing (but theoretically possible). It's a rather long trip, but the sky is clear :-).

Monday, November 01, 2010

Design, Structure, and Decisions

Joe is a junior programmer in the IT department of SomeLargeCompany, Inc. His company is not in the business of software; however, they have a lot of custom programs that are constantly developed, adapted, tweaked to follow the ever-changing landscape of products, sales, regulations, etc.

Joe is largely self-taught. He doesn't know much about software engineering, but in the end, he gets things done.

Right now, Joe is writing some code dealing with money. He's using lots of floating points variables ("double" in his language) to store money.
Joe does not know about the inherent rounding error of decimal-to-binary conversion (and back). He doesn't know about a decimal floating point type. He doesn't know, and didn't look for, a Money class that could also store the currency. He's just coding his way out of his problem. He didn't choose to use a double after pondering on the alternatives, or based on previous experiences. It wasn't a choice at all. He just didn't know any better.

Joe is actually in the middle of a rather long function, doing some database lookup, some calculations, some reporting. He learned, the hard way, that is better to check for errors. He will handle any error in the simplest possible way: by opening an error box in the middle of the function, where the error can be diagnosed.
Joe doesn't know about design principles. He doesn't know that it's usually better to keep business logic decoupled from the user interface. He didn't choose to put some GUI code inside the business logic after thinking about alternatives. He just did it. He didn't know any better.

That function is so long because at some point Joe realized he had to handle several different cases. He did that the usual way: with a switch/case. He never really "got" objects and couldn't even think about using polymorphism there. He uses classes, somehow, but not at that fine granularity. So, again, he didn't choose the switch/case over something else. He just didn't know any better.

Structure vs. Design
Joe is a good guy, really; and his code kinda works, most of the times. But this story is not about Joe. It's about code, structure, design, and decisions: because in the end, Joe will write some serious amount of code. Code has an inner structure: in this case, structure will reveal heavy reliance on primitive types (double), will reveal that business logic is tangled with database access and with user interface concerns, and will also reveal that no extension mechanism is in place. Code never lies.

Now, the modern (dare I say post-agile?) way of thinking is that in the end, the code is the design. That is, if you want to know the real design, you have to look at the code. You'll find a lot of literature (not to mention blog posts), some from well-known authors, promoting or just holding on this idea. At some point, just about everybody gave in and accepted this as a truism. As you might guess, I didn't.

A process or a thing?
Software development is a young discipline, yet extremely dynamic. It's hardly surprising to find confusion and disagreement even on the fundamentals. That includes, of course, having several hundred different definitions of "software design". For some, it's a process. For some, it's the result of that process. We then have endless debates, like "is coding a form of design?", which again are often confusing "design" with "modeling" and wasting lot of ink about nothing.

It doesn't have to be that hard. Let's start with the high-level question: design as a process vs. design as a thing. It's a rather simple choice; we may even benefit from the digital version of some old, dusty etymology dictionary, where design is commonly defined as "mark out, devise, choose, designate, appoint". Design is an act, and therefore a process. Please don't read "process" as "a series of mechanical acts"; just consider a process as something you carry out over a period of time.
More exactly, design is decision process. We choose between alternatives, balancing forces as we go. We make decisions through a reflective conversation with our materials, taking place all the time, from early requirements to bug fixing.

Design, however, is not the code. It is not the diagram. It is not an artifact. If we really want to transform design into "a thing", then it is best defined as the set of choices that brought us to the artifact; assuming, of course, that we actually made any choice at all.

Side note: not every decision is a design decision. Design is concerned with form, that is, with the shape of our material. Including or excluding a feature from a product, for instance, is not a design decision; it is best characterized as a marketing decision. In practice, a product often results from the interplay of marketing and design decisions.

Design as a decision process
Back in 2006 I suggested that we shouldn't use terms like "accidental architecture" because architecture (which is just another form of design) is always intentional: it's a set of choices we make. I suggested using "structure" instead.

Artifacts always have structure. Every artifact we use in software development is made of things (depending on our paradigm) that are somehow related (again, depending on our paradigm). That's the structure. Look at a diagram, and you'll see structure. Look at the code, and you'll see structure. Look at the executable, and if you can make sense of it, you'll see structure.

There is a fundamental difference between structure and design. Structure can be the result of an intentional process (design) or the result of an accidental process (coding your way out of something). Design is always intentional; if it's not intentional, it's not design.

So, if you want to know the real structure (and yes, usually you want to), you have to look at the code. You may also look at diagrams (I would). But diagrams will show an abstract, idealized, and yes, possibly obsolete high-level structure. The "real" structure is the code structure. It would be naive, however, to confuse structure with design.

Consider Joe's code. We can see some structure in that code. That structure, however, is not design. There is no rationale behind that structure, except "I didn't know any better". Joe did not design his code. He just wrote it. He didn't make a single decision. He did the only thing he knew. That's not design. Sorry.

What is design, again?
Once we agree that design is a decision process, we can see different design approaches for what they are: different ways to come to a decision.

The upfront school claims that some decisions can be made before writing code, perhaps by drawing models. Taken to the extreme (which is always naive), it would require that we make all design decisions before writing code.

The emergent design school claims that system-level decisions should emerge from the self-organization of lower-level decision, mostly taken by writing code according to "good practices" like Don't Repeat Yourself. Taken to the extreme (which is always naive), it would require that we make no system-level choices at all, and wait for code to jell into a well-formed structure.

The pattern school claims that we can recognize forces and adopt pre-packaged decisions, embodied into a pattern.

Systematic techniques like my good old SysOOD claim that we can systematically recognize some problems and choose from a set of sound transformations.

The MDA school claims that we can organize decisions along two axes: decisions that we store in models, and decisions that we store in transformation tools. This would be a long story in itself.

Etc.

If you have a favorite design approach (say, TDD or Design by Contract) you may consider spending a little time pondering on which kind of decisions are better supported by that approach.

There is no choice
If there is no choice to be made, or if you can't see that there is a choice to be made, you are not doing design. This is so big that I'll have to say it again. If there is no choice to be made, you're not doing design. You might be drawing some fancy diagram, but it's not design anyway. I can draw a detailed diagram of the I/O virtualization layer for an embedded device in 10 minutes. I've done that many times. I'm not choosing anything, just replicating something. That's not design. It's "just" modeling.

On the other end of the spectrum, I've been writing quite a bit of code lately, based on various Facebook APIs (the graph api, the javascript sdk, fbml, whatever). Like many other APIs, things never really work as documented (or lacking documentation, as it would be reasonable to assume). Here and there, in my code, I had to do things not the way I wanted, but the only way that actually worked. In light of the above, I can honestly say that I did not design those portions. I often couldn't make a choice, although I wish I could (in this sense, it would be terrible to have someone look at that code and think that it represents "my design").

Of course, that's not my first web application ever. Over time, I've written many, and I've created a lot of small reusable components. I know them well, I trust them, I have the source code, and I'm familiar with it. When I see a problem that I can easily solve by reusing one, I tend to do it on the spot.
Now, this is a borderline case, as I'm not really making a choice. The component is just there, crying to be reused. I'm just going with the flow. It's perhaps 5% design (recognizing the component as a good candidate and choosing to use it) and 95% habits.
Reusing standard library classes is probably 1% design, 99% habits. We used to write our own containers and even string classes back in the early 90s. Today, I must have a very compelling case to do so (the 1% design is recognizing I don't have a very compelling case :-).

Trying to generalize a little, we may like to think that we're making decisions all the time, but this is far from true. In many cases, we don't, because:

- Like Joe, we don't know any better.

- Like above, we're working with an unstable, poorly documented, rather crappy third party library. It's basically trial and error, with a little clean-up in the end (so, let's say 2-5% design). The library is making the choices (95-98%).

- We've done this several times in the past, and we're just repeating ourselves again. It's a habit. Perhaps we should question our habit, but we don't. Perhaps we should see that we're writing the same code over and over and design a reusable component, but we don't.

- We (the company, the team, the community) have a standard way to do this.

- The library/framework/language we're using has already made that choice.

- We took a high-level decision before, and the rest follows naturally.

After pondering on this for a while, I came to the conclusion that in the past I've been mixing two aspects of the decision making process. The need to make a decision, and the freedom to make a decision. You can plot this as a quadrant, if you want.

Sometimes, there is just no need to make a choice. This can actually be a good thing. It could be a highly productive state where you just have to write down your logic. There might be short "design moments" where we make low-scale decisions (the fractal nature of software makes the decision process fractal as well), but overall we just go with the flow. As usual, the issue is context. Not making a decision because there is a natural solution is quite different from not making a decision because we can't even see the possibilities, or the need.

Sometimes, we need to make a choice. At that point, we need the freedom to do so. If the code is so brittle that you have to find your way by trial and error, if company standards are overly restrictive, if our language is too limited, if the architecture is too constraining, we lack freedom. That leads to sub-optimal local and global choices. Here is the challenge of architecture: make the right decisions, so that you offer a reasonable structure for growth, yet leave enough freedom for local choices, and a feedback loop from local to global.

Design, as a decision process, is better experienced in bursts. You make a set of choices, and then you let the natural consequences unravel themselves. You may unravel some consequences using models, if you like it. In some cases, it's very effective. Some consequences are best unraveled by writing code. Be smart, not religious.

Be aware when you have to make too many choices, too often. It's fine in the beginning, as the decision space is huge. But as you progress, the need for new choices should naturally decrease, in frequency and in scale. Think of coding as a conversation with your artifacts. Just like any conversation, it can turn into an argument and then into a fight. Your code just does not want to be shaped that way; it was never meant to be shaped that way; it is resisting your change. You change something here, just to find out that you really need to patch that other part, and so on. You have to think, think, think. Lot of effort, little progress. On a deeper level, you're fighting the forcefield. The design is not aligned with the forcefield. There is friction (in the decision space!) and friction energy is wasted energy.

Finally, if you draw a quadrant, you'll see an interesting spot where there is no need to make choices, but there is freedom to. I like that spot: it's where I know how to do it, because I've done it before, but it's not the only way to do it, and I might try something new this time. Some people don't take the opportunity, and always take the safe, known way. Depending on the project, I don't mind taking a risk if I see the potential for a sizeable reward.

What about good design?
Back in 1972, David Parnas wrote one of the most influential papers ever ("On the criteria to be used in decomposing systems into modules"), where two alternative modular structures were proposed for the same system. The first used the familiar functional decomposition. The second was based on the concept of Information Hiding. Later on, that paper has been heavily quoted in the OOP literature as an inspiration for encapsulation. Indeed, Parnas suggests hiding data structure details inside modules, but that was just an example. The real message was that modules should encapsulate decisions: every module [...] is characterized by its knowledge of a design decision which it hides from all others"; again: "we propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others".

Think about it, and it makes lot of sense. Design is a decision-making process. Decisions may turn out to be wrong, or may become obsolete as business and technology change. Some decisions are also meant to be temporary from the very beginning. In all cases, we would like to modularize those decisions, so that we can change them without global impact.

Unfortunately, this is not how most people design software. Most decisions are not modularized at all. Worse yet, most decisions are not made explicit, either in code, diagrams, sketchy notes, whatever. And worst of all, a lot of code is written without taking any decision at all, just like good ol' Joe does. "Seat of your pants" is not a design strategy.

Note: given current programming technology and tools, modularizing decisions is easier said than done. For instance, I decided to use AJAX in the web application I mentioned above. I would love to modularize that decision away. But it would be damn hard. Too hard to be worth it. So I let this decision influence the overall structure. There are many cases like this. Quite often, during a joint design session, I would say something like - guys, here we're making a pervasive, non-linear choice. This is like a magic door. Once we choose, we enter into a different world.

So, we're doing a good design job when we make modular decisions, that we can change later without global impact. I think this may shed a different light on delaying decisions (design decisions, not (e.g.) marketing decision).

The folklore is that if you wait, you can make a more informed decision. Of course, if you wait too much, you reach a state of paralysis where you can't progress anymore, and you may also miss some opportunity along the road.

From the decision modularization perspective, things are slightly more precise:

- If, given our current knowledge, the decision can't be modularized, it's better to wait, because we might find a way to make it modular, or get a better picture of our problem, and hopefully make the "right" nonmodular decision. Note that when a decision is not modular, undoing/changing it will incur a substantial cost: the mass to move in the decision space will be high, and that's our measure of work.

- If the decision can be modularized, but you think that given more knowledge you could modularize it better, it makes sense to wait. Otherwise, there is little or no value in waiting. Note that postponing a decision and postponing implementation are two different things (unless your only way to make a decision is by writing code). There is a strong connection with my concept of invariant decision, but I'll leave it up to you to dig around.

As an aside, I'm under the impression that most people expect to just "learn the right answer" by waiting. And by "right" they mean that there is no need to modularize the decision anymore (because it's right :-). This is quite simplistic, as there is no guarantee whatsoever that a delayed decision will be particularly stable.
So, again, the value of delaying should be in learning new ways to modularize away the decision. Learning more about the risk/opportunity trade-off of not modularizing away the decision is second best.

Conclusions
In a sense, software development is all about decisions. Yet, the concept is completely absent from programming and modeling languages, and not particularly explicit in most design methods as well.
In the Physics of Software I'm trying to give Decisions a central role. Decisions are gateways to different forcefields. Conversely, we shape the forcefield by making decisions. Decisions live in the Decision Space, and software evolution is basically a process of moving our software to another position in the Decision Space. Decisions are also a key ingredient to link software design and software economics. I have many interesting ideas here, mostly based on real option theories, but that's another story.

Next time: tangling.

Monday, October 04, 2010

Notes on Software Design, Chapter 11: Friction in the Artifacts world

When I first "got" the concept of run-time friction, I thought it made sense only in the run-time world. I was thinking of friction as "everything that gets in the way as we process the Function", and since there is no Function in the artifact world, there can be no friction as well. That disturbed me a little, because every other concept was present in both worlds.

Later on, I realized I could extend that notion to the artifact side in two distinct ways. One didn't survive scrutiny. It was too informal, although somehow I'd like to bring back some of the underlying reasoning, probably as part of a different property. The other proved more solid. Since a few people asked me (in real life) how do I get these ideas, I think it might be interesting to tell the story behind the concept; after all, a blog ought to be a... log :-). The story is not really linear, but then, very few things in life are linear.

Step 1
It was an early morning back in July. I was running. At some point (within the first few kilometers) I had a flash that perhaps the notion of mass was not a primitive concept. Perhaps there was a concept of volume (and LOC would give volume, not mass) and a concept of density (after all, lines can be quite different). I spent a few minutes thinking about density (the simplest idea being that perhaps something like cyclomatic complexity could explain density), then thinking that I didn't like volume because it reminded me of Halstead's Software Science, and I didn't want my work to be so disconnected from practice. Then I started to think about a concept of surface; maybe there was an ideal volume / surface ratio too? Then the zen effect of running took over and I blissfully stopped thinking :-)). [most of those ideas were good, and at some point I'll have to reconsider a few things].

Step 2
Days later, I was thinking about giving a name to this stuff I'm writing. I came up with a few ideas, and also run a trademark / domain name search, because you never know. Looking for "Physics of Software", I found a remotely related entry in the C2 wiki: Physical Cues In Software Development.
Now, that stuff seems more concerned with the geometry/topology of code than with the physics of software, but while reading that page I was slightly tantalized by this sentence: "Too dense to refactor easily". Interesting. I did some literature research on density vs. refactoring, but nothing substantial came up.

Step 3
Days later again. I was writing some code (yeah, I write real world code too :-). I tend to write short methods: I practice what I preach. Still, I was in the middle of a rather long function (by my standards, that is, about 100 lines). I was looking at it, trying to understand how it got to be that big. I could see the gravitational effect of having used a massive third party component; that was consistent with my current understanding of the scale-free nature of software (more on this another time).
I could also see a few small, simple improvements, but in the end it was not trivial to refactor that method into a few shorter functions. Sure, it could be done, just not by selecting a few lines and choosing "extract method" from the refactoring menu. I had to create new classes, shuffle responsibilities around. Overall, it was a large effort (given the relatively small mass). I contemplated the idea of leaving that function alone :-). Too dense to refactor easily? Hmm.

Step 4
Perhaps a couple of weeks later. This thing kept bouncing in my head. I always assumed I could refactor every single method into a set of smaller methods, perhaps introducing new classes. Sure, run-time friction could grow as a result, but that's just something to be balanced. But was that actually true? Could I write a function that was basically impossible to refactor, that is, where extracting a few lines required either a huge effort or, even better, where to move N symbols outside, you basically have to add N symbols inside (to call the extracted method)? Having too much to do, I left the question unanswered.

Step 5
Just a few days later. Late evening, but unwilling to call it a day :-), I sat down trying to write a gordian function :-), a simple sequence of lines that couldn't easily be refactored. This is what I end up with:


void f( int a, int b )
  {
  int c = a + b;
  int d = a + c;
  int e = b + c;
  int f = a + d;
  int g = b + d;
  int h = c + d;
  int i = c + e;
  int j = b + e;
  int k = a + e;
  // ... use f, g, h, i, j, k as above
  }

It may be easier to visualize the pattern through a graphical view:

the idea is pretty simple: at every level, I'm using nearby concepts and distant concepts; I'm also creating nodes for subsequent use in lower levels. Now, this function is trivial. Cyclomatic complexity is just 1. Yet is hard to refactor. It is hard to move things (lines, symbols, concepts) around. So I thought perhaps this was "density".

Step 6
Out of nowhere, a few days later I realize that density was not the right name. When you have troubles moving things around, we call it viscosity, not density. That triggered an internal alert: viscosity has already been used in some computing literature, and I hate to redefine existing terms, so let's check literature again.

Step 7
To my knowledge, "viscosity" has been used in:
- The Cognitive Dimensions of Notations literature, where it is defined as "the difficulty of making small changes to the information structure" or "resistance to change", both of which are rather similar to what I'm thinking. Note that although the papers above are about the notation, not the code, I discussed extending those concepts from tools to materials almost three years ago.
- The well known Design Principles and Design Patterns paper from Robert Martin, where it is defined as "When the design preserving methods are harder to employ than the hacks, then the viscosity of the design is high". This is completely unrelated, and honestly I'm not sure that "viscosity" is the right term. Indeed, Martin is using several physics-lookalike properties (immobility, fragility, rigidity), but it seems like they've been adopted on the basis of some vernacular usage of terms, not on the basis of strong correlation between the software world and the physical world (there is certainly no notion of "preserving a design vs. hacks" in viscosity as defined in physics).

In the end, I considered viscosity as a good choice for the difficulty of moving knowledge around in the artifact world.

Step 8
Guess what. Viscosity is basically friction (in fluids). Ok, I got it :-).
Just like run-time friction kicks in when you try to move run-time knowledge around, artifact-side friction kicks in when you try to move artifact-side knowledge around. Code is one way of representing artifact-side knowledge. Diagrams are another way. They both manifest some resistance to change.
Once you get the concept (almost) right, it's time to clean things up, come up with a more precise definition, see if it's useful and what you can learn from it.

Defining viscosity
Why is the artifact in step 5 viscous, that is, what makes it difficult to move knowledge around? The main issue is that we can't find sub-centers, because every line has local interactions (with nearby knowledge) and non-local interaction (with distant knowledge). So although every single line is using just a few symbols, you can't simply find a subset of lines that is relatively isolated from the rest. Every line is also very simple on its own, and unworthy to be moved outside alone.

So, I could define a viscous artifact A like this:

A is viscous when given any subset B of A, the mass of knowledge inside B is not significantly higher than the amount of knowledge exchanged between B and A-B

I crafted this definition rather carefully :-). In fact, the problem with the function above is not just that every line depends on nearby and distant knowledge. It's also that it's doing very little. Otherwise, we could move a significant portion of code (doing "lot" of stuff) outside the function, of course by passing parameters. But then, the mass of knowledge inside the subset B would be higher than the mass of knowledge exchanged (the parameters). That is not the case in function f.

Viscous artifacts resist shear/tensile stress, that is, they resist extraction of knowledge. As we try to move the knowledge in B outside A, the knowledge exchanged with A-B is opposing the movement. The effort we have to spend to reorganize viscous artifacts is the equivalent of friction energy in the run-time world. Only, this time, it's human work, not CPU work.

Note: I could come up with some kind of formula for a viscosity coefficient. I did not because I feel I'm not yet at that stage. Still, the minimum ratio between exchanged and internal knowledge (quantified over the subsets B, not over A) seems like a good candidate.

It is important to understand that viscosity is an internal property of an artifact. It has nothing to do with the artifact interface. It's about the artifact internals. Of course, given the hierarchy of artifacts, an artifact may have low [internal] viscosity, yet be part of a viscous higher-level artifact. For instance, a low-viscosity function can be part of an high-viscosity class. In that case, it would be easy to move some portions of code outside the function, but not outside the class.

Consequences
Once again, we have to resist the temptation to define some property as "bad". In the physical world, viscosity is not "bad". It can be useful, or it can be a problem: it's a matter of context.

Besides, when you look at the definition above, you may see some relationship with a vague notion of cohesion: a viscous artifact is "more cohesive". Cohesion is usually considered a good property. What's wrong here?

My current understanding is that it's fine to be viscous when the mass is small. Actually, it's good to be viscous when the mass is small. A small viscous chunk of knowledge is a good center: it stands on its own, and can't be easily broken. However, viscosity should decrease as mass increases. That gives an opportunity for extraction of knowledge, thereby forming new centers. Note that by saying so, I'm implicitly accepting the existence of high-mass artifacts. Still:

- high-mass, low-viscosity artifacts can be considered as rather innocent underengineering, a form of technical debt that can be easily repaid (see also the latest comments to Chapter 0). The high gravity of the artifact will bring in more stuff: that would be the ideal time to refactor the code, which will be easy, since viscosity is low.

- high-mass, high-viscosity artifacts are a serious design weakness that should be dealt with as soon as you notice, possibly while you're still creating the artifact. Gravity will bring in more stuff, and things can only get worse.

Note: a trivial refactoring of a viscous artifact will significantly increase run-time friction, as we'll have to pass a lot of parameters around. In most cases, we'll have to rethink the artifact and perhaps a significant portion of the surroundings, as we may have chosen the wrong centers.

Conclusions
It's sort of revealing that I started with a notion of density but ended up with a notion of viscosity, and had to reject an existing definition of viscosity in the process. Labeling software phenomena after physical phenomena is simple. Doing so in a meaningful way is not so trivial.

I think that keeping the duality run-time / artifact in mind is helping me a lot in this journey. Things are much easier once you can clearly see that you're dealing with two different worlds. A recent post by Rico Mariani, for instance, raises an interesting point, which is trivially explained within my frame of reasoning, but seems unnecessarily obscure when you simply talk about "coupling" and ignore the artifact/run-time duality.

As I progress in cleaning up some ideas on the decision space, I hope to bring in even more clarity. Which reminds me that I have something really important to say about design and decisions. It's short, and I'll let it preempt :-) the next chapter on tangling.

Sunday, September 12, 2010

Notes on Software Design, Chapter 10: Run-Time Friction

So, here is the story. I keep a lot of notes. Some are text files with relevant links and organized ideas. Most are rather embarrassing scribbles on just about any piece of paper that is lying around when I need it. From time to time, I move some notes from paper to files, discard concepts that didn't prove themselves, and rearrange paragraphs to fake some kind of logical, sequential reasoning over a process that was, in fact, rather chaotic. Not surprisingly (since software is just another way to encode knowledge), David Parnas suggested long ago that we could do the same while documenting software design (see "A rational design process: How and why to fake it").
Well, it's not always easy. Sometimes, I try to approach the storytelling from an angle, see that it doesn't work out so well, and look for another. Sometimes I succeed, sometimes I don't (although, of course, the reader is the ultimate judge). This time, I have to confess, I feel like I couldn't find the right angle, the right way to start, to unfold a concept in a way that makes it look simple and natural. So I'll trust you to be smart enough to make sense of what follows :-). It's a very long post, and you may want to digest it in more than one session.

The physical world
I guess you all had to push some furniture around at one time or another. You have probably felt a stronger resistance in the beginning, followed by a milder form of resistance as soon as you got some movement.
The mild resistance is due to kinetic friction, while the initial, stronger resistance is usually due to static friction, that you have to overcome before moving the object (if you're not familiar with kinetic and static friction, wikipediawill tell you more than you want to know :-).

As you move your stuff around, friction makes you waste some energy, in a way that is basically proportional to the normal force, the distance, and the coefficient of kinetic friction (see the page above for the actual equation). I'll get back to this later, but if you move a constant mass on a flat surface, the energy you waste is proportional to the mass you move, the distance you go, and the magic coefficient of friction.

The beauty of all this is that it's simple and rather unambiguous. Friction is always present in mechanical engineering, but it's a well understood concept (as far as engineering is concerned; it's still blurry at the quantum level, at least for the uninitiated like myself), and there is usually no wishy-washy talking about friction. It's not a broad concept, that is, you won't be able to design the next-generation jet engine if all you have in your conceptual toolbox is friction, yet you won't be able to design an engine at all without an appreciation of friction.

The software world
I'm first and foremost a software design practitioner: I design software, almost every day. Sometimes by myself, most often with other people; therefore, I do a lot of "design talk". In many cases, at one point or another, someone is going to bring in "performance" or "efficiency" to support (or reject) a design decision.
We use those words a lot, with different meaning depending on who's saying it and why he's saying it. It seems like I'm never tired of linking wikipedia, so here is a page on computer performance. Just look at the initial list of different, context-dependent meanings. It's not surprising, then, to find out some people have very peculiar views of performance. "I use arrays because they're more efficient". Sure, except that then you do a linear search because you need multiple indexes; say "efficient" again :-)?

One might expect Computer Science (with capital letters :-) to come to the rescue and define terms more precisely, and hopefully with some relevance for practice. However, computer science is more concerned with computational complexity theory than with the nitty-gritty details of being "fast".
Now, don't get me wrong. You won't get too far as a programmer (and definitely not as a software designer) if you don't get the concept of complexity classes, if you can't see that an algorithm is O( n^2 ) and another is O( n log n ), or if you don't even know what the Big Oh notation is all about. You have to know this stuff, period. In a sense, complexity theory is part of the math of software, and there is little point in investigating a physics of software if you don't get the math first. But math alone won't cut it. However, once we get past the complexity class we get very little assistance from computer science (and I'm purposely ignoring the fact that just because an algorithm is in the O( n log n ) class in the average case doesn't mean I can't beat it with an O( n^2 ) algorithm in my practical cases).

On the "software engineering" side, the usual advice is to build the program and then use a profiler. Yeah, well, sure, beats banging your head against the wall :-), but it's not exactly like knowing what you're doing all along. Still, we make a lot of low-level design decisions while coding, and many of them will ultimately impact "performance". Lacking the basic terminology to think (and talk) about this kind of stuff is rather depressing, so why don't we try to move just a tiny step forward?

Wasting energy in software
So, here I have this piece of software (executable knowledge). For most practical applications, what I need is to get some data (interactively, from a DB, through some kind of device, whatever), transform it in a meaningful way (which could be a complex process encoded in thousands of lines), and spit out some results (which is still data, anyway). The transformation is the Function.

On the artifact side, our software may be using global variables all around, or be based on a nice polymorphic structure, yet the Function doesn't care. The structure we provide on the artifact side is the domain of Form (by now, you're probably familiar with all this stuff).

Now, transformation is a process, and no real-world process is 100% efficient; it's always going to waste something. Perhaps we should look better at that "waste" part. Something I learnt a long time ago, while pondering on principles and patterns, is that overly general concepts (like "performance") must give way to more specialized notions.

Some code, please :-)
Consider this short portion of C code. I'm using C because it's a low-level language, where the implications of any given choice are relatively easy to understand.

double max( double x, double y )
  {
  if( x > y )
    return x;
  else
    return y;
  }

double max3( double x, double y, double z )
  {
  double d = max( x, y );
  d = max( d, z );
  return d;
  }

The code is pretty obvious. In a common, stack-based CPU architecture, max3 will copy x and y on the stack and call max; then it will copy d and z and call max again. The return value might be stored in a CPU register or in RAM, depending on the compiler.

Copying those values is a waste of energy, of course. I could manually inline max inside max3 and get rid of that waste. I would sacrifice reusability and perhaps clarity for "higher performance" or "higher efficiency" or "reduced waste". Alternatively, the compiler could inline the function on my behalf (see Chapter 6 for the role of languages on balancing the two worlds).

What if I'm working on some large data structure? The Fortran guy down the corner will suggest that by keeping your structures in the common area / global memory, you won't even have to pass parameters around: every function knows exactly where to get input and where to store output! Again, we'll sacrifice reusability, and perhaps duplicate large portions of code, for sake of efficiency. As you go through most literature on High Performance Computing (see, for instance, The Ideal HPC Programming Language, recently reprinted in Communications of ACM), you'll see that the HPC community is constantly facing the problem of wasted cycles, and is wasting a lot of LOC to prevent that.

Moving data around is not the only way to waste energy. Consider this portion of real-world code, written by a (supposedly) performance-conscious "little meritocracy":

static int is_rfc2822_header(char *line)
{
int ch;
char *cp = line; 
if (!memcmp(line, "From ", 5) || !memcmp(line, ">From ", 6))
  return 1;
while ((ch = *cp++)) { 
  if (ch == ':')
    return cp != line;
  if ((33 <= ch && ch <= 57) ||
     (59 <= ch && ch <= 126))
    continue;
  break;
  }
return 0;
}

yeah, it's ugly as hell, but it's also wasteful (which is funny, for reasons that are too long to explain here). If you read it carefully, you'll find a way to optimize the "while" body quite a bit, and while you're at it, you can easily make it more readable. Also, the two memcmp in the beginning are wasting cycles (going through the first 5 characters twice), but just like the coefficient of friction must be measured in practice, at this level any alternative should really be measured on a real-world CPU.

Note: as we optimize code, we have to assume that is correct. We don't change Form unless we know that the Function is right. Before posting this, I checked for any update to the codebase (I got that code a few years ago, and it never looked right to me). The code is still the same, but now there is also a comment explaining what the function is intended to do. Unfortunately, it's not what it's doing, which is even more ironic, for the same unspoken reasons above. Anyway, we could easily fix the bug and still optimize the code.

Run-time Friction
Just like in the physical world we move object around, in the run-time world of software we move knowledge around. More exactly, we move data around (we call it data flow) and we move the execution point around (we call it control flow). We move that stuff around to calculate some Function. In the process of calculating the Function, we usually waste some cycles. We waste cycles because we have to copy data on the stack, or from one data structure to another. We waste cycles because we do unnecessary comparison, computations, jumps. We waste cycles because we process the same data more than once. Most often, we waste those cycles because we get something in exchange in the Form (artifact) domain. Sometimes, we waste cycles just because of bad coding.

The energy waste is not a constant: copying an integer is different from copying an array of integers (that's weight, of course). Also, if your array has been swapped out to the paging file, the copy is going to cost you more: that's the contribution of distance, and I'll get back to this later. Right now, remember that wasted energy is a consequence of friction, but is not friction.

Causes and types of software friction
We have already seen a few cases of software friction: when you copy data, you waste cycles. Max3 didn't strictly need to copy data: the Function didn't care about reusing max, only Form did. Before we try do define friction more precisely, it's interesting to see how deep the analogy with real-world friction really is. Indeed, we even have static software friction, and kinetic software friction!

Consider a Java (or .NET) virtual machine. When you hit a function for the first time, the code is compiled just in time. This has nothing to do with Function. It is a byproduct of a technological choice. It will cost you some cycles: that's friction. Also, it happens only once, to "put things in motion": that's static friction. In general, static friction will increase latency, while kinetic friction will reduce throughput. Good: we just sorted out the two main components of "performance".

Consider a web service. Before you can call the server, you go through a relatively lengthy process, from high-level stuff (marshaling your data) to low level stuff (establishing a network connection). This is all friction: the Function is happening on the other side, inside the service code. Here we see both static and kinetic friction at play: establishing a connection adds latency, exchanging data over the network reduces throughput.

Consider stored procedures. The ideal stored procedure takes little data in input, does significant CRUD inside, and returns little. This way, we have minimal waste due to kinetic friction, as we exchange little data with the database. Of course, this is not the only way to minimize energy waste: another approach would be to reduce distance, by bringing the database itself in-process. Interestingly, most real-time databases use the second approach.

So, what is causing friction in software? Friction is caused by:

A copy of data from one place to another (e.g. parameter passing, temporary variables, etc), as this adds no meaning to data, and therefore is useless as far as Function is concerned.

Syntactical transformation of data (e.g. marshaling) which adds no semantics (as above: this processing is not part of the Function). This includes any form of data transformation needed to talk over a non-native protocol.

Unnecessary statements (like those that could be removed in the C function above).

Redundant access / processing (some will be removed by the compiler, but some won't)

Bookkeeping (allocation, deallocation, reference counting, heap defragmentation, garbage collection, paging, etc). All this adds no semantics, and it's irrelevant for the Function: indeed, a well-written garbage collected program should behave properly under the so-called null garbage collector.

Unnecessary indirection. This is a long story and I'll leave for another time, as I've yet to talk about indirection in the physics of software.

In general, everything that is not strictly necessary to calculate the Function, but has been added because of Form, or because of the programmer's inability to streamline the code to the mere Function, is a source of friction and will waste run-time energy.

Defining friction
At this stage in my understanding of the physics of software, it's still hard to come up with numbers, coefficients, sometimes even formulas. Actually, I'm usually happy when I get some concept right. Still, let's look at a simplified formula for the energy wasted through friction (in the real world):

Normal Force * Coefficient of Friction * Distance.

That would hold pretty well in the software world as well, both at the qualitative (easier) and probably quantitative (not there yet) level. At the qualitative level, it tells us what we can control and perhaps leverage. I'll explore this in the next paragraph. At the quantitative level, it could help to evaluate low-level choices. First, however, we have to define Normal Force, Coefficient of Friction, and Distance.

I've defined distance in the run-time world in Chapter 9. Unfortunately, it's an ordinal scale, so we can't do math with distance. This sort of rules out any chance to have a quantitative definition of friction, but we can also look at it from the other side: a better understanding of friction energy (like: wasted cycles) could shed light on the right measurement scale for distance!

Assuming a flat world (I have no reason to think otherwise) the Normal Force is just weight. Weight could be easily defined as the number of bytes involved. For instance, the cost of a copy is linear with the number of bytes you copy.

The coefficient of friction is a dimensionless parameter. Interestingly, if we decide to measure energy in cycles (which makes some sense, although we usually think of cycles as time, not energy) that would imply that unit of measurement for Distance is cycles/byte. I'll have to think more about this.

Although the coefficient of friction, in the real world, cannot be predicted but only measured, we have some intuitive grasp of it being related to the materials. As the aforementioned wikipedia page explains, it's a relatively complex "system property", depending on many factors. The same applies in the software world. The cost to move a bunch of bytes from one position to another dependes on a bunch of factors. If we want to raise the abstraction level and think in terms of objects, and not bytes, things become more complex. The exact copy semantics (reference, shallow, deep) kicks in. That's fine: a software material with shallow copy semantics would have a different coefficient of friction than one with reference copy semantics.

Overall, I think we have little control over the coefficient of friction (I might be wrong), so for any practical purpose, distance and weight are the most interesting parameters.

Is it useful, anyway?
A good theory, and a good concept, must have a good explanatory power, that is, we should be able to use them to explain known phenomena, explain why something works, rationalize widespread practice or beliefs, etc.

As I've already discussed, the evolution of programming languages can be largely seen as an attempt to balance the world of artifact / form with the run-time / function world. In this sense, we can look for instance at the perfect forwarding problem, solved by right value references in the next C++ standard, as a further attempt to remove some energy waste, by avoiding unnecessary copy of data. C++ provides many ways to control friction energy, mostly in the area of generic programming and also template metaprogramming. The Curiously Recurring Template Pattern, for instance, provides a form of static polymorphism exactly to avoid some friction due to unnecessary indirection (virtual dispatch).

More generally, the simple equation for energy waste provides a clue on what we can actually control: weight, distance, coefficient of friction. This is it. As we shape software, this is what we can actually change if we want to reduce friction energy.

Consider HTTP compression: distance couldn't be changed, so we had to change weight.

Also, understanding the difference between static and kinetic friction explains a lot of existing practices. Think of the Nagle's algorithm. It works by increasing static friction (therefore latency) in exchange for lower kinetic friction (therefore throughput). Once you get your concepts right, so many things unfold so easily :-).

Finally, the analogy holds to the extremes: just like excessive friction in mechanical systems can lead to jam, excessive friction due to paging can jam a software system. This is commonly known as Trashing.

I think a caveat is in order: friction in the physical world is not necessarily evil. Wasn't it for friction, we couldn't even walk. Mechanical devices have to deal with friction all the time, but they also exploit friction all the time. It's harder to exploit friction in software (although the Nagle's algorithm does). Most often, we must see friction as a trade/off with other properties, mostly in the artifact side. Still, an understanding of the different types of friction, and of the constituents of friction energy, can help evaluate alternatives and even generate new, better ideas in a more systematic and (dare I say it :-) scientific way.

A different angle
I choose friction as a physical analogy because it's a simple, familiar concept. Intuition and everyday experience can easily compensate any lack of engineering knowledge. Still, I've been tempted to use different analogies, like hydraulic or electrical analogies. Indeed, there are several analogies between electrical, mechanical, hydraulic and even acoustic and optical systems (see here for a start), so it's always possible to choose a different reference system.

Anyway, my alternative would have been to model everything after resistance and current. Current would be the equivalent of throughput, or "performance", and resistance would cause thermal dissipation. In the end, I didn't go this way for a number of reasons; for instance, one-shot stuff like JIT would require something like a thermistor (think of a PTC in CRT degaussing), but I would lose a few readers that way :-).

Still, if you followed so far, there is an interesting result I'd like to share. Consider a trivial circuit where we apply 1V to a 1 ohm resistor, resulting in 1A current. Now, I'll replace the resistor with a series of 2, with resistance (1-P) and P ohms. Nothing changes, same current. Resistors represent processes.

Now say that we have this concept of parallel execution, so the process carried out by P can be parallelized. By way of the analogy, to increase throughput (current) I can simply add up to N resistors in parallel. Now the circulating current is obviously 1 / (1-P + P/N) A. Guess what, I just rediscovered Amdahl's Law using Ohm's Law. That's cute :-).

Ok guys, next time I'll have a much shorter post on the artifact-side notion of friction. If we survive that, we'll be ready for tangling.

Thursday, August 19, 2010

Notes on Software Design, Chapter 9. A simple property: Distance (part 2)

What is Distance in the run-time world? As I began pondering on this, it turned out I could define distance in several meaningful ways. Some were redundant with other notions I have yet to present. Some were useless for a theory of software design. In the end, I picked up a very simple definition, with some interesting ramifications.

Just like the notion of distance in the artifact world is based on the hierarchy of artifacts, the notion of distance in the run-time world is based on a hierarchy of locations. These are the locations where executable knowledge is located at any given time. So, given two pieces of executable knowledge P1 and P2, we can define an ordinal scale:

P1 and P2 are inside the CPU registers or the CPU execution pipeline - that's minimum distance.
P1 and P2 are inside the same L1 cache line
P1 and P2 are inside the L1 cache
… etc

for the full scale, see the (updated) Summary at the Physics of Software website.

Dude, I don't care about cache lines!
Yeah, well, but the processor does, and I'm looking for a good model of the real-world, not for an abstract model of computing disconnected from reality (which would be more like the math of software, not the physics).
You may be writing your code in Java or C#, or even in C++, and be blissfully unaware of what is going on under the hood. But the caches are there, and their effect is absolutely visible. For a simple, experimental, and well-written summary, see Igor Ostrovsky's Gallery of Processor Cache Effects (the source code is in C#, but results wouldn't be different in Java or C++).
Interestingly enough, most algorithms and data structures are not optimized for modern processors with N levels of cache. Still, there is an active area of research on cache-oblivious algorithms which, despite the name, are supposed to perform well with any cache line size across any number of cache levels (you can find a few links to specialized algorithms here but you'll have to work around broken links).

What about virtual memory? Again, we can ignore the magic most of the times, but when we're looking for high-performance solutions, we have to deal with it. Unconvinced? Take a look at You're Doing It Wrong, where Poul-Henning Kamp explains (perhaps with a bit too much "I know it all and you don't" attitude :-)) why textbooks are not really talking about real-world computers [anymore].

Consequences
What happens when run-time distance grows? We're bound to see a staircase-like behavior, as in the second picture in the gallery above, just with more risers/treads. When you move outside your process, you have context switching. When you move outside your computer, you have network latency. When you move outside your LAN, you also have name lookup and packet hops. We'll understand all this stuff much better as we get to the concept of friction.

There is more. When talking about artifact distance, I said that coupling (between artifacts) should decrease as distance increases. In the run-time world, coupling to the underlying platform should decrease as distance increases. This must be partially reflected in the artifacts themselves, but it is also a language / platform / transformation concern.
Knowledge at short distance can be tightly coupled to a specific hw / sw platform. For instance, all code inside one component can be tightly bound to:

An internal object model, say the C++ object model of a specific compiler version, or the C# or Java object model for a specific version of the virtual machine.

A specific operating system, if not virtualized (native code).

A specific hardware, if not virtualized.

This is fine at some level. I can even accept the idea that all the components inside a single application have to share some underlying assumptions. Sure, it would be better if components were relatively immune from binary issues (a plague in the C++ world). But overall (depending on the size of the application) I can "control" things and make sure everything is aligned.
But when I'm talking to another service / application over the network, my degree of control is much smaller. If everything is platform-dependent (with a broad definition of platform, mind you: Java is a platform), we're in for major deployment / maintenance issues. Even worse, it would represent a huge platform lock-in (I can't use a different technology for a new service). Things get just worse on a global network scale. This is why XML took the world by storm, why web services have been rather successful in the real world, and so on. This is also why I like technologies / languages that take integration with other technologies / languages seriously, and not religiously.
As usual, the statement above is bi-directional. That is, it makes very little sense to pursue strong decoupling from the underlying platforms at micro-level. Having a class talking to itself in XML is not a brilliant strategy. Again, design is about balance: in this case, balance between efficiency and convenience on one side, and flexibility and evolvability on the other. Balance is obtained when you can depend on your platform locally, and be increasingly independent as you move farther.

Run-time Distance is not a constant
Not necessarily, anyway; it depends on small-scale technical choices. In C++, for instance, once you get two objects in the same cache line, they will stay there for their entire life, because identity in C++ is address-based. In a garbage collected environment, this is not true anymore: objects can move freely during collection.
Moreover, once we move from the cache line to the entire cache, things come and go, they become near and distant along time. This contributes to complex performance patterns, and indeed, modern hardware makes accurate performance prediction almost impossible. I'm pretty sure there are some interesting phenomena to be studied here - a concept of oscillating distance, perhaps the equivalent of a performance beat when two concurrent threads have slightly different oscillating frequency, and so on, but I'm not currently investigating any of this - it's just too early.
At some point, distance becomes more and more "constant". Sure, a local service may migrate to LAN and then to WAN, but usually it does so because of human intervention (decisions!), and may require changes on the artifact side as well. Short-range distance is a fluid notion, changing as executable knowledge is, well, executed :-).

By the way: distance in the artifact world is not a constant, either. It is constant when the code is frozen. As soon as we change it, we change some distance relationships. In other words, when the computer is processing executable knowledge, run-time distance changes. When we process encoded knowledge (artifacts), artifact distance changes.

Distance-preserving transformations
Knowledge encoded in artifacts can be transformed several times before becoming executable knowledge. Most of those transformations are distance-preserving, that is, they map nearby knowledge to nearby knowledge (although with some jumps here and there).

For instance, pure sequences of statements (without choices, iterations, calls) are "naturally" converted into sequential machine-level instructions that not only will likely sit in the same cache line, but won't break the prefetch pipeline either. Therefore, code that is near in the artifact world will end up near in the run-time world as well.
In the C/C++ family, data that is sequentially declared in a structure (POD) is sequential in memory as well. Therefore, there is a good chance of sharing the same cache line if you keep your structures small.
Conversely, data and code in different components are normally mapped to different pages (in virtual memory systems). They won't share the same cache line (they may compete for the same cache line, but won't be present in the same line at once). So distant things will be distant.

Even "recent" advances in processor architecture are increasing the similarity between the artifact and run-time class of distance. Consider predicated execution: it's commonly used to remove branches at machine-level for short sequence of statements in if/else conditionals (see Fig. 1 in the linked paper). In terms of distance, it allows nearby code in the artifact space to stay close in the run-time space, by eliminating branches and therefore maximizing proximity in the execution pipeline.

Some transformations, however, are not distance-preserving. Inlining of code (think of C++ inline functions, for instance) will shrink distance while moving from the artifact world to the run-time world.
Aspect Oriented Programming is particularly interesting from the point of view of distance. On the artifact side, aspects allow to isolate cross-cutting concerns. Therefore, they allow to increase distance between the advice, that is factored out, and the join points. A non-distance-preserving transformation (weaving) brings the two concepts back together as we move toward execution.

Curiously enough, some non-preserving transformations work in the opposite way: they allow things to be near in the artifact space, yet be distant in the run-time world. Consider the numerous technologies (dating back to remote procedure calls) that allow you to code as if you were invoking a local function, or a method of a local object, while in fact you are executing a remote function, or a method of a remote object (through some kind of local proxy). This is creating an illusion of short distance (in the artifact world) while in fact maintaining high distance in the run-time world. As usual, whenever we create software over a thin layer of illusion, there is a potential for problems. When you look at The 8 fallacies of distributed computing , you can immediately recognize that dealing with remote objects as if they were local objects can be rather dangerous. Said otherwise, the illusion of short distance is a leaky abstraction. More on distributed computing when I'll get to the concept of friction.

A closing remark: in my previous post, I said that gravity (in the artifact world) tends to increase performance (a run-time property). We can now understand that better, and say that it is largely (not entirely) because:
- the most common transformations are distance-preserving.
- performance increases as the run-time distance decreases.
Again, friction is also at play here, but I have yet to introduce the concept.

Addenda on the artifact side
Some concepts on the artifact side are meant to give the illusion of a shorter distance, while maintaining separation. Consider extension methods in .NET or the more powerful concept of category in objective C. They both give the illusion of being very close to a class (when you use them) while in fact they are just as distant as any other class. (By the way: I've been playing with extension methods [in C#] as a way to get something like partial specialization in C++; it kinda works, but not inside generics, which is exactly were I would need it).

Distance in the Decision Space
While thinking about distance in the artifact and in the run-time world, I realized that the very first notion of distance I introduced was in the decision space. Still, I haven't defined that notion at the detail level (or lack thereof :-) at which I've defined distance in the artifact and run-time world. I have a few ideas, of course, the simplest definition being "the number of decision that must be undone + the number of [irreversible?] decisions that must be taken to move your artifacts". Those decisions would involve some mass of code. Moving that mass for that "space" would give a notion of necessary work. Anyway, it's probably too early to say more, as I have to understand the decision space better.

Coming soon, I hope, the notion of friction.