Carlo Pescio

Sunday, September 12, 2010

Notes on Software Design, Chapter 10: Run-Time Friction

So, here is the story. I keep a lot of notes. Some are text files with relevant links and organized ideas. Most are rather embarrassing scribbles on just about any piece of paper that is lying around when I need it. From time to time, I move some notes from paper to files, discard concepts that didn't prove themselves, and rearrange paragraphs to fake some kind of logical, sequential reasoning over a process that was, in fact, rather chaotic. Not surprisingly (since software is just another way to encode knowledge), David Parnas suggested long ago that we could do the same while documenting software design (see "A rational design process: How and why to fake it").
Well, it's not always easy. Sometimes, I try to approach the storytelling from an angle, see that it doesn't work out so well, and look for another. Sometimes I succeed, sometimes I don't (although, of course, the reader is the ultimate judge). This time, I have to confess, I feel like I couldn't find the right angle, the right way to start, to unfold a concept in a way that makes it look simple and natural. So I'll trust you to be smart enough to make sense of what follows :-). It's a very long post, and you may want to digest it in more than one session.

The physical world
I guess you all had to push some furniture around at one time or another. You have probably felt a stronger resistance in the beginning, followed by a milder form of resistance as soon as you got some movement.
The mild resistance is due to kinetic friction, while the initial, stronger resistance is usually due to static friction, that you have to overcome before moving the object (if you're not familiar with kinetic and static friction, wikipediawill tell you more than you want to know :-).

As you move your stuff around, friction makes you waste some energy, in a way that is basically proportional to the normal force, the distance, and the coefficient of kinetic friction (see the page above for the actual equation). I'll get back to this later, but if you move a constant mass on a flat surface, the energy you waste is proportional to the mass you move, the distance you go, and the magic coefficient of friction.

The beauty of all this is that it's simple and rather unambiguous. Friction is always present in mechanical engineering, but it's a well understood concept (as far as engineering is concerned; it's still blurry at the quantum level, at least for the uninitiated like myself), and there is usually no wishy-washy talking about friction. It's not a broad concept, that is, you won't be able to design the next-generation jet engine if all you have in your conceptual toolbox is friction, yet you won't be able to design an engine at all without an appreciation of friction.

The software world
I'm first and foremost a software design practitioner: I design software, almost every day. Sometimes by myself, most often with other people; therefore, I do a lot of "design talk". In many cases, at one point or another, someone is going to bring in "performance" or "efficiency" to support (or reject) a design decision.
We use those words a lot, with different meaning depending on who's saying it and why he's saying it. It seems like I'm never tired of linking wikipedia, so here is a page on computer performance. Just look at the initial list of different, context-dependent meanings. It's not surprising, then, to find out some people have very peculiar views of performance. "I use arrays because they're more efficient". Sure, except that then you do a linear search because you need multiple indexes; say "efficient" again :-)?

One might expect Computer Science (with capital letters :-) to come to the rescue and define terms more precisely, and hopefully with some relevance for practice. However, computer science is more concerned with computational complexity theory than with the nitty-gritty details of being "fast".
Now, don't get me wrong. You won't get too far as a programmer (and definitely not as a software designer) if you don't get the concept of complexity classes, if you can't see that an algorithm is O( n^2 ) and another is O( n log n ), or if you don't even know what the Big Oh notation is all about. You have to know this stuff, period. In a sense, complexity theory is part of the math of software, and there is little point in investigating a physics of software if you don't get the math first. But math alone won't cut it. However, once we get past the complexity class we get very little assistance from computer science (and I'm purposely ignoring the fact that just because an algorithm is in the O( n log n ) class in the average case doesn't mean I can't beat it with an O( n^2 ) algorithm in my practical cases).

On the "software engineering" side, the usual advice is to build the program and then use a profiler. Yeah, well, sure, beats banging your head against the wall :-), but it's not exactly like knowing what you're doing all along. Still, we make a lot of low-level design decisions while coding, and many of them will ultimately impact "performance". Lacking the basic terminology to think (and talk) about this kind of stuff is rather depressing, so why don't we try to move just a tiny step forward?

Wasting energy in software
So, here I have this piece of software (executable knowledge). For most practical applications, what I need is to get some data (interactively, from a DB, through some kind of device, whatever), transform it in a meaningful way (which could be a complex process encoded in thousands of lines), and spit out some results (which is still data, anyway). The transformation is the Function.

On the artifact side, our software may be using global variables all around, or be based on a nice polymorphic structure, yet the Function doesn't care. The structure we provide on the artifact side is the domain of Form (by now, you're probably familiar with all this stuff).

Now, transformation is a process, and no real-world process is 100% efficient; it's always going to waste something. Perhaps we should look better at that "waste" part. Something I learnt a long time ago, while pondering on principles and patterns, is that overly general concepts (like "performance") must give way to more specialized notions.

Some code, please :-)
Consider this short portion of C code. I'm using C because it's a low-level language, where the implications of any given choice are relatively easy to understand.

double max( double x, double y )
  {
  if( x > y )
    return x;
  else
    return y;
  }

double max3( double x, double y, double z )
  {
  double d = max( x, y );
  d = max( d, z );
  return d;
  }

The code is pretty obvious. In a common, stack-based CPU architecture, max3 will copy x and y on the stack and call max; then it will copy d and z and call max again. The return value might be stored in a CPU register or in RAM, depending on the compiler.

Copying those values is a waste of energy, of course. I could manually inline max inside max3 and get rid of that waste. I would sacrifice reusability and perhaps clarity for "higher performance" or "higher efficiency" or "reduced waste". Alternatively, the compiler could inline the function on my behalf (see Chapter 6 for the role of languages on balancing the two worlds).

What if I'm working on some large data structure? The Fortran guy down the corner will suggest that by keeping your structures in the common area / global memory, you won't even have to pass parameters around: every function knows exactly where to get input and where to store output! Again, we'll sacrifice reusability, and perhaps duplicate large portions of code, for sake of efficiency. As you go through most literature on High Performance Computing (see, for instance, The Ideal HPC Programming Language, recently reprinted in Communications of ACM), you'll see that the HPC community is constantly facing the problem of wasted cycles, and is wasting a lot of LOC to prevent that.

Moving data around is not the only way to waste energy. Consider this portion of real-world code, written by a (supposedly) performance-conscious "little meritocracy":

static int is_rfc2822_header(char *line)
{
int ch;
char *cp = line; 
if (!memcmp(line, "From ", 5) || !memcmp(line, ">From ", 6))
  return 1;
while ((ch = *cp++)) { 
  if (ch == ':')
    return cp != line;
  if ((33 <= ch && ch <= 57) ||
     (59 <= ch && ch <= 126))
    continue;
  break;
  }
return 0;
}

yeah, it's ugly as hell, but it's also wasteful (which is funny, for reasons that are too long to explain here). If you read it carefully, you'll find a way to optimize the "while" body quite a bit, and while you're at it, you can easily make it more readable. Also, the two memcmp in the beginning are wasting cycles (going through the first 5 characters twice), but just like the coefficient of friction must be measured in practice, at this level any alternative should really be measured on a real-world CPU.

Note: as we optimize code, we have to assume that is correct. We don't change Form unless we know that the Function is right. Before posting this, I checked for any update to the codebase (I got that code a few years ago, and it never looked right to me). The code is still the same, but now there is also a comment explaining what the function is intended to do. Unfortunately, it's not what it's doing, which is even more ironic, for the same unspoken reasons above. Anyway, we could easily fix the bug and still optimize the code.

Run-time Friction
Just like in the physical world we move object around, in the run-time world of software we move knowledge around. More exactly, we move data around (we call it data flow) and we move the execution point around (we call it control flow). We move that stuff around to calculate some Function. In the process of calculating the Function, we usually waste some cycles. We waste cycles because we have to copy data on the stack, or from one data structure to another. We waste cycles because we do unnecessary comparison, computations, jumps. We waste cycles because we process the same data more than once. Most often, we waste those cycles because we get something in exchange in the Form (artifact) domain. Sometimes, we waste cycles just because of bad coding.

The energy waste is not a constant: copying an integer is different from copying an array of integers (that's weight, of course). Also, if your array has been swapped out to the paging file, the copy is going to cost you more: that's the contribution of distance, and I'll get back to this later. Right now, remember that wasted energy is a consequence of friction, but is not friction.

Causes and types of software friction
We have already seen a few cases of software friction: when you copy data, you waste cycles. Max3 didn't strictly need to copy data: the Function didn't care about reusing max, only Form did. Before we try do define friction more precisely, it's interesting to see how deep the analogy with real-world friction really is. Indeed, we even have static software friction, and kinetic software friction!

Consider a Java (or .NET) virtual machine. When you hit a function for the first time, the code is compiled just in time. This has nothing to do with Function. It is a byproduct of a technological choice. It will cost you some cycles: that's friction. Also, it happens only once, to "put things in motion": that's static friction. In general, static friction will increase latency, while kinetic friction will reduce throughput. Good: we just sorted out the two main components of "performance".

Consider a web service. Before you can call the server, you go through a relatively lengthy process, from high-level stuff (marshaling your data) to low level stuff (establishing a network connection). This is all friction: the Function is happening on the other side, inside the service code. Here we see both static and kinetic friction at play: establishing a connection adds latency, exchanging data over the network reduces throughput.

Consider stored procedures. The ideal stored procedure takes little data in input, does significant CRUD inside, and returns little. This way, we have minimal waste due to kinetic friction, as we exchange little data with the database. Of course, this is not the only way to minimize energy waste: another approach would be to reduce distance, by bringing the database itself in-process. Interestingly, most real-time databases use the second approach.

So, what is causing friction in software? Friction is caused by:

A copy of data from one place to another (e.g. parameter passing, temporary variables, etc), as this adds no meaning to data, and therefore is useless as far as Function is concerned.

Syntactical transformation of data (e.g. marshaling) which adds no semantics (as above: this processing is not part of the Function). This includes any form of data transformation needed to talk over a non-native protocol.

Unnecessary statements (like those that could be removed in the C function above).

Redundant access / processing (some will be removed by the compiler, but some won't)

Bookkeeping (allocation, deallocation, reference counting, heap defragmentation, garbage collection, paging, etc). All this adds no semantics, and it's irrelevant for the Function: indeed, a well-written garbage collected program should behave properly under the so-called null garbage collector.

Unnecessary indirection. This is a long story and I'll leave for another time, as I've yet to talk about indirection in the physics of software.

In general, everything that is not strictly necessary to calculate the Function, but has been added because of Form, or because of the programmer's inability to streamline the code to the mere Function, is a source of friction and will waste run-time energy.

Defining friction
At this stage in my understanding of the physics of software, it's still hard to come up with numbers, coefficients, sometimes even formulas. Actually, I'm usually happy when I get some concept right. Still, let's look at a simplified formula for the energy wasted through friction (in the real world):

Normal Force * Coefficient of Friction * Distance.

That would hold pretty well in the software world as well, both at the qualitative (easier) and probably quantitative (not there yet) level. At the qualitative level, it tells us what we can control and perhaps leverage. I'll explore this in the next paragraph. At the quantitative level, it could help to evaluate low-level choices. First, however, we have to define Normal Force, Coefficient of Friction, and Distance.

I've defined distance in the run-time world in Chapter 9. Unfortunately, it's an ordinal scale, so we can't do math with distance. This sort of rules out any chance to have a quantitative definition of friction, but we can also look at it from the other side: a better understanding of friction energy (like: wasted cycles) could shed light on the right measurement scale for distance!

Assuming a flat world (I have no reason to think otherwise) the Normal Force is just weight. Weight could be easily defined as the number of bytes involved. For instance, the cost of a copy is linear with the number of bytes you copy.

The coefficient of friction is a dimensionless parameter. Interestingly, if we decide to measure energy in cycles (which makes some sense, although we usually think of cycles as time, not energy) that would imply that unit of measurement for Distance is cycles/byte. I'll have to think more about this.

Although the coefficient of friction, in the real world, cannot be predicted but only measured, we have some intuitive grasp of it being related to the materials. As the aforementioned wikipedia page explains, it's a relatively complex "system property", depending on many factors. The same applies in the software world. The cost to move a bunch of bytes from one position to another dependes on a bunch of factors. If we want to raise the abstraction level and think in terms of objects, and not bytes, things become more complex. The exact copy semantics (reference, shallow, deep) kicks in. That's fine: a software material with shallow copy semantics would have a different coefficient of friction than one with reference copy semantics.

Overall, I think we have little control over the coefficient of friction (I might be wrong), so for any practical purpose, distance and weight are the most interesting parameters.

Is it useful, anyway?
A good theory, and a good concept, must have a good explanatory power, that is, we should be able to use them to explain known phenomena, explain why something works, rationalize widespread practice or beliefs, etc.

As I've already discussed, the evolution of programming languages can be largely seen as an attempt to balance the world of artifact / form with the run-time / function world. In this sense, we can look for instance at the perfect forwarding problem, solved by right value references in the next C++ standard, as a further attempt to remove some energy waste, by avoiding unnecessary copy of data. C++ provides many ways to control friction energy, mostly in the area of generic programming and also template metaprogramming. The Curiously Recurring Template Pattern, for instance, provides a form of static polymorphism exactly to avoid some friction due to unnecessary indirection (virtual dispatch).

More generally, the simple equation for energy waste provides a clue on what we can actually control: weight, distance, coefficient of friction. This is it. As we shape software, this is what we can actually change if we want to reduce friction energy.

Consider HTTP compression: distance couldn't be changed, so we had to change weight.

Also, understanding the difference between static and kinetic friction explains a lot of existing practices. Think of the Nagle's algorithm. It works by increasing static friction (therefore latency) in exchange for lower kinetic friction (therefore throughput). Once you get your concepts right, so many things unfold so easily :-).

Finally, the analogy holds to the extremes: just like excessive friction in mechanical systems can lead to jam, excessive friction due to paging can jam a software system. This is commonly known as Trashing.

I think a caveat is in order: friction in the physical world is not necessarily evil. Wasn't it for friction, we couldn't even walk. Mechanical devices have to deal with friction all the time, but they also exploit friction all the time. It's harder to exploit friction in software (although the Nagle's algorithm does). Most often, we must see friction as a trade/off with other properties, mostly in the artifact side. Still, an understanding of the different types of friction, and of the constituents of friction energy, can help evaluate alternatives and even generate new, better ideas in a more systematic and (dare I say it :-) scientific way.

A different angle
I choose friction as a physical analogy because it's a simple, familiar concept. Intuition and everyday experience can easily compensate any lack of engineering knowledge. Still, I've been tempted to use different analogies, like hydraulic or electrical analogies. Indeed, there are several analogies between electrical, mechanical, hydraulic and even acoustic and optical systems (see here for a start), so it's always possible to choose a different reference system.

Anyway, my alternative would have been to model everything after resistance and current. Current would be the equivalent of throughput, or "performance", and resistance would cause thermal dissipation. In the end, I didn't go this way for a number of reasons; for instance, one-shot stuff like JIT would require something like a thermistor (think of a PTC in CRT degaussing), but I would lose a few readers that way :-).

Still, if you followed so far, there is an interesting result I'd like to share. Consider a trivial circuit where we apply 1V to a 1 ohm resistor, resulting in 1A current. Now, I'll replace the resistor with a series of 2, with resistance (1-P) and P ohms. Nothing changes, same current. Resistors represent processes.

Now say that we have this concept of parallel execution, so the process carried out by P can be parallelized. By way of the analogy, to increase throughput (current) I can simply add up to N resistors in parallel. Now the circulating current is obviously 1 / (1-P + P/N) A. Guess what, I just rediscovered Amdahl's Law using Ohm's Law. That's cute :-).

Ok guys, next time I'll have a much shorter post on the artifact-side notion of friction. If we survive that, we'll be ready for tangling.

Thursday, August 19, 2010

Notes on Software Design, Chapter 9. A simple property: Distance (part 2)

What is Distance in the run-time world? As I began pondering on this, it turned out I could define distance in several meaningful ways. Some were redundant with other notions I have yet to present. Some were useless for a theory of software design. In the end, I picked up a very simple definition, with some interesting ramifications.

Just like the notion of distance in the artifact world is based on the hierarchy of artifacts, the notion of distance in the run-time world is based on a hierarchy of locations. These are the locations where executable knowledge is located at any given time. So, given two pieces of executable knowledge P1 and P2, we can define an ordinal scale:

P1 and P2 are inside the CPU registers or the CPU execution pipeline - that's minimum distance.
P1 and P2 are inside the same L1 cache line
P1 and P2 are inside the L1 cache
… etc

for the full scale, see the (updated) Summary at the Physics of Software website.

Dude, I don't care about cache lines!
Yeah, well, but the processor does, and I'm looking for a good model of the real-world, not for an abstract model of computing disconnected from reality (which would be more like the math of software, not the physics).
You may be writing your code in Java or C#, or even in C++, and be blissfully unaware of what is going on under the hood. But the caches are there, and their effect is absolutely visible. For a simple, experimental, and well-written summary, see Igor Ostrovsky's Gallery of Processor Cache Effects (the source code is in C#, but results wouldn't be different in Java or C++).
Interestingly enough, most algorithms and data structures are not optimized for modern processors with N levels of cache. Still, there is an active area of research on cache-oblivious algorithms which, despite the name, are supposed to perform well with any cache line size across any number of cache levels (you can find a few links to specialized algorithms here but you'll have to work around broken links).

What about virtual memory? Again, we can ignore the magic most of the times, but when we're looking for high-performance solutions, we have to deal with it. Unconvinced? Take a look at You're Doing It Wrong, where Poul-Henning Kamp explains (perhaps with a bit too much "I know it all and you don't" attitude :-)) why textbooks are not really talking about real-world computers [anymore].

Consequences
What happens when run-time distance grows? We're bound to see a staircase-like behavior, as in the second picture in the gallery above, just with more risers/treads. When you move outside your process, you have context switching. When you move outside your computer, you have network latency. When you move outside your LAN, you also have name lookup and packet hops. We'll understand all this stuff much better as we get to the concept of friction.

There is more. When talking about artifact distance, I said that coupling (between artifacts) should decrease as distance increases. In the run-time world, coupling to the underlying platform should decrease as distance increases. This must be partially reflected in the artifacts themselves, but it is also a language / platform / transformation concern.
Knowledge at short distance can be tightly coupled to a specific hw / sw platform. For instance, all code inside one component can be tightly bound to:

An internal object model, say the C++ object model of a specific compiler version, or the C# or Java object model for a specific version of the virtual machine.

A specific operating system, if not virtualized (native code).

A specific hardware, if not virtualized.

This is fine at some level. I can even accept the idea that all the components inside a single application have to share some underlying assumptions. Sure, it would be better if components were relatively immune from binary issues (a plague in the C++ world). But overall (depending on the size of the application) I can "control" things and make sure everything is aligned.
But when I'm talking to another service / application over the network, my degree of control is much smaller. If everything is platform-dependent (with a broad definition of platform, mind you: Java is a platform), we're in for major deployment / maintenance issues. Even worse, it would represent a huge platform lock-in (I can't use a different technology for a new service). Things get just worse on a global network scale. This is why XML took the world by storm, why web services have been rather successful in the real world, and so on. This is also why I like technologies / languages that take integration with other technologies / languages seriously, and not religiously.
As usual, the statement above is bi-directional. That is, it makes very little sense to pursue strong decoupling from the underlying platforms at micro-level. Having a class talking to itself in XML is not a brilliant strategy. Again, design is about balance: in this case, balance between efficiency and convenience on one side, and flexibility and evolvability on the other. Balance is obtained when you can depend on your platform locally, and be increasingly independent as you move farther.

Run-time Distance is not a constant
Not necessarily, anyway; it depends on small-scale technical choices. In C++, for instance, once you get two objects in the same cache line, they will stay there for their entire life, because identity in C++ is address-based. In a garbage collected environment, this is not true anymore: objects can move freely during collection.
Moreover, once we move from the cache line to the entire cache, things come and go, they become near and distant along time. This contributes to complex performance patterns, and indeed, modern hardware makes accurate performance prediction almost impossible. I'm pretty sure there are some interesting phenomena to be studied here - a concept of oscillating distance, perhaps the equivalent of a performance beat when two concurrent threads have slightly different oscillating frequency, and so on, but I'm not currently investigating any of this - it's just too early.
At some point, distance becomes more and more "constant". Sure, a local service may migrate to LAN and then to WAN, but usually it does so because of human intervention (decisions!), and may require changes on the artifact side as well. Short-range distance is a fluid notion, changing as executable knowledge is, well, executed :-).

By the way: distance in the artifact world is not a constant, either. It is constant when the code is frozen. As soon as we change it, we change some distance relationships. In other words, when the computer is processing executable knowledge, run-time distance changes. When we process encoded knowledge (artifacts), artifact distance changes.

Distance-preserving transformations
Knowledge encoded in artifacts can be transformed several times before becoming executable knowledge. Most of those transformations are distance-preserving, that is, they map nearby knowledge to nearby knowledge (although with some jumps here and there).

For instance, pure sequences of statements (without choices, iterations, calls) are "naturally" converted into sequential machine-level instructions that not only will likely sit in the same cache line, but won't break the prefetch pipeline either. Therefore, code that is near in the artifact world will end up near in the run-time world as well.
In the C/C++ family, data that is sequentially declared in a structure (POD) is sequential in memory as well. Therefore, there is a good chance of sharing the same cache line if you keep your structures small.
Conversely, data and code in different components are normally mapped to different pages (in virtual memory systems). They won't share the same cache line (they may compete for the same cache line, but won't be present in the same line at once). So distant things will be distant.

Even "recent" advances in processor architecture are increasing the similarity between the artifact and run-time class of distance. Consider predicated execution: it's commonly used to remove branches at machine-level for short sequence of statements in if/else conditionals (see Fig. 1 in the linked paper). In terms of distance, it allows nearby code in the artifact space to stay close in the run-time space, by eliminating branches and therefore maximizing proximity in the execution pipeline.

Some transformations, however, are not distance-preserving. Inlining of code (think of C++ inline functions, for instance) will shrink distance while moving from the artifact world to the run-time world.
Aspect Oriented Programming is particularly interesting from the point of view of distance. On the artifact side, aspects allow to isolate cross-cutting concerns. Therefore, they allow to increase distance between the advice, that is factored out, and the join points. A non-distance-preserving transformation (weaving) brings the two concepts back together as we move toward execution.

Curiously enough, some non-preserving transformations work in the opposite way: they allow things to be near in the artifact space, yet be distant in the run-time world. Consider the numerous technologies (dating back to remote procedure calls) that allow you to code as if you were invoking a local function, or a method of a local object, while in fact you are executing a remote function, or a method of a remote object (through some kind of local proxy). This is creating an illusion of short distance (in the artifact world) while in fact maintaining high distance in the run-time world. As usual, whenever we create software over a thin layer of illusion, there is a potential for problems. When you look at The 8 fallacies of distributed computing , you can immediately recognize that dealing with remote objects as if they were local objects can be rather dangerous. Said otherwise, the illusion of short distance is a leaky abstraction. More on distributed computing when I'll get to the concept of friction.

A closing remark: in my previous post, I said that gravity (in the artifact world) tends to increase performance (a run-time property). We can now understand that better, and say that it is largely (not entirely) because:
- the most common transformations are distance-preserving.
- performance increases as the run-time distance decreases.
Again, friction is also at play here, but I have yet to introduce the concept.

Addenda on the artifact side
Some concepts on the artifact side are meant to give the illusion of a shorter distance, while maintaining separation. Consider extension methods in .NET or the more powerful concept of category in objective C. They both give the illusion of being very close to a class (when you use them) while in fact they are just as distant as any other class. (By the way: I've been playing with extension methods [in C#] as a way to get something like partial specialization in C++; it kinda works, but not inside generics, which is exactly were I would need it).

Distance in the Decision Space
While thinking about distance in the artifact and in the run-time world, I realized that the very first notion of distance I introduced was in the decision space. Still, I haven't defined that notion at the detail level (or lack thereof :-) at which I've defined distance in the artifact and run-time world. I have a few ideas, of course, the simplest definition being "the number of decision that must be undone + the number of [irreversible?] decisions that must be taken to move your artifacts". Those decisions would involve some mass of code. Moving that mass for that "space" would give a notion of necessary work. Anyway, it's probably too early to say more, as I have to understand the decision space better.

Coming soon, I hope, the notion of friction.

Thursday, August 05, 2010

Notes on Software Design, Chapter 8. A simple property: Distance (part 1)

You're writing a function in your preferred language. At some point, you look at your code and you see that a portion just does not belong there. You move it outside, creating another function. Asked why, you may answer that by doing so you made it reusable; that you want to respect the Single Responsibility Principle; that you want to increase cohesion; that code just looks "cleaner" and "easier to read" that way; and so on. I guess it sounds rather familiar.

By moving the code outside the function, you also increased its distance with the code that is left inside. This is probably not so familiar: distance is not a textbook property of software. However, the notion of distance is extremely important. Fortunately, distance is a very simple concept. It has an immediate intuitive meaning, and although it's still rather informal, we can already use it to articulate some non-trivial reasoning.

I'm trying to keep this post reasonably short, so I'll cover only the artifact side of the story here. I'll talk about the run-time world next time.

A Concept of Distance
Consider this simple function (it's C#, but that's obviously irrelevant)


int sum( int[] a )
  {
  int s = 0;

  foreach( int v in a )
    s += v ;
  
  return s;
  }

I used two blank lines to split the function in three smaller portions: initialization - computation - return. I didn't use comments - the code is quite readable as it is, and you know the adage: if you see a comment, make a function. It would be unnatural (and with dubious benefits) to split that function into sub-functions. Still, I wanted to highlight the three tiny yet distinct procedural portions (centers) within that function, so I used empty lines. I guess most of you have done the same at one point or another, perhaps on a larger scale.

Said otherwise, I wanted to show that some statements were conceptually closer than others. They don't have to be procedural statements. I have seen people "grouping" variable declarations in the same way, to show that some variables sort of "lump together" without creating a structure or a class. I did that by increasing their physical distance in the artifact space.

A Measure of Distance
Given two pieces of information P1, P2, encoded in some artifacts A1, A2, we can define their distance D( P1, P2 ) using an ordinal scale, that is, a totally ordered set:

P1 and P2 appear in the same statement - that's minimum distance
P1 and P2 appear in the same group of consecutive statements
P1 and P2 appear in the same function
… etc

for the full scale, including the data counterpart, see my Summary at the Physics of Software website.

Note that Distance is a relative property. You cannot take a particular piece of information and identify its distance (as you could do, for instance, with mass). You need two.
Also, the ordinal scale is rather limiting: you can do no math with it. It would be nice to turn it into a meaningful interval or ratio scale, but I'm not there yet.

Is Distance useful?
As in most theories, individual concepts may seem rather moot, but once you have enough concepts you can solve interesting problems or gain better understanding of complex phenomena. Right now, I haven't introduced the concept of tangling yet, so Distance may seem rather moot on itself. Still, we can temporarily use a vague notion of coupling to explore the value of distance. It will get better in the [near] future, trust me :-).

Consider a few consecutive statements inside a function. It's ok if they share intimate knowledge. The three segments in sum are rather strongly coupled, to the point that it's ineffective to split them in subfunctions, but that doesn't bother me much. It's fine to be tightly coupled at small distance. As we'll see, it's more than fine: it's expected.

Functions within a class are still close together, but farther apart. Again, it's ok if they share some knowledge. Ideally, that knowledge is embodied in the class invariant, but private functions are commonly tied with calling functions in a rather strong way. They often assume to be called in specific states (that could be captured in elaborated preconditions), and the caller is responsible to guarantee such preconditions. Sequence of calls are also expected to happen in specific orders, so that preconditions are met. Again, that doesn't bother me much. That's why the class exists in the first place: to provide a place where I can group together "closely related" functions and data.

Distinct classes are even more distant. Ideally, they won't share much. In practice, classes inside the same component often end up having some acquaintance with each other. For instance, widgets inside a widget library may work well together, but may not work at all with widgets inside a different library. Still, they're distant enough to be used individually.

We expect components / services to be lightly coupled. They can share some high-level contract, but that should be all.

Applications shouldn't be coupled at all – any coupling should appear at a lower level (components).

The logical consequence here is that coupling must decrease as distance increases. There is more to this statement than is immediately obvious. The real meaning is:
a) large distance requires low coupling
b) small distance requires high coupling

When I explain the concept, most people immediately think of (a) and ignore (b). Yet (b) is very important, because it says:
1) if coupling with the surroundings is not strong enough, you should move that portion elsewhere.
2) the code should go where the coupling is stronger (that is, if code is attracted elsewhere, consider moving it elsewhere :-)). That's basically why feature envy is considered a bad smell – the code is in the wrong place.

Cohesion as an emergent property
Cohesion has always been a more elusive concept than coupling. Looking at literature, you'll find dozens of different definitions and metrics for cohesion (early works like Myers' Composite/Structured Design used to call it "strength"). I've struggled with the concept for a while, because it didn't fit too well with other parts of my theory, but then I realized that cohesion is not a property per se.

Cohesion is a byproduct of attraction and distance: an artifact is cohesive if its constituents are at the right distance, considering the forces of attraction and rejection acting upon that artifact. If the resulting attraction is too strong or too weak, parts of that artifact want to move either down or up in the distance hierarchy, or into another site at the same level.

Attraction is too weak: the forces keeping that code together are not strong enough to warrant the short distance at which we placed the code. For instance, a long function with well-identified segments sharing little data. We can take that sequence of statements and move it up in the hierarchy - forming a new function.

Attraction is too strong: for instance, we put code in different classes, but those classes are intimately connected. The easier thing is to demote one class to a set of functions (down in the hierarchy) and merge those functions with the other class. But perhaps the entire shape is wrong, at odd with the forcefield. Perhaps new abstractions (centers) must be found, and functions, or even statements, moved into new places.

This is closing the circle, so to speak. Good software is in a state of equilibrium: attraction and rejection are balanced with proper distance between elements.

Note: I'm talking about attraction and rejection, but I have yet to present most attractive / repulsive forces. Still, somehow I hope most of you can grasp the concepts anyway.

An Alexandrian look on the notion of distance
I've quoted Christopher Alexander several time in an early discussion on the concept of form. Now, you may know that Alexander's most recent theory is explained in 4 tomes (which I haven't deeply read yet) collectively known as "The Nature of Order". A few people have tried to relate some of his concepts with the software world, but so far the results have been rather unimpressive (I'm probably biased in my judgment :-).

On my side, I see a very strong connection between the concept of equilibrium as an interplay between distance and the artifact hierarchy and the Alexandrian concept of levels of scale: “A balanced range of sizes is pleasing and beautiful”.
Which is not to say that you should have long functions, average functions, small functions :-). I would translate that notion in the software world as: a balanced use of the artifact hierarchy is pleasing and beautiful. That is:
Don't use long function: use multiple functions in a class instead.
Don't use long classes: use multiple classes in a component instead.
Don't create huge components: use multiple components inside an [application/service/program] instead

This is routinely ignored (which, I think, contributes to the freescale nature of most source code) but it's also the very first reason why those concepts have been introduced in the first place! Actually, we are probably still missing a few levels in the hierarchy, as required for instance to describe systems of systems.

Gravity, efficiency, and the run-time distance
Remember gravity? Gravity (in the artifact world) provides a path of least resistance for the programmer: just add stuff where there is other vaguely related related stuff. Gravity works to minimize distance, but in a kind of piecemeal, local minimum way. It's easy to get trapped into local minimum. The minimum is local when we add code that is not tightly connected with the surroundings, so that other forces at play (not yet discussed) will reject it.

When you point out incoherent, long functions, quite a few programmers bring in "efficiency" as an excuse (the other most common excuse being that it's easier to follow your code when you can just read it sequentially, which is another way to say "I don't understand abstraction" :-).
Now, efficiency is a run-time concept, and I haven't explained the corresponding concept in my theory yet. Still, using again the informal notion of efficiency we all have, we can already see that efficiency [in the run-time world] tends to decrease as distance [in the artifact world] increases. For instance, moving lines into another function requires passing parameters around. This is a first-cut, rough explanation of the well-known trade-off between run-time efficiency and artifact quality (maintainability, readability, reusability).

Coming soon:
the concept of distance in the run-time world, and distance-preserving transformations
efficiency in the physics of software
tangling
not sure yet : ), but probably isolation and density

Tuesday, July 27, 2010

On Kent Beck's Responsive Design

I try to keep an eye on software design literature. I subscribe to relevant publications from IEEE and ACM; I get a copy of conference proceedings; I read blogs and, just like everybody else, I follow links and suggestions. Being aware of the potential risk of filtering out information that is not aligned with the way I think, I'm also following a few blogs that are definitely not aligned with my belief system, just to make sure I don't miss too much. I'm bound to miss something, anyway, but I can live with that :-).

A few weeks ago I realized that I wasn't following Kent Beck's blog. I don't know why: I'm not a fan of XP, but I'm rather fond of Kent. He's an experienced designer with many good ideas. So, I took a tour of his posts and discovered that I had completely missed the concept of Responsive Design. That's weird, because I'm reading/scanning quite a few agile-oriented blogs, and I've never seen any mention of it. Oh, well; time to catch up.

I did my homework and spent some time reading all posts in the Responsive Design category and watching Kent's interesting QCon presentation. Looking at blog dates, it seems like activity peaked in April 2009, but rapidly declined during 2009 to almost nothing in 2010. So, apparently, Responsive Design hasn't taken the world by storm yet. Well, design wasn't so popular in the pre-agile era, and it's not going to be popular in the post-agile era either :-).

Anyway, the presentation inspired me with a stream of reflections that I'd like to share with you guys. I would recommend that you watch the presentation first (I know, it takes about 1 hour, but I think it's worth it). What follows is not intended to be a critic. I'm mostly trying to relate Kent's perspective on "what software design is about" with mine. It's a rather long post, so I split it in three paragraphs. The first two are about concepts in the presentation itself. The third is about an interesting difference in perspective, and how it affects our thinking.

Reflections on the introductory part
The idea of taking notes as we design, to uncover our own strategies and tactics, sounded so familiar. I guess at some point some of us feel an urge to understand what we're really doing. Although I'm looking for something different from Kent, I share some of his worries: trivial or extremely complicated insights might be just around the corner :-)

The talk about the meaning of "responsive" around 0:13 is resonating very well with the concepts of forcefield, design as "shaping a material", and backtalk. From my perspective, Kent is mostly saying: am I going to look at the forcefield, and design accordingly, or am I going to force a preconceived set of design decisions upon the problem? (see also my empty cup post).

Steady flow of features. We all love that, of course, and in a sense it's the most agile, XP-ish concept in the entire talk. The part about "the right time to design" seems rather connected with the Least Responsible Moment, so I can't really spare you a link to my own little idea of Invariant Decisions.
Again, I have a feeling that there is never enough context when talking about timing. I've certainly designed systems with a large separation (in time) between design and implementation. In many cases, it worked out quite well (of course, we always considered design as something fluid, not cast in stone). I guess it has a lot to do with how much domain knowledge you can leverage. Not every project is about doing something nobody has done before. Many are "just" about doing something in a much better way. Context, context, context...

Starting with values: it's something I have to improve in my Physics of Software work. Although I've said several times that all new methods should start with a declaration of values and beliefs, so far I haven't been thorough on this side. For instance, I value honest, unbiased, free, creative communication and thinking about software design issues, where the best ideas, and not the best talkers, get to win (just because I'm a good talker doesn't mean I want to win that way :-)). That's why I'm trying to come up with a less ambiguous, wishy-washy way to think and talk about design.

"Most of the decisions I make while designing have nothing to do with the problem domain [… but ...] are shaped by the fact that I'm instructing a computer. Weird. This is not my experience. I surely reuse solutions across domains, but the problem domain is heavily shaping the forcefield. When you go down to small-scale decisions, yeah, it gets more domain-independent, but high-level design is heavily problem-dependent. For instance, while you can usually (but not always) implement a chosen data structure in a rather domain-independent way, choosing the right data structure is usually a domain-dependent choice.
Don't get me wrong: I know I can safely ignore some portions of the problem domain when I design - I do that all the time. My view is that I can ignore the (possibly very complex) functional issues that have little impact on the forcefield, and therefore on the form. This is a complex subject that would require an in-depth study.

Principles. I don't quite like the term "principle" because is so wide in meaning that you can use it for just about everything. So, while the "don't repeat yourself" is a commonly accepted design principle, Kent's examples of principles from the insurance world or from Dynamo looks much more like pre-made [meta] decisions to me.
"You should never to lose a write". Fine. It's a decision, you can basically consider that a requirement.
"We're there to aid human decision making". Fine. It's a meta-decision, meaning, it's not just a design decision: it will influence everything, from your value proposition down to functional requirements and so on. It's not really a design principle, not even a project-specific design principle: it's more akin to a metaphor or to a goal.
Aside from that distinction, I agree that sometimes constraints, whatever form they take, are actually helpful during design, because they prune out a lot of the decision space (again, see my post above on the empty cup). Of course, the wrong constraints can prevent you from finding the sweet spot, so timing is an issue here as well - some constraints should be considered fluid early on, because you may learn that they're not the right constraints for your project.

The short part about patterns and principles reminded me of my early work on SysOOD, some of which got published in IEEE Software exactly as Principles Vs. Patterns. So, while Kent wants to understand principles better, I've always been trying to eliminate principles altogether (universal principles, that is). Still, I'd like to see the "non-universal" or "project-specific" principles recast under some other name and further explored, because I agree that having a shared set of values / goals / constraints can help tremendously in any significant project. Oh, I still have the Byte smalltalk balloon issue somewhere : ).

Reflection on the 4 strategies
I see the "meat" of the talk as being about [meta] strategies to move in the decision space. Your software is at some point A in the multi-dimensional decision space; for instance, you decided early on that you would use filenames and not streams. Now you want to move your software to another point B in the decision space (where, for instance, streams are used instead of filenames). How do you go from A to B?
That's an interesting issue, where experienced designers/programmers routinely adopt different approaches compared to novices (see the empty cup post again), so it's both interesting and promising to see Kent articulate his reasoning. In a sense, it's completely orthogonal to both patterns/antipatterns (which may point you to B / away from A) and to my work (which is more concerned with understanding and articulating why B is a more desirable point than A, or to find B in the first place).
Actually, strictly speaking I'm not even sure this stuff is really about "design". In my view, design is more about finding B. The execution/implementation of design, however, is about moving to B. Therefore, Kent's strategies looks more like "design execution strategies" than "design strategies".

That said, there is a subtle, yet striking difference between Kent's perspective and mine. Kent is talking about / looking to software design in a "first-person perspective", while I'm looking through the "I'm shaping a material" metaphor. He says things like: "I'm here; how do I get there?", and he's literally mimicking small steps. I'm saying things like: "my software is here; how do I move it there?"; it may seem like a small, irrelevant distinction, but I think it's shaping our thoughts rather deeply. It surely shapes the names we use: I would have never used "stepping stone", because I'm not thinking of myself going anywhere : ). More on this on the third paragraph below.

Although this is probably the most “practical” part of the talk, concepts are a little fuzzy. Leap can be seen as a strategy, actually the simplest strategy of all: you just move there. Safe Steps is more like a meta-strategy to move inside the decision space (indeed, around 42:00 Kent talks about Safe/Small Steps as a "principle"). Once you choose Safe Steps, you have many options left, Parallel, Stepping Stones and Simplifications being 3 of them. Still, the granularity of those strategies is so coarse that it's rather surprising Kent found only 3 of them.

For instance, I probably wouldn't have used Parallel to deal with the filenames/streams issues. I can't really say without looking at the code, but assuming I wanted to take small steps, I could have created a new class, which IS-A stream and HAS-A filename, moved the existing code to use that class (a very simple and safe change, because both the stream and the filename are there), gradually moved all the filename-dependent code inside that class, removed the filename from the interface (as it's no longer used), moved the filename-dependent code in a factory (so that I could make a choice on the stream type). At that point I could have killed the temporary class, using a plain stream again. This strategy (I could call it wrap/extract/unwrap) is quite simple and effective when dealing with legacy code, and doesn't really fit in any of the strategies Kent is proposing.

Stepping Stone. While Leap and Parallel are more like strategies to implement a decision you have already taken, it seems to me that Stepping Stone is different. Although this might be my biased understanding of it, it's also the only interpretation I can give that makes Stepping Stone different from good old top-down decomposition (which Kent makes clear is different, at about 50:30).
I would say that Stepping Stone is about making progress where you can, and while doing so, shed some light on the surroundings as well. This is not very clear in the beginning, but as Kent moves forward, it's more and more obvious that he's thinking about novel situations, where you don't know exactly what you want to build.
Indeed, sometimes the forcefield is just too blurry. Still, we often can see a portion of it rather clearly, and we may have a sensible idea about the form we want to shape locally. Stepping Stone is a good strategy here: build (or design) what you understand, get feedback (or backtalk), get progress, etc. I usually keep exploring the dark areas while I'm building stepping stones. I'm also rather careful not to overcommit on stepping stones, as they may turn out not to fit perfectly with what's coming next. That's fine, inasmuch as we understand that design is a process, not a phase with a beginning and an end.
I can also build "stepping stones" when I know where I'm going to, but as I said, at that point it's hard to tell the difference between building stepping stones and executing top-down design. Oh, by the way, for the agile guys who hated me so much for saying that TDD is a limited design strategy: move around 50:20. Play. Rewind. Play again. Repeat till necessary :-). But back to Stepping Stones: I think the key phrase here is around 55:40, when Kent says that by standing on a stepping stone he can see where to go next, which I read like: a previously blurry portion of the forcefield is now visible and clear, and I can move further.

Simplification. This is more akin to a design strategy, not to an execution strategy. I could see Simplification as a way to create small steps (perhaps even stepping stones). Indeed, the difference between Stepping Stones and Simplification is not really obvious (around 1:02:00 a participant asks about the similarity and doesn't seem like Kent can explain the difference so well either). I guess simplification is about to solve one single problem by solving a simpler version of the same problem first, whereas Stepping Stones are about solving a complex problem by solving another problem first (one that, however, brings us closer to solving the initial problem). In this sense, Simplification can be used where you can see where you wanna go (but it's too big / risky / ambitious) but also where you can't see exactly where you want to go, but you can clearly see a smaller version of it. Or, from my "I'm shaping a material" perspective, you can't really see the form of the real thing, but you can see the form of a smaller version of it. As I said, this stuff is interesting but rather fuzzy, and honestly, I'm not really sold on those 4 strategies.

Overall, I'd like to see Responsive Design further developed. It would probably benefit from a better distinction between design, design execution, principles, values, goals, and so on, but it's interesting stuff in the almost flat landscape of contemporary software design literature.

Perspective Matters
As I said, Kent is looking at the decision space in first-person perspective, while I see myself moving other things around. There is an interesting implication: when you think about moving yourself, you mostly consider distance. You are who you are. You want to go elsewhere. Distance, and therefore Leaps, Small Steps, Stepping Stones, it's all that matters.

When you think about moving something else, you have to consider two more factors. That, in my opinion, allows better reasoning.

Sure, distance is an important factor. But, as I explained in my post about Inertia, another important factor is Mass. You cannot ignore mass, but it's unnatural to consider a varying mass from a first-person perspective.

A simple example: I use Visual Studio (yeah yeah I know, some of you don't like it : ). Sometimes, as I'm creating a new project, I click in the wrong place (yeah yeah I'm kinda dumb :-) and I get the wrong project type. I want a C# project and I get a VB.NET project (yikes!! :-)). That's quite a distance in the decision space. Yet there is a very simple, safe step (Leap): delete the project and start from scratch. The distance is not a problem, because mass is basically null. I can use Leap not because the distance is small, but because the mass is small. I wouldn't do that with an existing, large project (I may consider going through an IL -> C# transformation though :-).

There is more: when I discussed Inertia, I talked about software being in a state of rest or motion in the decision space. This is the third factor, and again, it's hard to think about it in first-person perspective. You want to go somewhere, but you're already going elsewhere. It's not truly natural. Deflecting a moving object, however, is quite ordinary (most real-life games involving a ball, for instance, involve deflecting a moving object).

Consider a large project. Maybe a team is busy moving a subsystem S1 to another place in the decision space. Now you want to move subsystem S2 from A to B. It's not enough to consider distance and mass. You have to consider whether or not S2 is already moving, perhaps by attraction from S1. So S2 may be small (in mass) and the distance A-B may be small too, but if S2 is already moving into a different direction your job is harder, more risky, and in need of a careful execution strategy.

A real-world example: you have a legacy C++/MFC application. Currently, the UI is talking with the application through a small, homebrew messaging component (MC), of which you are in charge. You just found a better messaging component (BMC) on the internet, say something that is more efficient, portable, and actively developed by a community. You want to move your subsystem to another place in the decision space, from MC to BMC. Part of BMC being "portable" is based on the reasonable assumption that messages are strictly C++ objects.

Meanwhile, another team is moving the UI toward .NET (that's a big mass - big distance issue, but it's their problem, not yours :-). Now, there is an attraction between your messaging component and the UI. A .NET UI may have some issues dealing with native C++ messages. So, like it or not, there is another force at play, trying to move your component to a different place (say, NMC). Now, the mass of the messaging component might be small. The distance between MC and BMC may be small too - perhaps something you can deal with by using an adapter. But probably BMC is in a very different direction than NMC, where the UI team is pushing your component to move. So it's not just a matter of distance. It's not even enough to consider mass. You need to consider distance + full-blown Inertia. Or, to cut it short, work

Of course, depending on your project, some factor may dominate over the others, and a simpler model might be enough, but usually, distance alone won't cut it.

There is more...
At the end of the presentation, Kent mentions power law distribution. This is something that has got me interested too in the past few months (I mentioned that in Facebook), but it's too early to say something about it, and I want to check a few ideas with the author of an interesting paper first, anyway.

See you guys soon with a post on Distance (not in the decision space, but in the artifact space!)

Thursday, June 24, 2010

Notes on Software Design, Chapter 7: a better Forcefield Diagram

In my previous post in the NOSD series, I mentioned how an improved forcefield diagram was needed to model the kind of reasoning I'm trying to bring in software design. I also discussed the artifact-run/time dualism, and how many concepts in language design were born out of the fundamental need to balance conflicting forces between these two worlds. I also mentioned the role of some patterns in resolving the same kind of conflict.

Here I'll show you a practical example, introducing the improved forcefield notation as we go. It's quite simple, and as usual, the reasoning is more important than the drawing. I'll use new colors and shapes, but it's all very informal, the notation is not cast in stone, and it will probably evolve and change over time.

A common problem
You have two classes (Class1, Class2); as we learnt in Chapter 6, that usually means you have two artifacts, and right now I feel like blue is a good color for artifacts, so I'll color them in blue.

Now, those classes have some commonality in behavior; for instance, they both represent a geometrical object, and can provide you with a bounding box.
Behavior is a run-time concept, and I'll color that information in pink.

Commonality in behavior is an attractive force; I haven't talked about this yet, but trust me : ), or just rely on your intuition that "things that do similar things are close to each other".

Commonality in behavior also attracts a natural desire in the artifact space: polymorphic/uniform access to such behavior. While writing calling code, I'd like to ask for a bounding box in the same way, perhaps polymorphically through a base class / interface. That would make my client (partially) unaware of the specific classes.

Unfortunately, Class1 and Class2 have been written with different conventions. They don't share a base class; they don't use the same naming for functions; they may not even use the same types for parameters. Different conventions are an artifact issue, so I'll color this in blue again. Of course, different conventions are keeping Class1 and Class2 apart, and actively rejecting polymorphic / uniform access. So here is the forcefield, representing our problem:

Note that there is nothing here about a solution. At this stage, the forcefield is a representation of the problem. Still, we have a conflict between forces, and somehow we have to deal with it. What if we don't? I'll keep that as last.

Enter decisions
A missing concept in my previous attempts at modeling the forcefield was the very important notion of decision. Although I'm trying to keep concepts to a bare minimum, the Decision Space is an important piece of the puzzle, and there can't be a Decision Space without Decisions. I'm using a yellow hexagon to represent a decision.

So, how can we deal with those conflicting forces? A simple, technically sound decision is to refactor Class1 and Class2 to a common base class (or interface, or a hierarchy of both). That decision has a strong impact on "Different Conventions", effectively removing it from the forcefield, including the rejection lines. I think the diagram speaks for itself:

So, decisions can alter the forcefield, for better or worse. Still, we may choose to keep the artifacts unchanged. Perhaps Class1 and Class2 come from third-party libraries, or perhaps they're shared with a lot of existing code, and refactoring may impact that code as well (remember Mass and Inertia). Again, it's useful to represent this decision explicitly, although it's just an intermediate step. I colored "Different Conventions" in green to say that we deliberately decided to keep it inside the forcefield.

Patterns
Design patterns are now mainstream, with dozens of books and hundreds, if not thousands, of papers describing the problem / context / solution triad.
Now, the solution is usually represented using a UML diagram and/or some source code. Problem and context are described using text, or an example in UML/code. That's because we don't (didn't :-) have any proper way to describe a problem (forcefield) and context (mostly, pre-made decisions).
Still, look at the picture above once again: that's (of course :-) the problem/context setting for a well-known pattern: adapter. So let's look at the adapter in action, or how using adapter will impact the forcefield:

Adapter "simply" breaks the rejection between Different Conventions and Polymorphic / Uniform Access, therefore allowing client code to ignore the specific interface of Class1 and Class2. It is very interesting to observe what the forcefield is telling us: we didn't completely remove conflict. There is still an attraction/rejection between Class1 and Class2. In this sense, Adapter is less effective than refactoring (which, however, requires us to change the artifacts).
That conflict will emerge over time; for instance, adding common functionality will take more time, possibly some duplication of code, etc. This kind of "consequence" is not very well documented in the GoF book.

Note: there is no redundancy with code here: we're talking about the problem space and the decision space. Contrast this with the usual redundancy between a UML diagram and code (with pros and cons, as usual). Also, isn't this diagram much better at communicating design problems, decisions, and impact than the largely ignored design rationale tools and notations?

What else?
There is also a different decision we can make; it's a particularly bad decision, therefore it's also very popular :-) wherever code quality is easily ignored. We can just forgo uniform access, and have clients deal with a non-uniform interface (through if, switch/case, whatever). This is what I meant by "not dealing with conflict" earlier. I called that decision "Tangled Clients" for reasons that will become clear in a future post.

Time out
Last time I said I would provide some examples of conflicting forces in the non-software world. I'll have to cut this post short, because the forcefield diagram I've come up with would take too much to explain. But I'll offer a few pointers for those of you with some time to spare:

- Very simple: the adapter is extremely frequent in the real world as well, as the well-known socket adapter. The forcefield is remarkably similar, but finding the exact translation of every concept is not necessarily trivial. Try this out :-).

- Harder, some engineering knowledge is required. Rotating pumps may leak (why? hint: some empty space is needed if you want something to rotate freely :-). Old pumps just used a seal, which wasn't really good at preventing leaks. At some point (I think around 1940) the end face mechanical seal has been adopted as a better seal for rotating pumps. However, the fluid can still leak, which ain't that good in a number of cases. An interesting solution here is the magnetic drive pump (look it up, guys :-). Draw the forcefield for the problem, and represent how using a magnetic field to transmit movement can compensate otherwise conflicting forces. By the way, this is the example I was thinking about when I wrote my previous post.

- Manufacturing is actually full of great examples of conflicting forces and different approaches to compensate or overcome conflict. Think about thermal grease, just to give a starting point. The list is endless. Trying to model some forcefield in detail is a fascinating exercise.

- If you want to stay on the software side, here is an interesting one. Take some programming language feature (for instance, if you use .NET, you may consider partial classes or attributes; if you use Java, annotations) and identify which tension between the run-time world and the artifact world they're meant to solve. Partial classes are a simple, but interesting exercise: most criticism I've heard about them is stemming from a very partial :-) understanding of the surrounding forcefield, just like the common abuse I've seen (split a large class in two files :-)).

As usual, there is much more to say about compensating forces, the artifact/run-time dual nature of software, and so on, including an unexpected intuition on economy of scale that I got just yesterday while running. See you soon, and drop me a line if you try this stuff out :-)

Sunday, June 06, 2010

Notes on Software Design, Chapter 6: Balancing two Worlds

Back in my time, computer science students were introduced to the distinction between syntax and semantics from day one. That distinction is central in understanding programming languages from the algebraic perspective.

When you are looking for a theory of software forces, however, you'll find another distinction more useful. We have artifacts, and we have run-time instances of artifacts.

Software is encoded knowledge, and as software developers we shape that knowledge. However, we can't act on knowledge itself: we must go through a representation. Source files are a representation, UML diagrams are a representation, and so on.
Source files represent information using a language-specific syntax; for instance, if you use Java or C#, you encode some procedural and structural knowledge about a class inside a text file. The text file is an artifact; it's not a run-time thing. Along the same lines, you can use UML and draw a class diagram, encoding structural knowledge about several classes. That's another artifact. Often, when we say "class C" or "function foo" what we mean is actually its representation inside an artifact (a sequence of LOCs in a source file, where we encode some knowledge in a specific language).
Artifacts can be transformed into other artifacts: without mentioning MDA, you can just compile a bunch of source files and get an executable file. The executable file is just another artifact.

At that point, however, you can actually run the artifact. Let's say that you get a new process: the process is a run-time instance of the executable file artifact. Of course, you may start several run-time instances of the same artifact.

Once the process is started, a wonderful thing happens :-). Your structural and procedural knowledge, formerly represented through LOCs inside artifacts, is now executable as well. Functions can be called. Classes can be instantiated. A function call is a run-time instance of a function. An object is a run-time instance of a class.

We never shape the run-time instances directly. We shape artifacts (like a class described through textual Java or C++) so as to influence the run-time shape of the instances. Sometimes we shape an artifact so that its run-time instances can shape other run-time instances (meta-object protocols, dynamically generated code, etc).

Interestingly, some forces apply at the artifact level, while other forces apply at the instance level. If we don't get this straight, we end up with the usual confusion and useless debates.

A few examples are probably needed. As I haven't introduced enough software forces so far, I'll base most of my examples on -ilities, because the same reasoning applies.

Consider reusability: when we say that we want to reuse "a class" what we usually mean is that we want to reuse an artifact (a component, a source file). Reusing a run-time instance is a form of object caching, which might be useful, but is not what is meant by reusability.

Consider efficiency: when we say "this function is more efficient" what we usually mean is that its run-time instances are faster than other comparable implementations.

Consider mass and gravity (the only software property/force I've introduced so far). Both are about the artifacts.

I guess it's all bread-and-butter so far, so why is this distinction important? In Chapter 1, I wrote: a Center is (in software) a locus of highly cohesive information. in order to create highly cohesive units, we must be able to separate things. so this is what software [design] is about: keeping some things near, some things distant.

Now here is something interesting: in many cases, we want to keep the artifacts apart, but have the run-time instances connected. This need to balance asymmetrical forces in two worlds has been one of the dominant forces in the evolution of programming languages and paradigms, although nobody (as far as I know) has properly recognized its role.

Jump on my time machine, and get back to the 70s. Now consider a sequence of lines, in some programming language, encoding some procedural knowledge. We want to reuse that procedural knowledge, so we extract the lines into a "reusable" function. Reusability of procedural knowledge requires physical separation between a caller and a (reusable) callee (that's basically why subroutines/procedures/functions had to be invented). So, we can informally think of reusability as a force in the artifact world, keeping things apart.
Now consider the specular world of run-time instances. Calling a function entails a cost: we have to save the instruction pointer, pass parameters, change the IP, execute the function's body, restore the IP and probably do some cleanup (details depend on language and CPU). Reusability on the artifact side just compromised efficiency on the run-time side. In most cases, we can ignore the side-effect in the run-time world. Sometimes, we can't, so we have two conflicting forces.
Enter the brave language designer: his job is to invent something to bring balance between these two forces, in two different worlds. His first invention is macro expansion through pre-processing (as in C or macro assemblers). C-style macros may solve the problem, but they bring in others. So the language designer gets a better idea: inline expansion of function calls. That works better. However is done, inlining is a way to keep things apart in the artifact world, and strictly together in the run-time world.

Move along time, and consider concepts like function pointers, functions as first-class objects, or polymorphism: again, they're all about keeping artifacts apart, but instances connected. I don't want the artifact defining function/class f to know about function/class g, but I definitely want this instance of f to call g. Indeed (as we'll explore later on) the entire concept of indirection is born out of this basic need: keep artifacts apart and instances connected.

Consider AOP-like techniques: once again, they are about separating artifacts, while having the run-time behavior composed.

Once you understand this simple need, you can look at several programming concepts and clearly see their role in balancing forces between the two worlds. Where languages didn't help, patterns did. More on this next time.

At this point, however, we can draw another conclusion: that a better forcefield diagram would allow us to model forces (in the two worlds) and how we intend to balance those forces (which is how we intend to shape our software), and keep these two aspects apart. After all, we may have different strategies to balance forces out. More on this another time (soon enough, I hope).

Interestingly, software is not alone in this need to balance forces of attraction and rejection. I'll explore this in my next post, together with a few more examples from the software world. I also have to investigate the relationship between the act of balancing forces and the decision space I've mentioned while talking about inertia. This is something I haven't had time to explore so far, so it's gonna take a little more time.

Thursday, April 29, 2010

Notes on Software Design, Chapter 0: What?

As an author, I've learnt that the introduction is best written last, when text has unfolded, and you really know what your paper is about.
I'm far from that stage with my Notes on Software Design, but I see things clearly enough now that I can write a reasonable introduction, hence the Chapter 0, coming after Chapter 5.

It all starts with the realization that software is just a material, and software design (at any level) is an act meant to give shape (or form) to the material. Software, however, is not a physical material, so we can't borrow on traditional disciplines to seek guidance. Well, sort of.

What do we know about physical materials?
I'm not trying to write an essay on Materials science, so I'll focus on a few central issues here.

Materials have well defined properties. Examples of properties are Electrical conductivity, Coefficient of thermal expansion, Hygroscopy, and so on.

Properties enrich our language and make reasoning more effective (Donald Norman would say that properties "make us smart"). Arguments based on well-defined properties are more robust and don't lead to endless debates. "This material has high hygroscopy, so keep it away from moisture". Period. No debate.

Properties allow to focus: I need to sustain high compression. Therefore, I need high compressive strength. Etc.

Properties allow to select the best materials. Elasticity is required if you want your material to come back to its original shape after stress. Actually, if you know the stress, you can pick the best material, based on a set of criteria, defined by the target product. An important criterion, of course, would be cost.

Properties provide guidance while shaping the material. In fact, we even have manufacturing properties, like castability (see the wikipedia page above on material properties).

Now, properties are well-defined when they are based on replicable, experimental observations. Most often, they are based on both a quantitative measurement process and on a physical theory of matter.

The quantitative side is often based on forces. Compressibility is based on Pressure (and Volume). So we can link everything back to a few well-understood forces.

The theoretical side is usually based on a model of matter. Physical materials are subjected to the well-know laws, like newtonian physics (at the right scale). The model of matter itself (e.g. the notion of metals crystal structure) is helpful when we try to understand why materials behave in a certain way under some specific force.

Moving up (as I said, I'm not trying to write a comprehensive essay on physical materials), we have construction principles and patterns. This is encoded knowledge, prescribing what we should or should not do, or describing how to do something. The bright side is that, in the modern world, we can trace back most principles and patterns to a set of well-defined forces and properties.

Finally, we have tools. Finite-Element Analysis, for instance, can be used to investigate the mechanical, thermal, electromagnetic, [etc], properties of a large structure.

What do we know about software as a material?

Not much, I'm afraid. Our knowledge is basically articulated in:

- Principles
- Patterns and Blueprints
- Methods
- Metrics
- Ilities

There is no shortage of principles. This interesting page, for instance, lists quite a few. Many are ill-defined and redundant, but a working knowledge of most principles is considered necessary for any good programmer / designer.

We have patterns - I know I don't have to say more about this. We also have reference architectures, which are not quite the same as patterns or pattern languages, but close enough.

We have methods. Test Driven Design, for instance, is a method, not a principle.

We have metrics. Metrics seem to be very close to properties. Halstead defined a concept of "volume" for software, based on information theory. The well-known Chidamber and Kemerer suite defined a number of property-like concepts like Depth of Inheritance Tree, Number of Children, and so on. Metrics have the nice property of being easy to measure (most of the times). They have the dubious property of being very remotely connected with design reasoning. In fact, most literature on metrics is about proving they're not meaningless, mostly by showing some correlation with bug density or change density or something more connected with software development.

We have -ilities, like reliability, scalability, and so on. These are often ill-defined, hard to measure properties of the final product. More akin to defining a "safe car" than to defining the properties of an alloy.
Some authors have contributed more ility-like properties. Robert Martin, for instance, talks about Rigidity, Fragility, Viscosity and so on, but they are mostly based on metaphorical reasoning, not an a solid theory. They're also partially overlapping, and not precisely defined.

What do we ignore about software as a material?

We don't have a theory of forces. Lacking forces, it's basically impossible to come up with meaningful properties. Compressibility can't be defined when you don't even know about compression (pressure).

We don't have true properties. Properties should extend from language design to library design to application design. Properties should encompass paradigms. Properties should be based on a sensible theory of what software design is about, not on the mere fact that we can measure something or come up with a nice formula. Properties must be perceived as useful by software designers.

We don't have a way to model forces and properties, basically because we don't know jack about them.

Some perspective
I live in an old building (and I like it :-). When you look at the walls, however, you can almost hear the architect thinking "every problem can be solved by making some wall thicker". Modern buildings are designed with completely different techniques. The next-generation green buildings will make the most out of our knowledge of construction materials.

I also live in the software world. You probably know the adage "All problems in computer science can be solved by another level of indirection" (David Wheeler). Why is that? What is, exactly, a level of indirection? What is indirection, by the way? Why is it useful? What is the underlying theory, what is a reasonable measure? What is the real difference between indirection in data and in control flow? Pick your favorite book on software design, and look up the answer if you can find it :-).

So what?
Well, in the end, this is what I'm aiming for. A theory of software forces. A set of useful properties of software as a material. This is what all this stuff is about. I haven't worked out everything yet. Actually, I've changed my mind on several things I've written so far (hey, it's a blog, not a book :-). But it's slowly coming together.

Oh, by the way. I know quite a few people that won't feel good about the above. They want software to be an art. They want software to be "about humans".
Let me state this clearly: I'm not trying to pursue some sort of "deskilling", whereby any fool could put together great software by applying some sort of magic process. I don't believe in deskilling. Actually, I believe in upskilling. I also understand the idea of software development as a craft, and even as an art. I know the poetry of code, so to speak :-). I rely on intuition and tacit knowledge every single day.
Still, no amount of craftsmanship will prevent a cold, thin glass from breaking when force is applied. I want to know why, and how to shape it anyway, and I want a better way to say that, to teach that, I want to reach a deeper understanding upon which better, more ambitious systems can be built.

Too much? Most likely :-), but hey, what is life without a noble aspiration?

Friday, March 12, 2010

Where is your Knowledge?

Software development is a process of knowledge acquisition and knowledge encoding (see Phillip Armour, copiously quoted in this blog). Where, and how, do we store that knowledge? In several places, in several ways:

In source code: that's executable knowledge
In models: that's formal knowledge
In other kind of documents: that's written knowledge
In our brain, consciously: that's explicit knowledge
In our brain, unconsciously: that's tacit knowledge

Knowledge stored in source code has the extremely useful property of being executable, but we can't store the entire development knowledge in executable statements. Design Rationale, for instance, is not present in code (and not even in most UML diagrams, for that matter), and is basically stored at the conscious/unconscious level. My forcefield diagram is much better at formally capturing rationale.

Explicit knowledge is often passed by as oral tradition, while tacit knowledge is often passed by as "a way of doing things", just by working together. Pair programming, reviews, joint design sessions (and so on) help distribute both explicit and tacit knowledge.

Knowledge has value, but that value is not constant over time. In 1962, Fritz Machlup came up with the concept of Half-life of knowledge: the amount of time that has to elapse before half of the knowledge in a particular area is superseded or shown to be untrue.

Moreover, the initial value of a particular piece of knowledge can be very high, like a new algorithm that took you years to get right, or very small, like a trivial validation rule.

Recently, I began to think about the half-life of our knowledge repositories as well. With my usual boldness, I'll go ahead and define the Half-Life of a Knowledge Repository: the amount of time that has to elapse before half of the knowledge in a repository is unrecoverable or just too costly to recover. I could even define "too costly" as "higher than the discounted value of that knowledge when lookup is attempted".

The concept of recoverable knowledge is slightly deeper than it may seem. Sure, it does cover the obvious problems of losing knowledge for lack of backup procedures, or because it's stored in a proprietary format no longer supported, and so on. But it covers also several interesting cases:

- the knowledge is in the brain of an employee, who leaves the company
- the knowledge is in source code, but it's in an obsolete language
- the knowledge is in source code, but it's extremely hard to read
- etc.

I'll leave it up to you to define the half-life of source code, models, documents, brain (conscious and unconscious). Of course, more details are needed: niche languages, for instance, tend to have a shorter half-life.

Now, here is the real boon :-). We can combine the concept of Knowledge Half-Life, Knowledge Value, and Knowledge Repository Half-Life to map the risk of storing a particular piece of knowledge in a particular repository (only). Here is my first-cut map:

Knowledge Half-Life	Knowledge (initial) Value	Repository Half-Life	Result
Long	Long	Long	OK
Long	Long	Short	Risk
Long	Short	Long	Little Waste
Long	Short	Short	Little Risk
Short	Long	Long	Little Waste
Short	Long	Short	Little Risk
Short	Short	Long	Waste
Short	Short	Short	OK

It's interesting to review one of the values in the Agile Manifesto (Working software over comprehensive documentation) under this perspective.

Let's say we have a piece of knowledge, and that knowledge can be indeed stored in code (as I said, you can't store everything in code).

If the half-life of knowledge is short, storing it in code only is probably the best economical choice. If the half-life of knowledge is long, we have to worry a little more. If we add relevant unit tests to that piece of code, we increase the repository half-life, as they make it easier to recover knowledge from code. If we use a mainstream language, we can also increase the repository half-life.

This may still not be enough. If you had to recover the entire knowledge stored in a non-trivial piece of code (say, an mp4 codec) having only the source code, and no (comprehensive) documentation on what that piece of code is doing, why, and how, it would take you far too much. The half-life of code is shorter than the half-life of code + documents.
Actually, depending on context, given the choice to have just the code and nothing else, or just comprehensive documentation and nothing else, we better be careful about what we choose (when knowledge half-life is long, of course).

Of course, the opposite is also true: if you store knowledge with short half-life outside code, you seriously risk wasting your time.

I've often been critic about teaching and applying principles and techniques without the necessary context. I hope that somehow, the table above and the underlying concepts can move our understanding of when to use what a little further.