Thursday, August 19, 2010

Notes on Software Design, Chapter 9. A simple property: Distance (part 2)

What is Distance in the run-time world? As I began pondering on this, it turned out I could define distance in several meaningful ways. Some were redundant with other notions I have yet to present. Some were useless for a theory of software design. In the end, I picked up a very simple definition, with some interesting ramifications.

Just like the notion of distance in the artifact world is based on the hierarchy of artifacts, the notion of distance in the run-time world is based on a hierarchy of locations. These are the locations where executable knowledge is located at any given time. So, given two pieces of executable knowledge P1 and P2, we can define an ordinal scale:

P1 and P2 are inside the CPU registers or the CPU execution pipeline - that's minimum distance.
P1 and P2 are inside the same L1 cache line
P1 and P2 are inside the L1 cache
… etc

for the full scale, see the (updated) Summary at the Physics of Software website.

Dude, I don't care about cache lines!
Yeah, well, but the processor does, and I'm looking for a good model of the real-world, not for an abstract model of computing disconnected from reality (which would be more like the math of software, not the physics).
You may be writing your code in Java or C#, or even in C++, and be blissfully unaware of what is going on under the hood. But the caches are there, and their effect is absolutely visible. For a simple, experimental, and well-written summary, see Igor Ostrovsky's Gallery of Processor Cache Effects (the source code is in C#, but results wouldn't be different in Java or C++).
Interestingly enough, most algorithms and data structures are not optimized for modern processors with N levels of cache. Still, there is an active area of research on cache-oblivious algorithms which, despite the name, are supposed to perform well with any cache line size across any number of cache levels (you can find a few links to specialized algorithms here but you'll have to work around broken links).

What about virtual memory? Again, we can ignore the magic most of the times, but when we're looking for high-performance solutions, we have to deal with it. Unconvinced? Take a look at You're Doing It Wrong, where Poul-Henning Kamp explains (perhaps with a bit too much "I know it all and you don't" attitude :-)) why textbooks are not really talking about real-world computers [anymore].

Consequences
What happens when run-time distance grows? We're bound to see a staircase-like behavior, as in the second picture in the gallery above, just with more risers/treads. When you move outside your process, you have context switching. When you move outside your computer, you have network latency. When you move outside your LAN, you also have name lookup and packet hops. We'll understand all this stuff much better as we get to the concept of friction.

There is more. When talking about artifact distance, I said that coupling (between artifacts) should decrease as distance increases. In the run-time world, coupling to the underlying platform should decrease as distance increases. This must be partially reflected in the artifacts themselves, but it is also a language / platform / transformation concern.
Knowledge at short distance can be tightly coupled to a specific hw / sw platform. For instance, all code inside one component can be tightly bound to:
  • An internal object model, say the C++ object model of a specific compiler version, or the C# or Java object model for a specific version of the virtual machine.

  • A specific operating system, if not virtualized (native code).

  • A specific hardware, if not virtualized.

This is fine at some level. I can even accept the idea that all the components inside a single application have to share some underlying assumptions. Sure, it would be better if components were relatively immune from binary issues (a plague in the C++ world). But overall (depending on the size of the application) I can "control" things and make sure everything is aligned.
But when I'm talking to another service / application over the network, my degree of control is much smaller. If everything is platform-dependent (with a broad definition of platform, mind you: Java is a platform), we're in for major deployment / maintenance issues. Even worse, it would represent a huge platform lock-in (I can't use a different technology for a new service). Things get just worse on a global network scale. This is why XML took the world by storm, why web services have been rather successful in the real world, and so on. This is also why I like technologies / languages that take integration with other technologies / languages seriously, and not religiously.
As usual, the statement above is bi-directional. That is, it makes very little sense to pursue strong decoupling from the underlying platforms at micro-level. Having a class talking to itself in XML is not a brilliant strategy. Again, design is about balance: in this case, balance between efficiency and convenience on one side, and flexibility and evolvability on the other. Balance is obtained when you can depend on your platform locally, and be increasingly independent as you move farther.

Run-time Distance is not a constant
Not necessarily, anyway; it depends on small-scale technical choices. In C++, for instance, once you get two objects in the same cache line, they will stay there for their entire life, because identity in C++ is address-based. In a garbage collected environment, this is not true anymore: objects can move freely during collection.
Moreover, once we move from the cache line to the entire cache, things come and go, they become near and distant along time. This contributes to complex performance patterns, and indeed, modern hardware makes accurate performance prediction almost impossible. I'm pretty sure there are some interesting phenomena to be studied here - a concept of oscillating distance, perhaps the equivalent of a performance beat when two concurrent threads have slightly different oscillating frequency, and so on, but I'm not currently investigating any of this - it's just too early.
At some point, distance becomes more and more "constant". Sure, a local service may migrate to LAN and then to WAN, but usually it does so because of human intervention (decisions!), and may require changes on the artifact side as well. Short-range distance is a fluid notion, changing as executable knowledge is, well, executed :-).

By the way: distance in the artifact world is not a constant, either. It is constant when the code is frozen. As soon as we change it, we change some distance relationships. In other words, when the computer is processing executable knowledge, run-time distance changes. When we process encoded knowledge (artifacts), artifact distance changes.

Distance-preserving transformations
Knowledge encoded in artifacts can be transformed several times before becoming executable knowledge. Most of those transformations are distance-preserving, that is, they map nearby knowledge to nearby knowledge (although with some jumps here and there).

For instance, pure sequences of statements (without choices, iterations, calls) are "naturally" converted into sequential machine-level instructions that not only will likely sit in the same cache line, but won't break the prefetch pipeline either. Therefore, code that is near in the artifact world will end up near in the run-time world as well.
In the C/C++ family, data that is sequentially declared in a structure (POD) is sequential in memory as well. Therefore, there is a good chance of sharing the same cache line if you keep your structures small.
Conversely, data and code in different components are normally mapped to different pages (in virtual memory systems). They won't share the same cache line (they may compete for the same cache line, but won't be present in the same line at once). So distant things will be distant.

Even "recent" advances in processor architecture are increasing the similarity between the artifact and run-time class of distance. Consider predicated execution: it's commonly used to remove branches at machine-level for short sequence of statements in if/else conditionals (see Fig. 1 in the linked paper). In terms of distance, it allows nearby code in the artifact space to stay close in the run-time space, by eliminating branches and therefore maximizing proximity in the execution pipeline.

Some transformations, however, are not distance-preserving. Inlining of code (think of C++ inline functions, for instance) will shrink distance while moving from the artifact world to the run-time world.
Aspect Oriented Programming is particularly interesting from the point of view of distance. On the artifact side, aspects allow to isolate cross-cutting concerns. Therefore, they allow to increase distance between the advice, that is factored out, and the join points. A non-distance-preserving transformation (weaving) brings the two concepts back together as we move toward execution.

Curiously enough, some non-preserving transformations work in the opposite way: they allow things to be near in the artifact space, yet be distant in the run-time world. Consider the numerous technologies (dating back to remote procedure calls) that allow you to code as if you were invoking a local function, or a method of a local object, while in fact you are executing a remote function, or a method of a remote object (through some kind of local proxy). This is creating an illusion of short distance (in the artifact world) while in fact maintaining high distance in the run-time world. As usual, whenever we create software over a thin layer of illusion, there is a potential for problems. When you look at The 8 fallacies of distributed computing , you can immediately recognize that dealing with remote objects as if they were local objects can be rather dangerous. Said otherwise, the illusion of short distance is a leaky abstraction. More on distributed computing when I'll get to the concept of friction.

A closing remark: in my previous post, I said that gravity (in the artifact world) tends to increase performance (a run-time property). We can now understand that better, and say that it is largely (not entirely) because:
- the most common transformations are distance-preserving.
- performance increases as the run-time distance decreases.
Again, friction is also at play here, but I have yet to introduce the concept.

Addenda on the artifact side
Some concepts on the artifact side are meant to give the illusion of a shorter distance, while maintaining separation. Consider extension methods in .NET or the more powerful concept of category in objective C. They both give the illusion of being very close to a class (when you use them) while in fact they are just as distant as any other class. (By the way: I've been playing with extension methods [in C#] as a way to get something like partial specialization in C++; it kinda works, but not inside generics, which is exactly were I would need it).

Distance in the Decision Space
While thinking about distance in the artifact and in the run-time world, I realized that the very first notion of distance I introduced was in the decision space. Still, I haven't defined that notion at the detail level (or lack thereof :-) at which I've defined distance in the artifact and run-time world. I have a few ideas, of course, the simplest definition being "the number of decision that must be undone + the number of [irreversible?] decisions that must be taken to move your artifacts". Those decisions would involve some mass of code. Moving that mass for that "space" would give a notion of necessary work. Anyway, it's probably too early to say more, as I have to understand the decision space better.

Coming soon, I hope, the notion of friction.

Thursday, August 05, 2010

Notes on Software Design, Chapter 8. A simple property: Distance (part 1)

You're writing a function in your preferred language. At some point, you look at your code and you see that a portion just does not belong there. You move it outside, creating another function. Asked why, you may answer that by doing so you made it reusable; that you want to respect the Single Responsibility Principle; that you want to increase cohesion; that code just looks "cleaner" and "easier to read" that way; and so on. I guess it sounds rather familiar.

By moving the code outside the function, you also increased its distance with the code that is left inside. This is probably not so familiar: distance is not a textbook property of software. However, the notion of distance is extremely important. Fortunately, distance is a very simple concept. It has an immediate intuitive meaning, and although it's still rather informal, we can already use it to articulate some non-trivial reasoning.

I'm trying to keep this post reasonably short, so I'll cover only the artifact side of the story here. I'll talk about the run-time world next time.

A Concept of Distance
Consider this simple function (it's C#, but that's obviously irrelevant)

int sum( int[] a )
{
int s = 0;

foreach( int v in a )
s += v ;

return s;
}

I used two blank lines to split the function in three smaller portions: initialization - computation - return. I didn't use comments - the code is quite readable as it is, and you know the adage: if you see a comment, make a function. It would be unnatural (and with dubious benefits) to split that function into sub-functions. Still, I wanted to highlight the three tiny yet distinct procedural portions (centers) within that function, so I used empty lines. I guess most of you have done the same at one point or another, perhaps on a larger scale.

Said otherwise, I wanted to show that some statements were conceptually closer than others. They don't have to be procedural statements. I have seen people "grouping" variable declarations in the same way, to show that some variables sort of "lump together" without creating a structure or a class. I did that by increasing their physical distance in the artifact space.

A Measure of Distance
Given two pieces of information P1, P2, encoded in some artifacts A1, A2, we can define their distance D( P1, P2 ) using an ordinal scale, that is, a totally ordered set:

P1 and P2 appear in the same statement - that's minimum distance
P1 and P2 appear in the same group of consecutive statements
P1 and P2 appear in the same function
… etc

for the full scale, including the data counterpart, see my Summary at the Physics of Software website.

Note that Distance is a relative property. You cannot take a particular piece of information and identify its distance (as you could do, for instance, with mass). You need two.
Also, the ordinal scale is rather limiting: you can do no math with it. It would be nice to turn it into a meaningful interval or ratio scale, but I'm not there yet.

Is Distance useful?
As in most theories, individual concepts may seem rather moot, but once you have enough concepts you can solve interesting problems or gain better understanding of complex phenomena. Right now, I haven't introduced the concept of tangling yet, so Distance may seem rather moot on itself. Still, we can temporarily use a vague notion of coupling to explore the value of distance. It will get better in the [near] future, trust me :-).

Consider a few consecutive statements inside a function. It's ok if they share intimate knowledge. The three segments in sum are rather strongly coupled, to the point that it's ineffective to split them in subfunctions, but that doesn't bother me much. It's fine to be tightly coupled at small distance. As we'll see, it's more than fine: it's expected.

Functions within a class are still close together, but farther apart. Again, it's ok if they share some knowledge. Ideally, that knowledge is embodied in the class invariant, but private functions are commonly tied with calling functions in a rather strong way. They often assume to be called in specific states (that could be captured in elaborated preconditions), and the caller is responsible to guarantee such preconditions. Sequence of calls are also expected to happen in specific orders, so that preconditions are met. Again, that doesn't bother me much. That's why the class exists in the first place: to provide a place where I can group together "closely related" functions and data.

Distinct classes are even more distant. Ideally, they won't share much. In practice, classes inside the same component often end up having some acquaintance with each other. For instance, widgets inside a widget library may work well together, but may not work at all with widgets inside a different library. Still, they're distant enough to be used individually.

We expect components / services to be lightly coupled. They can share some high-level contract, but that should be all.

Applications shouldn't be coupled at all – any coupling should appear at a lower level (components).

The logical consequence here is that coupling must decrease as distance increases. There is more to this statement than is immediately obvious. The real meaning is:
a) large distance requires low coupling
b) small distance requires high coupling

When I explain the concept, most people immediately think of (a) and ignore (b). Yet (b) is very important, because it says:
1) if coupling with the surroundings is not strong enough, you should move that portion elsewhere.
2) the code should go where the coupling is stronger (that is, if code is attracted elsewhere, consider moving it elsewhere :-)). That's basically why feature envy is considered a bad smell – the code is in the wrong place.

Cohesion as an emergent property
Cohesion has always been a more elusive concept than coupling. Looking at literature, you'll find dozens of different definitions and metrics for cohesion (early works like Myers' Composite/Structured Design used to call it "strength"). I've struggled with the concept for a while, because it didn't fit too well with other parts of my theory, but then I realized that cohesion is not a property per se.

Cohesion is a byproduct of attraction and distance: an artifact is cohesive if its constituents are at the right distance, considering the forces of attraction and rejection acting upon that artifact. If the resulting attraction is too strong or too weak, parts of that artifact want to move either down or up in the distance hierarchy, or into another site at the same level.

Attraction is too weak: the forces keeping that code together are not strong enough to warrant the short distance at which we placed the code. For instance, a long function with well-identified segments sharing little data. We can take that sequence of statements and move it up in the hierarchy - forming a new function.

Attraction is too strong: for instance, we put code in different classes, but those classes are intimately connected. The easier thing is to demote one class to a set of functions (down in the hierarchy) and merge those functions with the other class. But perhaps the entire shape is wrong, at odd with the forcefield. Perhaps new abstractions (centers) must be found, and functions, or even statements, moved into new places.

This is closing the circle, so to speak. Good software is in a state of equilibrium: attraction and rejection are balanced with proper distance between elements.

Note: I'm talking about attraction and rejection, but I have yet to present most attractive / repulsive forces. Still, somehow I hope most of you can grasp the concepts anyway.

An Alexandrian look on the notion of distance
I've quoted Christopher Alexander several time in an early discussion on the concept of form. Now, you may know that Alexander's most recent theory is explained in 4 tomes (which I haven't deeply read yet) collectively known as "The Nature of Order". A few people have tried to relate some of his concepts with the software world, but so far the results have been rather unimpressive (I'm probably biased in my judgment :-).

On my side, I see a very strong connection between the concept of equilibrium as an interplay between distance and the artifact hierarchy and the Alexandrian concept of levels of scale: “A balanced range of sizes is pleasing and beautiful”.
Which is not to say that you should have long functions, average functions, small functions :-). I would translate that notion in the software world as: a balanced use of the artifact hierarchy is pleasing and beautiful. That is:
Don't use long function: use multiple functions in a class instead.
Don't use long classes: use multiple classes in a component instead.
Don't create huge components: use multiple components inside an [application/service/program] instead

This is routinely ignored (which, I think, contributes to the freescale nature of most source code) but it's also the very first reason why those concepts have been introduced in the first place! Actually, we are probably still missing a few levels in the hierarchy, as required for instance to describe systems of systems.

Gravity, efficiency, and the run-time distance
Remember gravity? Gravity (in the artifact world) provides a path of least resistance for the programmer: just add stuff where there is other vaguely related related stuff. Gravity works to minimize distance, but in a kind of piecemeal, local minimum way. It's easy to get trapped into local minimum. The minimum is local when we add code that is not tightly connected with the surroundings, so that other forces at play (not yet discussed) will reject it.

When you point out incoherent, long functions, quite a few programmers bring in "efficiency" as an excuse (the other most common excuse being that it's easier to follow your code when you can just read it sequentially, which is another way to say "I don't understand abstraction" :-).
Now, efficiency is a run-time concept, and I haven't explained the corresponding concept in my theory yet. Still, using again the informal notion of efficiency we all have, we can already see that efficiency [in the run-time world] tends to decrease as distance [in the artifact world] increases. For instance, moving lines into another function requires passing parameters around. This is a first-cut, rough explanation of the well-known trade-off between run-time efficiency and artifact quality (maintainability, readability, reusability).

Coming soon:
the concept of distance in the run-time world, and distance-preserving transformations
efficiency in the physics of software
tangling
not sure yet : ), but probably isolation and density