Sunday, June 06, 2010

Notes on Software Design, Chapter 6: Balancing two Worlds

Back in my time, computer science students were introduced to the distinction between syntax and semantics from day one. That distinction is central in understanding programming languages from the algebraic perspective.

When you are looking for a theory of software forces, however, you'll find another distinction more useful. We have artifacts, and we have run-time instances of artifacts.

Software is encoded knowledge, and as software developers we shape that knowledge. However, we can't act on knowledge itself: we must go through a representation. Source files are a representation, UML diagrams are a representation, and so on.
Source files represent information using a language-specific syntax; for instance, if you use Java or C#, you encode some procedural and structural knowledge about a class inside a text file. The text file is an artifact; it's not a run-time thing. Along the same lines, you can use UML and draw a class diagram, encoding structural knowledge about several classes. That's another artifact. Often, when we say "class C" or "function foo" what we mean is actually its representation inside an artifact (a sequence of LOCs in a source file, where we encode some knowledge in a specific language).
Artifacts can be transformed into other artifacts: without mentioning MDA, you can just compile a bunch of source files and get an executable file. The executable file is just another artifact.

At that point, however, you can actually run the artifact. Let's say that you get a new process: the process is a run-time instance of the executable file artifact. Of course, you may start several run-time instances of the same artifact.

Once the process is started, a wonderful thing happens :-). Your structural and procedural knowledge, formerly represented through LOCs inside artifacts, is now executable as well. Functions can be called. Classes can be instantiated. A function call is a run-time instance of a function. An object is a run-time instance of a class.

We never shape the run-time instances directly. We shape artifacts (like a class described through textual Java or C++) so as to influence the run-time shape of the instances. Sometimes we shape an artifact so that its run-time instances can shape other run-time instances (meta-object protocols, dynamically generated code, etc).

Interestingly, some forces apply at the artifact level, while other forces apply at the instance level. If we don't get this straight, we end up with the usual confusion and useless debates.

A few examples are probably needed. As I haven't introduced enough software forces so far, I'll base most of my examples on -ilities, because the same reasoning applies.

Consider reusability: when we say that we want to reuse "a class" what we usually mean is that we want to reuse an artifact (a component, a source file). Reusing a run-time instance is a form of object caching, which might be useful, but is not what is meant by reusability.

Consider efficiency: when we say "this function is more efficient" what we usually mean is that its run-time instances are faster than other comparable implementations.

Consider mass and gravity (the only software property/force I've introduced so far). Both are about the artifacts.

I guess it's all bread-and-butter so far, so why is this distinction important? In Chapter 1, I wrote: a Center is (in software) a locus of highly cohesive information. in order to create highly cohesive units, we must be able to separate things. so this is what software [design] is about: keeping some things near, some things distant.

Now here is something interesting: in many cases, we want to keep the artifacts apart, but have the run-time instances connected. This need to balance asymmetrical forces in two worlds has been one of the dominant forces in the evolution of programming languages and paradigms, although nobody (as far as I know) has properly recognized its role.

Jump on my time machine, and get back to the 70s. Now consider a sequence of lines, in some programming language, encoding some procedural knowledge. We want to reuse that procedural knowledge, so we extract the lines into a "reusable" function. Reusability of procedural knowledge requires physical separation between a caller and a (reusable) callee (that's basically why subroutines/procedures/functions had to be invented). So, we can informally think of reusability as a force in the artifact world, keeping things apart.
Now consider the specular world of run-time instances. Calling a function entails a cost: we have to save the instruction pointer, pass parameters, change the IP, execute the function's body, restore the IP and probably do some cleanup (details depend on language and CPU). Reusability on the artifact side just compromised efficiency on the run-time side. In most cases, we can ignore the side-effect in the run-time world. Sometimes, we can't, so we have two conflicting forces.
Enter the brave language designer: his job is to invent something to bring balance between these two forces, in two different worlds. His first invention is macro expansion through pre-processing (as in C or macro assemblers). C-style macros may solve the problem, but they bring in others. So the language designer gets a better idea: inline expansion of function calls. That works better. However is done, inlining is a way to keep things apart in the artifact world, and strictly together in the run-time world.

Move along time, and consider concepts like function pointers, functions as first-class objects, or polymorphism: again, they're all about keeping artifacts apart, but instances connected. I don't want the artifact defining function/class f to know about function/class g, but I definitely want this instance of f to call g. Indeed (as we'll explore later on) the entire concept of indirection is born out of this basic need: keep artifacts apart and instances connected.

Consider AOP-like techniques: once again, they are about separating artifacts, while having the run-time behavior composed.

Once you understand this simple need, you can look at several programming concepts and clearly see their role in balancing forces between the two worlds. Where languages didn't help, patterns did. More on this next time.

At this point, however, we can draw another conclusion: that a better forcefield diagram would allow us to model forces (in the two worlds) and how we intend to balance those forces (which is how we intend to shape our software), and keep these two aspects apart. After all, we may have different strategies to balance forces out. More on this another time (soon enough, I hope).

Interestingly, software is not alone in this need to balance forces of attraction and rejection. I'll explore this in my next post, together with a few more examples from the software world. I also have to investigate the relationship between the act of balancing forces and the decision space I've mentioned while talking about inertia. This is something I haven't had time to explore so far, so it's gonna take a little more time.

No comments: