Carlo Pescio: NOSD

Showing posts with label NOSD. Show all posts

Thursday, December 27, 2012

Ask not what an object is, but...

I can barely remember the days when objects were seen like a new, shiny, promising technology. Today, objects are often positioned between mainstream and retro, while the functional paradigm is enjoying an interesting renaissance. Still, in the last few months I stumbled on a couple of blog posts asking the quintessential question, reminiscent of those dark old days: “what is an object?”

The most recent (September 2012) is mostly a pointer to a stripped-down definition provided by Brian Marick: “It’s a clump of name->value mappings, some functions that take such clumps as their first arguments, and a dispatch function that decides which function the programmer meant to call”. Well, honestly, this is more about a specific implementation of objects, with a rather poor fit, for instance, with the C++ implementation. It makes sense when you’re describing a way to implement objects (which is what Marick did) but it’s not a particularly far-reaching definition.

The slightly older one (July 2012) is much more ambitious and comprehensive. Cook aims to provide a “modern” definition of objects, unrestricted by specific languages and implementations. It’s an interesting post indeed, and I suggest that you take some time reading it, but in the end, it’s still very much about the mechanics of objects ("An object is a first-class, dynamically dispatched behavior").

Although it may seem ok from a language design perspective, defining objects through their mechanics leaves a vacuum in our collective knowledge: how do we design a proper object-oriented system?

Notes on Software Design, Chapter 16: Learning to see the Forcefield

When we interact with the physical world, we develop an intuitive understanding of some physical forces. It does not take a PhD in physics to guess what is going to happen (at a macroscopic level) when you apply a load at the end of a cantilever:

You can also devise a few changes (like adding a cord or a rod) to distribute forces in a different way, possibly ending up with a different structure (like a truss):

Software is not so straightforward. As I argued before, we completely lack a theory of forces (and materials). Intuitive understanding is limited by the lack of correlation between form and function (see Gabriel). Sure, many programmers can easily perceive some “technological” forces. They perceive the UI, business logic, and persistence to be somehow “kept apart” by different concerns, hence the popularity of layered architectures. Beyond that, however, there is a gaping void which is only partially filled by tradition, transmitted through principles and patterns.

Still, I believe the modern designer should develop the ability to see the force field, that is, understand the real forces pulling things together or far apart, moving responsibilities around, clustering them around new concepts (centers). Part of my work on the Physics of Software is to make those forces more visible and well-defined. Here is an example, inspired by a recurring problem. This post starts easy, but may end up with something unexpected.

Notes on Software Design, Chapter 15: Run-Time Entanglement

So, here I am, back to my unpopular :-) series on the Physics of Software. It's time to explore the run-time side of entanglement, or more exactly, begin to explore, as I'm trying to write shorter posts, hoping to increase frequency as well.
A lot of time has gone by, so here is a recap from Chapter 12: Two clusters of information are entangled when performing a change on one immediately requires a change on the other (to maintain overall consistency). If you're new to this, reading chapter 12, 13 and 14 (dealing with the artifact-side of entanglement) won't hurt.

Moving to the run-time side, some forms of entanglement are obvious (like redundant data). Some are not. Some are rather narrow in scope. Some are far-reaching. I'll cover quite a few cases in this post, although I'm sure it's not an exhaustive list (yet).

Redundancy

This is the most obvious form of run-time entanglement: duplicated information.
Redundancy occurs even without your intervention: swap-file vs. RAM contents, 3rd-2nd-1st level cache vs. RAM and vs. caches in other cores/processors, etc; at this level, entanglement satisfaction is usually called coherence.
In other cases, redundancy is an explicit design decision: database replicas, mem-cached contents, etc. At a smaller scale, we also have redundancy whenever we store the same value in two distinct variables. Redundancy means our system is always on the verge of violating the underlying information entanglement. That's why caches are notoriously hard to "get right" in heavily concurrent environments without massive locking (which is a form of friction). This is strictly related with the CAP Theorem, which I have discussed months ago, but I'll come back to it in the future.
Just like we often try (through hard-won experience) to minimize redundancy in artifacts (the Keep it dry principle), we learnt to minimize redundancy in data as well (through normalization, see below), but also to somehow control redundancy through design patterns like the observer. I'll get back to patterns in the context of entanglement in a future post.

Referential Integrity

This is perhaps less obvious, but easy to understand. Referential integrity, as known within the relational database culture, requires (quoting Wikipedia) that any field in a table that is declared a foreign key can contain only values from a parent table's primary key or a candidate key. That means you cannot remove some records unless you update others, or that you cannot create some records unless some other is in place. That's a form of entanglement, and now we can start to grasp another concept: the idea of transaction is born out of the need to have intermediate micro-states that do not respect the entanglement constraint, yet make those micro-states invisible. As far as the external observer is concerned, time is quantized, and transactions are the ticks.
Interestingly, database normalization is all (just? :-) about imposing a specific form on data entanglement. Consider a database that is not in second normal form; I'll use the same example of the Wikipedia page for simplicity. Several records contain the same information (work location). If we change the work location for Jones, we have to update three records. This is entanglement through redundancy, of course. Normalization does not remove entanglement. What it does is to impose a particular form on it: through foreign keys. That makes it easier to deal with entanglement within the relational paradigm, but it is not without consequences for modularity (which would require an entire post to explore). In a sense, this "standardization" of entanglement form could be seen as a form of dampening. Views, too, are a practical dampening tool (on capable hands). More on this in a future post.

Class Invariant

This, I must confess, was not obvious at all to recognize at first, which is weird in insight. A class invariant is a constraint on the internal state (data) of any object of that class. That means, of course, that those data cannot change independently. Therefore, they are entangled by definition. More precisely, a class invariant can often be stated as a set of predicates, and each predicate is revealing some form of entanglement between two or more data members.
Curiously enough (or perhaps not :-), invariant must hold at the beginning/end of externally observable methods, though it can be broken during the actual execution of those methods. That is to say, public methods play the role of transactions, making time quantized once again.

Object Lifetime

Although this can be seen as a special case of class invariant (just like a cascade delete is ascribed to referential integrity) I'd like to make this explicit, for reasons we'll understand better in a future post. Some objects cannot outlive others. Some objects cannot be created without others. Reusing the terminology of chapter 13, this is a D-D or C-C form of entanglement. The same form of entanglement can be found underneath several DB normalization concepts, and thanks to the RT/Artifact dualism, will turn out to be a useful concept in the artifact world as well.

Toward a concept of tangling through procedural knowledge
All the above, I guess, it's pretty obvious in hindsight. It borrows heavily on the existing literature, which always considered data as something that can be subject to constraints. The risk here is to fall into the "information as data" tar pit. Information is not data. Information includes procedural knowledge (see also my old posts Lost ... and Found? for more on the concept of Information and Alexandrian centers). Is there a concept of entanglement for procedural knowledge?

Consider this simple spreadsheet:

	A	B
1	10	20
2	200	240

That may look like plain data, but behind the curtain we have two formulas (procedural knowledge):

A2=A1*B1
B2=A2*1.2

In the picture above, the system is stable, balanced, or as I will call it, in a steady state. Change a value in A1 or B1, however, and procedural knowledge will kick in. After a short interval (during which the system is unbalanced, or as I'll call it, in a transient state), balance will be restored, and all data entangled through procedural knowledge will be updated.
A spreadsheet, of course, is the quintessential Dataflow system, so it's also the natural trait d'union between knowledge-as-data and procedural knowledge. It should be rather obvious, however, that a similar reasoning can be applied to any kind of event-driven or reactive system. That kind of system is sitting idle (steady state) until something happens (new information entering the system and moving it to a transient state). That unleashes a chain of reactions, until balance has been restored.

Baby steps: now, what was the old-school view of objects (before the watering down took place)? Objects were like tiny little machines, communicating through messages. Right. So invoking a method on an object is not really different than sending the object a message. We're still upsetting its steady state, and the procedural knowledge inside the method is restoring balance (preserving the invariant, etc.)

So, what about the traditional, top-down, imperative batch system? It's actually quite straightforward now: invoking a function (even the infamous main function) means pushing some information inside the system. The system reacts by moving information around, creating new information, etc, until the output is ready (the new steady state). The glue between all that information is the procedural knowledge stored inside the system. Procedural knowledge is basically encoding the entanglement path for dynamic information.

Say that again?
This is beginning to look too much like philosophy, so let's move to code once again. In any C-like languages, given this code:

int a2 = a1 * b1;
int b2 = a2 * 1.2;
int a3 = a1 + b1;

we're expecting the so-called control-flow to move forward, initializing a2, then b2, then a3. In practice, the compiler could reorder the statements and calculate a3 first, or in between, and even the processor could execute the two instructions "out of order" (see the concept of "observable behavior" in the C and C++ standard). What we are actually saying is that a2 can be calculated given a1 and b1, and how; that b2 can be calculated given a2, and how; and that a3 can be calculated given a1 and b1, and how. The "can be calculated given ..." is an abstract interpretation of the code. That abstract interpretation provides the entanglement graph for run-time entities. The "how" part, as it happens, is more germane to the function than to the form, and in this context, is somehow irrelevant.

What about the control-flow primitives, like iteration, choice, call, etc? Iteration and choice are nothing special: at run-time, a specific branch will be taken, for a specific number of times, resulting in a specific run-time entanglement between run-time entities. Call (procedure call, function call) is somewhat different, because it reveals the fractal nature of software: whatever happens inside the function is hidden (except via side effects) from the caller. The function will carry out its job, that is, will restore the fine-grained input-output balance upon invocation.

This is the general idea of procedural knowledge: given an input, control-flow goes on, until the program reaches a steady state, where output has been produced. As noted above, due to the fractal nature of information, this process takes place at different levels. A component instance may still be in a transient state while some object contained in that instance is already back to a steady state.

In this sense, a procedure (as an artifact) is how we teach a machine to calculate an output given an input, through the stepwise creation and destruction of entangled run-time information. Phew :-).

Next steps
The beauty of the Run-Time/Artifact dualism is that whatever we learn on one side, we can often apply on the other side (with the added benefit that many things are easier to "get" on one side or another).
Here, for instance, I've introduced the concept of steady and transient state for run-time information. The same reasoning can be applied to the artifact side. Whenever you create, update or delete an artifact A1, if there is any other artifact that is entangled with A1, the system goes into an unbalanced, transient state. You have to apply work (energy) to bring it back to the steady state, where all the entangled information is properly updated (balanced). A nice quote from Leonardo da Vinci could apply here: "Motion is created by the destruction of balance" (unfortunately, I can't find an authoritative source for the quote :-). Motion in the information space is created by unsettling the system (moving to a transient state) until entanglement is finally satisfied again, and a steady state is reached. This is as true in the artifact space as in the run-time space.
Overall, this has been more like a trampoline chapter, where a few notions are introduced for the benefits of forthcoming chapters. Here is a short list of topics that I'll try to explore at some point in the future:

- The truth about multiplicity, frequency, and attraction/rejection in the run-time world.
- As above, in the artifact world.
- Moving entanglement between run-time and artifacts (we do this a lot while we design).
- The Entanglement Diagram.
- A few examples from well-known Design Patterns.
- Cross-cutting concerns and Delocalized plans: relationships with entanglement.
- More on distance, entanglement, probability of failure, and the CAP theorem.
- Dampening (isolation). This will open a new macro-chapter in the Physics of Software series.

If you read so far, you should follow me on twitter, or perhaps even share this post (I dare you :-)

Saturday, May 21, 2011

The CAP Theorem, the Memristor, and the Physics of Software

Where I present seemingly unrelated facts that are, in fact, not unrelated at all :-).

The CAP Theorem
If you keep current on technology, you're probably familiar with the proliferation of NoSQL , a large family of non-relational data stores. Most NoSQL stores have been designed for internet-scale applications, where large data stores are preferably deployed using an array of loosely connected low-cost servers. That brings a set of well-known issues to the table:

- we'd like data to be consistent (that is, to respect all the underlying constraints at any time, especially replication constraints). We call this property Consistency.

- we'd like fast response time, even when some server is down or under heavy load. We call this property Availability.

- we'd like to keep working even if the system gets partitioned (a connection between servers fails). We call this property Partition tolerance.

The CAP Theorem (formerly the Brewer's Conjecture) says that we can only choose two properties out of three.

There is a lot of literature on the CAP Theorem, so I don't need to go in further details here: if you want a very readable paper covering the basics, you can go for Brewer's CAP Theorem by Julian Browne, while if you're more interested in a formal paper, you can refer to Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services by Seth Gilbert and Nancy Lynch.

There is, however, a relatively subtle point that is often brought up and seldom clarified, so I'll give it a shot. Most people realize that there is some kind of "connection" between the concept of availability and the concept of partition. If a server is partitioned away, it is by definition not available. In that sense, it may seem like there is some kind of overlapping between the two concepts, and overlapped concepts are bad (orthogonal concepts should be preferred). However, it is not so:

- Availability is an external property, something the clients of the data store can see. They either can get data when they need it, or they can't.

- Partitioning is an internal property, something clients are totally unaware of. The data store has to deal with that on the inside.

Of course, given the fractal nature of software, in a system of systems we may see lack of availability at one level, because of the lack of partition tolerance at a lower level (or vice-versa, as an unavailable node may not be distinguishable from a partitioned node).

To sum up, while the RDBMS world has usually approached the problem by choosing Consistency and Availability over Partition tolerance, the NoSQL world has often chosen Availability and Partition tolerance, leading to the now famous Eventually Consistent model.

The Memristor, or the Value of a Theory
As a kid, I spent quite a bit of time hacking electronic stuff. I used to scavenge old TV sets and the like, and recover resistors, capacitors, and inductors to use on very simple projects. Little I knew that, theoretically, there was a missing passive element: the memristor.

Leon Chua developed the theory in 1971. It took until 2008 before someone could finally create a working memristor, using nanoscale techniques that would have looked like science fiction in 1971. Still, the theory was there, long before, pointing toward the future. At the bottom of the theory was the realization that the three known passive elements could be defined by a relationship between four concepts (current, voltage, charge, and flux-linkage). By investigating all possible relationships, he found a missing element, one that was theoretically possible, but yet undiscovered. He used the term completeness to indicate this line of reasoning, although we often call this kind of property symmetry.

Software Entanglement
In Chapter 12 of my ongoing series on the nature of software, I introduced the concept of Entanglement, further discusses in Chapter 13 and Chapter 14 (for the artifact world).

Here, I'll reuse my early definition from Chapter 12: Two clusters of information are entangled when performing a change on one immediately requires a change on the other.

Entanglement is constantly at work in the artifact world: change a class name, and you'll have to fix the name everywhere; add a member to an enumeration, and you'll have to update any switch/case statement over the enumeration; etc. It's also constantly at work in the run-time world.

Although I haven't formally introduced entanglement for the run-time world (that would/will be the subject of the forthcoming Chapter 15), you can easily see that maintaining a database replica creates a run-time entanglement between data: update one, and the other must be immediately (atomically, as we use to say) updated. Perhaps slightly more subtle is the fact that referential integrity is strongly related to run-time entanglement as well. Consistency in the CAP theorem, therefore, is basically Entanglement satisfaction.

Once we understand that, it's perhaps easier to see that the CAP theorem applies well beyond our traditional definition of distributed system. Consider a multi-core CPU with independent L1 caches. Cache contents become entangled whenever they map to the same address. CPU designers usually go with CA, forgetting P. That makes a lot of sense, of course, because we're talking about in-chip connections.

That's sort of obvious, though. Things get more interesting when we start to consider the run-time/artifact symmetry. That's part of the value of a [good] theory.

A CAP Theorem for Artifacts
My perspective on the Physics of Software is strongly based on the run-time/artifact dualism. Most forces apply in both worlds, so it is just natural to bring ideas from one world to another, once we know how to translate concepts. Just like symmetry allowed Chua to conceive the memristor, symmetry may allow us to extend concepts from the run-time world to the artifact world (and vice-versa). Let's give the CAP theorem a shot, by moving each run-time concept to the corresponding notion in the artifact world:

Consistency: just like in the run-time world, it's about satisfying Entanglement. While change in the run-time world is a C/U/D action on data, here is a C/U/D action on artifacts. If you add a member to an enumerated type, your artifacts are consistent when you have updated every portion of code that was enumerating over the extension of that type, etc.

Availability: just like in the run-time world, is the ability to access a working, up-to-date system. If you can't deploy your new artifacts, your system is not available (as far as the artifact world is concerned). The old system may be up and running, but your new system is not available to the user.

Partition tolerance: just like in the run-time world, it means you want your system to be usable even if some changes can't be propagated. That is, some artifacts have been partitioned away, and cannot be reached. Therefore, some sort of partial deployment will take place.

Well, indeed, you can only have two :-). Consider the artifacts involved in the usual distributed system:

- one or more servers
- one or more clients
- a contract in between

The clients and the server (as artifacts – think source code not processes) are U/D entangled over the contract: change a method signature (in RPC speak), or the WSDL, or whatever represents your contract, and you have to change both to maintain consistency.

What happens if you update [the source code of] a server (say, to fix a serious bug), and by doing so you need to update the contract, but cannot update [the source code of] a client [timely]? This is just like a partition. Think of a distributed development team: the team working on the client, for a reason or another, has been partitioned away.

You only have two choices:

- let go of Availability: you won't deploy your new server until you can update the client. This is the artifact side of not having your database available until all the replicas are connected (in the run-time world).

- let go of Consistency: you'll have to live with an inconsistent client, using a different contract.

Consequences
Chances are that you:
- Shiver at the idea of letting go of Consistency.
- Think versioning may solve the problem.

Of course, versioning is a coping strategy, not a solution. In fact, versioning is usually proposed in the run-time world of data stores as well. However, versioning your contract is like pretending that the server has never been updated. It may or may not work (what if your previous contract had major security flaws that cannot be plugged unless you change the contract? What if you changed your server in a way that makes the old contract impossible to keep? etc). It also puts a significant burden on the development team (maintaining more source code than strictly needed) and may significantly slow down development, which is another way to say it's impacting availability (of artifacts).

It is important, however, that we thoroughly understand context. After all, any given function call is a contract, and we surely want to maintain consistency as much as we can. So, when does the CAP Theorem apply to Artifacts? Partition tolerance and Availability must be part of the equation. That requires one of the following conditions to apply:

- Independent development teams. The Facebook ecosystem, for instance, is definitely prone to the artifact side of the CAP Theorem, just like any other system offering a public API to the world at large. You just can't control the clients.

- A large system that cannot be updated in every single part (unless you accept to slow down deployment/forgo availability). A proliferation of clients (web based, rich, mobile, etc) may also bring you into partitioning problems.

If you work on a relatively small system, and you have control of all clients, you're basically on a safe field – you can go with Consistency and Availability, because you just don't have significant partitions. Your development team is more like a multi-core CPU than a large distributed system.

Once we understand that versioning is basically a way to pretend changes never happened on the server side, is there something similar to Eventual Consistency for the artifact side?

TolerantReader may help. If you put extra effort into your clients, you can actually have a working system that will eventually be consistent (when source code for clients will be finally updated) but still be functional during transition times, without delaying the release of server updated. Of course, everything that requires discipline on the client side is rather useless in a Facebook-like ecosystem, but might be an excellent strategy for a large system where you control the clients.

Interestingly, Fowler talks about the virtues of TolerantReader as if it was always warranted for a distributed system. I tend to disagree: it makes sense only when some form of development-side partitioning can be expected. In many other cases, an enforced contract through XSD and code generation is exactly what we need to guarantee Consistency and Availability (because code generation helps to quickly align the syntactical bits – you still have to work on the semantic parts on both sides). Actually, the extra time required to create a TolerantReader will impact Availability on the short term (the server is ready, the client is not). Context, context, context. This is one more reason why I think we need a Physics of Software: without a clear understanding of forces and materials, it's too easy to slip into the fallacy that just because something worked very well for me [in some cases], it should work very well for you as well.

In fact, once you recognize the real forces, it's easy to understand that the problem is not limited to service-oriented software, XML contracts, and the like. You have the same problem between a database schema and a number of different clients. Whenever you have artifact entanglement and the potential for development partitioning (see above) you have to deal with the artifact-side CAP Theorem.

Moving beyond TolerantReader (which is not helping much when your client is sending the wrong data anyway :-), when development partitioning is expected, you need to think about a flexible contract from the very beginning. Facebook, for instance, allows clients to specify which fields they want to receive, but is still pretty rigid in the fields they have to send.

This is one more interesting R&D challenge that is not easily solved with today's technology. Curiously enough, in the physical world we have long understood the need to use soft materials at the interface level to compensate for imperfections and unexpected variations of contact surfaces. Look at any recent, thermally insulated window, and most likely you're gonna find a seal between the sash and the frame. Why are seals still needed in a world of high-precision manufacturing? ;-)

Conclusions
Time for a confession: when I first thought about all the above, I had already read quite a bit on the CAP Theorem, but not the original presentation from Eric Brewer where the conjecture (it wasn't a theorem back then) was first introduced. The presentation is about a larger topic: challenges in the development of large-scale distributed systems.

As I started to write this post, I decided to do my homework and go through Brewer's slides. He identified three issues: state distribution, consistency vs. availability, and boundaries. The first two are about the run-time world, and ultimately lead to the CAP theorem. While talking about Borders, however, Brewers begins with run-time issues (independent failure) and then moves into the artifact world, talking (guess what!) about contracts, boundary evolution, XML, etc. Interestingly, nothing in the presentation suggests any relationship between the problems. Here, I think, is the value of a good theory: as for the memristor, it provides us with leverage, moving what we know to a higher level.

In fact, another pillar of the Physics of Software is the Decision Space. I'm pretty sure the CAP theorem can be applied in the decision space as well, but that will have to wait.

Well, if you liked this, stay tuned for Chapter 15 of my NOSD series, share this post, follow me on twitter, etc.

Sunday, February 27, 2011

Notes on Software Design, Chapter 14: the Enumeration Law

In my previous post on this series, I used the well-known Shape problem to explain polymorphism from the entanglement perspective. We've seen that moving from a C-like approach (with a ShapeType enum and a union of structures) to an OO approach (with a Shape interface implemented by concrete shape classes) creates new centers, altering the entanglement between previous centers.
This time, I'll explore the landscape of entanglement a little more, using simple, well-known problems. My aim is to provide an intuitive grasp of the entanglement concept before moving to a more formal definition. In the end, I'll come up with a simple Software Law (one of many). This chapter borrows extensively from the previous one, so if you didn't read chapter 13, it's probably better to do so before reading further.

Shapes, again
Forget about the Shape interface for a while. We now have two concrete classes: Triangle and Circle. They represent geometric shapes, this time without graphical responsibilities. A triangle is defined by 3 points, a Circle by center and radius. Given a Triangle and a Circle, we want to know the area of their intersection.
As usual, we have two problems to solve: the actual mathematical problem and the software design problem. If you're a mathematician, you may rightly think that solving the first (the function!) is important, and the latter is sort of secondary, if not irrelevant. As a software designer, however, I'll allow myself the luxury of ignoring the gory mathematical details, and tinker with form instead.

If you're working in a language (like C++) where functions exist [also] outside classes, a very reasonable form would be this:

double Intersection( const Triangle& t, const Circle& c )
  {
  // something here
  }

Note that I'm not trying to solve a more general problem. First, I said "forget the Shape interface"; second, I may not know how to solve the general problem of shapes intersection. I just want to deal with a circle and a triangle, so this signature is simple and effective.

If you're working in a class-only language, you have several choices.

1) Make that an instance method, possibly changing an existing class.

2) Make that a static method, possibly changing an existing class.

3) If your language has open classes, or extension methods, etc, you can add that method to an existing class without changing it (as an artifact).

So, let's sort this out first. I hope you can feel the ugliness of placing that function inside Triangle or Circle, either as an instance or static method (that feeling could be made more formal by appealing to coupling, to the open/closed principle, or to entanglement itself).
So, let's say that we introduce a new class, and go for a static method. At that point is kinda hard to find nice names for class and method (if you're thinking about ShapeHelper or TriangleHelper, you've been living far too long in ClassNameWasteLand, sorry :-). Having asked this question a few times in real life (while teaching basic OO thinking), I'll show you a relatively decent proposal:

class Intersection
  {
  public static double Area( Triangle t, Circle c )
    {
    // something here
    }
  }

It's quite readable on the calling site:

double a = Intersection.Area( t, c ) ;

Also, the class name is ok: a proper noun, not something ending with -er / -or that would make Peter Coad (and me :-) shiver. An arguably better alternative would be to pass parameters in a constructor and have a parameterless Area() function, but let's keep it like this for now.

Whatever you do, you end up with a method/function body that is deeply coupled with both Triangle and Circle. You need to get all their data to calculate the area of the intersection. Still, this doesn't feel wrong, does it? That's the method purpose. It's ok to manipulate center, radius, and vertexes. Weird, coupling is supposed to be bad, but it doesn't feel bad here.

Interlude
Let me change problem setting for a short while, before I come back to our beloved shapes. You're now working on yet another payroll system. You got (guess what :-) a Person class:

class Person
  {
  Place address;
  String firstName;
  String lastName;
  TelephoneNumber homePhone;
  // …
  }

(yeah, I know, some would have written "Address address" :-)

You also have a Validate() member function (I'll simplify the problem to returning a bool, not an error message or set of error messages).

bool Validate()
  {
  return 
    address.IsValid() && 
    ValidateName( firstName ) && 
    ValidateName( lastName ) && 
    homePhone.IsValid() ;
  }

Yikes! That code sucks! It sucks because it's asymmetric (due to the two String-typed members) but also because it's fragile. If I get in there and add "Email personalEmail ;" as a field, I'll have to update Validate as well (and more member functions too, and possibly a few callers). Hmmm. That's because I'm using my data members. Well, I'm using Triangle and Circle data member as well in the function above, which is supposed to be worse; here I'm using my own data members. Why was that right, while Validate "feels" wrong? Don't use hand-waving explanations :-).

Back to Shapes
A very reasonable request would be to calculate the intersection of two triangles or two circles as well, so why don't we add those functions too:

class Intersection
  {
  public static double Area( Triangle t, Circle c )
    {     // something here    }
  public static double Area( Triangle t1, Triangle t2 )
    {     // something here    }
  public static double Area( Circle c1, Circle c2 )
    {     // something here    }
  }

We need 3 distinct algorithms anyway, so anything more abstract than that may look suspicious. It makes sense to keep the three functions in the same class: the first function is already coupled with Triangle and Circle. Adding the other two doesn't make it any worse from the coupling perspective. Yet, this is wrong. It's a step toward the abyss. Add a Rectangle class, and you'll have to add 4 member functions to Intersection.

Interestingly, we came to this point without introducing a single switch/case, actually without even a single "if" in our code. Even more interestingly, our former Intersection.Area was structurally similar to Person.Validate, yet relatively good. Our latter Intersection is not structurally similar to Person, yet it's facing a similar problem of instability.

Let's stop and think :-)
A few possible reactions to the above:

- I'm too demanding. You can do that and nothing terrible will happen.
Probably true, a small scale software mess rarely kills. The same kind of problem, however, is often at the core of a large-scale mess.

- I'm throwing you a curve.
True, sort of. The intersection problem is a well-known double-dispatch issue that is not easily solved in most languages, but that's only part of the problem.

- It's just a matter of Open/Closed Principle.
Yeah, well, again, sort of. The O/C is more concerned with extension. In practice, you won't create a new Person class to add email. You'll change the existing one.

- I know a pattern / solution for that!
Me too :-). Sure, some acyclic variant of Visitor (or perhaps more sophisticated solutions) may help with shape intersection. A reflection+attribute-based approach to validation may help with Person. As usual, we should look at the moon, not at the finger. It's not the problem per se; it's not about finding an ad-hoc solution. It's more about understanding the common cause of a large set of problems.

- Ok, I want to know what is really going on here :-)
Great! Keep reading :-)

C/D-U-U Entanglement
Everything that felt wrong here, including the problem from Chapter 13, that is:

- the ShapeType-based Draw function with a switch/case inside

- Person.Validate

- the latter Intersection class, dealing with more than one case

was indeed suffering from the same problem, namely an entanglement chain: C/D-U-U. In plain English: Creation or Deletion of something, leading to an Update of something else, leading to an Update of something else. I think that understanding the full chain, and not simply the U-U part, makes it easier to recognize the problem. Interestingly, the former Intersection class, with just one responsibility, was immune from C/D-U-U entanglement.

The chain is easier to follow on the Draw function in Chapter 13, so I'll start from there. ShapeType makes it easy to understand (and talk about) the problem, because it's both Nominative and Extensional. Every concept is named: individual shape types and the set of those shape types (ShapeType itself). The set is also providing through its full extension, that is, by enumerating its values (that's actually why it's called an enumerated type). Also, every concrete shape is given a name (a struct where concrete shape data are stored).

So, for the old C-like solutions, entanglement works (unsurprisingly) like this:

- There is a C/D-C/D entanglement between members of ShapeType and a corresponding struct. Creation or deletion of a new ShapeType member requires a corresponding action on a struct.

- There is a C/D-U entanglement between members of ShapeType and ShapeType (this is obvious: any enumeration is C/D-U entangled with its constituents).

- There is U-U entanglement between ShapeType and Draw, or every other function that is enumerating over ShapeType (a switch/case being just a particular case of enumerating over names).

Consider now the first Intersection.Area. That function is enumerating over two sets: the set of Circle data, and the set of Triangle data. There is no switch/case involved: we're just using those data, one after another. That amounts to enumeration of names. The function is C/D-U-U entangled with any circle and triangle data. We don't consider that to be a problem because we assume (quite reasonably) that those sets won't change. If I change my mind and represent a Circle using 3 points (I could) I'll have to change Area. It's just that I do not consider that change any likely.

Consider now Person.Validate. Validate is enumerating over the set of Person data, using their names. It's not using a switch/case. It doesn't have to. Just using a name after another amounts to enumeration. Also, enumeration is not broken by a direct function call: moving portions of Validate into sub-functions wouldn't make any difference, which is why layering is ineffective here.
Here, Validate is U-U entangled with the (unnamed) set of Person data, or C/D-U-U entangled with any of its present or future data members. Unlike Circle or Triangle, we have all reasons to suspect the Person data to be unstable, that is, for C and/or D to occur. Therefore, Validate is unstable.

Finally, consider the latter Intersection class. While the individual Area functions are ok, the Intersection class is not. Once we bring in more than one intersection algorithm, we entangle the class with the set of shape types. Now, while in the trivial C-like design we had a visible, named entity representing the set of shapes (ShapeTypes), in the intersection problem I didn't provide such an artifact (on purpose). The fact that we're not naming it, however, does not make it any less real. It's more like we're dealing with an Intensional definition of the set itself (this would be a long story; I wrote an entire post about it, then decided to scrap it). Intersection is C/D-U-U entangled with individual shape types, and that makes it unstable.

A Software Law
One of the slides in my Physics of Software reads "software has a nature" (yeah, I need to update those slides with tons of new material). In natural sciences, we have the concept of Physical law, which despite the name doesn't have to be about physics :-). In physics, we do have quite a few interesting laws, not necessarily expressed through complex formulas. The Laws of thermodynamics , for instance, can be stated in plain English.

When it comes to software, you can find several "laws of software" around. However, most of them are about process (like Conway's law, Brooks' law, etc), and just a few (like Amdhal's law) are really about software. Some are just principles (like the Law of Demeter) and not true laws. Here, however, we have an opportunity to formulate a meaningful law (one of many yet to be discovered and/or documented):

- The Enumeration Law
every artifact (a function, a class, whatever) that is enumerating over the members of set by name is U-U entangled with that set, or said otherwise, C/D-U-U entangled with any member of the set.

This is not a design principle. It is not something you must aim for. It's a law. It's always true. If you don't like the consequences, you have to change the shape of your software so that the law no longer applies, or that entanglement is harmless because the set is stable.

This law is not earth-shattering. Well, the laws of thermodynamics don't look earth-shattering either, but that doesn't make them any less important :-). Honestly, I don't know about you, but I wish someone taught me software design this way. This is basically what is keeping me going :-).

Consequences
In Chapter 12, I warned against the natural temptation to classify entanglement in a good/bad scale, as it has been done for coupling and even for connascence. For instance, "connascence of name" is considered the best, or least dangerous, form. Yet we have just seen that entanglement with a set of names is at the root of many problems (or not, depending on the stability of that name set).

Forces are not good or evil. They're just there. We must be able to recognize them, understand how they're influencing our material, and deal with them accordingly. For instance, there are well-known approaches to mitigate dependency on names: indirection (through function pointers, polymorphism, etc) may help in some cases; double indirection (like in double-dispatch) is required in other cases; reflection would help in others (like Validate). All those techniques share a common effect, that we'll discuss in a future post.

Conclusions
Speaking of future: recently, I've been exploring different ways to represent entanglement, as I wasn't satisfied with my forcefield diagram. I'll probably introduce an "entanglement diagram" in my next post.

A final note: visit counters, sharing counters and general feedback tell me that this stuff is not really popular. My previous rant on design literature had way more visitors than any of my posts on the Physics of Software, and was far easier to write :-). If you're not one of the usual suspects (the handful of good guys who are already sharing this stuff around), consider spreading the word a little. You don't have to be a hero and tell a thousand people. Telling the guy in front of you is good enough :-).

Sunday, January 02, 2011

Notes on Software Design, Chapter 13: On Change

We have a long history of manipulating physical materials. Cavemen built primitive tools out of rocks and sticks. I would speculate :-) that they didn't know much about physics, yet through observation, serendipity and good old trial and error they managed to survive pretty well, or we wouldn't be here today talking about software.

For centuries, we only had empirical observations and haphazard theories about the real nature of the physical world. The Greek Classical Element were Earth, Water, Air, Fire, and Aether, which is not exactly the way we look at materials today. Yet the Greeks managed to build large and incredibly beautiful temples.

Of course, today we know a quite a bit more. Take any kind of physical material, like a copper wire. In the real world, you can:

- apply thermal stress to the material, e.g. by warming it. Different materials react differently.

- apply mechanical stress to the material, e.g. by pulling it on both ends. Different materials react differently.

- apply electromagnetic forces, e.g. by moving the wire inside a magnetic field. Different materials react differently.

- etc.

To get just a little more into specifics, if we restrict ourselves to mechanical stress, and focus on the so-called static loading, we end up with just a few kind of stress: tension, compression, bending, torsion, and shear (see here and also here).

Although people have built large, beautiful structures without any formal knowledge of material science, it is hard to deny the huge progress made possible by a deeper understanding of what is actually going on with materials. Knowledge of forces and reactions allows us to choose the best material, the best shape, and even to create new, better materials.

The software world
Although we can build large, even beautiful systems, it is hard to deny that we are still in the dark ages. We have vague principles that usually can't be formally defined. We have different materials, like statically typed and dynamically typed languages, yet we don't have a sound theory of forces, so we end up with the usual evangelist trying to convert the world to its material of choice. It's kinda funny, because it's just as sensible as having a Brick Evangelist trying to convince you to use bricks instead of glass for your windows ("it's more robust!"), and a Glass Evangelist trying to convince you to build your entire house out of glass ("you'll get more light!").

Indeed, a key realization in my exploration on the nature of software was that we completely lack the notion of force. Sure, we use a bunch of terms like "flexibility", "heavy load", "fast", "fragile", etc., and they all seem to be somehow inspired by physics, but in the end it's just vernacular usage without any kind of theory behind.

As I realized that, I began looking for a small, general yet useful set of "software forces". Nothing too fancy. Tension, compression, bending, torsion, and shear are not fancy, but are incredibly useful for physical materials (no, I wasn't looking for similar concepts in software, but for something just as useful).

Initially, I focused my attention on artifacts, because design is very much about form, and form is made visible through artifacts. I got many ideas and explored several avenues, but most seemed more like ad hoc concepts or turned out to be dead ends. Eventually, I realized we already knew the fundamental forces, though we never used the term "force" to characterize them.

Interestingly, those forces are better known in the run-time world, and are usually restricted to data. Upon closer scrutiny, it turned out they also apply to procedural knowledge, and to the artifact world as well. Also, it turned out that those forces were intimately connected with the concept of entanglement, and could actually shed some light on entanglement itself. More than that, they could be used to find entangled information, in practice, no more philosophy, thank you :-).

So, let's take the easy road and see what we can do with run-time knowledge expressed through data, and build from there.

It's almost too simple
Think about a simple data-oriented application, like a book library. What can you do with a Book record? It's rather obvious: create, read, update, delete. That's it. CRUD. (I'm still thinking about the need for an identity-preserving Move outside the database realm, but let's focus on CRUD right now). CRUD for run-time data is trivial, and is a well-known concept.

CRUD in the artifact world is also relatively simple and intuitive:

Create: we create a new artifact, being it a component, a class, a function.

Read: this got me thinking for a while. In the end, my best shot is also the simplest: we (the contingent intepreter) read the function, as an act of understanding. More on the idea of contingent/essential interpreter in a rather whimsical :-) presentation by Richard Gabriel, that I already suggested a few years ago.

Update: we change an artifact; for instance, we change the source code of some function.

Delete: we remove an artifact; for instance, we remove a member function from a class.

Note that given the fractal nature of software, some operations are recursively identical (by deleting a component we delete all the sub-artifacts), while others change nature as we move into the fractal hierarchy (by creating a new class inside a component we're also updating the component). This is true in the run-time world of data as well.

Extending CRUD to run-time procedural knowledge is not necessarily trivial. Can we talk about CRUD for functions (as run-time, not compile-time, concepts)? We usually don't, but that's because the most common material and tools (programming languages) we're using are largely based on the idea that procedural knowledge is created once (through artifacts), then compiled and executed.

However, interpreted languages always had the possibility of creating, updating and deleting functions at run-time. Some compiled languages allow dynamic function creation (like C#, through Reflection.Emit) but not update/delete. The ability to update a function through reflection would be an important step toward AOP-like possibilities, but most languages have no support for that. This is fine, though: liquids can't be compressed, but that didn't prevent us from defining the concept of compression. I'm looking for concepts that apply in both worlds (artifacts and run-time) but not necessarily to all materials (languages, etc.). So, to summarize:

- Create, Update, Delete retain the usual meaning also for the run-time concept of functions. It means we're literally creating, updating and deleting a function at run-time. The same applies to the (run-time) concept of class (not object). We can Create, Update, Delete classes at run-time as well (given the appropriate language/material).

- Read, in the artifact world, is the human act of reading, or better, is the contingent interpreter (us) giving meaning to (interpreting) the function. At run-time, the equivalent notion is execution, with the essential interpreter (the machine) giving meaning to the function. Therefore, reading a function in the run-time world simply means executing the function. Executing a function may also update some data, but this is fine: the function is read, data are updated; there is no contradiction.

Where do we go from here?
I'm trying to keep this post short, so I'll necessarily leave a lot of stuff for next time. However, I really want to anticipate something. In my previous post I said:

Two clusters of information are entangled when performing a change on one immediately requires a change on the other.

I can now state that better as:

Two clusters of information are entangled when a C/R/U/D on one immediately requires a C/R/U/D on the other.

Yes, the two definitions are not equivalent, as Read is not a Change. The second definition, however, is better, for reasons we'll explore in the next few posts. One reason is that it provides guidance on the kind of "change" you need to consider. Just restrict yourself to CRUD, and you'll be fine. As we'll see next, we can usually group CD and RU, and in fact I've defined two concepts representing those groupings.

Entanglement through CD and/or RU is not necessarily a bad thing. It means that the entangled clusters are attracting each other, and, inasmuch as that specific force is concerned, rejecting others. In other words, they want to form a single cluster, possibly a composite cluster in the next hierarchical level (like two functions attracting themselves and creating a class). Of course, it might also be the case that the two clusters should be merged into a single cluster.

We'll see more about different form of CRUD-induced entanglement in the next few posts. We'll also see how many concepts (from references to database normalization) are actually just ways to deal with the many forms of entanglement, or to move entanglement from the artifact space to the run-time space. Right now, I'll use my remaining time to explain polymorphism through the entanglement perspective.

Polymorphism explained
Consider the usual idea of multiple shapes (circle, triangle, rectangle, etc.) that can be drawn on a screen. In the pre-object era (say, in ANSI C) we could have defined a ShapeType enum, a Shape struct with a ShapeType field and possibly a ShapeData field, where ShapeData would have probably been a union (with all fields for all different shapes). Draw would have been implemented as a switch/case over the ShapeType field.

We have U/U entanglement between ShapeType and ShapeData, and between ShapeType and Draw. U/U means that whenever we update ShapeType we need to update ShapeData as well. I add the Spiral type, so I need to add the relevant fields in the ShapeData union, and add logic to Draw. We also have U/U entanglement between ShapeData and Draw: if we change the way we store shape data, we have to change the Draw function as it depends on those data. We could live with that, but is not pretty.

What if we have another function that depends on ShapeType, like Area? Well, we need another switch-case, and Area will have the same kind of U/U entanglement with ShapeType and ShapeData as Draw. It is that entanglement that is bringing all those concepts together. ShapeType, ShapeData, Draw and Area want to form a cluster.

Note that without polymorphism (language-supported or simulated through function pointers) the code mass is organized into a potentially degenerative form: every function (Draw, Area, etc) is U/U-entangled with ShapeType, and is therefore attracting new procedural knowledge related to each and every ShapeType value. The forcefield will make those functions bigger and bigger, unless of course ShapeType is no longer extended.

Object-oriented polymorphism "works" because it changes the gravitational centers. Each concrete class (Circle, Triangle, etc) becomes a gravitational center for procedural and structural knowledge entangled with a single (ex)ShapeType value. In fact, ShapeType no longer needs to exist at all. There is still U/U-entanglement between the (ex)ShapeData portion related to Circle and the procedural knowledge in Circle.Draw and Circle.Surface (in a Java/C#ish syntax). But this is fine, because they have now been brought together into a new cluster (the Circle class) that then rejected the other artifacts (the rest of the ShapeData fields, the other functions). As I said in my previous post, entanglement between distant element is bad.

Note that we don't really change the mass of code. We just organize it differently, around different gravitational centers. If you draw functions as tall boxes in the switch-case form, you can see that classes are just "wide" boxes cutting through functions. The area of the boxes (the amount of code) does not change.

Object-oriented polymorphism in a statically-typed language also requires [interface] inheritance. This brings in a new form of U/U-entanglement, this time between the interface and each and every concrete class. Whenever we change the interface, we have to change all the concrete classes.

Interestingly, some of those functions might be entangled in the run-time world: they tend to be called together (R/R-entanglement in the run-time space). That would suggest yet another clustering (in the artifact space), like separating all the drawing operations from the geometrical operations in different interfaces.

Note that the polymorphic solution is not necessarily better: it's just a matter of probability. If the set of functions is unstable, plain polymorphism is not really a good answer. We may want to explore a more complex solution based on the Visitor pattern, or some other clustering of artifacts. Understanding entanglement for Visitor is indeed an interesting exercise.

Coming soon (I hope :-)
CD and RU-entanglement (with their real names :-), paving our way to the concept of isolation, or shielding, or whatever I'm gonna settle for. Oh guys, by the way: happy new year!

Friday, November 19, 2010

Notes on Software Design, Chapter 12: Entanglement

Mechanical devices require maintenance, some more than others. Maintenance is intended to keep them working, because mechanical parts tend to wear out. Maintenance is required so that they can still work "as new", that is, do the same thing they were built to do.

Software programs require maintenance, some more than others. Maintenance is intended to keep them useful, because software parts don't wear out, but they get obsolete. They solve yesterday's problems, not today's problems. Or perhaps they didn't even solve yesterday's problems right (bugs). Maintenance is required so that software can so something different from what it was built to do.

Looking from a slightly different perspective, software is encoded knowledge, and the value of knowledge is constantly decaying (see the concept of half-life of knowledge). Maintenance is required to restore value. Energy (human work) must be supplied to keep encoded knowledge current, that is, valuable.

Either way, software developers spend a large amount of their time changing stuff.

Change
Ideally, changes would be purely additional. We would write a new software entity (e.g. a class) inside a new artifact (a source file). We would transform that artifact into something executable (if needed; interpreted languages don't need this step). We would deploy just the new executable artifact. The system would automagically discover the new artifact and start using it.

Although it's possible to create systems like that (I've done it many times through plug-in architectures), programs are not written this way from the ground up. Plug-ins, assuming they exist at all, usually appears at a relatively coarse granularity. Below that level, maintenance is rarely additional. Besides, additional maintenance is only possible when the change request is aligned with the underlying architecture.

So, the truth is, most often we can't simply add new knowledge to the system. We also have to change and remove existing parts. A single change request then results in a wave of changes, propagating through the system. Depending on some factors (that I'll discuss later on) the wave could be dampened pretty soon, or even amplified. In that case, we'll have to change more and more parts as the wave propagates through the system. If we change a modularized decisions, the change wave is dampened at module boundary.

The nature of change
Consider a system with two hardware nodes. The nodes are based on completely different hardware architectures: for instance, node 1 is a regular PC, node 2 is a DSP+ASIC embedded device. They are connected through standard transport protocols. Above transport, they don't share a single line of code, because they don't calculate the same function.
Yet, if you change the code deployed in node 1, you have to change some (different) code in node 2 as well, or the system won't work. Yikes! Aren't those two systems loosely coupled according to the traditional coupling theory? Sure. But still, changes need to happen on both sides.

The example above may seem paradoxical, or academic, but is not. Most likely, you own one of those devices. It's a decoder, like those for DVB-T, sat TV, or inside DivX players. The decoder is not computing the same function as the encoder (in a sense, it's computing the inverse function). The decoder may not share any code at all with the encoder. Still, they must be constantly aligned, or you won't see squat.

This is the real nature of change in software: you change something, and a number of things must be changed at the same time to preserve correctness. In a sense, change is affecting very distant, even physically disconnected components, and must be simultaneous: change X, and Y, Z, K must all change, at once.

At this point, we may think that all the physical analogies are breaking down, because this is not how the physical world behaves. Except it does :-). You just have to look at the right scale, and move from the familiar Newtonian physics to the slightly less comfortable (for me, at least) quantum physics.

Quantum Entanglement
Note: if you really, really, really hate physics, you can skip this part. Still, if you never came across the concept of Quantum Entanglement, you may find it interesting. Well, when I first heard of it, I thought it was amazing (though I didn't see the connection with software back then).

You probably know about the Heisenberg Uncertainty Principle. Briefly, it says that when you go down to particles like photons, you can't measure (e.g.) both position and wavelength at arbitrarily high precision. It's not a problem of measurement techniques. It's the nature of things.

Turns out, however, that we can create entangled particles. For instance, we can create two particles A and B that share the same exact wavelenght. Therefore, we can try to circumvent the uncertainty principle by measuring particle A wavelength, and particle B position. Now, and this is absolutely mind-blowing :-), it does not work. As soon as you try to measure particle B position, particle A reacts, by collapsing its wave function, immediately, even at an arbitrary distance. You cannot observe A without changing B.

Quoting wikipedia: Quantum entanglement [..] is a property of certain states of a quantum system containing two or more distinct objects, in which the information describing the objects is inextricably linked such that performing a measurement on one immediately alters properties of the other, even when separated at arbitrary distances.

Replace measurement with change, and that's exactly like software :-).

Software Entanglement
The parallel between entangled particles and entangled information is relatively simple. What is even more interesting, it works both in the artifact world and in the run-time world, at every level in the corresponding hierarchy (the run-time/artifact distinction is becoming more and more central, and was sorely missing in most previous works on related concepts). On the other hand, the concept of entanglement has far-reaching consequences, and is also a trampoline to new concepts. This post, therefore, is more like a broad introduction than an in-depth scrutiny.

Borrowing from quantum physics, we can define software entanglement as follows:

Two clusters of information are entangled when performing a change on one immediately requires a change on the other.

A simple example in the artifact space is renaming a function. All (by-name) callers must be changed, immediately. Callers are entangled with the callee. A simple example in the run-time space is caching. Caching requires cache coherence mechanisms, exactly because of entanglement.

Note 1: I said "by name", because callers by reference are not affected. This is strongly related with the idea of dampening, and we'll explore it in a future post.

Note 2: for a while, I've been leaning on using "tangling" instead of "entanglement", as it is a more familiar word. So, in some previous posts, you'll find mentions to "tangling". In the end, I decided to go with "entanglement" because "tangling" has already being used (with different meaning) in AOP literature, and also because entanglement, although perhaps less familiar, is simply more precise.

Not your granpa coupling
Coupling is a time-honored concept, born in the 70s and survived to this day. Originally, coupling was mostly concerned with data. Content coupling took place when a module was tweaking another module's internal data; common coupling was about sharing a global variable; etc. Some forms of coupling were considered stronger than others.

Most of those concepts can be applied to OO software as well, although most metrics for OO coupling takes a more simplified approach and mostly consider dependencies as coupling. Still, some forms of dependency are considered stronger than others. Inheritance is considered stronger than composition; dependency on a concrete class is considered stronger than dependency on an interface; etc. Therefore, the mere presence of an interface between two classes is assumed to reduce coupling. If class A doesn't talk to class B, and doesn't even know about class B existence, they are considered uncoupled.

Of course, lack of coupling in the traditional sense does not imply lack of entanglement. This is why so many attempts to decouple systems through layers are so ineffective. As I said many times, all those layers in business systems can't usually dampen the wave of changes coming from the (seemingly) innocent need to add a field to a database table. In every layer, we usually have some information node that is tangled with that table. Data access; business logic; user interface; they all need to change, instantly, so that this new field will be put to use. That's entanglement, and it's not going away by layering.

So, isn't entanglement just coupling? No, not really. Many systems that would be defined as "loosely coupled" are indeed "heavily entangled" (the example above with the encoder/decoder is the poster child of a loosely coupled / heavily entangled system).

To give honor to whom honor is due, the closest thing I've encountered in my research is the concept of connascence, introduced by Meilir Page-Jones in a little known book from mid-90s ("What Every Programmer Should Know About Object-Oriented Design"). Curiously enough, I came to know connascence after I conceived entanglement, while looking for previous literature on the subject. Stille, there are some notable differences between connascence and entanglement, that I'll explore in the forthcoming posts (for instance, Page-Jones didn't consider the RT/artifact separation).

Implications
I'll explore the implications of entanglement in future posts, where I'll also try to delve deeper into the nature of change, as we can't fully understand entanglement until we fully understand change.

Right now, I'd like to highlight something obvious :-), that is, having distant yet entangled information is dangerous, and having too much tangled information is either maintenance hell or performance hell (artifact vs run-time).

Indeed, it's easy to classify entanglement as an attractive force. This is an oversimplification, however, so I'll leave the real meat for my next posts.

Beyond Good and Evil
There is a strong urge inside the human being to classify things as "good" and "bad". Within each category, we further classify things by degree of goodness or badness. Unsurprisingly, we bring this habit into computer science.

Coupling has long been sub-classified by "strength", with content coupling being stronger than stamp coupling, in turn stronger than data coupling. Even connascence has been classified by strength, with e.g. connascence of position being stronger than connascence of name. The general consensus is that we should aim for the weakest possible form of coupling / connascence.

I'm going to say something huge now, so be prepared :-). This is all wrong. When two information nodes are entangled, changes will propagate, period. The safest entanglement is the one with the minimum likelihood of occurrence. Of course, that must be weighted with the number of entangled nodes. It's very similar to risk exposure (believe or not, I couldn't find a decent link explaining the concept to the uninitiated :-).

There is more. We can dampen the effects of some forms of entanglement. That requires human work, that is, energy. It's usually upfront work (sorry guys :-), although that's not always the case. Therefore, even if I came up with some classification of good/bad entanglement, it would be rather pointless, because dampening is totally context-dependent.

I'll explore dampening in future posts, of course. But just to avoid excessive vagueness, think about polymorphism: it's about dampening the effect of a change (which change? Which other similar change is not dampened at all by polymorphism?). Think about the concept of reference. It's about dampening the effect of a change (in the run-time space). Etc.

Your system is not really aligned with the underlying forcefield unless you have strategies in place to dampen the effect of entanglement with high risk exposure. Conversely, a deep understanding of entanglement may highlight different solutions, where entangled information is kept together, as to minimize the cost of change. Examples will follow.

What's next?
Before we can talk about dampening, we really need to understand the nature of change better. As we do that, we'll learn more about entanglement as well, and how many programming and design concepts have been devised to deal with entanglement, while some concepts are still missing (but theoretically possible). It's a rather long trip, but the sky is clear :-).

Monday, October 04, 2010

Notes on Software Design, Chapter 11: Friction in the Artifacts world

When I first "got" the concept of run-time friction, I thought it made sense only in the run-time world. I was thinking of friction as "everything that gets in the way as we process the Function", and since there is no Function in the artifact world, there can be no friction as well. That disturbed me a little, because every other concept was present in both worlds.

Later on, I realized I could extend that notion to the artifact side in two distinct ways. One didn't survive scrutiny. It was too informal, although somehow I'd like to bring back some of the underlying reasoning, probably as part of a different property. The other proved more solid. Since a few people asked me (in real life) how do I get these ideas, I think it might be interesting to tell the story behind the concept; after all, a blog ought to be a... log :-). The story is not really linear, but then, very few things in life are linear.

Step 1
It was an early morning back in July. I was running. At some point (within the first few kilometers) I had a flash that perhaps the notion of mass was not a primitive concept. Perhaps there was a concept of volume (and LOC would give volume, not mass) and a concept of density (after all, lines can be quite different). I spent a few minutes thinking about density (the simplest idea being that perhaps something like cyclomatic complexity could explain density), then thinking that I didn't like volume because it reminded me of Halstead's Software Science, and I didn't want my work to be so disconnected from practice. Then I started to think about a concept of surface; maybe there was an ideal volume / surface ratio too? Then the zen effect of running took over and I blissfully stopped thinking :-)). [most of those ideas were good, and at some point I'll have to reconsider a few things].

Step 2
Days later, I was thinking about giving a name to this stuff I'm writing. I came up with a few ideas, and also run a trademark / domain name search, because you never know. Looking for "Physics of Software", I found a remotely related entry in the C2 wiki: Physical Cues In Software Development.
Now, that stuff seems more concerned with the geometry/topology of code than with the physics of software, but while reading that page I was slightly tantalized by this sentence: "Too dense to refactor easily". Interesting. I did some literature research on density vs. refactoring, but nothing substantial came up.

Step 3
Days later again. I was writing some code (yeah, I write real world code too :-). I tend to write short methods: I practice what I preach. Still, I was in the middle of a rather long function (by my standards, that is, about 100 lines). I was looking at it, trying to understand how it got to be that big. I could see the gravitational effect of having used a massive third party component; that was consistent with my current understanding of the scale-free nature of software (more on this another time).
I could also see a few small, simple improvements, but in the end it was not trivial to refactor that method into a few shorter functions. Sure, it could be done, just not by selecting a few lines and choosing "extract method" from the refactoring menu. I had to create new classes, shuffle responsibilities around. Overall, it was a large effort (given the relatively small mass). I contemplated the idea of leaving that function alone :-). Too dense to refactor easily? Hmm.

Step 4
Perhaps a couple of weeks later. This thing kept bouncing in my head. I always assumed I could refactor every single method into a set of smaller methods, perhaps introducing new classes. Sure, run-time friction could grow as a result, but that's just something to be balanced. But was that actually true? Could I write a function that was basically impossible to refactor, that is, where extracting a few lines required either a huge effort or, even better, where to move N symbols outside, you basically have to add N symbols inside (to call the extracted method)? Having too much to do, I left the question unanswered.

Step 5
Just a few days later. Late evening, but unwilling to call it a day :-), I sat down trying to write a gordian function :-), a simple sequence of lines that couldn't easily be refactored. This is what I end up with:


void f( int a, int b )
  {
  int c = a + b;
  int d = a + c;
  int e = b + c;
  int f = a + d;
  int g = b + d;
  int h = c + d;
  int i = c + e;
  int j = b + e;
  int k = a + e;
  // ... use f, g, h, i, j, k as above
  }

It may be easier to visualize the pattern through a graphical view:

the idea is pretty simple: at every level, I'm using nearby concepts and distant concepts; I'm also creating nodes for subsequent use in lower levels. Now, this function is trivial. Cyclomatic complexity is just 1. Yet is hard to refactor. It is hard to move things (lines, symbols, concepts) around. So I thought perhaps this was "density".

Step 6
Out of nowhere, a few days later I realize that density was not the right name. When you have troubles moving things around, we call it viscosity, not density. That triggered an internal alert: viscosity has already been used in some computing literature, and I hate to redefine existing terms, so let's check literature again.

Step 7
To my knowledge, "viscosity" has been used in:
- The Cognitive Dimensions of Notations literature, where it is defined as "the difficulty of making small changes to the information structure" or "resistance to change", both of which are rather similar to what I'm thinking. Note that although the papers above are about the notation, not the code, I discussed extending those concepts from tools to materials almost three years ago.
- The well known Design Principles and Design Patterns paper from Robert Martin, where it is defined as "When the design preserving methods are harder to employ than the hacks, then the viscosity of the design is high". This is completely unrelated, and honestly I'm not sure that "viscosity" is the right term. Indeed, Martin is using several physics-lookalike properties (immobility, fragility, rigidity), but it seems like they've been adopted on the basis of some vernacular usage of terms, not on the basis of strong correlation between the software world and the physical world (there is certainly no notion of "preserving a design vs. hacks" in viscosity as defined in physics).

In the end, I considered viscosity as a good choice for the difficulty of moving knowledge around in the artifact world.

Step 8
Guess what. Viscosity is basically friction (in fluids). Ok, I got it :-).
Just like run-time friction kicks in when you try to move run-time knowledge around, artifact-side friction kicks in when you try to move artifact-side knowledge around. Code is one way of representing artifact-side knowledge. Diagrams are another way. They both manifest some resistance to change.
Once you get the concept (almost) right, it's time to clean things up, come up with a more precise definition, see if it's useful and what you can learn from it.

Defining viscosity
Why is the artifact in step 5 viscous, that is, what makes it difficult to move knowledge around? The main issue is that we can't find sub-centers, because every line has local interactions (with nearby knowledge) and non-local interaction (with distant knowledge). So although every single line is using just a few symbols, you can't simply find a subset of lines that is relatively isolated from the rest. Every line is also very simple on its own, and unworthy to be moved outside alone.

So, I could define a viscous artifact A like this:

A is viscous when given any subset B of A, the mass of knowledge inside B is not significantly higher than the amount of knowledge exchanged between B and A-B

I crafted this definition rather carefully :-). In fact, the problem with the function above is not just that every line depends on nearby and distant knowledge. It's also that it's doing very little. Otherwise, we could move a significant portion of code (doing "lot" of stuff) outside the function, of course by passing parameters. But then, the mass of knowledge inside the subset B would be higher than the mass of knowledge exchanged (the parameters). That is not the case in function f.

Viscous artifacts resist shear/tensile stress, that is, they resist extraction of knowledge. As we try to move the knowledge in B outside A, the knowledge exchanged with A-B is opposing the movement. The effort we have to spend to reorganize viscous artifacts is the equivalent of friction energy in the run-time world. Only, this time, it's human work, not CPU work.

Note: I could come up with some kind of formula for a viscosity coefficient. I did not because I feel I'm not yet at that stage. Still, the minimum ratio between exchanged and internal knowledge (quantified over the subsets B, not over A) seems like a good candidate.

It is important to understand that viscosity is an internal property of an artifact. It has nothing to do with the artifact interface. It's about the artifact internals. Of course, given the hierarchy of artifacts, an artifact may have low [internal] viscosity, yet be part of a viscous higher-level artifact. For instance, a low-viscosity function can be part of an high-viscosity class. In that case, it would be easy to move some portions of code outside the function, but not outside the class.

Consequences
Once again, we have to resist the temptation to define some property as "bad". In the physical world, viscosity is not "bad". It can be useful, or it can be a problem: it's a matter of context.

Besides, when you look at the definition above, you may see some relationship with a vague notion of cohesion: a viscous artifact is "more cohesive". Cohesion is usually considered a good property. What's wrong here?

My current understanding is that it's fine to be viscous when the mass is small. Actually, it's good to be viscous when the mass is small. A small viscous chunk of knowledge is a good center: it stands on its own, and can't be easily broken. However, viscosity should decrease as mass increases. That gives an opportunity for extraction of knowledge, thereby forming new centers. Note that by saying so, I'm implicitly accepting the existence of high-mass artifacts. Still:

- high-mass, low-viscosity artifacts can be considered as rather innocent underengineering, a form of technical debt that can be easily repaid (see also the latest comments to Chapter 0). The high gravity of the artifact will bring in more stuff: that would be the ideal time to refactor the code, which will be easy, since viscosity is low.

- high-mass, high-viscosity artifacts are a serious design weakness that should be dealt with as soon as you notice, possibly while you're still creating the artifact. Gravity will bring in more stuff, and things can only get worse.

Note: a trivial refactoring of a viscous artifact will significantly increase run-time friction, as we'll have to pass a lot of parameters around. In most cases, we'll have to rethink the artifact and perhaps a significant portion of the surroundings, as we may have chosen the wrong centers.

Conclusions
It's sort of revealing that I started with a notion of density but ended up with a notion of viscosity, and had to reject an existing definition of viscosity in the process. Labeling software phenomena after physical phenomena is simple. Doing so in a meaningful way is not so trivial.

I think that keeping the duality run-time / artifact in mind is helping me a lot in this journey. Things are much easier once you can clearly see that you're dealing with two different worlds. A recent post by Rico Mariani, for instance, raises an interesting point, which is trivially explained within my frame of reasoning, but seems unnecessarily obscure when you simply talk about "coupling" and ignore the artifact/run-time duality.

As I progress in cleaning up some ideas on the decision space, I hope to bring in even more clarity. Which reminds me that I have something really important to say about design and decisions. It's short, and I'll let it preempt :-) the next chapter on tangling.

Sunday, September 12, 2010

Notes on Software Design, Chapter 10: Run-Time Friction

So, here is the story. I keep a lot of notes. Some are text files with relevant links and organized ideas. Most are rather embarrassing scribbles on just about any piece of paper that is lying around when I need it. From time to time, I move some notes from paper to files, discard concepts that didn't prove themselves, and rearrange paragraphs to fake some kind of logical, sequential reasoning over a process that was, in fact, rather chaotic. Not surprisingly (since software is just another way to encode knowledge), David Parnas suggested long ago that we could do the same while documenting software design (see "A rational design process: How and why to fake it").
Well, it's not always easy. Sometimes, I try to approach the storytelling from an angle, see that it doesn't work out so well, and look for another. Sometimes I succeed, sometimes I don't (although, of course, the reader is the ultimate judge). This time, I have to confess, I feel like I couldn't find the right angle, the right way to start, to unfold a concept in a way that makes it look simple and natural. So I'll trust you to be smart enough to make sense of what follows :-). It's a very long post, and you may want to digest it in more than one session.

The physical world
I guess you all had to push some furniture around at one time or another. You have probably felt a stronger resistance in the beginning, followed by a milder form of resistance as soon as you got some movement.
The mild resistance is due to kinetic friction, while the initial, stronger resistance is usually due to static friction, that you have to overcome before moving the object (if you're not familiar with kinetic and static friction, wikipediawill tell you more than you want to know :-).

As you move your stuff around, friction makes you waste some energy, in a way that is basically proportional to the normal force, the distance, and the coefficient of kinetic friction (see the page above for the actual equation). I'll get back to this later, but if you move a constant mass on a flat surface, the energy you waste is proportional to the mass you move, the distance you go, and the magic coefficient of friction.

The beauty of all this is that it's simple and rather unambiguous. Friction is always present in mechanical engineering, but it's a well understood concept (as far as engineering is concerned; it's still blurry at the quantum level, at least for the uninitiated like myself), and there is usually no wishy-washy talking about friction. It's not a broad concept, that is, you won't be able to design the next-generation jet engine if all you have in your conceptual toolbox is friction, yet you won't be able to design an engine at all without an appreciation of friction.

The software world
I'm first and foremost a software design practitioner: I design software, almost every day. Sometimes by myself, most often with other people; therefore, I do a lot of "design talk". In many cases, at one point or another, someone is going to bring in "performance" or "efficiency" to support (or reject) a design decision.
We use those words a lot, with different meaning depending on who's saying it and why he's saying it. It seems like I'm never tired of linking wikipedia, so here is a page on computer performance. Just look at the initial list of different, context-dependent meanings. It's not surprising, then, to find out some people have very peculiar views of performance. "I use arrays because they're more efficient". Sure, except that then you do a linear search because you need multiple indexes; say "efficient" again :-)?

One might expect Computer Science (with capital letters :-) to come to the rescue and define terms more precisely, and hopefully with some relevance for practice. However, computer science is more concerned with computational complexity theory than with the nitty-gritty details of being "fast".
Now, don't get me wrong. You won't get too far as a programmer (and definitely not as a software designer) if you don't get the concept of complexity classes, if you can't see that an algorithm is O( n^2 ) and another is O( n log n ), or if you don't even know what the Big Oh notation is all about. You have to know this stuff, period. In a sense, complexity theory is part of the math of software, and there is little point in investigating a physics of software if you don't get the math first. But math alone won't cut it. However, once we get past the complexity class we get very little assistance from computer science (and I'm purposely ignoring the fact that just because an algorithm is in the O( n log n ) class in the average case doesn't mean I can't beat it with an O( n^2 ) algorithm in my practical cases).

On the "software engineering" side, the usual advice is to build the program and then use a profiler. Yeah, well, sure, beats banging your head against the wall :-), but it's not exactly like knowing what you're doing all along. Still, we make a lot of low-level design decisions while coding, and many of them will ultimately impact "performance". Lacking the basic terminology to think (and talk) about this kind of stuff is rather depressing, so why don't we try to move just a tiny step forward?

Wasting energy in software
So, here I have this piece of software (executable knowledge). For most practical applications, what I need is to get some data (interactively, from a DB, through some kind of device, whatever), transform it in a meaningful way (which could be a complex process encoded in thousands of lines), and spit out some results (which is still data, anyway). The transformation is the Function.

On the artifact side, our software may be using global variables all around, or be based on a nice polymorphic structure, yet the Function doesn't care. The structure we provide on the artifact side is the domain of Form (by now, you're probably familiar with all this stuff).

Now, transformation is a process, and no real-world process is 100% efficient; it's always going to waste something. Perhaps we should look better at that "waste" part. Something I learnt a long time ago, while pondering on principles and patterns, is that overly general concepts (like "performance") must give way to more specialized notions.

Some code, please :-)
Consider this short portion of C code. I'm using C because it's a low-level language, where the implications of any given choice are relatively easy to understand.

double max( double x, double y )
  {
  if( x > y )
    return x;
  else
    return y;
  }

double max3( double x, double y, double z )
  {
  double d = max( x, y );
  d = max( d, z );
  return d;
  }

The code is pretty obvious. In a common, stack-based CPU architecture, max3 will copy x and y on the stack and call max; then it will copy d and z and call max again. The return value might be stored in a CPU register or in RAM, depending on the compiler.

Copying those values is a waste of energy, of course. I could manually inline max inside max3 and get rid of that waste. I would sacrifice reusability and perhaps clarity for "higher performance" or "higher efficiency" or "reduced waste". Alternatively, the compiler could inline the function on my behalf (see Chapter 6 for the role of languages on balancing the two worlds).

What if I'm working on some large data structure? The Fortran guy down the corner will suggest that by keeping your structures in the common area / global memory, you won't even have to pass parameters around: every function knows exactly where to get input and where to store output! Again, we'll sacrifice reusability, and perhaps duplicate large portions of code, for sake of efficiency. As you go through most literature on High Performance Computing (see, for instance, The Ideal HPC Programming Language, recently reprinted in Communications of ACM), you'll see that the HPC community is constantly facing the problem of wasted cycles, and is wasting a lot of LOC to prevent that.

Moving data around is not the only way to waste energy. Consider this portion of real-world code, written by a (supposedly) performance-conscious "little meritocracy":

static int is_rfc2822_header(char *line)
{
int ch;
char *cp = line; 
if (!memcmp(line, "From ", 5) || !memcmp(line, ">From ", 6))
  return 1;
while ((ch = *cp++)) { 
  if (ch == ':')
    return cp != line;
  if ((33 <= ch && ch <= 57) ||
     (59 <= ch && ch <= 126))
    continue;
  break;
  }
return 0;
}

yeah, it's ugly as hell, but it's also wasteful (which is funny, for reasons that are too long to explain here). If you read it carefully, you'll find a way to optimize the "while" body quite a bit, and while you're at it, you can easily make it more readable. Also, the two memcmp in the beginning are wasting cycles (going through the first 5 characters twice), but just like the coefficient of friction must be measured in practice, at this level any alternative should really be measured on a real-world CPU.

Note: as we optimize code, we have to assume that is correct. We don't change Form unless we know that the Function is right. Before posting this, I checked for any update to the codebase (I got that code a few years ago, and it never looked right to me). The code is still the same, but now there is also a comment explaining what the function is intended to do. Unfortunately, it's not what it's doing, which is even more ironic, for the same unspoken reasons above. Anyway, we could easily fix the bug and still optimize the code.

Run-time Friction
Just like in the physical world we move object around, in the run-time world of software we move knowledge around. More exactly, we move data around (we call it data flow) and we move the execution point around (we call it control flow). We move that stuff around to calculate some Function. In the process of calculating the Function, we usually waste some cycles. We waste cycles because we have to copy data on the stack, or from one data structure to another. We waste cycles because we do unnecessary comparison, computations, jumps. We waste cycles because we process the same data more than once. Most often, we waste those cycles because we get something in exchange in the Form (artifact) domain. Sometimes, we waste cycles just because of bad coding.

The energy waste is not a constant: copying an integer is different from copying an array of integers (that's weight, of course). Also, if your array has been swapped out to the paging file, the copy is going to cost you more: that's the contribution of distance, and I'll get back to this later. Right now, remember that wasted energy is a consequence of friction, but is not friction.

Causes and types of software friction
We have already seen a few cases of software friction: when you copy data, you waste cycles. Max3 didn't strictly need to copy data: the Function didn't care about reusing max, only Form did. Before we try do define friction more precisely, it's interesting to see how deep the analogy with real-world friction really is. Indeed, we even have static software friction, and kinetic software friction!

Consider a Java (or .NET) virtual machine. When you hit a function for the first time, the code is compiled just in time. This has nothing to do with Function. It is a byproduct of a technological choice. It will cost you some cycles: that's friction. Also, it happens only once, to "put things in motion": that's static friction. In general, static friction will increase latency, while kinetic friction will reduce throughput. Good: we just sorted out the two main components of "performance".

Consider a web service. Before you can call the server, you go through a relatively lengthy process, from high-level stuff (marshaling your data) to low level stuff (establishing a network connection). This is all friction: the Function is happening on the other side, inside the service code. Here we see both static and kinetic friction at play: establishing a connection adds latency, exchanging data over the network reduces throughput.

Consider stored procedures. The ideal stored procedure takes little data in input, does significant CRUD inside, and returns little. This way, we have minimal waste due to kinetic friction, as we exchange little data with the database. Of course, this is not the only way to minimize energy waste: another approach would be to reduce distance, by bringing the database itself in-process. Interestingly, most real-time databases use the second approach.

So, what is causing friction in software? Friction is caused by:

A copy of data from one place to another (e.g. parameter passing, temporary variables, etc), as this adds no meaning to data, and therefore is useless as far as Function is concerned.

Syntactical transformation of data (e.g. marshaling) which adds no semantics (as above: this processing is not part of the Function). This includes any form of data transformation needed to talk over a non-native protocol.

Unnecessary statements (like those that could be removed in the C function above).

Redundant access / processing (some will be removed by the compiler, but some won't)

Bookkeeping (allocation, deallocation, reference counting, heap defragmentation, garbage collection, paging, etc). All this adds no semantics, and it's irrelevant for the Function: indeed, a well-written garbage collected program should behave properly under the so-called null garbage collector.

Unnecessary indirection. This is a long story and I'll leave for another time, as I've yet to talk about indirection in the physics of software.

In general, everything that is not strictly necessary to calculate the Function, but has been added because of Form, or because of the programmer's inability to streamline the code to the mere Function, is a source of friction and will waste run-time energy.

Defining friction
At this stage in my understanding of the physics of software, it's still hard to come up with numbers, coefficients, sometimes even formulas. Actually, I'm usually happy when I get some concept right. Still, let's look at a simplified formula for the energy wasted through friction (in the real world):

Normal Force * Coefficient of Friction * Distance.

That would hold pretty well in the software world as well, both at the qualitative (easier) and probably quantitative (not there yet) level. At the qualitative level, it tells us what we can control and perhaps leverage. I'll explore this in the next paragraph. At the quantitative level, it could help to evaluate low-level choices. First, however, we have to define Normal Force, Coefficient of Friction, and Distance.

I've defined distance in the run-time world in Chapter 9. Unfortunately, it's an ordinal scale, so we can't do math with distance. This sort of rules out any chance to have a quantitative definition of friction, but we can also look at it from the other side: a better understanding of friction energy (like: wasted cycles) could shed light on the right measurement scale for distance!

Assuming a flat world (I have no reason to think otherwise) the Normal Force is just weight. Weight could be easily defined as the number of bytes involved. For instance, the cost of a copy is linear with the number of bytes you copy.

The coefficient of friction is a dimensionless parameter. Interestingly, if we decide to measure energy in cycles (which makes some sense, although we usually think of cycles as time, not energy) that would imply that unit of measurement for Distance is cycles/byte. I'll have to think more about this.

Although the coefficient of friction, in the real world, cannot be predicted but only measured, we have some intuitive grasp of it being related to the materials. As the aforementioned wikipedia page explains, it's a relatively complex "system property", depending on many factors. The same applies in the software world. The cost to move a bunch of bytes from one position to another dependes on a bunch of factors. If we want to raise the abstraction level and think in terms of objects, and not bytes, things become more complex. The exact copy semantics (reference, shallow, deep) kicks in. That's fine: a software material with shallow copy semantics would have a different coefficient of friction than one with reference copy semantics.

Overall, I think we have little control over the coefficient of friction (I might be wrong), so for any practical purpose, distance and weight are the most interesting parameters.

Is it useful, anyway?
A good theory, and a good concept, must have a good explanatory power, that is, we should be able to use them to explain known phenomena, explain why something works, rationalize widespread practice or beliefs, etc.

As I've already discussed, the evolution of programming languages can be largely seen as an attempt to balance the world of artifact / form with the run-time / function world. In this sense, we can look for instance at the perfect forwarding problem, solved by right value references in the next C++ standard, as a further attempt to remove some energy waste, by avoiding unnecessary copy of data. C++ provides many ways to control friction energy, mostly in the area of generic programming and also template metaprogramming. The Curiously Recurring Template Pattern, for instance, provides a form of static polymorphism exactly to avoid some friction due to unnecessary indirection (virtual dispatch).

More generally, the simple equation for energy waste provides a clue on what we can actually control: weight, distance, coefficient of friction. This is it. As we shape software, this is what we can actually change if we want to reduce friction energy.

Consider HTTP compression: distance couldn't be changed, so we had to change weight.

Also, understanding the difference between static and kinetic friction explains a lot of existing practices. Think of the Nagle's algorithm. It works by increasing static friction (therefore latency) in exchange for lower kinetic friction (therefore throughput). Once you get your concepts right, so many things unfold so easily :-).

Finally, the analogy holds to the extremes: just like excessive friction in mechanical systems can lead to jam, excessive friction due to paging can jam a software system. This is commonly known as Trashing.

I think a caveat is in order: friction in the physical world is not necessarily evil. Wasn't it for friction, we couldn't even walk. Mechanical devices have to deal with friction all the time, but they also exploit friction all the time. It's harder to exploit friction in software (although the Nagle's algorithm does). Most often, we must see friction as a trade/off with other properties, mostly in the artifact side. Still, an understanding of the different types of friction, and of the constituents of friction energy, can help evaluate alternatives and even generate new, better ideas in a more systematic and (dare I say it :-) scientific way.

A different angle
I choose friction as a physical analogy because it's a simple, familiar concept. Intuition and everyday experience can easily compensate any lack of engineering knowledge. Still, I've been tempted to use different analogies, like hydraulic or electrical analogies. Indeed, there are several analogies between electrical, mechanical, hydraulic and even acoustic and optical systems (see here for a start), so it's always possible to choose a different reference system.

Anyway, my alternative would have been to model everything after resistance and current. Current would be the equivalent of throughput, or "performance", and resistance would cause thermal dissipation. In the end, I didn't go this way for a number of reasons; for instance, one-shot stuff like JIT would require something like a thermistor (think of a PTC in CRT degaussing), but I would lose a few readers that way :-).

Still, if you followed so far, there is an interesting result I'd like to share. Consider a trivial circuit where we apply 1V to a 1 ohm resistor, resulting in 1A current. Now, I'll replace the resistor with a series of 2, with resistance (1-P) and P ohms. Nothing changes, same current. Resistors represent processes.

Now say that we have this concept of parallel execution, so the process carried out by P can be parallelized. By way of the analogy, to increase throughput (current) I can simply add up to N resistors in parallel. Now the circulating current is obviously 1 / (1-P + P/N) A. Guess what, I just rediscovered Amdahl's Law using Ohm's Law. That's cute :-).

Ok guys, next time I'll have a much shorter post on the artifact-side notion of friction. If we survive that, we'll be ready for tangling.

Carlo Pescio

Pages