Carlo Pescio: Notes on Software Design, Chapter 14: the Enumeration Law

In my previous post on this series, I used the well-known Shape problem to explain polymorphism from the entanglement perspective. We've seen that moving from a C-like approach (with a ShapeType enum and a union of structures) to an OO approach (with a Shape interface implemented by concrete shape classes) creates new centers, altering the entanglement between previous centers.
This time, I'll explore the landscape of entanglement a little more, using simple, well-known problems. My aim is to provide an intuitive grasp of the entanglement concept before moving to a more formal definition. In the end, I'll come up with a simple Software Law (one of many). This chapter borrows extensively from the previous one, so if you didn't read chapter 13, it's probably better to do so before reading further.

Shapes, again
Forget about the Shape interface for a while. We now have two concrete classes: Triangle and Circle. They represent geometric shapes, this time without graphical responsibilities. A triangle is defined by 3 points, a Circle by center and radius. Given a Triangle and a Circle, we want to know the area of their intersection.
As usual, we have two problems to solve: the actual mathematical problem and the software design problem. If you're a mathematician, you may rightly think that solving the first (the function!) is important, and the latter is sort of secondary, if not irrelevant. As a software designer, however, I'll allow myself the luxury of ignoring the gory mathematical details, and tinker with form instead.

If you're working in a language (like C++) where functions exist [also] outside classes, a very reasonable form would be this:

double Intersection( const Triangle& t, const Circle& c )
  {
  // something here
  }

Note that I'm not trying to solve a more general problem. First, I said "forget the Shape interface"; second, I may not know how to solve the general problem of shapes intersection. I just want to deal with a circle and a triangle, so this signature is simple and effective.

If you're working in a class-only language, you have several choices.

1) Make that an instance method, possibly changing an existing class.

2) Make that a static method, possibly changing an existing class.

3) If your language has open classes, or extension methods, etc, you can add that method to an existing class without changing it (as an artifact).

So, let's sort this out first. I hope you can feel the ugliness of placing that function inside Triangle or Circle, either as an instance or static method (that feeling could be made more formal by appealing to coupling, to the open/closed principle, or to entanglement itself).
So, let's say that we introduce a new class, and go for a static method. At that point is kinda hard to find nice names for class and method (if you're thinking about ShapeHelper or TriangleHelper, you've been living far too long in ClassNameWasteLand, sorry :-). Having asked this question a few times in real life (while teaching basic OO thinking), I'll show you a relatively decent proposal:

class Intersection
  {
  public static double Area( Triangle t, Circle c )
    {
    // something here
    }
  }

It's quite readable on the calling site:

double a = Intersection.Area( t, c ) ;

Also, the class name is ok: a proper noun, not something ending with -er / -or that would make Peter Coad (and me :-) shiver. An arguably better alternative would be to pass parameters in a constructor and have a parameterless Area() function, but let's keep it like this for now.

Whatever you do, you end up with a method/function body that is deeply coupled with both Triangle and Circle. You need to get all their data to calculate the area of the intersection. Still, this doesn't feel wrong, does it? That's the method purpose. It's ok to manipulate center, radius, and vertexes. Weird, coupling is supposed to be bad, but it doesn't feel bad here.

Interlude
Let me change problem setting for a short while, before I come back to our beloved shapes. You're now working on yet another payroll system. You got (guess what :-) a Person class:

class Person
  {
  Place address;
  String firstName;
  String lastName;
  TelephoneNumber homePhone;
  // …
  }

(yeah, I know, some would have written "Address address" :-)

You also have a Validate() member function (I'll simplify the problem to returning a bool, not an error message or set of error messages).

bool Validate()
  {
  return 
    address.IsValid() && 
    ValidateName( firstName ) && 
    ValidateName( lastName ) && 
    homePhone.IsValid() ;
  }

Yikes! That code sucks! It sucks because it's asymmetric (due to the two String-typed members) but also because it's fragile. If I get in there and add "Email personalEmail ;" as a field, I'll have to update Validate as well (and more member functions too, and possibly a few callers). Hmmm. That's because I'm using my data members. Well, I'm using Triangle and Circle data member as well in the function above, which is supposed to be worse; here I'm using my own data members. Why was that right, while Validate "feels" wrong? Don't use hand-waving explanations :-).

Back to Shapes
A very reasonable request would be to calculate the intersection of two triangles or two circles as well, so why don't we add those functions too:

class Intersection
  {
  public static double Area( Triangle t, Circle c )
    {     // something here    }
  public static double Area( Triangle t1, Triangle t2 )
    {     // something here    }
  public static double Area( Circle c1, Circle c2 )
    {     // something here    }
  }

We need 3 distinct algorithms anyway, so anything more abstract than that may look suspicious. It makes sense to keep the three functions in the same class: the first function is already coupled with Triangle and Circle. Adding the other two doesn't make it any worse from the coupling perspective. Yet, this is wrong. It's a step toward the abyss. Add a Rectangle class, and you'll have to add 4 member functions to Intersection.

Interestingly, we came to this point without introducing a single switch/case, actually without even a single "if" in our code. Even more interestingly, our former Intersection.Area was structurally similar to Person.Validate, yet relatively good. Our latter Intersection is not structurally similar to Person, yet it's facing a similar problem of instability.

Let's stop and think :-)
A few possible reactions to the above:

- I'm too demanding. You can do that and nothing terrible will happen.
Probably true, a small scale software mess rarely kills. The same kind of problem, however, is often at the core of a large-scale mess.

- I'm throwing you a curve.
True, sort of. The intersection problem is a well-known double-dispatch issue that is not easily solved in most languages, but that's only part of the problem.

- It's just a matter of Open/Closed Principle.
Yeah, well, again, sort of. The O/C is more concerned with extension. In practice, you won't create a new Person class to add email. You'll change the existing one.

- I know a pattern / solution for that!
Me too :-). Sure, some acyclic variant of Visitor (or perhaps more sophisticated solutions) may help with shape intersection. A reflection+attribute-based approach to validation may help with Person. As usual, we should look at the moon, not at the finger. It's not the problem per se; it's not about finding an ad-hoc solution. It's more about understanding the common cause of a large set of problems.

- Ok, I want to know what is really going on here :-)
Great! Keep reading :-)

C/D-U-U Entanglement
Everything that felt wrong here, including the problem from Chapter 13, that is:

- the ShapeType-based Draw function with a switch/case inside

- Person.Validate

- the latter Intersection class, dealing with more than one case

was indeed suffering from the same problem, namely an entanglement chain: C/D-U-U. In plain English: Creation or Deletion of something, leading to an Update of something else, leading to an Update of something else. I think that understanding the full chain, and not simply the U-U part, makes it easier to recognize the problem. Interestingly, the former Intersection class, with just one responsibility, was immune from C/D-U-U entanglement.

The chain is easier to follow on the Draw function in Chapter 13, so I'll start from there. ShapeType makes it easy to understand (and talk about) the problem, because it's both Nominative and Extensional. Every concept is named: individual shape types and the set of those shape types (ShapeType itself). The set is also providing through its full extension, that is, by enumerating its values (that's actually why it's called an enumerated type). Also, every concrete shape is given a name (a struct where concrete shape data are stored).

So, for the old C-like solutions, entanglement works (unsurprisingly) like this:

- There is a C/D-C/D entanglement between members of ShapeType and a corresponding struct. Creation or deletion of a new ShapeType member requires a corresponding action on a struct.

- There is a C/D-U entanglement between members of ShapeType and ShapeType (this is obvious: any enumeration is C/D-U entangled with its constituents).

- There is U-U entanglement between ShapeType and Draw, or every other function that is enumerating over ShapeType (a switch/case being just a particular case of enumerating over names).

Consider now the first Intersection.Area. That function is enumerating over two sets: the set of Circle data, and the set of Triangle data. There is no switch/case involved: we're just using those data, one after another. That amounts to enumeration of names. The function is C/D-U-U entangled with any circle and triangle data. We don't consider that to be a problem because we assume (quite reasonably) that those sets won't change. If I change my mind and represent a Circle using 3 points (I could) I'll have to change Area. It's just that I do not consider that change any likely.

Consider now Person.Validate. Validate is enumerating over the set of Person data, using their names. It's not using a switch/case. It doesn't have to. Just using a name after another amounts to enumeration. Also, enumeration is not broken by a direct function call: moving portions of Validate into sub-functions wouldn't make any difference, which is why layering is ineffective here.
Here, Validate is U-U entangled with the (unnamed) set of Person data, or C/D-U-U entangled with any of its present or future data members. Unlike Circle or Triangle, we have all reasons to suspect the Person data to be unstable, that is, for C and/or D to occur. Therefore, Validate is unstable.

Finally, consider the latter Intersection class. While the individual Area functions are ok, the Intersection class is not. Once we bring in more than one intersection algorithm, we entangle the class with the set of shape types. Now, while in the trivial C-like design we had a visible, named entity representing the set of shapes (ShapeTypes), in the intersection problem I didn't provide such an artifact (on purpose). The fact that we're not naming it, however, does not make it any less real. It's more like we're dealing with an Intensional definition of the set itself (this would be a long story; I wrote an entire post about it, then decided to scrap it). Intersection is C/D-U-U entangled with individual shape types, and that makes it unstable.

A Software Law
One of the slides in my Physics of Software reads "software has a nature" (yeah, I need to update those slides with tons of new material). In natural sciences, we have the concept of Physical law, which despite the name doesn't have to be about physics :-). In physics, we do have quite a few interesting laws, not necessarily expressed through complex formulas. The Laws of thermodynamics , for instance, can be stated in plain English.

When it comes to software, you can find several "laws of software" around. However, most of them are about process (like Conway's law, Brooks' law, etc), and just a few (like Amdhal's law) are really about software. Some are just principles (like the Law of Demeter) and not true laws. Here, however, we have an opportunity to formulate a meaningful law (one of many yet to be discovered and/or documented):

- The Enumeration Law
every artifact (a function, a class, whatever) that is enumerating over the members of set by name is U-U entangled with that set, or said otherwise, C/D-U-U entangled with any member of the set.

This is not a design principle. It is not something you must aim for. It's a law. It's always true. If you don't like the consequences, you have to change the shape of your software so that the law no longer applies, or that entanglement is harmless because the set is stable.

This law is not earth-shattering. Well, the laws of thermodynamics don't look earth-shattering either, but that doesn't make them any less important :-). Honestly, I don't know about you, but I wish someone taught me software design this way. This is basically what is keeping me going :-).

Consequences
In Chapter 12, I warned against the natural temptation to classify entanglement in a good/bad scale, as it has been done for coupling and even for connascence. For instance, "connascence of name" is considered the best, or least dangerous, form. Yet we have just seen that entanglement with a set of names is at the root of many problems (or not, depending on the stability of that name set).

Forces are not good or evil. They're just there. We must be able to recognize them, understand how they're influencing our material, and deal with them accordingly. For instance, there are well-known approaches to mitigate dependency on names: indirection (through function pointers, polymorphism, etc) may help in some cases; double indirection (like in double-dispatch) is required in other cases; reflection would help in others (like Validate). All those techniques share a common effect, that we'll discuss in a future post.

Conclusions
Speaking of future: recently, I've been exploring different ways to represent entanglement, as I wasn't satisfied with my forcefield diagram. I'll probably introduce an "entanglement diagram" in my next post.

A final note: visit counters, sharing counters and general feedback tell me that this stuff is not really popular. My previous rant on design literature had way more visitors than any of my posts on the Physics of Software, and was far easier to write :-). If you're not one of the usual suspects (the handful of good guys who are already sharing this stuff around), consider spreading the word a little. You don't have to be a hero and tell a thousand people. Telling the guy in front of you is good enough :-).

11 comments:

Unknown said...: Maybe to make NOSD series more interesting to many people you could actully give some little advice on how to solve some problems presented (a solution is always a good attractive method ;-). That's not the goal of the posts, but, hey, it's marketing! :); 1:47 PM
Romano Scuri said...: It may not be the case that the simple concepts are easier to follow than the more complicated and unusual?

I repeat myself. I think you overestimate the average reader.; 9:03 AM
Carlo Pescio said...: Romano: maybe, but in a sense, I don't want to think so. That would condemn everyone to everlasting mediocrity, whereby we only write simple stuff, and read simple stuff, because we only like and share simple stuff. I prefer to believe that the problem lies in the subject, the writing style, the timing, whatever.
That said, it's more and more obvious that I will have to cut most of the material, present a few more concepts in perhaps 3 posts, and move on to something else.

Fulvio: you mean "a reflection+attribute-based approach to validation" wasn't enough of an advice? :-)))
In the end, it would be much simpler to change the approach altogether, and write "idea papers" where I provide hints to the physics of software. That means I won't have time to develop the theory any further. Oh well, can't have it all...; 8:41 AM
Unknown said...: eheh hit and sunk! Ovbiously I read that advice, but the comment were about the series, not this specific post. Basically learning about your theory is a path to learn the way to think how to solve problems ;-) But it's not precooked food, so it could be less interesting to many people! :); 9:19 AM
Unknown said...: eheh I was referring to the series, not the chapter 14 per se ;) Eventually your path it's a good way to reason about problem and figure out good solutions, so I really appreciate your approch (I'm between the smart readers, I hope :-)); 7:58 PM
cyrille said...: I'm glad to see the concepts of extensional/intensional mentioned in this post, they are really relevant for design to hedge against change.

Also your Enumeration Law makes perfect sense, though I find the sentence "depending on the stability of that name set ... is at the root of many problems" the easiest way to share it with colleagues. Now I have clearer arguments on why I love so much the Java enums, and why I much prefer using predicates rather than fixed lists to select stuff out of sets that can grow or change.

On the form, I believe these ideas are just too fresh right now; once you're done with an overall first pass, they would deserve complete re-writing, together with multiple reviewers, on how to express the meat of the ideas best. But it is normal to start with a focus on the content, which is too new to be popular yet.; 2:05 PM
Carlo Pescio said...: Cyrille, I certainly agree: once I manage to cover most concepts informally (if ever :-), a complete rewriting will be needed. Hope you'll volounteer as a reviewer at that point :-); 4:08 PM
thvv said...: I like this article. I am sorry you aren't getting many hits.. it takes time to read, and people are impatient.

Intersection does not feel like an object to me. Perhaps Circle and Triangle are Figures, and Figure has a factory Figure.intersection(a, b)
that creates a new Figure, whose area and position we can obtain from it. Then I don't have to remember the order of arguments.

There has to be a "null" Figure in case the objects don't overlap.. it has zero area, fine, but what is its position? and are all null Figures equal? urk.; 8:03 PM
Jonah said...: Hi Carlo,

I just want to say your approach and ideas are profound and original imo. They are crystallizing many vague concepts that I've learned from other "gurus". I plan to read all of your posts now.

Please don't stop your work on the Physics of Software -- I would truly love to see a book come to fruition.; 5:49 PM
Carlo Pescio said...: Jonah, thanks a lot. Every once in a while, I get some positive feedback on this work, even though the global picture is that most people aren’t really fond of it.

I’m still pursuing this line of investigation, and I hope too that, at some point, ideas will coalesce into something coherent enough to deserve a book :-); 6:31 PM
Unknown said...: This comment has been removed by a blog administrator.; 8:24 PM

Carlo Pescio

Pages

Sunday, February 27, 2011

Notes on Software Design, Chapter 14: the Enumeration Law

11 comments:

some of my websites / projects