Carlo Pescio: July 2007

Tuesday, July 17, 2007

Get the ball rolling, part 3 (of 4, pretty sure)

In the last few months I've been repeating a mantra (also in this blog): design is always highly contextual.
So, before taking a look at what I did, it's important to consider why I did it. It's also important to conside how I did it, as to put everything in perspective.

Why
- I use models quite often, but I'm not a visual modeling zealot. I know, from experience, that visual modeling can be extremely valuable in some cases, and totally useless in others (more on this next time). I can usually tell when visual modeling can be useful, using a mixture of experience, intuition, and probably some systematic rules I never took the time to write down.
When I first read the XP episode, I've been negatively impressed by the final results, and intuitively felt that I could get something better by not rushing into coding. So a first reason was to prove that point (to myself first :-).

- I aimed to remove all the bad smells I've identified in their code, without introducing others. I also wanted to prove a point: that a careful designed solution didn't have to be a code monster, as implied elsewhere. I wanted the final result to have basically the same number of rows as they had. Maybe less. Therefore, I wrote the code in a line-conscious way.

- I wanted the comparison to be fair. Therefore, I did not aim for the super-duper, ultimate bowling framework. I just aimed for a better design and better code, through a different process. I'll talk a little about what it would take to turn my design into a true "bowling framework" in my next post.

- Actually, I found the idea of not writing the ultimate bowling framework extremely useful. There has been a lot of babbling about Real Options in software development in the last few years, and I even mention a few concepts in my project management course.
However, I lacked a simple example where I could explain a few concepts better. By (consciously :-) under-engineering some parts of the design, I got just the example I needed (more on this next time).

- I also wanted to check how many bugs I could possibly put in my code without using TDD. I would just write the code, run all their tests, and see how many bugs I had to fix before I got a clean run.

How
I must confess I procrastinated this stuff for quite a while. In some way, it didn't feel totally right to criticize other people's work like I had to do. But after all, by writing this I'll open my work to critics as well. So one evening, at about 10 PM (yeap :-), I printed out the wikipedia page on bowling, fired up a CASE tool (just for drawing), and got busy.

Unfortunately, I can't offer you a realistic transcript of what I thought all along. When you really use visual modeling, you're engaged in a reflective conversation with your material (see my paper, "Listen to your tools and materials"; I should really get a pre-publishing version online). You're in the flow, and some reasoning that will take a while to explain here can take just a few seconds in real time.

Anyway, I can offer a few distinct reflections that took place as I was reading the wikipedia page and (simultaneously) designing my solution. Note that I entirely skipped domain modeling; the domain was quite simple, and the wikipedia page precise enough. Also, note that I felt more comfortable (coming from 0-knowledge of bowling) reading wikipedia, not reverse engineering requirements from the XP episode. That might be my own personal bias.

I immediately saw quite a few candidate classes: the game itself, the player, the pins, the lane, the frame, the ball. Bertrand Meyer said that classes "are here for picking", and in this case he was quite right.
I never considered the throw as a class though. Actually, "throw" can be seen as a noun or as a verb. In this context, it looked more like a verb: you throw the ball and hit pins. Hmmm, guess what, they didn't look at it this way in the XP episode; they sketched a diagram with a Throw class and rushed into coding.

Rejecting classes is not easy without requirements, even informal ones. For instance, if I have to design a real automatic scoring system, there will be a sensor in the lane, to detect the player crossing the foul line. There will also be the concept of foul ball, which was ignored in the XP episode. So (see above, "fair comparison") I decided that my "requirements" were to create a system with the same capabilities as they did. No connection to sensors, no foul ball. But, I wanted my design to be easily extended to include that stuff. So I left out the Lane class (which could also encapsulate any additional sensor, like one in the gutter). But I wanted a Ball class; more on this later.

Game is an obvious class: we need a starting point, and also a sort of container for the Frames (see below) and for Balls. The Game has a need to know when the current frame is completed (all allowed balls thrown, or strike), so it can move to the next frame. But while advancing to the next frame is clearly its own responsibility, knowledge of frame completion is not (see below).

The Frame looked like a very promising abstraction. If you look at the throwing and scoring rules, it's pretty obvious that in 10-pin bowling you have 9 "normal" frames and a 10th "non standard" frame, where up to 3 balls can be thrown, as there is the concept of bonus ball (hence the imprecise domain model in the XP episode, where they just wrote 0..3 throws in a frame). Also in 3-6-9 bowling you have "special" frames, and so on.
Does this distinction between standard and nonstandard frames ring the bell of inheritance? Good. Now if you want the frame to be polymorphic, you gotta put some behaviour in it. Of course you can do that by reasoning upon the diagram: you don't need to write test code to discover that hey, you dunno what method to put in a Frame class.
Anyway, they may not know, but I do :-). The Frame can then tell the Game if it is completed, shielding the Game from the difference between a standard or last frame (or "strike frames" in 3-6-9 bowling, for instance). To do so, the Frame must know about any Ball that has been thrown in that frame, so it may also have a Throw method. The Frame should also calculate its own score. Wait... that's something the Frame can't do on its own.

In fact, the two authors have been smart enough to pick a problem where trivial encapsulation won't do it. The Frame needs to know about balls that might have been thrown in other frames to calculate its score. Now, there are several ways to let it know about them, and I did consider a few, but for brevity, here is what I did (the simplest thing): the Score method in class Frame takes a list of Balls as a parameter; the Game keeps a list of Balls and passes it to Frame. The Frame just keeps track of the first Ball thrown in the frame itself. From that knowledge, it's trivial to calculate the score. The LastFrame class is just slightly different, as the completion criteria changes to account for bonus balls; I also have to redefine the Throw method, again to account for bonus balls.

So what is the real need for a Ball class? Well, right now, it keeps track of how many pins were hit, and its own index in the Balls list. However, it's also a cheap option for growth. More on this next time, where I'll talk about the difference between having even a simple, non-encapsulated class like Ball, and using primitive types like integers.

What
In the end I got something like this (except I had no Status: in the beginning, I put a couple of booleans for Strike and Spare).

At that point, I decided to move to coding. In the XP episode the selected language is Java, but I don't like the letter J much :-) so I used C#. Given the similarity of the languages, the difference is irrelevant anyway. I didn't even want to wait for Visual Studio to come up, so I just started my friendly Notepad and typed the code without much help from the editor :-). Classes and methods were just a few lines long anyway. At that point it was 11 PM, and I called it a day. (So all this stuff took about 1 hour, between BUFD :->> and coding).

Next morning I was traveling, so I started Visual Studio, imported the code, corrected quite a few typos to make it compile, then imported the test cases from the XP episode, converted them to the NUnit syntax, and hit the run button. Quite a few failed. I had a stupid bug in LastFrame.Throw. I fixed the bug. All tests succeeded. While playing with the code, I saw an opportunity to make it shorter by adding a class (an enum actually) that could model the frame state. I did so in the code, as the change was simple and neat. I also brought back the change in the diagram, which took me like 10 seconds (too much for the XP guys, I guess). I run the tests, and they failed, exactly like before :-).
I could blame the editor and the autocompletion, but in LastFrame.Throw I had something like

if( status != Status.Spare && status != Status.Spare )

instead of

if( status != Status.Strike && status != Status.Spare )

Ok, I'm pretty dumb :-), but anyway, I fixed the bug and got a green bar again.

That was it. Well actually it wasn't. In my initial design I had a MaxBalls virtual method in Frame. That allowed for some more polymorphism in Game. In the end I opted for a constant to make the code a few line shorter, since the agile guys seem so concerned about LOC (again, more on this next time). I'm not really so line-conscious in my everyday programming, so I would have usually left the virtual method there (no YAGNI here; or maybe I could restate is You ARE Gonna Need It :-)).

So, here is the code. Guess what, if you remove empty lines, it's a few line shorter than their code. If you remove also brace-only lines, it's a few lines longer. So we could call it even. Except it's just much better :-).

Ok, this is like the longest post ever. Next time I'll talk a little about the readability of the code, give some examples of how this design/code could be easily changed to (e.g.) 3-6-9 bowling, digress on structural Vs. procedural complexity, tinker a little with real options and under/over engineering, and overall draw some conclusions.

internal class Ball
  {
  public Ball( int pins, int index )
    {
    hitPins = pins;
    indexInGame = index;
    }

  public int hitPins;
  public int indexInGame;
  }

internal class Frame
  {
  public const int MaxBalls = 2;

  public virtual bool IsComplete()
    {
    return status != Status.Incomplete ;
    }

  public int Score( Ball[] thrownBalls )
    {
    int score = 0;
    if( IsComplete() )
      {
      score = firstBall.hitPins + thrownBalls[ firstBall.indexInGame + 1 ].hitPins;
      if( status == Status.Strike || status == Status.Spare )
        score += thrownBalls[ firstBall.indexInGame + 2 ].hitPins;
      }
    return score;
    }

  public virtual void Throw( Ball b )
    {
    const int totalPins = 10;
    if( firstBall == null )
      firstBall = b;
    if( b == firstBall && b.hitPins == totalPins )
      status = Status.Strike;
    else if( b != firstBall && firstBall.hitPins + b.hitPins == totalPins )
      status = Status.Spare;
    else if( b != firstBall )
      status = Status.Complete;
    }

  private Ball firstBall;
  protected Status status;
  protected enum Status { Incomplete, Spare, Strike, Complete } ;
  }

internal class LastFrame : Frame
  {
  public new const int MaxBalls = Frame.MaxBalls + 1;

  public override bool IsComplete()
    {
      return ( status == Status.Strike && bonusBalls == 2 ) ||
             ( status == Status.Spare && bonusBalls == 1 ) ||
             ( status == Status.Complete );
    }

  public override void Throw( Ball b )
    {
    if( status != Status.Strike && status != Status.Spare )
      base.Throw( b );
    else
      ++bonusBalls;
    }

  private int bonusBalls;
  }

public class Game
  {
  public Game()
    {
    const int MaxFrames = 10;
    frames = new Frame[ MaxFrames ];
    for( int i = 0; i < MaxFrames - 1; ++i )
      frames[ i ] = new Frame();
    frames[ MaxFrames - 1 ] = new LastFrame();
    int maxBalls = (MaxFrames-1)*Frame.MaxBalls + LastFrame.MaxBalls;
    thrownBalls = new Ball[ maxBalls ];
    }

  public void Throw( int pins )
    {
    if( frames[ currentFrame ].IsComplete() )
      ++currentFrame;
    Ball b = new Ball( pins, currentBall );
    frames[ currentFrame ].Throw( b );
    thrownBalls[ currentBall++ ] = b;
    }

  public int Score()
    {
    int score = 0;
    foreach( Frame f in frames )
      score += f.Score( thrownBalls );
    return score;
    }

  private Ball[] thrownBalls;
  private Frame[] frames;
  private int currentFrame;
  private int currentBall;
  }

Saturday, July 14, 2007

Get the ball rolling, part 2 (of 4, most likely)

I hope you had some time to read the XP episode paper and ponder a little on what they managed to get in the end. As I've anticipated, I do I have a few issues with the results. It's not a matter of correctness: they pass all the tests. It's a matter of quality.
The design is questionable, and even the code is questionable. No big deal: after all, it's a little more than a toy example. It gets slightly worse when you consider all this stuff in perspective (at the end of the XP game, code is all you get; I'll get back to this later). Right now, here are the main flaws I see:

- overabundance of numerical literals.
This is programming 101: literals are bad (although they keep your code a few line shorter). Go over the code. You'll see numbers like 10 just about everywhere. Unfortunately, in bowling you got 10 pins and also 10 frames. You always have to read carefully to understand which is which. Now go back to the wikipedia page on bowling. See the mention of 5-pin bowling, 9-pin bowling, etc? Just go ahead and change the code to use 9 pins and 10 frames(9-pin bowling ain't that simple, but anyway). Guess what, none of the test cases will help, and it's not a one-line change as it ought to be. There is even a 21 in that code, being 10 * 2 + 1. I'll let you figure what that 10, 2, and 1 are :-).

- programming against an implementation, not against the problem
Some portion of the code are indeed quite good:

public int scoreForFrame(int theFrame)
  {
    ball = 0;
    int score=0;
    for (int currentFrame = 0;
         currentFrame < theFrame;
         currentFrame++)
    {
      if (strike())
        score += 10 + nextTwoBalls();
      else if (spare())
        score += 10 + nextBall();
      else
        score += twoBallsInFrame();
    }
    return score;
  }

You can basically read it in plain english. Unfortunately, most of the rest is not of the same quality, for instance:

public void add(int pins)
  {
    itsScorer.addThrow(pins);
    adjustCurrentFrame(pins);
  }
  private void adjustCurrentFrame(int pins)
  {
    if (firstThrowInFrame == true)
    {
      if (adjustFrameForStrike(pins) == false)
        firstThrowInFrame = false;
    }
    else
    {
      firstThrowInFrame=true;
      advanceFrame();
    }
  }
...
  private void advanceFrame()
  {
    itsCurrentFrame = Math.min(10, itsCurrentFrame + 1);
  }

"Adjust" a frame? Couldn't we get it right on the first place :-)? Even a Math.min call?? C'mon. There isn't any resemblance with the problem domain. It's just programming against the implementation, until things seems to work. Now, I've seen a lot of code like this over the years; it's the typical code people get after years of tweaking, except here it didn't take years, just hours.
Now, tweaking code till it works is already bad per se, but it's much worse when you consider that in XP, at the end of the game, the code is the design, and the test cases are the requirements. Again, everything is fine when the domain is trivial, largely known, well documented elsewhere (like in this case). But if we tackle a difficult new problem in an uncharted domain, and all we leave behind is code like that, I pity the souls that will join the team.

- Feature Envy: Game over Frame
According to Fowler, a method (or worse, a class) is exhibiting Feature Envy
when it "seems more interested in a class other than the one it actually is in". A variation of Feature Envy is when the other class does not even exist, but it is continually talked about in a method (or another class). Take Game. Excluding empty lines. Game is 45 lines long; if you exclude brace-only lines, it's 25 lines long. Of those, 16 contain the word "Frame". Is that suggesting you a missing abstraction?

- Feature Envy: Scorer over Ball
Just like above, Scorer is 57 lines long excluding empty lines (35 excluding also braces). Of those, 15 contain the word Ball (and 6, the word Frame). C'mon. We're talking about bad smells here, you agile guys are supposed to fix this stuff.

(note: the two previous issues may come from hindsight, as in my design, I do have a Frame and a Ball class).

- "Scorer"?
Good ol' Peter Coad once talked about the "er-er Principle": Challenge any class name that ends in "-er" (e.g. Manager or Controller). If it has no parts, change the name of the class to what each object is managing. If it has parts, put as much work in the parts that the parts know enough to do themselves.
That's OOP 101, because if you need a manager, it's because you got passive objects, and objects are about behaviour, not just data. You need a scorer because you haven't got the right classes.

Overall, this is not code you may want to maintain. Wanna give it a try? Ok, try to change to rules to, e.g., 3-6-9 bowling or low-ball bowling. Or make your own random rule, like "the 7th frame is only allowed one ball". See how easy it is. Hey, don't give me that YAGNI look :-), you are supposed to embrace change :-)).
It's worth repeating that in a real case, you'll be much worse off: you won't have a lenghty transcript of the session, telling you about how they come to put together that code. You'll have just the code; except that, in real-world projects, code will be 3 or 4 orders or magnitude bigger. You'll have to figure out everything else just from the code. If that's the average quality, well, good luck.

Of course, having only code to maintain means cheaper maintenance. Well, more exactly, it means that the mechanical act of maintenance (modifying stuff that needs to be changed) is cheaper. Of course, maintenance is not just about the mechanics: it is (mainly) about understanding the existing, understanding the new, understanding how to get from here to there.
Some redundancy (between wordy requirements, high-level design, and code) can help. Indeed, when I decided to design & implement my version of what above, I found it more convenient to read the wikipedia page, not to reverse engineer requirements from code. Of course, I also used the test cases they wrote to corroborate my understanding of the rules. Some redundancy can be helpful. Just ask your favorite aerospace engineer :-)).
This is why I always teach people to look for the bright and the dark side of everything, redundancy included (which of course, has well known shortcomings as well). That's why I don't like "extreme" approaches that pretend there is no downside in making "extreme" choices, only benefits.

So, on the bright side :-)), the code above is short (excluding blank lines and excluding test code, it totals 102 lines). Indeed, in a usenet post RCM reports that people who have "generated code" from the devilish overdesigned :-) UML diagram and added behavior got a whooping 400 lines.
Big deal. That was (in the authors' intention) a sketchy domain model, not a sensible design model. And code generation is usually a scam. That has nothing to do with diagrammatic reasoning (quite the opposite, I would say: it's diagrammatic nonreasoning: more on this next time).
Back to the bright side of the above, that code uses only 2 classes, which could be good if you don't like diagrams (which would help you understand the collaboration between multiple classes). Appalling enough, considering that the two authors together have something like 70 years programming experience, it looks like beginners' code. Which could be good in some environments. I'll get back to this another time.

Ok, enough bad cop for today. I'll show you my stuff in a few days (meanwhile, you can grind your teeth :-). I'll also describe my "process" and then draw some conclusions.

Wednesday, July 11, 2007

Get the ball rolling, part 1 (of 4, I guess)

A few weeks ago, I got a sad email from a reader. It was about some people around him bashing UML, and more generally, diagrammatic reasoning.
This is not big news: some people just don't like modeling (for various reasons), and in the last few years they've also found fertile ground in several "agile" precepts.
Indeed, his colleagues just sent him a link to Engineer Notebook: An Extreme Programming Episode, by Robert C. Martin and Robert S. Koss. Somewhere around the end, RCM says "We were bedevilled by the daemons of diagramatic overdesign. My God, three little boxes drawn on the back of a napkin, Game, Frame, and Throw, and it was still too complicated and just plain wrong." His colleagues sort of wanted this carved on his tombstone :-).

Now, I've been reading Robert Martin's works since the mid 90s. I've enjoyed several of his papers, and I've even published a couple of papers in C++ Report when he was the magazine editor. I must admit, however, that although I own the book from which that paper has been derived, I just skimmed through it. So, before saying anything, I did the right thing™ and read the paper carefully. Truth to be said, I found that paper laudable, because by providing a realistic example, Robert gives us an opportunity to tackle the same problem in a different way, and compare the results.

Of course, the problem is on a very small scale, so someone could complain that it's not the kind of setting where diagrammatic reasoning shines. That would be a lame excuse, so I won't say that.
I won't even question the fact that they both knew Bowling quite well, and that therefore the dynamics isn't really representative of what they can do in an unfamiliar domain (in fact, there is no user's involvement in the XP episode). Of course that's quite relevant, but what the hell, I can learn bowling rules in no time :-).
I won't even try to assess whether they were faithful to the XP principles or not; it seems to me that they did it just fine: they kept the emphasis on working code, did some refactoring, wrote tests first, applied pair programming... seems like they're doing it by the book.

Still, I do have issues with the quality of the results, and I do have issues with the conclusions. I'll save this stuff for my next posts. Meanwhile, I really urge you to read the paper. It may put you off a little if you don't know/play bowling (I don't :-), but it's a worthy exercise. Read the code, see if you like it. I'll see you in a few days :-).

Note: this is not going to be a theoretical, wishy-washy, hand-waving blurb. I can play theories, but in this case, it's much easier to play code. In fact, one year ago or so I posted a simple challenge for the TDD crowd. I basically got back theories, insults, and the like, but no one wrote any damn code (except in the end, when a good guy did). I'm an action guy :-))), so I'll just show you what I don't like, how I would model and implement the same thing, and compare the results. Actually, maybe you wanna give it a try too!

Carlo Pescio

Pages