Carlo Pescio: quality

Showing posts with label quality. Show all posts

Wednesday, January 19, 2011

Can we really control the quality/cost trade-off?

An old rule in software project management (also known as the Project triangle is that you can have a system that:
- has low development cost (Cheap)
- has short development time (Fast)
- has good quality (Good)
except you can only choose two :-). That seem to suggest that if you want to go fast, and don't want to spend more, you can only compromise on quality. Actually, you can also compromise on feature set, but in many real-world cases, that happens only after cost, time and quality have already been compromised.

So, say that you want to compromise on quality. Ok, I know, most engineers don't want to compromise on quality; it's almost like compromising on ethics. Still, we actually do, more often that we'd like to, so let's face the monster. We can:

- Compromise on external quality, or quality as perceived by the user.
Ideally, this would be simply defined as MTBF (Mean Time Between Failures), although for many users, the perceived "quality" is more akin to the "user experience". In practice, it's quite hard to predict MTBF and say "ok, we won't do error checking here because that will save us X development days, and only decrease MTBF by Y". Most companies don't even measure MTBF. Those who do, use it just as a post-facto quality goal, that is, they fix bugs until they reach the target MTBF. We just don't know enough, in the software engineering community, to actually plan the MTBF of our product.

- Compromise on internal quality, or quality as perceived by developers.
Say goodbye to clarity of names, separation of concerns, little or no duplication of logic, extendibility in the right places, low coupling, etc. In some cases, poor internal quality may resurface as poor external quality (say, there is no error handling in place, so the code just crashes on unexpected input), but in many cases, the code can be rather crappy but appear to work fine.

- Compromise on process, or quality as perceived by the Software Engineering Institute Police :-).
What usually happens is that people start bypassing any activity that may seem to slow them down, like committing changes into a version control system :-) or being thorough about testing. Again, some issues may resurface as external quality problems (we don't test, so the user gets the thrill of discovering our bugs), but users won't notice if you cut corners in many process areas (well, they won't notice short-term, at least).

Of course, a "no compromises" mindset is dangerous anyway, as it may lead to an excess of ancillary activities not directly related to providing value (to the project and to users). So, ideally, we would choose to work at the best point in the quality/time and/or quality/cost continuum (the "best point" would depend on a number of factors: more on this later). Assuming, however, that there is indeed a continuum. Experience taught me the opposite: we can only move in discrete steps, and those are few and rather far apart.

Quality and Cost
Suppose I'm buying a large quantity of forks, wholesale. Choices are almost limitless, but overall the real choice is between:

- the plastic kind (throwaway)

it works, sort of, but you can't use it for all kinds of food, you can't use it while cooking, and so on.

- the "cheap steel" kind.

it's surely more resistant than plastic: you can actually use it. Still, the tines are irregular and feel uncomfortable in your mouth. Weight isn't properly balanced. And I wouldn't bet on that steel actually being stainless.

- the good stainless steel kind.

Properly shaped, properly balanced, good tasting :-), truly stainless, easy to clean up. You may or may not have decorations like that: price doesn't change so much (more on this below).

- the gold-plated kind.

it shows that you got money, or bad taste, or both :-). Is not really better than the good stainless fork, and must be handled more carefully. It's a real example of "worse is better".

Of course, there are other kind of forks, like those with resin or wood handles (which ain't really cheaper than the equivalent steel kind, and much worse on maintenance), or the sturdy plastic kind for babies (which leverage parental anxiety and charge a premium). So, the quality/cost landscape does not get much richer than that. Which brings us to prices. I did a little research (not really exhaustive – it would take forever :-), but overall it seems like, normalizing on the price of the plastic kind, we have this price scale:

Plastic: 1
Cheap steel: 10
Stainless steel: 20
Gold-plated: 80

The range is huge (1-80). Still, we have a big jump from throwaway to cheap (10x), a relatively small jump from cheap to good (2x), and another significant jump between good and gold-plated (4x). I think software is not really different. Except that a 2x jump is usually considered huge :-).

Note that I'm focusing on building cost, which is the significant factor on market price for a commodity like forks. Once you consider operating costs, the stainless steel is the obvious winner.

The Quality/Cost landscape for Software
Again, I'll consider only building costs here, and leave maintenance and operation costs aside for a moment.

If you build throwaway software, you can get really cheap. You don't care about error handling (which is a huge time sink). You copy&paste code all over. You proceed by trial and error, and sometimes patch code without understanding what is broken, until it seems to work. You keep no track of what you're doing. You use function names like foo and bar. Etc. After all, you just want to run that code once, and throw it away (well, that's the theory, anyway :-).
We may think that there is no place at all for that kind of code, but actually, end-user programming often falls into that case. I've also seen system administration scripts, lot of scientific code, and quite a bit of legacy business code that would easily fall into this category. They were written as throwaways, or by someone who only had experience writing throwaways, and somehow they didn't get thrown away.

The next step is very far apart: if you want your code to be any good, you have to spend much more. It's like the plastic/"cheap steel" jump. Just put decent error handling in place, and you'll spend way more. Do a reasonable amount of systematic testing (automated or not) and you're gonna spend more. Aim for your code to be reasonably readable, and you'll have to think more, go back and refactor as you learn more, etc. At this stage, internal quality is still rather low; code is not really stainless. External quality is acceptable. It doesn't feel good, but it gets [some] work done. A lot of "industrial" code is actually at this level. Again, maintenance may not be cheap, but I'm only considering building costs here; a 5x to 10x jump compared to throwaway code is quite reasonable.

The next step is when you invest more in -ilities, as well as in external quality through more systematic and deeper testing. You're building something that you and your users like. It's truly stainless. It feels good. However, it's hard to be "just a little stainless". It's hard to be "somehow modular" or "sort of thread-safe". Either you're on one side, or you're on the other. If you want to be on the right side, you're gonna spend more. Note that just like with fork, long-term cost is lower for good, stainless code, but as far as building cost is concerned, a 2X increase compared to "cheap steel" code is not surprising.

Gold-plating is obvious: grandiose design that serves no purpose (and that would appear as gratuitous complexity to a good designer), coding standards that add no value (like requiring a comment on each and every parameter of every function), etc. I've seen systems spending a lot of energy trying to be as scalable as Amazon, even though they only had to serve a few hundred concurrent clients. Goldplated systems spend more on building, and may also incur higher maintenance costs afterward (be careful not to scratch the gold :-). Of course, it takes experience to recognize goldplating. To someone that has never designed a system for testability, interfaces for mocking may seem like a waste. To an experienced designer, they tell a different story.

But software is... soft!
As usual, a metaphor can be stretched only so far. Software quality can be easily altered over time. Plastic can't be transformed into gold (if you can, give me a call :-), but we can release a bug-ridden version and fix issues later. We may incur intentional technical debt and pay it back in the next release. This could be the right strategy in the right market at the right time, but that's not always the case.

Indeed, although start-ups, customer discovery, innovative products, etc seem to get all the attention lately, if you happen to work in a mature market you may want to focus on early-on quality. Some customers are known to be forgiving about quality issues, in exchange for early access to innovative products. It's simply not the general case.

Modular Quality?
Theoretically, we could invest more in some "critical" components and less in others. After all, we don't need the same quality (external and internal) in every single portion. We can accept quirks here and there. Due to the fractal nature of software, this can be extended to the entire hierarchy of artifacts. It's surely possible. However, it takes a lot of maturity to develop on the edge of quality. Having to constantly reason about the quality/cost trade off can easily cost you more than you're saving: I'd rather spend that reasoning to create quality code.

Once again, however, we may have a plan to start with plastic and grow into stainless steel over time. We must be aware that it is easier said than done. I've seen it many times over my long career. Improving software quality is just another case of moving our software in the decision space. We decided to make software out of plastic early on, and now we want to move it to another place, say the cheap steel spot. Some work is required. Work is proportional to the distance (which is large – see above), but also to the opposing force. Whatever form that force takes, a significant factor is always the mass to be moved, and here lies the key. It's hard to move a monolithic mass of code into another place (in the decision space). It's easier if you can move it a module at time.

Within a small modular unit, you can easily improve local quality. Change a switch/case into polymorphism. Add an interface to decouple two concrete classes. It's easy and it's cheap, because mass is small. Say, however, that you got your modular structure wrong: an important concept has not been identified, and is now scattered all around. You have to tweak a lot of code. While doing so, you'll find that you're not just fighting the mass you have to move: you have to fight the gravitational attraction of the surrounding code. That may reveal more scattered concepts. It's a damn lot of work, possibly more than it's worth. Software is not necessarily soft.

So, can we really move software around? Yes, in the small. Yes, if you got the modular structure right, and you're only cleaning up within that modular structure. Once again, the fractal nature of software can be our best friend or our worst enemy. Get the right partitioning between applications / services, and you only have to work inside. Get the right partitioning between components, and you only have to work inside. And so on. It's fine to have a plastic component, inasmuch as the interface is right. Get the modular structure wrong, and sooner or later you'll need to make wide-range changes, or accept a "cheap steel" quality. If we got it wrong at a small scale, it's no big deal. At a large scale, it can be a massacre.

Conclusions
Once again, modularity is the key. Not just "any" modularity, but a modularity aligned with the underlying forcefield. If you get that right, you can build plastic components and evolve them to stainless steel quality over time. Been there, done that. Get your modular structure wrong, and you'll lack locality of action. In other words: if you plan on cutting corners inside a module, you have to invest at the interface level. That's simpler if you design top-down than if you're in the "emergent design" camp, because the module will be underengineered inside, and we can't expect a clean interface to emerge from underengineered code.

Wednesday, March 19, 2008

(Simple) Metrics

I've been using metrics for a long time (certainly more than 10 years now). I've been using metrics to control project quality (including my own stuff, of course), to define acceptance criteria for outsourced code, to understand the way people work, to "smell" large projects before attempting a refactoring activity, to help making an informed refactor / rewrite decision, to pinpoint functions or classes in need of a careful review, to estimate residual bugs, an so on.

Of course, I use different metrics for different purposes. I also combine metrics to get the right picture. In fact, you can now find several tools to calculate (e.g.) code metrics. You can also find many papers discussing (often with contradictory results) the correlation between any given metric and (e.g.) bug density. In most cases, those papers are misguided, as they look for correlation between a single metric and the target (like bug density). Reality is not that simple; it can be simplified, but not to that point.

Consider good old cyclomatic complexity. You can use it as-is, and it can be useful to calculate the minimum reasonable number of test cases you need for a single function. It's also known that functions with higher cyclomatic complexity tend to have more bugs. But it's also well known that (on average) there is a strong, positive correlation between cyclomatic complexity (CC) and lines of code (LOC). That's really natural: long functions tend to have a complex control flow. Many people have therefore discounted CC, as you can just look at the highly correlated (and easier to calculate) LOC. Simple reasoning, except it's wrong :-).

The problem with that, again, is trying to use just one number to understand something that's too complex to be represented by a single number. A better way is to get both CC and LOC for any function (or method) and then use quadrants.

Here is a real-world example, albeit from a very small program: a smart client invoking a few web services and dealing with some large XML files on the client side. It has been written in C# using Visual Studio, therefore some methods are generated by the IDE. Also, the XML parser is generated from the corresponding XSD. Since I'm concerned with code which is under the programmer's control, I've excluded all the generated files, resulting in about 20 classes. For each method, I gathered the LOC and CC count (more on "how" later). I used Excel to get the following picture:

As you can see, every method is just a dot in the chart, and the chart has been split in 4 quadrants. I'll discuss the thresholds later, as it's more important to understand the meaning of each quadrant first.

The lower-left quadrant is home for low-LOC, low-CC methods. These are the best methods around: short and simple. Most code ought to be there (as it is in this case).

Moving clockwise, the next you get (top-left) is for high LOC, low CC methods. Although most coding standards tend to somehow restrict the maximum length of any given method, it's pretty obvious that a long method with a small CC is not that bad. It's "linear" code, likely doing some initialization / configuration. No big deal.

The next quadrant (top-right) is for high LOC, high CC methods. Although this might seem the worst quadrant, it is not. High LOC means an opportunity for simple refactoring (extract method, create class, stuff like that). The code would benefit from changes, but those changes may require relatively little effort (especially if you can use refactoring tools).

The lower-right quadrant is the worst: short functions with high CC (there are none in this case). These are the puzzling functions which can pack a lot of alternative paths into just a few lines. In most cases, it's better to leave them alone (if working) or rewrite them from scratch (if broken). When outsourcing, I usually ask that no code falls in this quadrant.

For the project at hand, 3 classes were in quadrant 3, so candidate for refactoring. I took a look, and guess what, it was pretty obvious that those methods where dealing with business concerns inside the GUI. There were clearly 3 domain classes crying to be born (1 shared by the three methods, 1 shared by 2, one used by the remaining). Doing so brought to better code, with little effort. This is a rather ordinary experience: quadrants pinpoint problematic code, then it's up to the programmer/designer to find the best way to fix it (or decide to leave it as it is).

A few words on the thresholds: 10 is a rather generous, but somewhat commonly accepted threshold for CC. The threshold for LOC depends heavily on the overall project quality. I've been accepting a threshold of 100 in quality-challenged projects. As the quality improves (through refactoring / rewriting) we usually lower the threshold. Being a new development, I adopted 20 LOC as a rather reasonable threshold.

As I said, I use several different metrics. Some can be used in isolation (like code clones), but in most cases I combine them (for instance, code clones vs. code stability gives a better picture of the problem). Coupling and cohesion should also be considered as pairs, never as single numbers, and so on.
Quadrants are not necessarily the only tool: sometimes I also look at the distribution function of a single metric. This is way superior to what too many people tend to do (like looking at the "average CC", which is meaningless). As usual, a tool is useless if we can't use it effectively.

Speaking of tools, the project above was in C#, so I used Source Monitor, a good free tool working directly on C# sources. Many .NET tools work on the MSIL instead, and while that may seem like a good idea, in practice it doesn't help much when you want a meaningful LOC count :-).

Source Monitor can export in CSV and XML. Unfortunately, the CSV didn't contain the detailed data I wanted, so I had to use the XML. I wrote a short XSLT file to extract the data I needed in CSV format (I suggest you use the "save as" feature, as unwanted spacing / carriage returns added by browsers may cripple the result). Use it freely: I didn't put a license statement inside, but all [my] source code in this blog can be considered under the BSD license unless otherwise stated.

Saturday, July 14, 2007

Get the ball rolling, part 2 (of 4, most likely)

I hope you had some time to read the XP episode paper and ponder a little on what they managed to get in the end. As I've anticipated, I do I have a few issues with the results. It's not a matter of correctness: they pass all the tests. It's a matter of quality.
The design is questionable, and even the code is questionable. No big deal: after all, it's a little more than a toy example. It gets slightly worse when you consider all this stuff in perspective (at the end of the XP game, code is all you get; I'll get back to this later). Right now, here are the main flaws I see:

- overabundance of numerical literals.
This is programming 101: literals are bad (although they keep your code a few line shorter). Go over the code. You'll see numbers like 10 just about everywhere. Unfortunately, in bowling you got 10 pins and also 10 frames. You always have to read carefully to understand which is which. Now go back to the wikipedia page on bowling. See the mention of 5-pin bowling, 9-pin bowling, etc? Just go ahead and change the code to use 9 pins and 10 frames(9-pin bowling ain't that simple, but anyway). Guess what, none of the test cases will help, and it's not a one-line change as it ought to be. There is even a 21 in that code, being 10 * 2 + 1. I'll let you figure what that 10, 2, and 1 are :-).

- programming against an implementation, not against the problem
Some portion of the code are indeed quite good:

public int scoreForFrame(int theFrame)
  {
    ball = 0;
    int score=0;
    for (int currentFrame = 0;
         currentFrame < theFrame;
         currentFrame++)
    {
      if (strike())
        score += 10 + nextTwoBalls();
      else if (spare())
        score += 10 + nextBall();
      else
        score += twoBallsInFrame();
    }
    return score;
  }

You can basically read it in plain english. Unfortunately, most of the rest is not of the same quality, for instance:

public void add(int pins)
  {
    itsScorer.addThrow(pins);
    adjustCurrentFrame(pins);
  }
  private void adjustCurrentFrame(int pins)
  {
    if (firstThrowInFrame == true)
    {
      if (adjustFrameForStrike(pins) == false)
        firstThrowInFrame = false;
    }
    else
    {
      firstThrowInFrame=true;
      advanceFrame();
    }
  }
...
  private void advanceFrame()
  {
    itsCurrentFrame = Math.min(10, itsCurrentFrame + 1);
  }

"Adjust" a frame? Couldn't we get it right on the first place :-)? Even a Math.min call?? C'mon. There isn't any resemblance with the problem domain. It's just programming against the implementation, until things seems to work. Now, I've seen a lot of code like this over the years; it's the typical code people get after years of tweaking, except here it didn't take years, just hours.
Now, tweaking code till it works is already bad per se, but it's much worse when you consider that in XP, at the end of the game, the code is the design, and the test cases are the requirements. Again, everything is fine when the domain is trivial, largely known, well documented elsewhere (like in this case). But if we tackle a difficult new problem in an uncharted domain, and all we leave behind is code like that, I pity the souls that will join the team.

- Feature Envy: Game over Frame
According to Fowler, a method (or worse, a class) is exhibiting Feature Envy
when it "seems more interested in a class other than the one it actually is in". A variation of Feature Envy is when the other class does not even exist, but it is continually talked about in a method (or another class). Take Game. Excluding empty lines. Game is 45 lines long; if you exclude brace-only lines, it's 25 lines long. Of those, 16 contain the word "Frame". Is that suggesting you a missing abstraction?

- Feature Envy: Scorer over Ball
Just like above, Scorer is 57 lines long excluding empty lines (35 excluding also braces). Of those, 15 contain the word Ball (and 6, the word Frame). C'mon. We're talking about bad smells here, you agile guys are supposed to fix this stuff.

(note: the two previous issues may come from hindsight, as in my design, I do have a Frame and a Ball class).

- "Scorer"?
Good ol' Peter Coad once talked about the "er-er Principle": Challenge any class name that ends in "-er" (e.g. Manager or Controller). If it has no parts, change the name of the class to what each object is managing. If it has parts, put as much work in the parts that the parts know enough to do themselves.
That's OOP 101, because if you need a manager, it's because you got passive objects, and objects are about behaviour, not just data. You need a scorer because you haven't got the right classes.

Overall, this is not code you may want to maintain. Wanna give it a try? Ok, try to change to rules to, e.g., 3-6-9 bowling or low-ball bowling. Or make your own random rule, like "the 7th frame is only allowed one ball". See how easy it is. Hey, don't give me that YAGNI look :-), you are supposed to embrace change :-)).
It's worth repeating that in a real case, you'll be much worse off: you won't have a lenghty transcript of the session, telling you about how they come to put together that code. You'll have just the code; except that, in real-world projects, code will be 3 or 4 orders or magnitude bigger. You'll have to figure out everything else just from the code. If that's the average quality, well, good luck.

Of course, having only code to maintain means cheaper maintenance. Well, more exactly, it means that the mechanical act of maintenance (modifying stuff that needs to be changed) is cheaper. Of course, maintenance is not just about the mechanics: it is (mainly) about understanding the existing, understanding the new, understanding how to get from here to there.
Some redundancy (between wordy requirements, high-level design, and code) can help. Indeed, when I decided to design & implement my version of what above, I found it more convenient to read the wikipedia page, not to reverse engineer requirements from code. Of course, I also used the test cases they wrote to corroborate my understanding of the rules. Some redundancy can be helpful. Just ask your favorite aerospace engineer :-)).
This is why I always teach people to look for the bright and the dark side of everything, redundancy included (which of course, has well known shortcomings as well). That's why I don't like "extreme" approaches that pretend there is no downside in making "extreme" choices, only benefits.

So, on the bright side :-)), the code above is short (excluding blank lines and excluding test code, it totals 102 lines). Indeed, in a usenet post RCM reports that people who have "generated code" from the devilish overdesigned :-) UML diagram and added behavior got a whooping 400 lines.
Big deal. That was (in the authors' intention) a sketchy domain model, not a sensible design model. And code generation is usually a scam. That has nothing to do with diagrammatic reasoning (quite the opposite, I would say: it's diagrammatic nonreasoning: more on this next time).
Back to the bright side of the above, that code uses only 2 classes, which could be good if you don't like diagrams (which would help you understand the collaboration between multiple classes). Appalling enough, considering that the two authors together have something like 70 years programming experience, it looks like beginners' code. Which could be good in some environments. I'll get back to this another time.

Ok, enough bad cop for today. I'll show you my stuff in a few days (meanwhile, you can grind your teeth :-). I'll also describe my "process" and then draw some conclusions.

Carlo Pescio

Pages