Monday, October 17, 2011

You're solving the wrong problem

It had to happen. After my post on Yahtzee, my agile friend (Marco) has come back with more code katas, adding to the pile we already discussed. His latest shot was Kata Nine, a.k.a. "Back to the CheckOut".
His code was fine, really. Nothing fancy, basically a straightforward application of the Strategy pattern. You can find many similar implementations on the net (one of which prompted my tweet about ruby on training wheels, but it was more about the development style than the final result). Of course, my friend would have been disappointed if I had just said "yeah, well done!", so I had to be honest (where is the point of being friends if you can't be honest) and say "yeah, but you're solving the wrong problem".

A short conversation followed, centered on the need to question requirements (as articulated by users and customers) and come up with the real, hidden requirements, the value of domain knowledge and the skills one would need to develop to become an effective analyst or "requirement engineer" (not that I suggest you do :-), with the usual drift into agility and such.

I'll recommend that you read the problem statement for kata nine. It's not really necessary that you solve the problem, but it wouldn't hurt to think about it a little. I'm not going to present any code this time.

Kata Nine through the eye of a requirements engineer
The naive analyst would just take a requirement statement from the user/customer, clean it up a little, and pass it downstream. That's pretty much useless. The skilled analyst will form a coherent [mental] model of the world, and move toward a deeper understanding of the problem. The quintessential tool to get there is simple, and we all learn to use it around age 4: it's called questions :-). Of course, it’s all about the quality of the questions, and the quality of the mental model we're building . In the end, it has a lot to do with the way we look at the world, as explained by the concept of framing.

The problem presented in kata nine can be framed as a set of billing rules. Rules, as formulated, are based on product type (SKU), product price (per SKU) and a discount concept based on grouping a fixed quantity of items (quantity depending on SKU) and assigning a fixed price to that group (3 items 'A' for $1.30). It might be tempting for an analyst to draw a class diagram here, and for a designer to introduce an interface over the concept of rule, so that the applicability and the effects of a rule become just an implementation detail (you wish :-).

Note how quickly we're moving from the real-world issues of products, prices, discounts into the computing world, made of classes, interfaces, information hiding. However, the skilled analyst will recognize some underlying assumptions and ask for more details. Here are a few questions he may want to ask, and some of the reasoning that may follow. Some are trivial, some are not. Sometimes, two trivial questions conjure to uncover a nontrivial interplay.

- rules may intersect on a single SKU (3 items 'A' for $1.30, 6 for $2.50). Is that the case? Is there a simplified/simplifying assumption that grouping by larger quantities will always lead to a more convenient price per unit?

- without discounts, billing is a so-called online process, that is, you scan one item, and you immediately get to know the total (so far). You just add the SKU price, and never have to re-process the previous items.
If you have at most one grouping-based discount rule per SKU, you still have a linear process. You add one item, and you either form a new group, or add to the spare items. Forming a new group may require to recalculate the billing (just for the spares).
If you allow more than one rule for the same SKU, well, it depends. Under the assumption above about convenience for larger groups, you basically have a prioritized list of rules. You add one item, and you have to reprocess the entire set of items with the same SKU. You apply the largest possible quantity rule, get a spare set, apply the larger quantity rule to that spare set, and so on, until only the default "quantity 1" rule applies. This is a linear, deterministic process, repeated every time you add a new item.
In practice, however, you may have a rule for 3, 4, 5 items of the same SKU. A group of 8 items could then be partitioned in 4+4 or 5+3. Maybe 5+3 (which would be chosen by a priority list) is more convenient. Maybe not. Legislation tends to favor the buyer, so you should look for the most favorable partitioning. This is no longer a linear process, and the corresponding code will increase in complexity, testing will be more difficult, etc.
Perhaps we should exclude these cases. Perhaps we just discovered a new requirement for another part of the systems, where rules are interactively created (we don't expect to add rules by writing code, do we :-). We need to dig deeper.
Note: understanding when a requirement is moving the problem to an entirely new level of complexity is one of the most important moments in requirement analysis. As I said before, I often use the term "nonlinear decision" here, implying that there is a choice: to include that requirement or not, to renegotiate the requirement somehow, to learn more about the problem and maybe discover that it's our lucky day and that the real problem is not what we thought. This happens quite often, especially when requirements are gathered from legacy systems. I discussed this in an old (but still relevant) paper: When Past Solutions Cause Future Problems, IEEE Software, 1997.

- grouping by SKU is still a relatively simple policy. At worst, you recalculate the partitioning for all the items of the same SKU that you're scanning, and deal with a localized combinatorial problem. Is that the only kind of rule we need? What about the popular "buy 3 items, get the cheapest for $1", that is not based on SKU? What about larger categories, like "buy 3 items in the clothing department, get the cheapest for $1"? A naive analyst may ignore the issue, and a naive designer may think that dropping an interface on top of a rule will make this a simple extension. That's not the case.
Once you move beyond grouping by SKU, you have to recalculate the entire partitioning for all scanned products every time a new product is scanned. You add a product B priced as $5. Which combination of products and rules results in the most favorable price? Grouping with the other 3 items with the same SKU, or with other 2 in the clothing department? Finding the most favorable partitioning is now a rule-based combinatorial process. New rules can be introduced at any time, so it's hard to come up with an optimization scheme. Beware the large family with two full carts :-).
Your code is now two orders of magnitude harder than initially expected. Perhaps we should really renegotiate requirements, or plan a staged development within a larger timeframe than expected. What, you didn't investigate requirements, promised a working system in a week, and are now swamped? Well, you can always join the recurrent twitter riot against estimates :-).

- Oh, of course you also have fidelity cards, don't you. Some rules apply only if you have a card. In practice, people may hand their card over only after a few items have been scanned. So you need to go back anyway, but that wouldn't be a combinatorial process per se, just a need to start over.

It gets much worse...
There is an underlying, barely visible concept of time in the problem statement: "this week we have a special offer". So, rules apply only during a time interval.

Time-dependent rules are tricky. The skilled analyst knows that it's quite simple to verify if a specific instant (event) falls within an interval, and then apply the rules that happen to be effective at that instant. However, he also understands that a checkout is not an instantaneous process. It takes time, perhaps a few seconds, perhaps a few minutes. What if the two intervals overlap, that is, a rule is effective when checkout starts, but not when it ends (or vice versa)? The notion of "correctness", here, is relatively hard to define, and the store manager must be involved in this kind of decision. The interplay with the need to recalculate everything whenever a new item is scanned should be obvious: if you don't do anything about it, the implementation is going to choose for you.
Note: traditionally, many businesses have tried to sidestep this sort of problem by applying changes to business rules "after hours". Banks, for instance, still rely a lot on nightly jobs, but even more so on the concept of night. This safe harbor, however, is being constantly challenged by a 24/7 world. Your online promotion may end at midnight in your state, but that could be morning in your customer's state.

... before it gets better :-)
Of course, it's not just about finding more and more issues. The skilled analyst will quickly see that a fixed price discount policy (3 x $1.30) could easily be changed into a proportional discount ("20% off"). He may want to look into that at some point, but that's a secondary detail, because it's not going to alter the nature of the problem in any relevant way. Happy day :-).

Are we supposed to ask those questions?
Shouldn't we just start coding? It's agile 2011, man. Implement the first user story, get feedback, improve, etc. It's about individual and interactions, tacit knowledge, breaking new ground, going into the unknown, fail fast, pivot, respond to change. Well, at least, it's just like that on Hacker News, so it must be true, right?
I'm not going to argue with that, if not by pointing out an old post of mine on self-fulfilling expectations (quite a few people thought it was just about UML vs. coding; it's the old "look at the moon, not at the finger" thing). How much of the rework (change) is actually due to changing business needs, and how much is due to lack of understanding of the current business needs? We need to learn balance. Analysis paralysis means wasting time and opportunities. So does the opposite approach of rushing into coding.

We tried having an analyst, but it didn't work
My pre-canned, context-free :-) answer to this is: "no, your obsolete coder is not a skilled analyst". Here are some common traits of the unskilled analyst:
- ex coder, left himself become obsolete but knows something about the problem domain and the legacy system, so the company is trying to squeeze a little more value out of him.
- writes lengthy papers called "functional specifications" that nobody wants to read.
- got a whole :-) 3 days training on use cases, but never got the difference with a functional spec.
- kinda knows entity-relationship, and can sort-of read a class diagram. However, he would rather fill 10 pages of unfathomable prose than draw the simplest diagram.
- talks about web applications using CICS terminology.
- the only Michael Jackson he ever heard about was a singer.
- after becoming agile :-), he writes user stories like "as a customer, I want to pay my bill" (yeah, sure, whatever).
- actually believes he can use a set of examples (or test cases) as a specification (good luck with combinatorial problems).

More recurrent ineffective analysts: the marketer who couldn't market, the salesman who couldn't sale, the customer care who didn't care. Analysis is though. Just because Moron Joe couldn't do it, however, it doesn't mean it can't be done.

How to become a skilled analyst
The answer is surprisingly simple:
- you can't. It's too late. The discipline has been killed and history has been erased.
- you shouldn't. Nobody is looking for a skilled analyst anyway. Go learn a fashionable language or technology instead.

More seriously (well, slightly more seriously), the times of functional specifications are gone for good, and nobody is mourning. Use cases stifled innovation for a long while, then faded into background because writing effective use cases was hard, writing crappy use cases was simple, and people usually went for simple. Users stories are a joke, but who cares.

What if you want to learn the way of the analyst, anyway? Remember, a skilled analyst is not an expert in a specific domain (although he tends to grow into an expert in several domains). He's not the person you pay to get answers. He's the person you pay to get better questions (and so, indirectly, better answers).
The skilled analyst will ask different questions, explore different avenues, and find out different facts. So the essential skills would be observation, knowledge of large class of problems, classification, and description. Description. Michael Jackson vehemently advocated the need for a discipline of description (see "Defining a Discipline of Description," IEEE Software, Sep./Oct. 1998; it's behind pay walls, but I managed to find a free copy here, until it lasts; please go read it). He noted, however, that we lacked suck a discipline; that was 1998, and no, we ain't got one meanwhile.

Now, consider the patterns of thought I just used above to investigate potential issues:
Rules -> overlapping rules.
Incremental problem -> Priority list -> Selective combinatorial problem -> Global combinatorial problem.
Time interval -> overlapping intervals -> effective rules.
The problem is, that kind of knowledge is nowhere to be learnt. You get it from the field, when it dawns on you. Or maybe you don't. Maybe you get exposed to it, but never manage to give it enough structure, so you can't really form a proto-pattern and reuse that knowledge elsewhere. Then you won't have 20 years of experience, but 1 year of experience 20 times over.

Contrast this with more mature fields. Any half-decent mechanical engineer would immediately recognize cyclic stress and point out a potential fatigue issue. It's part of the basic training. It's not (yet) about solving the problem. It's about recognizing large families of recurrent issues.

C'mon, we got Analysis Patterns!
Sort of. I often recommend Fowler's book, and even more so the lesser known volumes from David Hay (Data Model Patterns) and a few more, similar works. But honestly, those are not analysis patterns. They are "just" models. Some better and more refined than others. But still, they are just models. A pattern is a different beast altogether, and I'm not even sure we can have analysis patterns (since a pattern includes a solution resolving forces, and analysis is not about finding a solution).

A few years ago I argued that problem frame patterns were closer to the spirit of patterns than the more widely known work from Fowler, and later used them to investigate the applicability of Coad's Domain Neutral Component (which is another useful weapon in the analyst arsenal).

However, in a maturing discipline of requirements engineering, we would need to go at a much finer granularity, and build a true catalog of recurring real-world problems, complete with questions to ask, angles to investigate, subtleties to work out. No solutions. It's not about design. It's not about process reengineering. It's about learning the problem.

Again, I don't think we will, but here is a simplified attempt at documenting the Rules/Timing stuff, in a format somewhat inspired by patterns, but still new (and therefore, tentative).

  • Name
Time-dependent Rules and Transactions

  • Context
We have a set of Rules that must be checked during a Transaction. Rules are effective from time T1 to time T2. Time can be expressed in absolute terms (e.g. December 25, 2011, 12:00 AM), recurring terms (every Friday, 12:00 to 17:00), etc. A Transaction is a set of actions, taking place over time, with a non-negligible duration (that is, a transaction is not an instantaneous event). Rules may have to be checked several times during the lifetime of a single Transaction.

  • Problem
The interval during which any given Rule is effective may overlap with the time interval required to carry over a transaction, that is, a rule may be effective when the transactions starts, but not when it ends, or vice versa, or may be valid for an interval beginning and ending inside the transaction time span (for long-lived transactions). Still, we need to guarantee some form of coherence in the observed behavior.

  • Example
[... the checkout problem would be a perfect fit here...]

  • Issues and Questions
- Is there any kind of regulation dictating the system behavior?
- Would the effect of simply applying rules effective at different times be noticeable?
- Can the transaction be simplified into an instantaneous event?
- Can we just "freeze" the rule set when the transaction starts? (there are two facets here: the real-world implications of doing so, and the machine-side implications of doing so).
- Can we replay the entire transaction every time a new event is processed (e.g. a new item is added to the cart), basically moving everything forward in time?
- etc.

I don't know about you, but I would love to have a place where this kind of knowledge could be stored, cleaned up, structured, and curated.

Conclusions
It would be easy to blame it all on agility. After all, a lot of people have taken agility as an excuse for "if it's hard, don't do it". Design has been trivialized to SOLID and Beck's 4 principles. Analysis is just gone. Still, I'm not going to. I've seen companies stuck in waterfall. It ain't pretty. But it's more and more obvious that we're stuck in just another tar pit, where a bunch of gurus keep repeating the same mantras, and the discipline is not moving forward.

Coding is fascinating. A passionate programmer will stay up late to work on some interesting problem and watch his code work. I did, many times, and I'm sure I'll still do for a long while. Creating models and descriptions does not seem to be on a par with that. There is some intellectual satisfaction in there too, but not near as much as seeing something actually run. In the end, this might be the simplest, most honest explanation of why, in the end, we always come back to code.

Still, we need a place, a format, a curating community for long-term, widely useful knowledge that transcends nitty-gritty coding issues (see also my concept of Half-Life of a Knowledge Repository). Perhaps we just need to make it fun. I like the way the guys at The Fun Theory can turn everything into something fun (take a look at the piano stairs). Perhaps we should give requirements engineering another shot :-).


If you read so far, you should follow me on twitter.

Acknowledgement
The image on top is a copy of this picture from Johnny Jet, released under a creative common license with permission to share for commercial use, with attribution.