[Wednesday, June 29, 2011]

Cut the red wire!

We're all familiar with this scene: the good guys trying to save the day (or the world) by defusing a bomb. Quickly running out of time, they have to cut a wire. Failure is not an option. The red bar won't turn into a green bar.

What if you had to write code that works the very first time? What if you had no chance to test your code, let alone fixing bugs? You got only one chance to run your code and save the world. Would that change the way you write? The way you think? How? What can you learn from that?

Lest you think I'm kinda crazy: I'm not actually suggesting that you shouldn't test your code. I'm not actually suggesting that you stop doing your test-driven thing (if you do) or that you forget your test plan (if you have one), etc etc.
Consider this like a stretching exercise. We may learn something useful just by going outside our comfort zone. Give it a try.


Yahtzee
[sorry guys, gotta go into Dr. Mallard mode for a while; you can breeze through this section, if you're the TL;DR type]

I came to write this post, and to choose Yahtzee as the underlying problem, mostly because I like coincidences :-).

A friend of mine, a relatively young programmer in love with everything agile, every once in a while shows me some "code kata" that he has completed. Not long ago, he came up with the KataYahtzee. His code was, to use his words "completely covered by tests". After all, that's what today's gurus are constantly talking about. Tests, tests, more tests (yawn). However, the code emerging from all those tests was in my opinion rather crappy (one of the benefits of friendship is that you can say this sort of things in a benign way and have them interpreted in a benign way :-). I muttered something about a better alternative, promised I would send him my code, then got caught in other stuff and never actually implemented a thing.

Still, shortly after, I was visiting the blog of an acquaintance of mine (xpmatteo), and after leaving a comment, I read a few more things including a post about Yahtzee and the removal of conditionals and loops.

That was yet another coincidence: I joked with my agile friend before, telling him that sooner or later, they would have raised the bar from anti-if to anti-while, anti-for, anti-pattern-matching, anti-list-comprehension, anti-recursion, etc :-). But I also truly liked the idea that by changing the underlying data structure you could make some rules extremely simple to check (I'm a big fan of the "smart data – stupid procedures" approach). Anyway, that reminded me of my promise, so I fired up my editor and then I closed it :-), because I'd better read the actual rules first (some would call me anti-agile for that, but I don't see why I can't leverage existing knowledge, when available).

Having never played Yahtzee before, I quickly discovered that the standard rules didn't match the rules in the kata page. However, you'll find them under "scoring variations". Reading wikipedia sorted out some of my doubts (is "13333" a double pair? No), so I was ready to actually write some code.

Before you get any further
This is where I should ask you to try it on your own. You probably won't, but if you do, remember: you want your code to run the very first time. You can write a set of test cases, but if one fails, the game is over (unless the test case is wrong).

Just to add a little juice: the modern twist on the opening scene is that shortly after the heroes manage to defuse the bomb, the countdown is restarted. So, here is your second little bomb coming: if it actually works the first time, change the rules from the "kata version" to the traditional Yahtzee rules, quickly :-), and run a new set of test cases. We're counting on you.

Picking my tools
My alma mater was completely sold on formal methods, and I used to like them too. In practice, over a number of years now, I had very few chances (and then reasons) to apply formal methods in the real world. Therefore, I won't suggest that you develop a formal proof of correctness for your code before you run it.

The problem with a formal proof is that (in many cases) the proof is at least as complex as the code, sometimes more. I won't trust my only chance on a proof that is potentially more obscure than my code. After all, Donald Knuth notoriously said "Beware of bugs in the above code; I have only proved it correct, not tried it". Sure, I may try to check my proof :-) with the help a theorem prover, but that would be cheating: in a sense, the prover would be testing my proof.

Another well-known technique to develop mission-critical code is the use of inspections and reviews. But then again, this is sort of cheating, because I would be using people to test my code (in their heads). That's a no-no.

So, having ruled out old-school stuff like Cleanroom Software Engineering, and having removed TDD from the table at very beginning, let's move on.

Is that even possible?
Of course, before we even start trying, it makes sense to ask if there is actually any hope to get a program working the very first time. Well, for simple problems, it's certainly possible. For instance, if you ask me to calculate the area of a circle with (say) 5 digits accuracy given the radius, I may just write something like:
double Area( double radius )
{
return radius * radius * 3.1415926 ;
}

I would probably use a statically checked language (which is a bit like cheating, but not too much :-), just to catch any typo. After that, I would feel rather confident.

Ok, that was easy. Indeed, it would have been even simpler/safer if my language had a PI constant. The language would have been closer to the problem I was trying to solve. Which is exactly what we need: a language where our problem can be explained in a simple, straightforward way. Here, I think, is the key to my approach to writing code that runs the first time (I actually do that in real life, just not every time – more on this later on). I could say it shortly like this: Everything is simple when it has a direct mapping to your language.
Quoting Ward Cunningham (from "Clean Code" by Bob Martin), "You know you are working on clean code when each routine you read turns out to be pretty much what you expected. You can call it beautiful code when the code also makes it look like the language was made for the problem". I won't trust my only chance on nothing less than beautiful code :-).

Now, we can read this as the frequently preached "look for the best language for the problem at hand", and in a sense it's true, but truth is, once you get past a few obvious cases, you ain't gonna find a language that was made exactly for your problem. The alternative approach, of course, is to create a language that is made exactly for your problem. No, I'm not suggesting that you design a language and write a compiler first: remember, you can only run your code once. That includes your brand-new compiler. Do it if you want, but don't blame me if you manage to annihilate our civilization.

Looking back at the trivial Area function, you can easily see that my language had a useful abstraction (a "double", meaning IEEE 754 floating-point math) together with a few basic operations. Say that I trust those abstractions and the underlying hardware. I also trust my Area function, simple and small as it is. Area, therefore, could be thought of as a language extension, a new trusted abstraction.

I could simply call this "Compositional Correctness": get your low-level abstractions right. Then build more abstractions on top of those, each so small and simple that it is trivially correct. Of course, you can do it the other way too (top-down), or you can mix the strategies. In a sense, it's just an application of C. A. R. Hoare's principle: "there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult". Well, sometimes, it doesn't have to be difficult.

Listen to Your Problem
I can almost hear the word "DSL" in the background, and indeed, a Domain Specific Language may turn a complex problem into a trivial one. For instance, HTML is a pretty good DSL to create formatted, hyperlinked documents.
DSLs are often associated with humongous graphical / modeling tools (and I'm not gonna trust my only chance with one of those) or with flexible languages like Ruby. Still, remember the old Bell Labs motto: "library design is language design". You don't really need to write your language: you can extend your language just by writing a library (like with Area).
Of course, you need a programming language with some built-in flexibility (which is why I dislike overly restrictive languages), but you don't need to get too fancy. To prove my point, I'll use (modern) C# in what follows. I could have used good ol' C++ as well. Java would make a few things a little more cumbersome than I'd like to, but it could still be a viable option. Dynamic languages would do just fine. As usual, design decisions are way more important than little language-specific details.

A DSL should speak the language of your problem. Therefore, the first step toward the creation of a DSL should be to carefully listen to your problem and understand the inner nature, the basic concepts upon which the problem structure is built. In a better world, this would be a widely researched, frequently applied process, but in this world, it's not (although you can find a few noteworthy works on DLSs). In practice, I first try to understand whether the problem "fits" with some of the categories I'm familiar with, like:

- is it a unification / pattern matching problem? For instance, I could model a double pair like (x,x,y,y,z), and leave it as an implicit convention of the language that different letters will necessarily match different faces. This looks promising, until you look at some rules (like Large straight) that don't really fit that model. Sure, you can stretch the language a little, and allow literals: by the way, this is the approach taken by the original Ruby Quiz solution, which inspired the kata, which inspired this post. However, if you look at the original Yahtzee rules for a Large straight, they won't fit with literals either (it's just five sequential dice). Pattern matching is not the real Yahtzee language. As an aside, having read the Ruby code, it just doesn't like like the kind of code I would trust to run flawlessly the first time :-).

- is it an incremental counting problem? What I really loved in Matteo's post was the idea that you could simply keep a count of faces, and that would easily provide a score for several rules. However, he already showed that such technique won't cover all rules.

- is it a map/reduce kind of problem? It doesn't really look like that, because the "map" is supposed to take a (k1,v1) pair and return a list of (k2,v2) pairs, to be grouped and reduced. I could shoehorn Yahtzee into this, but it doesn't look like a natural fit for its rules.

- is it a filter and projection problem (a-la SQL)? This may actually work. Conditions can be expressed as filters. I can start with a sequence and, through a series of trivial filters, end up with an empty sequence (if there is no match for the rule) or with the surviving dice. At that point, it's basically a matter of taking the sum. Hmm, perhaps this could work. Time to dig a little deeper.

Why not SQL, then?
At this point, it might be tempting to simply use SQL as a language. Alternatively, if you're familiar with .NET, you may want to try LINQ, which is basically a SQL-like syntax to play with collections. Well, why not. Here is a Pair scoring expressed as a LINQ statement:
2 * 
(from d in dice group d by d into g
where g.Count() >= 2 orderby g.Key
descending select g.Key).FirstOrDefault();

Part of the trick here is that the select expression returns a collection of integers, and that FirstOrDefault will return the first element (the highest face with a pair in the original sequence) or 0, which is the default value for int. It works with any kind of sequence (IEnumerable). In my tests (yes, I tested this :-), I just used an array of integers to represent dice.

The problem is, however, that this code has to be carefully read. I have seen production code full of statements like that, and to put it gently, it sucks.
On the upside, it's control-flow free, so it should please a few people out there :-). More seriously, the absence of control flow should probably by a byproduct of the "right" Yahtzee language. Control flow makes code harder to read in one pass. If I have to trust the code, I'd like to read it and "get it" in one pass. Note, however, that absence of control flow is not a guarantee of readability (or testability, or whatever else). Standard SQL, for instance, is control-flow free, but I can hardly say that a lengthy query mixing inner and outer joins can be easily read and trusted. The LINQ query above has no control flow, but I wouldn't like my code to read like that. Part of the problem lies with the LINQ syntax, but another part of the problem is that we're relying on standard data structures (collections) and standard SQL-like instructions, and they don't speak the problem language.

Sketching the Yahtzee language
Language design is difficult and requires that we pay attention to many details and facets. We want the "procedural" part to become trivial: this is the point of the entire exercise. Sketching the language requires that we focus on a few rules, while keeping the others (or at least some of them) in the back of our mind, because the "right" language must be a good fit for most of them (ideally, all of them). However, picking two similar rules to begin with allows us to focus slightly better, without being too narrow. I'll pick the Pair and Double Pair rules.

We also have to choose a language style. The "language style" inside Area was the familiar "infix operations" borrowed from math. Personally, I can't see that as a good fit for the Yahtzee rules. I'm pretty sure, however, that I'd like my rules to become one-liners, something you can understand by reading it once. I think a Fluent Interface style will fit that role better.

At this stage, I would not focus too much on [domain] classes. Sure, some classes are there just for the picking, as Bertrand Meyer said. Dice, Face, Roll, Rule, Score, etc. However, when you design a small DSL, starting with classes is not necessarily the right approach. More often than not, classes will emerge as the (Alexandrian) centers of the conversations we're building. Still, we have to start somewhere, so I'll have an imaginary "roll" object (of unknown class) representing the roll we want to score against a rule.

Let's give it a try. A trivial pair of one-liners for our rules could be:
roll.GetHighestPair().Score();
roll.GetTwoHighestPairs().Score();

however, this is wrong. Sure, it looks like good old stepwise refinement, but in fact I'm just pushing the burden of Yahtzee rules on roll's shoulders. Rules can grow and change, roll should be a stable abstraction.
Even this little step in the wrong direction, however, can teach us something: what if there is no pair? I surely don't want my rules to be choke-full of conditionals, as that would wreck my "read it once and see it's right" principle. So, we can already give some structure to our language:

we are manipulating a collection of die (roll)
the collection may go through several "stages" (the "filters" or "selections")
at every stage, the collection may get empty, but never a null reference (that will avoid conditionals)
the score of an empty collection is zero (which is consistent with Yahtzee)

Ok, let's give it another try:
roll.GetPairs().GetHighest().Score();
roll.GetPairs().GetTwoHighest().Score();

this is marginally better, because we formed a few smaller concepts. We can ask a roll for pairs; we can ask a pairs collection for the highest (by face: that should be made more explicit in our language), or the two highest. However, it's still very tailored, and it does not support any other rule. As we progress, the other rules have to get slightly more in the foreground to help us shaping our language (yeah, it's really a Gestalt process, but don't get me started on this :-).

The problem with the concept of pair is that it's not general enough to appear in our language. We need something more general. In the end, Yahtzee as a selection language is mostly about filtering by face or by occurrences of dice, grouped by faces. Perhaps a class would help here, a group of dice with the same face:
class DiceGroup
{
public DiceGroup(int f, int c)
{
Face = f;
Count = c;
}

public int Face
{
get;
private set;
}

public int Count
{
get;
private set;
}

public int Score()
{
return Face * Count;
}
}

Now, I know some of you have been brainwashed :-) to the point that you feel the need to write a load of test cases for this class, but let's face it: it's basically as complex as Area, actually even less. I made it more complex than strictly needed, by using properties instead of plain public data, but that's giving me something back: immutability. A DiceGroup cannot be changed after construction. Not having to reason about mutable objects helps a lot in writing safe code, and Yahtzee is not a performance-critical problem, so I can afford it. DiceGroup is also a stable abstraction. Either you use it as-is, or you scrap it and use something else. There is no point in tweaking DiceGroup. Hence, a real economic theory for software design (as opposed to babbling about the need to refactor every single line you ever wrote) would tell us that there is little value in drowning this specific class in trivial test cases.
Provocation: what if the best way to minimize the cost of change was to find a number of useful, small abstractions that do not change at all, or rarely do, because they're in frictionless contact with the nature of the problem?

Now that I have a DiceGroup, what is a Roll? Well, a Roll might be just a sequence of DiceGroup. I can take my 5 dice, organize them in 1 to 5 DiceGroup objects (depending on the actual faces) and call that sequence a Roll. I'll ignore the details of how to do that for now, and get back to it later. Let's say that a Roll will be immutable too, so whenever we filter a Roll, we get a new Roll.

So, say that I roll my dice and I get 4, 4, 1, 4, 3. I'll organize them into three DiceGroup objects: {4,3}, {1,1}, {3,1} (read them as {Face,Count}). To score them as Pair, I would have to remove the two DiceGroup with Count < 2, but that's not enough: {4,3} must also be turned into {4,2}, because the idea of Pair is that you max out at 2 occurrences. Hmm. That's just like trimming. I could write Pairs just like:
roll.TrimCountTo(2).TopFace().Score();

where both TrimCountTo and TopFace return a new Roll, filtered and changed as needed. The rule is pretty readable, and given the language conventions above about returning an empty sequence if there is no match for the filter, I'll just get 0 when there are no pairs.

TwoPairs would require a little extension to the language, because TopFace is again a bit too tailored to the Pair rule. The language itself is also telling us that (if we listen closely) because a TopFace method should return a DiceGroup, not a Roll. That makes the language asymmetric, so operations are harder to combine, and I don't want that. Lucky enough, this is easy to fix: why not having a TopByFace(n) method returning a roll with the n best DiceGroup (by face)? Now I can write Pair and TwoPairs like:
roll.TrimCountTo(2).TopByFace(1).Score();
roll.TrimCountTo(2).TopByFace(2).Score();

[Un]surprisingly enough, those two little functions will get us many other rules for free:
ThreeOfAKind: roll.TrimCountTo(3).Score();
FourOfAKind: roll.TrimCountTo(4).Score();
Yahtzee: r.TrimCountTo(5).Score();
Chance: roll.Score();

Yo, I like this. Rules are trivially correct. They can be created by combining a few operators. This looks like the right DSL, or part of it. You can't express all the Yahtzee rules with just two operators (TrimCountTo and TopByFace). Ones, Twos, ..., Sixes are out. Well, they can be easily brought in by mirroring TrimCountTo with TrimFaceTo. So, I will have:

Ones: roll.TrimFaceTo(1).Score();
etc...

This is almost too simple, isn't it? Well, when it gets that easy, it's because the language "was made for the problem", to quote Ward again, or because our solution is in frictionless contact with the forcefield, to bring in Alexander once more.

As usual, the thinking process is more important than code here. I am getting feedback. Backtalk, actually. I'm just not getting it the way most people are used / taught to (by executing code). I'm getting feedback through a reflective conversation with my material. I know, this is not what everyone is telling you. And please understand I'm not implying that you should stop testing your code. I'm just taking the path less traveled by to see what happens.

What's left? Small straight (1,2,3,4,5), Large straight (2,3,4,5,6), Full house (two of a kind + three of a kind, exactly: as per instructions, (4,4,4,4,4) is not a Full house). These rules cannot be expressed yet. You may see a pattern here, however. The straights are basically trimming by face, but with multiple faces. Full house is trimming by count, but with multiple counts. This may suggest an improvement to our language: turn TrimCountTo into TrimCountsTo, and TrimeFaceTo into TrimFacesTo. Care must be taken, however, not to break the simplicity of the language by creating too powerful instructions. Perhaps an orthogonal operator should be brought in. This is already a very long post, and I still have a lot of code to show, so I'll leave this extension out.

The Structure of the Yahtzee Language
DiceGroup was trivial. I basically did that bottom-up. Roll might not be just as trivial, and so far we only know that we need a TrimCountTo, a TrimFaceTo, and a Score method. Now we have two forces to balance:
We want Roll to be as trivial as possible. It has to work the first time! Honestly, this is the keystone now. DiceGroup was so trivial to be obviously correct. Rules are now so obvious that we can happily trust them. We need Roll to be so simple that we can trust it too.
We want our DSL to be extendible. We're missing a few rules, and the "standard" Yahtzee rules were different as well. I don't want to tweak Roll to change or extend the rules: remember what I said about finding stable abstractions.

I've been talking about [programming] language design in the past, and how "good" languages should be largely extendible. A few built-in core concepts supporting most of the language itself, which would then largely be in a library. It's time to apply the same principle to the Yahtzee Language.

Here is where our actual implementation language can help or hinder. C# is almost helpful here. A dynamic language, or a language with structural conformance for interfaces, would make the following code much shorter and easier to read. Anyway, the key concept here is that of an open class, something that we can extend without breaking the basic abstraction and without creating subtypes (while still not breaking encapsulation). What we need is a "simple" Roll class with the core methods and a set of Extension Methods (in C# parlance) which implements the "upper layer" of our language. Score() will end up in Roll, while TrimCountTo, TrimFaceTo, and TopByFace will land in a RollLanguage class. Therefore, adding support for more rules won't require any change to existing classes. Just implement new extension methods. Of course, that in turn requires Roll to be flexible enough to support basically all the desirable extension methods.

In practice, it's not that hard. Let's start with RollLanguage. Ideally, it would be something like this:
static class RollLanguage
{
public static Roll TopByFace(this Roll groups, int minCount)
{
//trivial code here
}

public static Roll TrimCountTo(this Roll groups, int minCount)
{
//trivial code here
}

public static Roll TrimFaceTo(this Roll groups, int face)
{
//trivial code here
}
}

Let's go ahead and implement TrimFaceTo. This would be my first shot:
public static Roll TrimFaceTo(this Roll groups, int face)
{
return groups.Where(g => g.Face == face);
}

note that I'm using LINQ (not exactly, I'm using the Linq extension method Where, not the LINQ syntax). However, this is a very simple expression. Sure, you have to know C# and .NET, but that's basically returning a sequence with all the DiceGroup which have the specified face. Unfortunately that won't compile :-), for two simple reasons:

1) Roll must be an IEnumerable<DiceGroup>, because that's what Where expects.
2) Where will then return an IEnumerable<DiceGroup>, not a Roll.

(1) Is relatively trivial to fix, and while doing so, we'll sketch Roll as well:
class Roll : IEnumerable<DiceGroup>
{
public IEnumerator<DiceGroup> GetEnumerator()
{
return diceByFace.GetEnumerator();
}

System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return diceByFace.GetEnumerator();
}

private IEnumerable<DiceGroup> diceByFace;
}

This is a boilerplate code (partially generated by Visual Studio), that I wouldn't need with dynamic languages (or with structural conformance). It's pretty trivial though: Roll has-a and is-a IEnumerable<DiceGroup>. It implements the interface by forwarding to the data member. I'm rather confident that it's correct.

(2) Requires that we tweak our code a little (again, this wouldn't be necessary in a more flexible programming language). Code speaks better than words here:
public static Roll TrimFaceTo(this Roll groups, int face)
{
return groups.Where(g => g.Face == face).AsRoll();
}

public static Roll AsRoll(this IEnumerable<DiceGroup> r)
{
return new Roll(r);
}

All methods in RollLanguage can be implemented over Linq, but have to end with AsRoll() to convert the IEnumerable into an actual Roll. Yeah, it sucks, but the alternative (getting rid of DiceGroup and Roll altogether) is even worse (I'll touch on this later on). An implicit conversion operator would be ok, but C# does not allow implicit conversion from interfaces.

So, here is TrimToCount:
public static Roll TrimCountTo(this Roll groups, int minCount)
{
return groups.Where(g => g.Count >= minCount).Select(
g => new DiceGroup(g.Face, minCount)).AsRoll();
}

Honestly, this is a borderline function. You need to be somewhat confident with Linq to get it the first time through. Basically, I'm filtering by count (that's the easy part) and then I'm capping the count by creating a new DiceGroup with the same face and minCount as the actual count. I can trust this, but if you don't, I'll understand :-). Simpler code is welcome!

So far I managed to avoid any kind of conditional. I didn't want any IF in my rules, and I didn't need any in RollLanguage thus far. However, my most readable TopByFace implementation is if-based:
public static Roll TopByFace(this Roll groups, int minCount)
{
if( groups.Count() < minCount )
return Roll.Empty();
else
return groups.OrderByDescending(g => g.Face).Take(minCount).AsRoll();
}

The idea is simple: if I don't have at least minCount groups, the filter fails and an empty roll must be returned. Otherwise, I can simply use Linq again, sort my groups by face (descending) and take the first minCount groups. Again, this might be borderline for someone uncomfortable with Linq. Looks pretty trivial to me, but simpler code would be welcome; actually, if you are an anti-if devotee, feel morally obliged to show me how to improve that code and get rid of the if (using a ternary operator is not a real alternative, of course). Pushing the IF in a custom version of Take is possible, but that would be just moving it around (with an interesting trade/off in efficiency Vs. reusability).

By writing this code, we already got a set of requirements for Roll. Must be (and have) an IEnumerable<DiceGroup>; must have a Score() method; must have an Empty() static method returning an empty sequence; must be constructable from a IEnumerable<DiceGroup> (trivial); we also need the "real" constructor, taking a sequence (or array) of integers (the 5 dice) and doing the actual grouping. We have already seen part of the code above, so I'll add the trivial parts here and leave the "hard" constructor last:


class Roll : IEnumerable<DiceGroup>
{
public Roll(IEnumerable<DiceGroup> dice)
{
diceByFace = dice;
}

public int Score()
{
return diceByFace.Sum(g => g.Score());
}

public static Roll Empty()
{
return empty;
}

private IEnumerable<DiceGroup> diceByFace;
private static int[] emptyArray = new int[0];
private static Roll empty = new Roll(emptyArray);
}

OK. Once again, this is so trivial that I can trust it. Now let's get over with the grouping; lucky me, I can simply use Linq once again:
public Roll(int[] dice)
{
diceByFace = dice.GroupBy(x => x).Select(
g => new DiceGroup(g.First(), g.Count()));
}

Once again, this is borderline if you aren't familiar with Linq. I'm grouping by faces, and for each group I'm building a DiceGroup object with face and count. I'd like a simpler syntax (g.First() is not exactly obvious) but this is what Linq provides me with. Usually, I'd also add an assertion that dice has 5 elements. I left that out to keep my code shorter.

Interestingly, although I ignored performance, concurrency, etc so far, it would be rather simple to translate all this code in high-performance C++, using mutable stack-based objects, because in Yahtzee we're always dealing with (at most) 5 dice, so everything could be pre-allocated at fixed size. I would have to re-create some of the Linq extension methods, but C++11 has got lambda, so the overall style could stay the same.

Wrap it up (and extend it)
We're basically through. At this point, everything else is cosmetic. How do I group rules? Do I need a Rule interface? Well, in C#, I could just use a delegate for that. Can I just use member functions in a Yahtzee class and get over with it? Sure, unless we're concerned about a hierarchy of Yahtzee games (standard, non-standard, etc). Also, organizing rules into an array or dictionary might be the simplest way to glue them to the UI (which I'm not going to write). Since this is all basically irrelevant to our quest, I just wrote a set of one-line member functions.

Part 2 of the problem was to change rules to the original Yahtzee scoring. For instance, adopting the original rules a Three-Of-A-Kind will get you the sum of all dice (not just the 3 of a kind). The simplest way, perhaps a bit too technical, would be to do it like this:
Math.Sign( ScoreThreeOfAKind( roll ) ) * roll.Score()

Alternatively, you can create new filters, returning the original roll (if the filter is passed) or an empty roll. It's basically trivial now.

In the end, I didn't create classes like Dice or Face. They would add some value (by forcing constraints) but I don't want to get too serious with this code :-).

The entire code is here. For simplicity, I used a single file. I added a few tests in the end, so that I could either succeed or fail when I finally ran the code (it worked!). In the end, I sort of like it. It's not perfect, but is pretty good. Feel free to improve it, in which case, I'd like to know!

Winding down
Even though I've written so much, there would be much more to say. I'll give you the short version:

Bad abstraction vs. Good abstraction
strictly speaking, I wouldn't need any "base" abstraction like DiceGroup and Roll in my Yahtzee language. I could write every single function as an extension method of standard library classes. Why should I use concrete, non-reusable terms like Roll and DiceGroup when I could just use IEnumerable<IGrouping<int, int>>? Arguably, my code would be "more abstract", "more mathematic", and therefore "better" by sticking to abstract, domain-independent interfaces. Here, I think, my programming style differs from some. I used to like the idea of abstracting the domain away from my code, perhaps 20 years ago or so. I guess it was some sort of academic imprinting. However, over time I moved into a different coding style, where abstractions are taken from the problem domain, not from abstract math. This allows me to focus on the real-world, concrete problem I want to solve; to speak the language of the problem, with people and with my code; and to leave behind me code that can be more easily understood by people who know about the problem domain, and perhaps a little less about computer science. I hinted at this issue in the past, as in my mind map about habitable software, and even before, without ever taking time to do it justice. Perhaps this post will fill part of that gap.

Code quality
As I discussed my code with my friend, he sort of liked it. He couldn't fully admit it, because hey, I had no tests in place, but he obviously felt it was much easier to read and to evolve. Of course, popular authors would say that it's bad code anyway: "Code without tests is bad code. It doesn't matter how well written it is; it doesn't matter how pretty or object-oriented or well-encapsulated it is" (Michael C. Feathers, "Working Effectively with Legacy Code"). This is pretty harsh, because if I had to choose between the code above (with this blog post as the dreaded "comprehensive documentation") and some crappy code entirely covered by tests, I would have a hard time choosing the latter.
Curiously enough, for a very long time testability has been like the stepchild of -ilities. In the past few years, the pendulum swung too far on the other side, and now it's all about tests.
Since it's way too easy to babble and pontificate, I'd like someone to write much better code than mine (and yeah, completely different, thank you) using TDD, and show me the process through which tests are driving him toward that code. If you ain't got code, please don't bother leaving a comment about the greatness of TDD and how dumb I am for even trying to write code without using tests. OTOH, if you got code to back up your claims, you're more than welcome.

Limits and confessions
I'm making a habit to conclude with a few confessions, so here I come. I don't always write code this way. I tend to: I like my code to work the first time. However, it took me more than the allotted time of 1h to complete my mission. Writing the code was easy. Thinking about the language, trying out a few different styles, rejecting options, etc, was not, and almost never is.
The "little DSL" approach requires a clarity of thought that we can't always afford, and it does not necessarily pay off. Yahtzee is also quite simple, and requirements were well-known. Again, that's a luxury we don't face so often (although I don't buy the idea that we never know what we're doing; deep inside uncertainty there is often a core of stable concepts, if you take the time to dig it out).
There are also limits to the applicability of these techniques. Not every single problem lends itself well to the DSL approach. As usual, we have to be flexible, understand the nature of the problem, and choose the best design strategy. A DSL was just particularly well-suited for Yahtzee.
Indeed, a few years ago I spent some time doing a [little] upfront design, followed by implementation, for the bowling problem. Back then, my aim was to show that you don't have to give up on OOP and OOD so easily, and that by sticking to OOD you won't end up with a LOC monster as it was suggested elsewhere. I didn't use a DSL back then, but of course, it's quite possible to combine a little upfront OOD with a little DSL-like library with a little emergent design. This is very close to what I usually do when I have to code something on my own. Group dynamics are different, and I tend to adapt my techniques to the context.

What can we learn from this stuff?
A lot, I hope :-). Software development is about acquiring and encoding knowledge into an executable format (my usual quote from Phillip Armour). Our code can be extremely simplified through a careful selection of the language we use to encode that knowledge: when the language is aligned with the problem domain, many layers of complexity simply fall down. Your programming language is usually not aligned with your problem. It's your job to make it so, by introducing the right abstractions.
So, next time you're facing some complex issue, ask yourself: what if this code had to run the very first time?

Acknowledgement
The image on top is adapted from this picture from Lauri Rantala, released under a creative common license with permission to modify the picture and for commercial use.

Labels: , , ,