Carlo Pescio: architecture

Showing posts with label architecture. Show all posts

Tuesday, October 02, 2012

Don’t do it!

We used to be taught that, by spending enough time thinking about a problem, we would come up with a "perfect" model, one that embodies many interesting properties (often disguised as principles). One of those properties was stability, that is, most individual abstractions didn't need to change as requirements evolved. Said otherwise, change was local, or even better, change could be dealt with by adding new, small things (like new classes), not by patching old things.

That school didn't last; some would say it failed (as in "objects have failed"). At some point in time, another school prevailed, claiming that thinking too far into the future was bad, that it could lead to the wrong model anyway, and that you'd better come up with something simple that can solve today's problems, keeping the code quality high so that you can easily evolve it later, safely protected by a net of unit tests.

As is common, one school tended to mischaracterize the other (and vice-versa), usually by pushing things to the extreme through some cleverly designed argument, and then claiming generality. It's easy to do so while talking about software, as we lack sound theories and tangible forces.

Consider this picture instead:

Even if you don't know squat about potential energy, local minima and local maxima, is there any doubt the ball is going to fall easily?

No Controller, Episode 3: Inglorious Objekts

A week ago or so, Ralf Westphal published yet another critique of my post on living without a controller. He also proposed a different design method and therefore a different design. We also exchanged a couple of emails.

Now, I'm not really interested in "defending" my solution, because the spirit of the post was not to show the "perfect solution", but simply how objects could solve a realistic "control" problem without needing a centralized controller.

However, on one side Ralf is misrepresenting my work to the point where I have to say something, and on the other, it's an interesting chance to talk a bit more about software design.

So, if you haven't read my post on the controller, I would suggest you take some time and do so. There is also an episode 2, because that post has been criticized before, but you may want to postpone reading that and spend some time reading Ralf's post instead.

In the end, what I consider most interesting about Ralf's approach is the adoption of a rule-based approach, although he's omitting a lot of necessary details. So after as little fight as possible :-), I'll turn this into a chance to discuss rules and their role in OOD, because guess what, I'm using rules too when I see a good fit.

I'll switch to a more conversational structure, so in what follows "you" stands for "Ralf", and when I quote him, it's in green.

Life without Stupid Objects, Episode 1

So, this is not, strictly speaking, the next post to the previous post. Along the road, I realized I was using a certain style in the little code I wanted to show, and that it wasn't the style most people use, and that it would be distracting to explain that style while trying to communicate a much more important concept.

So this post is about persistence and a way of writing repositories. Or it is about avoiding objects with no methods and mappers between stupid objects. Or it is about layered architectures and what constitutes a good layer, and why we shouldn't pretend we have a layered architecture when we don't. Or it is about applied physics of software, understanding our material (software) and what we can really do with it. Or why we should avoid guru-driven programming. You choose; either way, I hope you'll find it interesting.

Episode 2: the Controller Strikes Back

This post should have been about power law distribution of class / method sizes, organic growth of software and living organisms, Alexandrian level of scales, and a few more things.

Then the unthinkable happened. Somebody actually came up with a comment to Life without a controller, case 1, said my design was crappy and proposed an alternative based on a centralized, monolithic approach, claiming miraculous properties. I couldn’t just sit here and do nothing.

Besides, I wrote that post in a very busy week, leaving a few issues unexplored, and this is a good chance to get back to them.

I suggest that you go read the whole thing, as it’s quite interesting, and adds the necessary context to the following. Then please come back for the bloodshed. (Note: the original link is broken, as the file was removed; the "whole thing" link now points to a copy hosted on my website).

The short version
If you’re the TL;DR kind and don’t want to read Zibibbo’s post and my answer, here is the short version:
A caveman is talking to an architect.
Caveman: I don’t really like the architecture of your house.
Architect: why?

Life without a controller, case 1

In my most popular post ever, I argued (quoting Alan Kay and Peter Coad along the way) that classes ending in –er (like the infamous Controller, Manager, etc) are usually an indication that our software is not really OO (built from cooperating objects), but still based on the centralized style of procedural programming.

That post was very well received, with a few exceptions. One of the critics said something like “I’ll keep calling my controllers controllers, and my presenters presenters”. This is fine, of course. If you have a controller, why not calling it “controller”? Or, as Shakespeare said, a controller by any other name would smell as bad :-) . The entire idea was to get rid of controllers, not changing their name!

Now, getting rid of controllers is damn hard in 2012. Almost every UI framework is based on some variation of MVC. Most server-side web-app frameworks are based on MVC as well. Proposing to drop the controller would be just as useful as proposing that we drop html and javascript and start using something serious to write web apps :-).

Still, I can’t really sit here and do nothing while innocent objects get slaughtered, so I decided to write a few posts on living without a controller. I’ll start far away from UI, to avoid rocking the boat. I’ll move closer in some future post.

The CAP Theorem, the Memristor, and the Physics of Software

Where I present seemingly unrelated facts that are, in fact, not unrelated at all :-).

The CAP Theorem
If you keep current on technology, you're probably familiar with the proliferation of NoSQL , a large family of non-relational data stores. Most NoSQL stores have been designed for internet-scale applications, where large data stores are preferably deployed using an array of loosely connected low-cost servers. That brings a set of well-known issues to the table:

- we'd like data to be consistent (that is, to respect all the underlying constraints at any time, especially replication constraints). We call this property Consistency.

- we'd like fast response time, even when some server is down or under heavy load. We call this property Availability.

- we'd like to keep working even if the system gets partitioned (a connection between servers fails). We call this property Partition tolerance.

The CAP Theorem (formerly the Brewer's Conjecture) says that we can only choose two properties out of three.

There is a lot of literature on the CAP Theorem, so I don't need to go in further details here: if you want a very readable paper covering the basics, you can go for Brewer's CAP Theorem by Julian Browne, while if you're more interested in a formal paper, you can refer to Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services by Seth Gilbert and Nancy Lynch.

There is, however, a relatively subtle point that is often brought up and seldom clarified, so I'll give it a shot. Most people realize that there is some kind of "connection" between the concept of availability and the concept of partition. If a server is partitioned away, it is by definition not available. In that sense, it may seem like there is some kind of overlapping between the two concepts, and overlapped concepts are bad (orthogonal concepts should be preferred). However, it is not so:

- Availability is an external property, something the clients of the data store can see. They either can get data when they need it, or they can't.

- Partitioning is an internal property, something clients are totally unaware of. The data store has to deal with that on the inside.

Of course, given the fractal nature of software, in a system of systems we may see lack of availability at one level, because of the lack of partition tolerance at a lower level (or vice-versa, as an unavailable node may not be distinguishable from a partitioned node).

To sum up, while the RDBMS world has usually approached the problem by choosing Consistency and Availability over Partition tolerance, the NoSQL world has often chosen Availability and Partition tolerance, leading to the now famous Eventually Consistent model.

The Memristor, or the Value of a Theory
As a kid, I spent quite a bit of time hacking electronic stuff. I used to scavenge old TV sets and the like, and recover resistors, capacitors, and inductors to use on very simple projects. Little I knew that, theoretically, there was a missing passive element: the memristor.

Leon Chua developed the theory in 1971. It took until 2008 before someone could finally create a working memristor, using nanoscale techniques that would have looked like science fiction in 1971. Still, the theory was there, long before, pointing toward the future. At the bottom of the theory was the realization that the three known passive elements could be defined by a relationship between four concepts (current, voltage, charge, and flux-linkage). By investigating all possible relationships, he found a missing element, one that was theoretically possible, but yet undiscovered. He used the term completeness to indicate this line of reasoning, although we often call this kind of property symmetry.

Software Entanglement
In Chapter 12 of my ongoing series on the nature of software, I introduced the concept of Entanglement, further discusses in Chapter 13 and Chapter 14 (for the artifact world).

Here, I'll reuse my early definition from Chapter 12: Two clusters of information are entangled when performing a change on one immediately requires a change on the other.

Entanglement is constantly at work in the artifact world: change a class name, and you'll have to fix the name everywhere; add a member to an enumeration, and you'll have to update any switch/case statement over the enumeration; etc. It's also constantly at work in the run-time world.

Although I haven't formally introduced entanglement for the run-time world (that would/will be the subject of the forthcoming Chapter 15), you can easily see that maintaining a database replica creates a run-time entanglement between data: update one, and the other must be immediately (atomically, as we use to say) updated. Perhaps slightly more subtle is the fact that referential integrity is strongly related to run-time entanglement as well. Consistency in the CAP theorem, therefore, is basically Entanglement satisfaction.

Once we understand that, it's perhaps easier to see that the CAP theorem applies well beyond our traditional definition of distributed system. Consider a multi-core CPU with independent L1 caches. Cache contents become entangled whenever they map to the same address. CPU designers usually go with CA, forgetting P. That makes a lot of sense, of course, because we're talking about in-chip connections.

That's sort of obvious, though. Things get more interesting when we start to consider the run-time/artifact symmetry. That's part of the value of a [good] theory.

A CAP Theorem for Artifacts
My perspective on the Physics of Software is strongly based on the run-time/artifact dualism. Most forces apply in both worlds, so it is just natural to bring ideas from one world to another, once we know how to translate concepts. Just like symmetry allowed Chua to conceive the memristor, symmetry may allow us to extend concepts from the run-time world to the artifact world (and vice-versa). Let's give the CAP theorem a shot, by moving each run-time concept to the corresponding notion in the artifact world:

Consistency: just like in the run-time world, it's about satisfying Entanglement. While change in the run-time world is a C/U/D action on data, here is a C/U/D action on artifacts. If you add a member to an enumerated type, your artifacts are consistent when you have updated every portion of code that was enumerating over the extension of that type, etc.

Availability: just like in the run-time world, is the ability to access a working, up-to-date system. If you can't deploy your new artifacts, your system is not available (as far as the artifact world is concerned). The old system may be up and running, but your new system is not available to the user.

Partition tolerance: just like in the run-time world, it means you want your system to be usable even if some changes can't be propagated. That is, some artifacts have been partitioned away, and cannot be reached. Therefore, some sort of partial deployment will take place.

Well, indeed, you can only have two :-). Consider the artifacts involved in the usual distributed system:

- one or more servers
- one or more clients
- a contract in between

The clients and the server (as artifacts – think source code not processes) are U/D entangled over the contract: change a method signature (in RPC speak), or the WSDL, or whatever represents your contract, and you have to change both to maintain consistency.

What happens if you update [the source code of] a server (say, to fix a serious bug), and by doing so you need to update the contract, but cannot update [the source code of] a client [timely]? This is just like a partition. Think of a distributed development team: the team working on the client, for a reason or another, has been partitioned away.

You only have two choices:

- let go of Availability: you won't deploy your new server until you can update the client. This is the artifact side of not having your database available until all the replicas are connected (in the run-time world).

- let go of Consistency: you'll have to live with an inconsistent client, using a different contract.

Consequences
Chances are that you:
- Shiver at the idea of letting go of Consistency.
- Think versioning may solve the problem.

Of course, versioning is a coping strategy, not a solution. In fact, versioning is usually proposed in the run-time world of data stores as well. However, versioning your contract is like pretending that the server has never been updated. It may or may not work (what if your previous contract had major security flaws that cannot be plugged unless you change the contract? What if you changed your server in a way that makes the old contract impossible to keep? etc). It also puts a significant burden on the development team (maintaining more source code than strictly needed) and may significantly slow down development, which is another way to say it's impacting availability (of artifacts).

It is important, however, that we thoroughly understand context. After all, any given function call is a contract, and we surely want to maintain consistency as much as we can. So, when does the CAP Theorem apply to Artifacts? Partition tolerance and Availability must be part of the equation. That requires one of the following conditions to apply:

- Independent development teams. The Facebook ecosystem, for instance, is definitely prone to the artifact side of the CAP Theorem, just like any other system offering a public API to the world at large. You just can't control the clients.

- A large system that cannot be updated in every single part (unless you accept to slow down deployment/forgo availability). A proliferation of clients (web based, rich, mobile, etc) may also bring you into partitioning problems.

If you work on a relatively small system, and you have control of all clients, you're basically on a safe field – you can go with Consistency and Availability, because you just don't have significant partitions. Your development team is more like a multi-core CPU than a large distributed system.

Once we understand that versioning is basically a way to pretend changes never happened on the server side, is there something similar to Eventual Consistency for the artifact side?

TolerantReader may help. If you put extra effort into your clients, you can actually have a working system that will eventually be consistent (when source code for clients will be finally updated) but still be functional during transition times, without delaying the release of server updated. Of course, everything that requires discipline on the client side is rather useless in a Facebook-like ecosystem, but might be an excellent strategy for a large system where you control the clients.

Interestingly, Fowler talks about the virtues of TolerantReader as if it was always warranted for a distributed system. I tend to disagree: it makes sense only when some form of development-side partitioning can be expected. In many other cases, an enforced contract through XSD and code generation is exactly what we need to guarantee Consistency and Availability (because code generation helps to quickly align the syntactical bits – you still have to work on the semantic parts on both sides). Actually, the extra time required to create a TolerantReader will impact Availability on the short term (the server is ready, the client is not). Context, context, context. This is one more reason why I think we need a Physics of Software: without a clear understanding of forces and materials, it's too easy to slip into the fallacy that just because something worked very well for me [in some cases], it should work very well for you as well.

In fact, once you recognize the real forces, it's easy to understand that the problem is not limited to service-oriented software, XML contracts, and the like. You have the same problem between a database schema and a number of different clients. Whenever you have artifact entanglement and the potential for development partitioning (see above) you have to deal with the artifact-side CAP Theorem.

Moving beyond TolerantReader (which is not helping much when your client is sending the wrong data anyway :-), when development partitioning is expected, you need to think about a flexible contract from the very beginning. Facebook, for instance, allows clients to specify which fields they want to receive, but is still pretty rigid in the fields they have to send.

This is one more interesting R&D challenge that is not easily solved with today's technology. Curiously enough, in the physical world we have long understood the need to use soft materials at the interface level to compensate for imperfections and unexpected variations of contact surfaces. Look at any recent, thermally insulated window, and most likely you're gonna find a seal between the sash and the frame. Why are seals still needed in a world of high-precision manufacturing? ;-)

Conclusions
Time for a confession: when I first thought about all the above, I had already read quite a bit on the CAP Theorem, but not the original presentation from Eric Brewer where the conjecture (it wasn't a theorem back then) was first introduced. The presentation is about a larger topic: challenges in the development of large-scale distributed systems.

As I started to write this post, I decided to do my homework and go through Brewer's slides. He identified three issues: state distribution, consistency vs. availability, and boundaries. The first two are about the run-time world, and ultimately lead to the CAP theorem. While talking about Borders, however, Brewers begins with run-time issues (independent failure) and then moves into the artifact world, talking (guess what!) about contracts, boundary evolution, XML, etc. Interestingly, nothing in the presentation suggests any relationship between the problems. Here, I think, is the value of a good theory: as for the memristor, it provides us with leverage, moving what we know to a higher level.

In fact, another pillar of the Physics of Software is the Decision Space. I'm pretty sure the CAP theorem can be applied in the decision space as well, but that will have to wait.

Well, if you liked this, stay tuned for Chapter 15 of my NOSD series, share this post, follow me on twitter, etc.

Thursday, February 03, 2011

Is Software Design Literature Dead?

Sometimes, my clients ask me what to read about software design. Most often than not, they don't want a list of books – many already have the classics covered. They would like to read papers, perhaps recent works, to further advance their understanding of software design, or perhaps something inspirational, to get new ideas and concepts. I have to confess I'm often at loss for suggestions.

The sense of discomfort gets even worse when they ask me what happened to the kind of publications they used to read in the late '90s. Stuff like the C++ Report, the Journal of Object Oriented Programming, Java Report, Object Expert, etc. Most don't even know about the Journal of Object Technology, but honestly, the JOT has taken a strong academic slant lately, and I'm not sure they would find it all that interesting. I usually suggest they keep current on design patterns: for instance, the complete EuroPLoP 2009 proceedings are available online, for free. However, mentioning patterns sometimes furthers just another question: what happened to the pattern movement? Keep that conversation going for a while, and you get the final question: so, is software design literature dead?

Is it?
Note that the question is not about software design - it's about software design literature. Interestingly, Martin Fowler wrote about the other side of the story (Is Design Dead?) back in 2004. He argued that design wasn't dead, but its nature had changed (I could argue that if you change the nature of something, then it's perhaps inappropriate to keep using the same name :-), but ok). So perhaps software design literature isn't dead either, and has just changed nature: after all, looking for "software design" on Google yields 3,240,000 results.
Trying to classify what I usually find under the "software design" chapter, I came up with this list:

- Literature on software design philosophy, hopefully with some practical applications. Early works on Information Hiding were as much about a philosophy of software design as about practical ways to structure our software. There were a lot of philosophical papers on OOP and AOP as well. Usually, you get this kind of literature when some new idea is being proposed (like my Physics of Software :-). It's natural to see less and less philosophy in a maturing discipline, so perhaps a dearth of philosophical literature is not a bad sign.

- Literature on notations, like UML or SysML. This is not really software design literature, unless the rationale for the notation is discussed.

- Academic literature on metrics and the like. This would be interesting if those metrics addressed real design questions, but in practice, everything is always observed through the rather narrow perspective of correlation with defects or cost or something appealing for manager$. In most cases, this literature is not really about software design, and is definitely not coming from people with a strong software design background (of course, they would disagree on that :-)

- Literature on software design principles and heuristics, or refactoring techniques. We still see some of this, but is mostly a rehashing of the same old stuff from the late '90s. The SOLID principles, the Law of Demeter, etc. In most cases, papers say nothing new, are based on toy problems, and are here just because many programmers won't read something that has been published more than a few months ago. If this is what's keeping software design literature alive, let's pull the plug.

- Literature on methods, like Design by Contract, TDD, Domain-Driven Design, etc. Here you find the occasional must-read work (usually a book from those who actually invented the approach), but just like philosophy, you see less and less of this literature in a maturing discipline. Then you find tons of advocacy on methods (or against methods), which would be more interesting if they involved real experiments (like the ones performed by Simula labs) and not just another toy projects in the capable hands of graduate students. Besides, experiments on design methods should be evaluated [mostly] by cost of change in the next few months/years, not exclusively by design principles. Advocacy may seem to keep literature alive, but it's just noise.

- Literature on platform-specific architectures. There is no dearth of that. From early EJB treatises to JBoss tutorials, you can find tons of papers on "enterprise architecture". Even Microsoft has some architectural literature on application blocks and stuff like that. Honestly, in most cases it looks more like Markitecture (marketing architecture as defined by Hohmann), promoting canned solutions and proposing that you adapt your problem to the architecture, which is sort of the opposite of real software design. The best works usually fall under the "design patterns" chapter (see below).

- Literature for architecture astronauts. This is a nice venue for both academics and vendors. You usually see heavily layered architectures using any possible standards and still proposing a few more, with all the right (and wrong) acronyms inside, and after a while you learn to turn quickly to the next page. It's not unusual to find papers appealing to money-saving managers who don't "get" software, proposing yet another combination of blueprint architectures and MDA tools so that you can "generate all your code" with a mouse click. Yeah, sure, whatever.

- Literature on design patterns. Most likely, this is what's keeping software design literature alive. A well-written pattern is about a real problem, the forces shaping the context, an effective solution, its consequences, etc. This is what software design is about, in practice. On the other hand, a couple of conferences every year can't keep design literature in perfect health.

- Literature on design in the context of agility (mostly about TDD). There is a lot of this on the net. Unfortunately, it's mostly about trivial problems, and even more unfortunately, there is rarely any discussion about the quality of the design itself (as if having tests was enough to declare the design "good"). The largest issue here is that it's basically impossible to talk about the design of anything non-trivial strictly from a code-based perspective. Note: I'm not saying that you can't design using only code as your material. I'm saying that when the problem scales slightly beyond the toy level, the amount of code that you would have to write and show to talk about design and design alternatives grows beyond the manageable. So, while code-centric practices are not killing design, they are nailing the coffin on design literature.

Geez, I am usually an optimist :-)), but this picture is bleak. I could almost paraphrase Richard Gabriel (of "Objects have failed" fame) and say that evidently, software design has failed to speak the truth, and therefore, software design narrative (literature) is dying. But that would not be me. I'm more like the opportunity guy. Perhaps we just need a different kind of literature on software design.

Something's missing
If you walk through the aisles of a large bookstore, and go to the "design & architecture" shelves, you'll all the literary kinds above, not about software, but about real-world stuff (from chairs to buildings). except on the design of physical objects too. Interestingly, you can also find a few morelike:

- Anthologies (presenting the work of several designers) and monographs on the work of famous designers. We are at loss here, because software design is not immediately visible, is not interesting for the general public, is often kept as a trade secret, etc.
I know only one attempt to come up with something similar for software: "Beautiful Architecture" by Diomidis Spinellis. It's a nice book, but it suffers from a lack of depth. I enjoyed reading it, but didn't come back with a new perspective on something, which is what I would be looking for in this kind of literature.
It would be interesting, for instance, to read more about the architecture of successful open-source projects, not from a fanboy perspective but through the eyes of experienced software designers. Any takers?

- Idea books. These may look like anthologies, but the focus is different. While an anthology is often associated with a discussion or critics of the designer's style, an idea book presents several different objects, often out of context, as a sort of creative stimulus. I don't know of anything similar for software design, though I've seen many similar book for "web design" (basically fancy web pages). In a sense, some literature on patterns comes close to being inspirational; at least, good domain-specific patterns sometimes do. But an idea book (or idea paper) would look different.

I guess anthologies and monographs are at odd with the software culture at large, with its focus on the whizz-bang technology of the day, little interest for the past, and often bordering on religious fervor about some products. But idea books (or most likely idea papers) could work, somehow.

Indeed, I would like to see more software design literature organized as follows:

- A real-world, or at least realistic problem is presented.

- Forces are discussed. Ideally, real-world forces.

- Possibly, a subset of the whole problem is selected. Real-world problems are too big to be dissected in a paper (or two, or three). You can't discuss the detailed design of a business system in a paper (lest you appeal only to architecture astronauts). You could, however, discuss a selected, challenging portion. Ideally, the problem (or the subset) or perhaps just the approach / solution, should be of some interest even outside the specific domain. Just because your problem arose in a deeply embedded environment doesn't mean I can't learn something useful for an enterprise web application (assuming I'm open minded, of course :-). Idea books / papers should trigger some lateral thinking on the reader, therefore unusual, provocative solutions would be great (unlike literature on patterns, where you are expected to report on well-known, sort of "traditional" solutions).

- Practical solutions are presented and scrutinized. I don't really care if you use code, UML, words, gestures, whatever. Still, I want some depth of scrutiny. I want to see more than one option discussed. I don't need working code. I'm probably working on a different language or platform, a different domain, with different constraints. I want fresh design ideas and new perspectives.

- In practice, a good design aims at keeping the cost of change low. This is why a real-world problem is preferable. Talking about the most likely changes and how they could be addressed in different scenarios beats babbling about tests and SOLID and the like. However, "most likely changes" is meaningless if the problem is not real.

Funny enough, there would be no natural place to publish something like that, except your own personal page. Sure, maybe JOT, maybe not. Maybe IEEE Software, maybe not. Maybe some conference, maybe not. But we don't have a software design journal with a large readers pool. This is part of the problem, of course, but also a consequence. Wrong feedback loop :-).

Interestingly, I proposed something similar years ago, when I was part of the editorial board of a software magazine. It never made a dent. I remember that a well-known author argued against the idea, on the basis that people were not interested in all this talking about ins-and-outs, design alternatives and stuff. They would rather have a single, comprehensive design presented that they could immediately use, preferably with some source code. Well, that's entirely possible; indeed, I don't really know how many software practitioners would be interested in this kind of literature. Sure, I can bet a few of you guys would be, but overall, perhaps just a small minority is looking for inspiration, and most are just looking for canned solutions.

Or maybe something else...
On the other hand, perhaps this style is just too stuck in the '90s to be appealing in 2011. It's unidirectional (author to readers), it's not "social", it's not fun. Maybe a design challenge would be more appealing. Or perhaps the media are obsolete, and we should move away from text and graphics and toward (e.g.) videos. I actually tried to watch some code kata videos, but the guys thought they were like this, but to an experienced designer they looked more like this. (and not even that funny). Maybe the next generation of software designers will come up with a better narrative style or media.

Do something!
I'm usually the "let's do something" kind of guy, and I've entertained the idea of starting a Software Design Gallery, or a Software Design Idea Book, or something, but honestly, it's a damn lot of work, and I'm a bit skeptical about the market size (even for a free book/paper/website/whatever). As usual, any feedback is welcome, and any action on your part even more :-)

Wednesday, January 19, 2011

Can we really control the quality/cost trade-off?

An old rule in software project management (also known as the Project triangle is that you can have a system that:
- has low development cost (Cheap)
- has short development time (Fast)
- has good quality (Good)
except you can only choose two :-). That seem to suggest that if you want to go fast, and don't want to spend more, you can only compromise on quality. Actually, you can also compromise on feature set, but in many real-world cases, that happens only after cost, time and quality have already been compromised.

So, say that you want to compromise on quality. Ok, I know, most engineers don't want to compromise on quality; it's almost like compromising on ethics. Still, we actually do, more often that we'd like to, so let's face the monster. We can:

- Compromise on external quality, or quality as perceived by the user.
Ideally, this would be simply defined as MTBF (Mean Time Between Failures), although for many users, the perceived "quality" is more akin to the "user experience". In practice, it's quite hard to predict MTBF and say "ok, we won't do error checking here because that will save us X development days, and only decrease MTBF by Y". Most companies don't even measure MTBF. Those who do, use it just as a post-facto quality goal, that is, they fix bugs until they reach the target MTBF. We just don't know enough, in the software engineering community, to actually plan the MTBF of our product.

- Compromise on internal quality, or quality as perceived by developers.
Say goodbye to clarity of names, separation of concerns, little or no duplication of logic, extendibility in the right places, low coupling, etc. In some cases, poor internal quality may resurface as poor external quality (say, there is no error handling in place, so the code just crashes on unexpected input), but in many cases, the code can be rather crappy but appear to work fine.

- Compromise on process, or quality as perceived by the Software Engineering Institute Police :-).
What usually happens is that people start bypassing any activity that may seem to slow them down, like committing changes into a version control system :-) or being thorough about testing. Again, some issues may resurface as external quality problems (we don't test, so the user gets the thrill of discovering our bugs), but users won't notice if you cut corners in many process areas (well, they won't notice short-term, at least).

Of course, a "no compromises" mindset is dangerous anyway, as it may lead to an excess of ancillary activities not directly related to providing value (to the project and to users). So, ideally, we would choose to work at the best point in the quality/time and/or quality/cost continuum (the "best point" would depend on a number of factors: more on this later). Assuming, however, that there is indeed a continuum. Experience taught me the opposite: we can only move in discrete steps, and those are few and rather far apart.

Quality and Cost
Suppose I'm buying a large quantity of forks, wholesale. Choices are almost limitless, but overall the real choice is between:

- the plastic kind (throwaway)

it works, sort of, but you can't use it for all kinds of food, you can't use it while cooking, and so on.

- the "cheap steel" kind.

it's surely more resistant than plastic: you can actually use it. Still, the tines are irregular and feel uncomfortable in your mouth. Weight isn't properly balanced. And I wouldn't bet on that steel actually being stainless.

- the good stainless steel kind.

Properly shaped, properly balanced, good tasting :-), truly stainless, easy to clean up. You may or may not have decorations like that: price doesn't change so much (more on this below).

- the gold-plated kind.

it shows that you got money, or bad taste, or both :-). Is not really better than the good stainless fork, and must be handled more carefully. It's a real example of "worse is better".

Of course, there are other kind of forks, like those with resin or wood handles (which ain't really cheaper than the equivalent steel kind, and much worse on maintenance), or the sturdy plastic kind for babies (which leverage parental anxiety and charge a premium). So, the quality/cost landscape does not get much richer than that. Which brings us to prices. I did a little research (not really exhaustive – it would take forever :-), but overall it seems like, normalizing on the price of the plastic kind, we have this price scale:

Plastic: 1
Cheap steel: 10
Stainless steel: 20
Gold-plated: 80

The range is huge (1-80). Still, we have a big jump from throwaway to cheap (10x), a relatively small jump from cheap to good (2x), and another significant jump between good and gold-plated (4x). I think software is not really different. Except that a 2x jump is usually considered huge :-).

Note that I'm focusing on building cost, which is the significant factor on market price for a commodity like forks. Once you consider operating costs, the stainless steel is the obvious winner.

The Quality/Cost landscape for Software
Again, I'll consider only building costs here, and leave maintenance and operation costs aside for a moment.

If you build throwaway software, you can get really cheap. You don't care about error handling (which is a huge time sink). You copy&paste code all over. You proceed by trial and error, and sometimes patch code without understanding what is broken, until it seems to work. You keep no track of what you're doing. You use function names like foo and bar. Etc. After all, you just want to run that code once, and throw it away (well, that's the theory, anyway :-).
We may think that there is no place at all for that kind of code, but actually, end-user programming often falls into that case. I've also seen system administration scripts, lot of scientific code, and quite a bit of legacy business code that would easily fall into this category. They were written as throwaways, or by someone who only had experience writing throwaways, and somehow they didn't get thrown away.

The next step is very far apart: if you want your code to be any good, you have to spend much more. It's like the plastic/"cheap steel" jump. Just put decent error handling in place, and you'll spend way more. Do a reasonable amount of systematic testing (automated or not) and you're gonna spend more. Aim for your code to be reasonably readable, and you'll have to think more, go back and refactor as you learn more, etc. At this stage, internal quality is still rather low; code is not really stainless. External quality is acceptable. It doesn't feel good, but it gets [some] work done. A lot of "industrial" code is actually at this level. Again, maintenance may not be cheap, but I'm only considering building costs here; a 5x to 10x jump compared to throwaway code is quite reasonable.

The next step is when you invest more in -ilities, as well as in external quality through more systematic and deeper testing. You're building something that you and your users like. It's truly stainless. It feels good. However, it's hard to be "just a little stainless". It's hard to be "somehow modular" or "sort of thread-safe". Either you're on one side, or you're on the other. If you want to be on the right side, you're gonna spend more. Note that just like with fork, long-term cost is lower for good, stainless code, but as far as building cost is concerned, a 2X increase compared to "cheap steel" code is not surprising.

Gold-plating is obvious: grandiose design that serves no purpose (and that would appear as gratuitous complexity to a good designer), coding standards that add no value (like requiring a comment on each and every parameter of every function), etc. I've seen systems spending a lot of energy trying to be as scalable as Amazon, even though they only had to serve a few hundred concurrent clients. Goldplated systems spend more on building, and may also incur higher maintenance costs afterward (be careful not to scratch the gold :-). Of course, it takes experience to recognize goldplating. To someone that has never designed a system for testability, interfaces for mocking may seem like a waste. To an experienced designer, they tell a different story.

But software is... soft!
As usual, a metaphor can be stretched only so far. Software quality can be easily altered over time. Plastic can't be transformed into gold (if you can, give me a call :-), but we can release a bug-ridden version and fix issues later. We may incur intentional technical debt and pay it back in the next release. This could be the right strategy in the right market at the right time, but that's not always the case.

Indeed, although start-ups, customer discovery, innovative products, etc seem to get all the attention lately, if you happen to work in a mature market you may want to focus on early-on quality. Some customers are known to be forgiving about quality issues, in exchange for early access to innovative products. It's simply not the general case.

Modular Quality?
Theoretically, we could invest more in some "critical" components and less in others. After all, we don't need the same quality (external and internal) in every single portion. We can accept quirks here and there. Due to the fractal nature of software, this can be extended to the entire hierarchy of artifacts. It's surely possible. However, it takes a lot of maturity to develop on the edge of quality. Having to constantly reason about the quality/cost trade off can easily cost you more than you're saving: I'd rather spend that reasoning to create quality code.

Once again, however, we may have a plan to start with plastic and grow into stainless steel over time. We must be aware that it is easier said than done. I've seen it many times over my long career. Improving software quality is just another case of moving our software in the decision space. We decided to make software out of plastic early on, and now we want to move it to another place, say the cheap steel spot. Some work is required. Work is proportional to the distance (which is large – see above), but also to the opposing force. Whatever form that force takes, a significant factor is always the mass to be moved, and here lies the key. It's hard to move a monolithic mass of code into another place (in the decision space). It's easier if you can move it a module at time.

Within a small modular unit, you can easily improve local quality. Change a switch/case into polymorphism. Add an interface to decouple two concrete classes. It's easy and it's cheap, because mass is small. Say, however, that you got your modular structure wrong: an important concept has not been identified, and is now scattered all around. You have to tweak a lot of code. While doing so, you'll find that you're not just fighting the mass you have to move: you have to fight the gravitational attraction of the surrounding code. That may reveal more scattered concepts. It's a damn lot of work, possibly more than it's worth. Software is not necessarily soft.

So, can we really move software around? Yes, in the small. Yes, if you got the modular structure right, and you're only cleaning up within that modular structure. Once again, the fractal nature of software can be our best friend or our worst enemy. Get the right partitioning between applications / services, and you only have to work inside. Get the right partitioning between components, and you only have to work inside. And so on. It's fine to have a plastic component, inasmuch as the interface is right. Get the modular structure wrong, and sooner or later you'll need to make wide-range changes, or accept a "cheap steel" quality. If we got it wrong at a small scale, it's no big deal. At a large scale, it can be a massacre.

Conclusions
Once again, modularity is the key. Not just "any" modularity, but a modularity aligned with the underlying forcefield. If you get that right, you can build plastic components and evolve them to stainless steel quality over time. Been there, done that. Get your modular structure wrong, and you'll lack locality of action. In other words: if you plan on cutting corners inside a module, you have to invest at the interface level. That's simpler if you design top-down than if you're in the "emergent design" camp, because the module will be underengineered inside, and we can't expect a clean interface to emerge from underengineered code.

Sunday, January 10, 2010

Delaying Decisions

Since microblogging is not my thing, I decided to start 2010 by writing my longer post ever :-). It will start with a light review of a well-known principle and end up with a new design concept. Fasten your seatbelt :-).

The Last Responsible Moment
When we develop a software product, we make decisions. We decide about individual features, we make design decisions, we make coding decisions, we even decide which bugs we really want to fix before going public. Some decisions are taken on the fly; some, at least in the old school, are somewhat planned.

A key principle of Lean Development is to delay decisions, so that:
a) decisions can be based on (yet-to-discover) facts, not on speculation
b) you exercise the wait option (more on this below) and avoid early commitment

The principle is often spelled as "Delay decisions until the last responsible moment", but a quick look at Mary Poppendieck's website (Mary co-created the Lean Development approach) shows a more interesting nuance: "Schedule Irreversible Decisions at the Last Responsible Moment".

Defining "Irreversible" and "Last Responsible" is not trivial. In a sense, there is nothing in software that is truly irreversible, because you can always start over. I haven't found a good definition for "irreversible decision" in literature, but I would define it as follows: if you make an irreversible decision at time T, undoing the decision at a later time will entail a complete (or almost complete) waste of everything that has been created after time T.

There are some documented definitions for "last responsible moment". A popular one is "The point when failing to decide eliminates an important option", which I found rather unsatisfactory. I've also seen some attempts to quantify that better, as in this funny story, except that in the real world you never have a problem which is that simple (very few ramifications in the decision graph) and that detailed (you know the schedule beforehand). I would probably define the Last Responsible Moment as follows: time T is the last responsible moment to make a decision D if, by postponing D, the probability of completing on schedule/budget (even when you factor-in the hypothetical learning effect of postponing) decreases below an acceptable threshold. That, of course, allows us to scrap everything and restart, if schedule and budget allows for it, and in this sense it's kinda coupled with the definition of irreversible.

Now, irreversibility is bad. We don't want to make irreversible decisions. We certainly don't want to make them too soon. Is there anything we can do? I've got a few important things to say about modularity vs. irreversibility and passive vs. proactive option thinking, but right now, it's useful to recap the major decision areas within a software project, so that we can clearly understand what we can actually delay, and what is usually suggested that we delay.

Major Decision Areas
I'll skip on a few very-high-level, strategic decisions here (scope, strategy, business model, etc). It's not that they can't be postponed, but I need to give some focus to this post :-). So I'll get down to the more ordinarily taken decisions.

People
Choosing the right people for the project is a well-known ingredient for success.

Approach/Process
Are we going XP, Waterfall, something in between? :-).

Feature Set
Are we going to include this feature or not?

Design
What is the internal shape (form) of our product?

Coding
Much like design, at a finer granularity level.

Now, "design" is an overly general concept. Too general to be useful. Therefore, I'll split it into a few major decisions.

Architectural Style
Is this going to be an embedded application, a rich client, a web application? This is a rather irreversible decision.

Platform
Goes somewhat in pair with Architectural Style. Are we going with an embedded application burnt into an FPGA? Do you want to target a PIC? Perhaps an embedded PC? Is the client a Windows machine, or you want to support Mac/Linux? A .NET server side, or maybe Java? It's all rather irreversible, although not completely irreversible.

3rd-Party Libraries/Components/Etc
Are we going to use some existing component (of various scale)? Unless you plan on wrapping everything (which may not even be possible), this often end up being an irreversible decision. For instance, once you commit yourself to using Hibernate for persistence, it's not trivial to move away.

Programming Language
This is the quintessential irreversible decision, unless you want to play with language converters. Note that this is not a coding decisions: coding decisions are made after the language has been chosen.

Structure / Shape / Form
This is what we usually call "design": the shape we want to impose to our material (or, if you live in the "emergent design" side, the shape that our material will take as the final result of several incremental decisions).

So, what are we going to delay? We can't delay all decisions, or we'll be stuck. Sure, we can delay something in each and every area, but truth is, every popular method has been focusing on just a few of them. Of course, different methods tried to delay different choices.

A Little Historical Perspective
Experience brings perspective; at least, true experience does :-). Perspective allows to look at something and see more than it's usually seen. For instance, perspective allows to look at the old, outdated, obsolete waterfall approach and see that it (too) was meant to delay decisions, just different decisions.

Waterfall was meant to delay people decisions, design decisions (which include platform, library, component decisions) and coding decisions. People decision was delayed by specialization: you only have to pick the analyst first, everyone else can be chosen later, when you know what you gotta do (it even makes sense -)). Design decision was delayed because platform, including languages, OS, etc, were way more balkanized than today. Also, architectural styles and patterns were much less understood, and it made sense to look at a larger picture before committing to an overall architecture.
Although this may seem rather ridiculous from the perspective of a 2010 programmer working on Java corporate web applications, most of this stuff is still relevant for (e.g.) mass-produced embedded systems, where choosing the right platform may radically change the total development and production cost, yet choosing the wrong platform may over-constrain the feature set.

Indeed, open systems (another legacy term from late '80s - early '90s) were born exactly to lighten up that choice. Choose the *nix world, and forget about it. Of course, the decision was still irreversible, but granted you some latitude in choosing the exact hw/sw. The entire multi-platform industry (from multi-OS libraries to Java) is basically built on the same foundations. Well, that's the bright side, of course :-).

Looking beyond platform independence, the entire concept of "standard" allows to delay some decision. TCP/IP, for instance, allows me to choose modularly (a concept I'll elaborate later). I can choose TCP/IP as the transport mechanism, and then delay the choice of (e.g.) the client side, and focus on the server side. Of course, a choice is still made (the client must have TCP/IP support), so let's say that widely adopted standards allow for some modularity in the decision process, and therefore to delay some decision, mostly design decisions, but perhaps some other as well (like people).

It's already going to be a long post, so I won't look at each and every method/principle/tool ever conceived, but if you do your homework, you'll find that a lot of what has been proposed in the last 40 years or so (from code generators to MDA, from spiral development to XP, from stepwise refinement to OOP) includes some magic ingredient that allows us to postpone some kind of decision.

It's 2010, guys
So, if you ain't agile, you are clumsy :-)) and c'mon, you don't wanna be clumsy :-). So, seriously, which kind of decisions are usually delayed in (e.g.) XP?

People? I must say I haven't seen much on this. Most literature on XP seems based on the concept that team members are mostly programmers with a wide set of skills, so there should be no particular reason to delay decision about who's gonna work on what. I may have missed some particularly relevant work, however.

Feature Set? Sure. Every incremental approach allows us to delay decisions about features. This can be very advantageous if we can play the learning game, which includes rapid/frequent delivery, or we won't learn enough to actually steer the feature set.
Of course, delaying some decisions on feature set can make some design options viable now, and totally bogus later. Here is where you really have to understand the concept of irreversible and last responsible moment. Of course, if you work on a settled platform, things get simpler, which is one more reason why people get religiously attached to a platform.

Design? Sure, but let's take a deeper look.

Architectural Style: not much. Quoting Booch, "agile projects often start out assuming a given platform and environmental context together with a set of proven design patterns for that domain, all of which represent architectural decisions in a very real sense". See my post Architecture as Tradition in the Unselfconscious Process for more.
Seriously, nobody ever expected to start with a monolithic client and end up with a three-tier web application built around a MVC pattern just by coding and refactoring. The architectural style is pretty much a given in many contemporary projects.

Platform: sorry guys, but if you want to start coding now, you gotta choose your platform now. Another irreversible decision made right at the beginning.

3rd-Party Libraries/Components/Etc: some delay is possible for modularized decisions. If you wanna use hibernate, you gotta choose pretty soon. If you wanna use Seam, you gotta choose pretty soon. Pervasive libraries are so entangled with architectural styles that it's relatively hard to delay some decisions here. Modularized components (e.g. the choice of a PDF rendering library) are simple to delay, and can be proactively delayed (see later).

Programming Language: no way guys, you have to choose right here, right now.

Structure / Shape / Form: of course!!! Here we are. This is it :-). You can delay a lot of detailed design choices. Of course, we always postpone some design decision, even when we design before coding. But let's say that this is where I see a lot of suggestions to delay decisions in the agile literature, often using the dreaded Big Upfront Design as a straw man argument. Of course, the emergent design (or accidental architecture) may or may not be good. If I had to compare the design and code coming out of the XP Episode with my own, I would say that a little upfront design can do wonders, but hey, you know me :-).

Practicing
OK guys, what follows may sound a little odd, but in the end it will prove useful. Have faith :-).
You can get better at everything by doing anything :-), so why not getting better at delaying decisions by playing Windows Solitaire? All you have to do is set the options in the hardest possible way:

now, play a little, until you have to make some decision, like here:

I could move the 9 of spades or the 9 of clubs over the 10 of hearts. It's an irreversible decision (well, not if you use the undo, but that's lame :-). There are some ramifications for both choices.
If I move the 9 of clubs, I can later move the king of clubs and uncover a new card. After that, it's all unknown, and no further speculation is possible. Here, learning requires an irreversible decision; this is very common in real-world projects, but seldom discussed in literature.
If I move the 9 of spades, I uncover the 6 of clubs, which I can move over the 7 of aces. Then, it's kinda unknown, meaning: if you're a serious player (I'm not) you'll remember the previous cards, which would allow you to speculate a little better. Otherwise, it's just as above, you have to make an irreversible decision to learn the outcome.

But wait: what about the last responsible moment? Maybe we can delay this decision! Now, if you delay the decision by clicking on the deck and moving further, you're not delaying the decision: you're wasting a chance. In order to delay this decision, there must be something else you can do.
Well, indeed, there is something you can do. You can move the 8 of aces above the 9 of clubs. This will uncover a new card (learning) without wasting any present opportunity (it could still waste a future opportunity; life it tough). Maybe you'll get a 10 of aces under that 8, at which point there won't be any choice to be made about the 9. Or you might get a black 7, at which point you'll have a different way to move the king of clubs, so moving the 9 of spades would be a more attractive option. So, delay the 9 and move the 8 :-). Add some luck, and it works:

and you get some money too (total at decision time Vs. total at the end)

Novice solitaire players are also known to make irreversible decision without necessity. For instance, in similar cases:

I've seen people eagerly moving the 6 of aces (actually, whatever they got) over the 7 of spades, because "that will free up a slot". Which is true, but irrelevant. This is a decision you can easily delay. Actually, it's a decision you must delay, because:
- if you happen to uncover a king, you can always move the 6. It's not the last responsible moment yet: if you do nothing now, nothing bad will happen.
- you may uncover a 6 of hearts before you uncover a king. And moving that 6 might be more advantageous than moving the 6 of aces. So, don't do it :-). If you want to look good, quote Option Theory, call this a Deferral Option and write a paper about it :-).

Proactive Option Thinking
I've recently read an interesting paper in IEEE TSE ("An Integrative Economic Optimization Approach to Systems Development Risk Management", by Michel Benaroch and James Goldstein). Although the real meat starts in chapter 4, chapters 1-3 are probably more interesting for the casual reader (including myself).
There, authors recap some literature about Real Options in Software Engineering, including the popular argument that delaying decisions is akin to a deferral option. They also make important distinctions, like the one between passive learning through deferral of decisions, and proactive learning, but also between responsiveness to change (a central theme in agility literature) and manipulation of change (relatively less explored), and so on. There is a a lot of food for thought in those 3 chapters, so if you can get a copy, I suggest that you spend a little time pondering over it.
Now, I'm a strong supporter of Proactive Option Thinking. Waiting for opportunities (and then react quickly) is not enough. I believe that options should be "implanted" in our project, and that can be done by applying the right design techniques. How? Keep reading : ).

The Invariant Decision
If you look back at those pictures of Solitaire, you'll see that I wasn't really delaying irreversible decisions. All decisions in solitaire are irreversible (real men don't use CTRL-Z). Many decisions in software development are irreversible as well, especially when you are in a tight budget/schedule, so starting over is not an option. Therefore, irreversibility can't really be the key here. Indeed, I was trying to delay Invariant Decisions. Decisions that I can take now, or I can take later, with little or no impact on the outcomes. The concept itself may seem like a minor change from "irreversible", but it allows me to do some magic:
- I can get rid of the "last responsible moment" part, which is poorly defined anyway. I can just say: delay invariant decisions. Period. You can delay them as much as you want, provided they are still invariant. No ambiguity here. That's much better.
- I can proactively make some decisions invariant. This is so important I'll have to say it again, this time in bold: I can proactively make some decisions invariant.

Invariance, Design, Modularity
If you go back to the Historical Perspective paragraph, you can now read it under a different... perspective :-). Several tools, techniques, methods can be adopted not just to delay some decision, but to create the option to delay the decision. How? Through careful design, of course!

Consider the strong modularity you get from service-oriented architecture, and the platform independence that comes through (well-designed) web services. This is a powerful weapon to delay a lot of decisions on one side or another (client or server).

Consider standard protocols: they are a way to make some decision invariant, and to modularize the impact of some choices.

Consider encapsulation, abstraction and interfaces: they allow you to delay quite a few low-level decisions, and to modularize the impact of change as well. If your choice turn out to be wrong, but it's highly localized (modularized) you may afford undoing your decision, therefore turning irreversible into reversible. A barebone example can be found in my old post (2005!) Builder [pattern] as an option.

Consider a very old OOA/OOD principle, now somehow resurrected under the "ubiquitous language" umbrella. It states that you should try to reflect the real-world entities that you're dealing with in your design, and then in your code. That includes avoiding primitive types like integer, and create meaningful classes instead. Of course, you have to understand what you're doing (that is, you gotta be a good designer) to avoid useless overengineering. See part 4 of my digression on the XP Episode for a discussion about adding a seemingly useless Ball class (that is: implanting a low cost - high premium option).
Names alter the forcefield. A named concept stands apart. My next post on the forcefield theme, by the way, will explore this issue in depth :-).

And so on. I could go on forever, but the point is: you can make many (but not all, of course!) decisions invariant, if you apply the right design techniques. Most of those techniques will also modularize the cost of rework if you make the wrong decision. And sure, you can try to do this on the fly as you code. Or you may want to to some upfront design. You know what I'm thinking.

OK guys, it took quite a while, but now we have a new concept to play with, so more on this will follow, randomly as usual. Stay tuned.

Sunday, February 22, 2009

Notes on Software Design, Chapter 4: Gravity and Architecture

In my previous posts, I described gravity and inertia. At first, gravity may seem to have a negative connotation, like a force we constantly have to fight. In a sense, that's true; in a sense, it's also true for its physical counterpart: every day, we spend a lot of energy fighting earth gravity. However, without gravity, like as we know it would never exist. There is always a bright side :-).

In the software realm, gravity can be exploited by setting up a favorable force field. Remember that gravity is a rather dumb :-) force, merely attracting things. Therefore, if we come up with the right gravitational centers early on, they will keep attracting the right things. This is the role of architecture: to provide an initial, balanced set of centers.

Consider the little thorny problem I described back in October. Introducing Stage 1, I said: "the critical choice [...] was to choose where to put the display logic: in the existing process, in a new process connected via IPC, in a new process connected to a [RT] database".
We can now review that decision within the framework of gravitational centers.

Adding the display logic into the existing process is the path of least resistance: we have only one process, and gravity is pulling new code into that process. Where is the downside? A bloated process, sure, but also the practical impossibility of sharing the display logic with other processes.
Reuse requires separation. This, however, is just the tip of the iceberg: reuse is just an instance of a much more general force, which I'll cover in the forthcoming posts.

Moving the display logic inside a separate component is a necessary step toward [independent] reusability, and also toward the rarely understood concept of a scaled-down architecture.
A frequently quoted paper from David Parnas (one of the most gifted software designers of all times) is properly titled "Designing Software for Ease of Extension and Contraction" (IEEE Transactions on Software Engineering, Vol. 5 No. 2, March 1979). Somehow, people often forget the contraction part.
Indeed, I've often seen systems where the only chance to provide a scaled-down version to customers is to hide the portion of user interface that is exposing the "optional" functionality, often with questionable aesthetics, and always with more trouble than one could possibly want.

Note how, once we have a separate module for display, new display models are naturally attracted into that module, leaving the acquisition system alone. This is gravity working for us, not against us, because we have provided the right center. That's also the bright side of the thorny problem, exactly because (at that point, that is, stage 2) we [still] have the right centers.

Is the choice of using an RTDB to further decouple the data acquisition system and the display system any better than having just two layers?
I encourage you to think about it: it is not necessarily trivial to undestand what is going on at the forcefield level. Sure, the RTDB becomes a new gravitational center, but is a 3-pole system any better in this case? Why? I'll get back to this in my next post.

Architecture and Gravity
Within the right architecture, features are naturally attracted to the "best" gravitational center.
The "right" architecture, therefore, must provide the right gravitational centers, so that features are naturally attracted to the right place, where (if necessary) they will be kept apart from other features at a finer granularity level, through careful design and/or careful refactoring.
Therefore, the right architeture is not just helping us cope with gravity: it's helping us exploit gravity to our own advantage.

The wrong architecture, however, will often conjure with gravity to preserve itself.
As part of my consulting activity, I’ve seen several systems where the initial partitioning of responsibility wasn’t right. The development team didn’t have enough experience (with software design and/or with the problem domain) to find out the core concepts, the core issues, the core centers.
The system was partitioned along the wrong lines, and as mass increased, gravity kicked in. The system grew with the wrong form, which was not in frictionless contact with the context.
At some point, people considered refactoring, but it was too costly, because mass brings Inertia, and inertia affects any attempt to change direction. Inertia keeps a bad system in a bad state. In a properly partitioned system, instead, we have many options for change: small subsystems won’t put up much of a fight. That’s the dream behind the SOA concept.
I already said this, but is worth repeating: gravity is working at all granularity levels, from distributed computing down to the smallest function. That's why we have to keep both design and code constantly clean. Architecture alone is not enough. Good programmers are always essential for quality development.

What about patterns? Patterns can lower the amount of energy we have to spend to create the right architecture. Of course, they can do so because someone else spent some energy re-discovering good ideas, cleaning them up, going through shepherding and publishing, and because we spent some time learning about them. That said, patterns often provide an initial set of centers, balancing out some forces (not restricted to gravity).
Of course, we can't just throw patterns against a problem: the form must be in effortless contact with the real problem we're facing. I've seen too many good-intentioned (and not so experienced :-) software designers start with patterns. But we have to understand forces first, and adopt the right patterns later.

Enough with mass and gravity. Next time, we're gonna talk about another primordial force, pushing things apart.

See you soon, I hope!

Wednesday, January 14, 2009

Notes on Software Design, Chapter 3: Mass, Gravity and Inertia

I thought I could discuss the whole concept of Gravity and its implications in 2 or 3 (long) posts. While writing, I realized I'll need at least 4 or 5. So, this time I'll talk a little about how we can cope with gravity, and about the concept of Inertia. Next time, I'll discuss how we can exploit gravity, and why (despite the obvious cost) it is important that we do not surrender to (or ignore) gravity.

How do we cope with gravity? Needless to say, we have to spend some energy to move away from the amorphous big blob. As usual, we can also borrow some of that energy from someone (or something) else. Here are a few well-proven ideas:

- Architecture. I used to define architecture as "an overall structure, providing a natural place for features and concepts". I could now say that architecture must provide the right centers, or (from the viewpoint of mass and gravity) the right gravitational centers, so that the system can grow harmoniously. The right architecture is also the key to exploit gravity. More about this (and about the role of design patterns) next time.

- Refactoring. While architecture requires some kind of upfront investment, refactoring fights gravity in a more piecemeal, continuous fashion.
Although Refactoring and Emergent Design are often seen as the arch-enemies of Architecture, they are not. Experienced developers know that both are needed, as they work at different scales.
No amount of architecture, for instance, will ever prevent small-scale gravity to attract more code into existing functions. When we add a new feature (maybe under a tight deadline) gravity suggests to add that feature in place, often without even breaking the smallest separation unit – the function.
Conversely, gravity (and even more so Inertia) does not allow refactoring to scale economically beyond some (hard to identify) threshold.

- Measurement and Correction. While refactoring is often performed on-the-fly by programmers, fixing bad smells as they go, we can also use automatic tools to help us keep the code within some quality bounds. See Simple Metrics and More on Code Clones for a few ideas. Of course, measures provide guidance, but then the usual refactoring techniques must be applied.

- Visualization. More on this another time.

- Better Languages and Technologies. At some granularity level, technology becomes either a boon or an hindrance. Consider components: creating binary, release-to-release compatible components in C++ is a nightmare. .NET, for instance, does a much better job. Languages with a simple grammar, like Java and C#, or with strong support for reflection, also allows better tools to be built (see next point)

- Better Tools. Consider web services. They provide a relatively painless way to create a distributed system. The lack of pain doesn't really come from SOAP (which isn't that stroke of genius), but from the underlying HTTP/XML infrastructure and from the widely available, easily interoperable WSDL tools. Consider also refactoring: without good tools, it's a relatively error-prone activity. Refactoring tools make it much easier to fight gravity, moving code around with relatively little effort.

On Inertia
Mass brings gravity. Gravitational attraction works to preserve the existing structure (at the fractal levels I discussed in Chapter 1). In the physical world, however, we have another interesting manifestation of mass, called Inertia. There are many formulations of the concept (see the wikipedia page for details), but what is most interesting here is the simple F=m*a equation. We apply external forces (human work) to a system, but systems with a large mass won't easily change their state of rest or motion (including their current direction).

What is, then, the state of rest/motion for a software system? We could provide several analogies. To find the best analogy for acceleration, we need the best analogy for speed. To find the best analogy for speed, we need the best analogy for space.

The underlying idea must be that we apply some effort to move our software through space. What is the nature of that space? A few real-world examples are needed. Consider a C++/MFC application; we want to migrate the GUI layer to C#/.NET (interestingly, "migration" is commonly used to indicate motion in space). Consider a monolithic, legacy application that must be exposed as a service; or a web application that requires some performance improvement. Sure, all this may require some change in mass too (as some code will be added, some removed), but what is required is to move the software to a different place. What is that place, or, inside which kind of space do we want to move? I encourage you to think about this on your own for a while, before reading further.

My answer is rather simple: that space is the decision space. Software is built by making a number of decisions: we choose languages, technologies, architectural styles, coding styles (e.g. error handling styles, readability/efficiency trade offs, etc.), and so on. We also choose a development process, a team, etc.
Some of those decisions are explicit and carefully worked out. Some are taken on the fly as we code. At any given time, our software is located in a specific (albeit difficult to define) place inside a huge, multi-dimensional decision space. Each decision affects some portion of code. Some are clearly separated. Some are pervasive or cross-cutting.

Software development is a learning process; therefore, some of those decisions will be wrong. Some will be right for a while, but since real-world software does not live in a vacuum, we'll have to change them anyway later.
Changing a decision requires moving our software through the decision space: every decomposition unit affected by that decision will be touched, therefore adding to the mass to be moved (hence the deadly cost of cross-cutting, pervasive concerns).

Inertia explains why some decisions are so hard to change. Any decision we change is bound to require a change in the state of rest, or motion, of our software, because we want to move it into another place.
Some of those decisions impact a large mass of software, and therefore a strong force must be applied. Experience shows that after a critical mass is reached, it becomes so hard to even understand what to do, that software becomes an immovable object (therefore requiring an irresistible force :-).

Of course, small systems won't show much inertia, which explains why the dynamics of programming in the small are different from the dynamics of programming in the large.

Also, speed and acceleration depends also on time. I'll save this for a later time, as I still have to understand a few things better :-)

Enough for today. See you guys soon!

Tuesday, May 27, 2008

On the concept of Form (3): the Force Field

Warning: :-) this post is going to be somehow conceptual. I'll soon move to some real-world, software-based example, but I really need to introduce some concepts first.

The notion of force field might be unfamiliar to some, so I'll borrow a great example from Alexander himself. Consider your first "requirement" (for a system yet to be built) as a permanent magnet of some size and shape. If you place a flat glass over that magnet, and drop some iron filings on it, the iron will naturally dispose along the magnetic field lines. That gives us an image of [a section of] the force field. Now add another magnet: the shape of the field will change, as the magnets are interacting, thereby shaping a more complex force field.
We can change the [shape of the] field in many ways: moving magnets around, changing their shape, their magnetization, or even adding some shields around magnets.

The great thing about the magnetic field is that we can somehow observe its shape. Indeed, if our goal was to create a form that can be put into effortless contact with the field, we'll just have to replicate the same form that the magnetic field is giving to the iron filings. As Alexander says (NoTSoF, page 21), "once we have a diagram of forces [...] this will in essence also describe the form as a complementary diagram of forces".
In the real world, and even more so in the software world, we are never so lucky: the force field is invisible and tends also to be highly unstable.

Usually, the force field of a software project starts with Requirements. Requirements are often categorized in some way, like "functional" and "nonfunctional", or "user requirements" and "system requirements. However, requirements of any kind are just like magnets: they contribute to shape the overall field.

Requirements are just one kind of force, that is, they are not alone in shaping the field. Many technological choices we make, sometimes very (or too) early, are also shaping the force field.
Consider a simple business application. Once you decide that you'll build a web application, you have added quite a few powerful magnets. If you're familiar with JSP and EJB, you are naturally tempted to choose those technologies early on. That's like adding quite a few powerful magnets again. Or maybe it's like adding a magnetic shield: it really depends on context.

Sometimes, technology makes the field simpler: the right infrastructure should simplify the field, that is, it should act more like a shield than like a magnet. In this sense, infrastructure should be chosen when the dominant forces are known, unlike what happens in many projects, where infrastructure (usually a superstructure in disguise) is chosen too early, thereby making the overall field even more complex.

We also shape the field, so to speak, by choosing what to ignore and what to postpone in any given release. Anything we ignore, like anything we postpone, won't be allowed to shape the field right now.
This is fine, as long as the corresponding magnets will be placed somehow distant from the others (good modularity), possibly with some kind of magnetic shielding in between (stable interfaces). It's also fine if we can ignore it forever. Any attempt to temporarily ignore a strictly interacting force will wreak havoc later on, as our form will no longer match the resulting force field. Refactoring can accommodate minor misfits with the ideal form, but won't help much when the force field changes radically (see also my notes on refactoring here).

Here lies one of the architect's fundamental abilities: the intuitive understanding that something can be beneficially postponed, while something else must be dealt with immediately, because its influence on the force field is so strong that doing otherwise will shift us toward the wrong kind of form.

It is important to understand the role of choice in exploiting instability. Too often, software developers tend to see requirements as "fixed". They don't like to negotiate: it's much easier to fight the compiler than the marketing guys.
A good architect, however, can't miss the opportunity to simplify the field by moving some magnets around. That requires the ability to see the overall picture and the fine details at the same time. Here is Alexander again (page 18): "this ability to deal with several layers of form-context boundaries in concert is an important part of what we often refer to as the designer's sense of organization. The internal coherence of an ensemble depends on a whole net of such adaptations".
That ain't an easy feat. It requires an understanding of the business, the users, and the technology. And even more important, it requires a willingness to act on that knowledge. The power of choice extends to the infrastructure: sometimes, by willingly postponing a technological choice until the force fields takes shape, we can make a better, more "natural" choice.

This can be hard for some developers: they want certainty, and they want it now. In my experience, that goes in pair with the willingness to adopt a sub-optimal, but repetitive and context free solution for a wide class of problems, instead of adopting several optimal, but reasoned and context-dependent solution for smaller classes of problems.

Unfortunately, choosing the "wrong" technology is very much like choosing the wrong shape or orientation for a building. To quote Alexander once again (page 29): "Instead of orienting the house carefully for sun and wind, the builder conceives its organization without concern for orientation, and light, heat, and ventilation are taken care of by fans, lamps, and other kinds of peripheral devices. Bedrooms are not separated from living rooms in plan, but are placed next to one another and the walls between them stuffed with acoustic insulation".
I think we can easily see a parallel with software here: a misfit technology is chosen early on. As a consequence, you find yourself adding more and more technology (fans, lamps, insulation) to satisfy the end-user needs. "Modern" web applications seem to have taken this path: faced with a difficult field, they're layering one technology on top the other, desperately trying to overcome the problems of the previous layer.

Next time, in no particular order: agility, unstable requirements, early coding, TDD, "seeing" the field, internal and external representations, is UML any useful, order within chaos (dominant forces), constructive force field and systematic techniques, and whatever else will come to my mind :-).

Wednesday, November 28, 2007

Architecture as Tradition in the Unselfconscious Process

In my previous post, On the concept of Form (1), I mentioned how Architecture is providing viscosity, and therefore playing the role Alexander ascribed to tradition.

I've also proposed that the unselfconscious design process, which is very similar to the emergent design concept held so dearly by many agilists, requires some degree of tradition, and therefore, an underlying architecture. I've also gone so far as to propose the idea that many agile projects begin with a "traditional" architecture in mind:
Now, although some people in the XP/agile camp might disagree, refactoring is a viable solution only when the desired rate of change is slow, and only when the gap to fill is small. In other words, only when the overall architecture (or plain structure) is not challenged: maybe it's dictated by the J2EE way of doing things, or by the Company One True Way of doing things, or by the Model View Controller police, and so on. Truth is, without an overall architecture resisting change, a neverending sequence of small-scale refactoring may even have a negative large-scale impact.

In the past few days, I've been reading "The Economics of Architecture-First," by Grady Booch, IEEE Software, Sept/Oct, 2007. Here is an interesting excerpt:
Now, strict agilists might counter that an architecture-first approach is undesirable because we should allow a system's architecture to emerge over time. On the one hand, they're absolutely correct: a system's architecture is simply the name we give to the artifact that results from the many local design decisions made over a software-intensive system's lifetime. On the other hand, they're wrong: agile projects often start out assuming a given platform and environmental context together with a set of proven design patterns for that domain, all of which represent architectural decisions in a very real sense.

I could almost call this synchronicity :-).

For more on emergent architecture (or structure), see my now-old post Infrastructure and Superstructure.

Carlo Pescio

Pages