Saturday, December 06, 2008

Notes on Software Design, Chapter 2: Mass and Gravity

Mass is a simple concept, which is better understood by comparison. For instance, a long function has bigger mass than a short one. A class with several methods and fields has bigger mass than a class with just a few methods and fields. A database with a large number of tables has bigger mass than a database with a few. A database table with many fields has bigger mass than a table with just a few. And so on.

Mass, as discussed above, is a static concept. We don't look at the number of records in a database, or at the number of instances for a class. Those numbers are not irrelevant, of course, but they do not contribute to mass as discussed here.

Although we can probably come up with a precise definition of mass, I'll not try to. I'm fine with informal concepts, at least at this time.

Mass exerts gravitational attraction, which is probably the most primitive force we (as software designers) have to deal with. Gravitational attraction makes large functions or classes to attract more LOCs, large components to attract more classes and functions, monolithic programs to keep growing as monoliths, 1-tier or 2-tiers application to fight as we try to add one more tier. Along the same lines, a single large database will get more tables; a table with many fields will attract more fields, and so on.

We achieve low mass, and therefore smaller and balanced gravity, through careful partitioning. Partitioning is an essential step in software design, yet separation always entails a cost. It should not surprise you that the cost of [fighting] gravity has the same fractal nature of separation.

A first source of cost is performance loss:
- Hardware separation requires serialization/marshaling, network transfer, synchronization, and so on.
- Process separation requires serialization/marshaling, synchronization, context switching, and so on.
- In-process component separation requires indirect function calls or load-time fix-up, and may require some degree of marshaling (depending on the component technology you choose)
- Interface – Implementation separation requires (among other things) data to be hidden (hence more function calls), prevents function inlining (or makes it more difficult), and so on.
- In-component access protection prevents, in many cases, exploitation of the global application state. This is a complex concept that I need to defer to another time.
- Function separation requires passing parameters, jumping to a different instruction, jumping back.
- Mass storage separation prevents relational algebra and query optimization.
- Different tables require a join, which can be quite costly (here the number of records resurfaces!).
- (the overhead of in-memory separation is basically subsumed by function separation).

A second source of cost is scaffolding and plumbing:
- Hardware separation requires network services, more robust error handling, protocol design and implementation, bandwidth estimation and control, more sophisticated debugging tools, and so on.
- Process separation requires most of the same.
- And so on (useful exercise!)

A third source of cost is human understanding:
Unfortunately, many people don’t have the ability to reason at different abstraction levels, yet this is exactly what we need to work effectively with a distributed, component-based, multi-database, fine-grained architecture with polymorphic behavior. The average programmer will find a monolithic architecture built around a single (albeit large) database, with a few large classes, much easier to deal with. This is only partially related to education, experience, and tools.

The ugly side of gravity is that it’s a natural, incremental, attractive, self-sustaining force.
It starts with a single line of code. The next line is attracted to the same function, and so on. It takes some work to create yet another function; yet another class; yet another component (here technology can help or hurt a lot); yet another process.
Without conscious appreciation of other forces, gravity makes sure that the minimum resistance path is followed, and that’s always to keep things together. This is why so much software is just a big ball of mud.

Enough for today. Still, there is more to say about mass, gravity and inertia, and a lot more about other (balancing) forces, so see you guys soon...

Breadcrumb trail: instance/record count cannot be ignored at design time. Remember to discuss the underlying forces.