Tuesday, October 02, 2012

Don’t do it!


We used to be taught that, by spending enough time thinking about a problem, we would come up with a "perfect" model, one that embodies many interesting properties (often disguised as principles). One of those properties was stability, that is, most individual abstractions didn't need to change as requirements evolved. Said otherwise, change was local, or even better, change could be dealt with by adding new, small things (like new classes), not by patching old things.

That school didn't last; some would say it failed (as in "objects have failed"). At some point in time, another school prevailed, claiming that thinking too far into the future was bad, that it could lead to the wrong model anyway, and that you'd better come up with something simple that can solve today's problems, keeping the code quality high so that you can easily evolve it later, safely protected by a net of unit tests.

As is common, one school tended to mischaracterize the other (and vice-versa), usually by pushing things to the extreme through some cleverly designed argument, and then claiming generality. It's easy to do so while talking about software, as we lack sound theories and tangible forces.

Consider this picture instead:


Even if you don't know squat about potential energy, local minima and local maxima, is there any doubt the ball is going to fall easily?

Is there any doubt that this ball is in stable equilibrium instead?




The nice thing about mechanics is that we have both a sound theory of forces (so that you can formally prove that some configuration is stable, if needed) and an intuitive grasp of things (so that you don't need big theories when things are reasonable simple).

This is not so in software. When the guru claims something, the "proof" stands mostly in his argumentation ability (rhetoric). Theorems, like the CAP theorem, are few and far between. Lacking a theory of forces, people tend to propose oversimplified models, where (for instance) there is always a stable configuration for every problem (the old school) or there is never a stable configuration for any problem (some interpretations of the modern school).

Stability
Requirements change; actually, our own understanding of the problem changes during and after development. Technology shifts as we're writing our code. All those changes in the requirement space act as forces. Those forces materialize as changes in our artifacts (source code). Unless we have the right shape in place, some of those forces will ripple through our artifacts like waves. The problem, of course, is finding (or even defining) "the right shape", one where stability is maximized. The right shape is totally dependent on the problem (forces), so there is no “right shape” per se.

Unfortunately, we don't know much about forces and shapes. Sure, there is an echo of the "stability" property in some design principles. The "open/closed" principle is trying to keep a class stable by moving unstable parts into polymorphically-derived classes. The "dependency inversion" principle is trying to contain the wave of change by avoiding dependencies on concrete things (from the classic Design Principles and DesignPatterns: "One motivation behind the DIP is to prevent you from depending upon volatile modules. The DIP makes the assumption that anything concrete is volatile" [quite an assumption anyway]). Etc. But we're far from having a sound theory. Concrete things might be stable, and interfaces unstable, for instance.

Still, sometimes we can find small, "stable" abstractions. Sometimes, apparently, we can't. It would be ok if we could at least recognize unstable configurations, and perhaps move toward more stable shapes. This, however, would require an even more ambitious step: to actually classify the most common instabilities. Not unlike classifying mechanical stress into tension, compression, bending, torsion, and shear (see Notes on Software Design, Chapter 13: On Change for a digression on this stuff), this would bring the software design conversation to an entirely new level.

This post is not the right place to do so, although I've spent quite some time tinkering with the concept of stability in my exploration of the Physics of Software. But let's at least try to classify a few common instabilities:

1) Instability of internal structure or observable run-time behavior, behind a uniform interface
For instance, you have many shapes, each with its own optimized implementation, but they can all Draw() themselves.

2) Instability in multiplicity, with uniform processing
You have an unbounded collection of objects, but you treat them all the same way.

3) Instability in structure, with a uniform externalized behavior (usually reflective behavior)
For instance, you have widely different structures, but all you want to do is serialize them

4) Instability in structure, with a non-uniform (usually external) behavior
You have different structures or fields, and you do different things with each one

Etc. There aren’t many more cases anyway; for instance, encapsulation, closures and currying can deal with another kind of instability (guess which :-).

Progress in programming languages brought us some helpful concepts to deal with 1, 2, and 3 (in different paradigms), but nothing to deal with 4.

We may try to coerce 4 into 1, 2, or 3; for instance, by pretending that everything is just a list or a map, so that instability in structure becomes instability in multiplicity. That does not really work well. It may work when you try to convert 3 into 2 (because your language lacks reflection), but 4 is a different beast altogether.

Open sets
In my latest post on the physics of software, I used a visual metaphor to show how some configurations are necessarily unstable. To recap, any single concept that is U/ entangled with an open set of concepts is, by necessity, unstable.
In the end, I came up with the uncomfortable idea that this class is therefore unstable by construction:

class Person
{
  string firstName;
  string lastName;
  DateTime dateOfBirth;
  string phone;
  // …
}

because, quoting myself :-), "there is an unbounded set of attributes for a Person (mobile phone; email; living address; working address; place of birth; height; weight; etc), and we have a single class which is U/U entangled with that set".

What is worse, we're facing type-4 instability, as I will use the dateOfBirth to calculate your age and email to contact you. That's not uniform behavior.

I also said that the force field is suggesting a different shape, not necessarily something you want to use, but a different shape nonetheless. So this post is not really about instability, but about what the force field is telling us to do.

Don't do this at home
What I'm going to show is basically a logic consequence. It doesn't mean you have to do things that way, or that I'm proposing you do things that way. Actually, I totally discourage you from doing things that way. It doesn't play well with the way you've been taught to structure your code. It doesn't play well with your language and tools; with your database; with your UI library; etc. Of course, it might just be the opposite (UI libraries, tools, languages, etc. are not well aligned with the real nature of things). But the net result is the same. Don't do it. That said, here is what you could do if you were insane (yes, I’ve done that a few times :-).

Note: I've removed quite a lot of details from what follows, as this post was turning into an endless exploration of possibilities. That should be ok since you're not supposed to use this post as a starting point for anything practical anyway :-).

Step 1 (simple): Decompose to small Classes with a Role
Large abstractions are usually unstable (I have a nice theory about this, based on entanglement: to a screen near to you sooner or later). So let's do the simple thing first, and re-group fields into smaller abstractions aggregated by role:

class PhoneNumber
{
// some structure and responsibility here
}

class Address
{
// some structure and responsibility here
// possibly using fine-grained classes like
// Country, State, City, etc
}

class PersonalName
{
// some structure and responsibility here
// see http://en.wikipedia.org/wiki/Personal_name
}

class Sex
{
// ...
}

class DateOfBirth
{
// some structure and responsibility here
// (see also below)
}

At this point, we could at least represent our Person as:

class Person
{
  PersonalName who;
  Sex gender;
  DateOfBirth born;
  Address livingAddress;
  Address workingAddress;
  PhoneNumber homePhone;
  PhoneNumber mobilePhone;
  // etc
}

This does not solve the problem, of course. But small abstractions are often more stable, and given the proper overall design, partitioning might be enough to prevent a ripple of changes. For instance (in theory) any change to the structure of a phone number should be isolated into PhoneNumber and should not ripple into Person (if you can make your DB and UI play nice, that is).

This partitioning needs careful attention. For instance, DateOfBirth is now a class; so it’s not just a DateTime. The reason to do so is to have a place where we can move a responsibility formerly allocated to Person (calculate age). You can’t ask a DateTime to calculate Age; it wouldn’t be appropriate for such a general class. It’s a perfect responsibility for a DateOfBirth class, though.

Note that I probably need many more classes than you're willing to create. Address, for instance, is a very specific abstraction, describing a place with extreme detail. Sometimes we want a vaguer concept of place (like "place of birth"). A structural hierarchy of places can be in order here, and we may even find a simple way to deal with increasingly detailed information about a place.

One may also want to group the who, gender and born fields into a separate Demographics sub-center. It's fine to do so, of course.

The shape of step 1
Apparently, Step 1 is not making much of a difference. Yet there is something here anyway. The initial shape was basically a large, big, unstable center (the Person). Instability will occur, by necessity, inside that center (as there is no other place).

Now we have a number of sub-centers (the Address, the PersonalName, etc), and instability can be localized. Oh, yes, PersonalName can easily be unstable; move outside your familiar culture and you'll discover new needs and new fields.

Step 1 is also suggesting a potential fractal hierarchy of concepts (see Place) that I'm not going to investigate further, but is nonetheless important in the pursuit of a truly object-oriented form. In a sense, it's moving from a big circle (Person) to something like this (represented at an intermediate step in a fractal decomposition)



Step 1 is also suggesting that sub-centers, like PhoneNumber and Address, are full-fledged classes, with (at the very minimum) validation behavior and probably more. It's also suggesting that we should have a widget to edit a PhoneNumber, and a widget to edit an Address, and that a UI technology without widgets is a rather stupid idea not fitting well with this. It's also suggesting that the UI should be dynamically composed by reflectively exploring the Person structure, looking for widget-supported concepts.

Finally, Step 1 might suggest that we may want to have a PhoneNumberRepository and so on to deal with persistence in a more modular way, but then, let's wait for Step 2.

Step 2: the reversal
Abandon you hopes. Here is where the fun begins. Just like User shouldn't know about Credential (see that “Chapter 16” post), but Credential should know about User, Person shouldn't know about PhoneNumber. PhoneNumber should know about Person. (Yes, that might require a more abstract concept than a Person, say a base class, because companies have phone numbers too; I’ll ignore that for simplicity).

That's it, nice and easy. The way to avoid instability in structure is to disband the structure. There is no spoon. There is no Person as you tend to think of it. The role of Person, as we'll see, is to provide identity and (optionally) reflective behavior.

So Step 2 brings to this shape (as an object model)


Many PhoneNumbers and Addresses can be connected to a single Person, according to their role. If I come up with an Email class, I don't need to change the Person. The person just provides identity. [Yes, I’m stepping away from my usual “dependencies go up” standard because I want to show the gravitational pattern here].

This should really be the end of this post. That’s what the forcefield is telling us to do. Everything that follows is just one way to make it work, with its own consequences. There might be better ways. Actually, I hope so.

Note that the diagram above is a conceptual object model. In practice, you won’t see those associations in your classes, because they would be dealt with at the repository level. This might be a little confusing, so let's start from the bottom: the database.

The Database
A truly modular system would call for separate tables for each modular concept. Therefore, a table for PhoneNumber, a table for Address, etc. Most of those tables would have:
- the person ID
- the concept Role (like "home number"). In some cases (DateOfBirth) this is not needed.
- all the concept's fields (like country prefix and whatever)
(some of the satellite classes are unstable; guess you can figure what happens to their database counterpart)
There is also a Person table. The table provides only an ID, to act as a foreign key on the other tables. That's the persistence-level idea of providing identity.

Of course, that provides maximum modularity. Need a new role for an existing concept? Nothing to change at the persistence level. Need a new concept (email)? Just add a table.

Still, now it’s also inefficient to build the "entire" person (or a significant subset). Of course, it would be possible to come up with a database (or DBMS as they used to say) where you define a logical structure (tables) and then you specify a physical structure (like: pack all this stuff into a single row) and the database is smart enough to remove the logical joins when mapping to physical access. But our current toolset is not well aligned with this. In a sense, this is basically the opposite of views. 
Worst case: you do a lot of small queries. Alternative: a number of joins and some smart code (see below). In practice, what follows does not require separate tables; it’s just the shape most aligned with the force field. As I’ll talk about persistence code, you’ll see that the decision about database structure won’t percolate much.

The Repository
Coming from a traditional design background, it would make sense to have a Repository for every concept, like a PhoneNumberRepository (or perhaps a PhoneBook class :-). After all, this would preserve modularity as well (and avoid the hourglass shape we saw in chapter 16), again at a risk of poor performance.

Ignoring performance for a minute, single-concept repositories would usually be trivial. So, in practice, we could have classes like this (you may or may not like static methods; it’s a rather irrelevant details in this context; also, I'll assume that you can call a repository and get back a business object. See Life without Stupid Objects, Episode 1 for a way to do that while preserving strict layering.)

class AddressRepository
{
  public static Address GetPersonAddress( Id personIdentity, Role r );
  public static IEnumerable
GetPersonAddresses( Id personIdentity );
// …
}

The Address object will not need to contain a Person object; it’s enough to have its Id inside.

However, this choice will force us to do a select for each and every fine-grained concept. That’s usually unacceptable, so let’s play with this. I’ll assume a SQL database, because it’s what I know best and it’s the environment where I’ve actually implemented most of this weird stuff.

The Statement (part 1)
A Repository is usually charged with quite a few responsibilities. Say that I introduce a new class instead, that I can call Statement. The Statement, among other things, would offer the ability to:

- specify a center / start table, providing identity
- specify that we need some fields, from some other tables (joined with the center table)
- add conditions
- transform all that into SQL
- execute the SQL
- help Repositories turn that stuff into objects (I'll get back to this in a minute)

Using a statement, I could to things like this (at a rather low level):

Statement s = new Statement( “person”, “id” ) ; // the center table and identity field
s.AddFields( “role”, “street” ).FromTable( “address” ).RelatedBy( “id” ) ;
s.AddFields( “firstname” ).FromTable( “personalname” ).RelatedBy( “id” ) ;

Which of course would become a 3-table join with 4 fields selected. The real syntax needs to get a bit more complicated as the concepts get fractal, but let’s keep it simple.

I could also add conditions, like:

s.AddFields( “role”, “street” ).FromTable( “address” ).RelatedBy( “id” ).Where( equal, “role”, “home” ) ;

At this stage, the Statement is just a way to build SQL statements compositionally, and then execute them. The trick is, of course, that the composition can now be spread among Repositories, each dealing only with its own table.
Note: In .NET land, this stuff can leverage LINQ expression trees, and improve on my syntax as well using lambdas here and there. In other languages, it might be slightly more challenging, but still doable.

The new Repository
The role of a repository is now to:
- contribute in creating a statement
- process the result of a statement execution and build objects

So, the repository does not own the statement, does not execute the statement, does not entirely control the statement. It participates in building a statement, and in converting results back to objects.

Traditionally, a method like GetPersonAddresses does everything at the same time, so let’s split this thing in two:

class AddressRepository
{
  // mutable object syntax
  public static void PrepareGetAddresses( Statement s ); 
  private static List
ProcessGetAddress( Record r );
  // …
}

Of course, I can still provide the former GetPersonAddress method as a shortcut when composition is not needed. The nice part, however, is that now I can:

- create a statement for a center table
- call any number of satellite repositories to have their own tables, fields and conditions merged in, by calling the Prepare... methods
- execute the statement

Executing the statement, at first, will just get a bunch of records from the DB, but I want objects. Actually, I want each repository to create its own sub-object for every record that has been retrieved. That's quite simple: during prepare, the repository will instruct the Statement with a callback function / delegate / whatever (that's why ProcessGetAddress can be made private). The statement, after execution, will iterate over the retrieved records, and call all the callback functions in different repositories for each record. Taken together, what we get back is the object-land equivalent of a record. The problem, now, is where do we store that stuff.

(Oh, Record is just a convenient class to hide the technology-specific notion of a record).

The Statement (part 2)
Ideally, I'd like to prepare my statement (by calling various repositories; an example will follow shortly), then execute my statement, and get back a sequence of objects.

If you're in the dynamic typing side of the world, that's very simple. The statement can simply invoke the repository callback functions for each and every record, take the contribution from every repository, build a dynamic object, add it to a list, and that will be it.

If you're into static typing, that's a bit of a challenge. Consider a statement that retrieves a significant subset of fields formerly belonging to Person, like DateOfBirth, Address, Email, etc. The statement is built from the contribution of several different repositories, then executed, then the result is processed by different repositories, so for every record we get back a DateOfBirth, an Address, an Email, etc. As much as you might be tempted to bring this stuff together into a Person class, that’s exactly what we’re trying not to do, so don’t :-).

Note how this is suggesting that when we deal with unstable structures, at some point we may benefit from a dynamic object. Note the context. I’m not saying that every object has to be dynamic, or that we always need them. Just that in this case, in this specific role as a result of statement execution for unstable structures, a dynamic object is very useful.

An idea I’ve had no chance to apply so far in real projects is to adopt Tuples (like C# type-safe tuples, that is) as type-safe transport objects. Would be nice if it worked, although C# generics are quite limited; for instance, something like variadic templates in C++ would be useful to deal with fractal decomposition.

So, at this stage, I will assume that the Statement will return a dynamic object, and that object will be made of typed business objects like DateOfBirth. If your language lacks dynamic objects, an HashMap of some sort would be ok too.

We’re still missing a few things:

1) Where do I put the “coordination” logic, that is, the creation of the statement, the Prepare calls to repositories, the statement execution, the actual usage of the results.
2) How do I deal with cross-concept (business) logic.
3) What about the [unstable] UI?

Believe it or not, it’s easier to start with (2) because it will pave the way to (1) as well. I'll postpone the UI till the end, because (2) will show that it's unstable only in some use cases.

Cross-concept logic
Ok, so we can extract a Person, or any projection of a Person (say, only phone and age), as a dynamic bag of type-safe objects. What about cross-concept logic, that is, some processing involving more than one satellite class?

When I talk about this, people tend to come up with unrealistic examples because, in practice, cross-concept logic is not so common. If it were, those concepts wouldn't be real sub-centers. I don't need your phone number to know whether or not it's your birthday. Still, there are realistic examples, like:



Data extraction: I want to find all males over 35 living in a given town, say for spamming marketing purposes.

Default values: if I know where you live and I'm asking for your phone number, I may want to precompile the area prefix.

Business logic: even in this trivial Person domain, we can find something meaningful. For instance, when the user logs in, we want to check if it’s his birthday, and if so, we want to say "happy birthday, ". It borders on presentation and extraction logic, but it's not, and it’s cross-concept.

None of those is particularly challenging. Actually, the new shape helps revealing the nature of what we're doing much better than a monolithic Person class.

Data Extraction
The basic idea for cross-concept extraction is simple: we want to resolve a set of properties to a set of identities, then extract the individual concepts that we’re interested in, which are connected to those identities.

Well, that's what the Statement + Repositories can easily do. The only question left is where do we put the Statement composition, execution, and processing of the results. Depending on your view of layers, that question may already contain its own answer: if you only allow Application-level classes to access repositories, that logic should be in an Application (or Use-Case) layer class. A code sketch for this logic could be:

Statement spam = PersonRepository.PrepareIdentity( id );
PersonalNameRepository.PrepareGet( spam ) ;
EmailRepository.PrepareGet( spam ) ;
DateOfBirthRepository.SetCondition( above, 35, spam ) ;
AddressRepository.SetCondition( "city", "some city", spam ) ;
IEnumerable< dynamic > victims = spam.Exec();
// iterate over victims and do things with email and personal name

What does something like
PersonRepository.PrepareIdentity( id );
actually do? Quite simple:
return new Statement( “person”, “id” ).Where( equal, “id”, id ) ;

I guess you can figure out the rest. We're basically building a Statement from the cooperation of different repositories. PrepareGet is not the nicest name ever for a method, but it's very easy to understand (I hope), and that's my main concern at this time (as I'm leaving out lot of details). AddTo or IncludeIn could be nicer versions. Also, a more fluent approach like:
DateOfBirthRepository.SetCondition( above, 35 ).In( spam ) ;
would improve readability quite a bit.

I understand you may not like that code at first. It "looks" like the kind of code you want to hide behind repositories. But that's mostly about habits, not substance. The persistence stuff is indeed inside repositories (+ Statement). This is a type-4 unstable selection of things, and must be in the upper layers, not in the bottom layers.

Note: in this case, the set of fields was basically hard-coded. In other cases, it's dynamically determined by various conditions. The compositional nature of the Statement tends to play rather well with that.

Default Values
The default value thing can happen at different levels. In the specific case given above, it's obviously a user interaction concern, as there is no business rule about users having phones within the same prefix area where they live. It's just a convenience when there is a human on the other side of the screen, typing data.

So let’s decompose this further, in two distinct responsibilities:

- Find the most likely prefix given an Address
exercise: where do we put that logic? Is that about the entire Address or just a smaller concept like Area?

- Propose that as a default value.
We need a way to trigger a call to the logic above and use that as a default value; as this is a user interaction concern (for instance, it would make no sense to do that when we import people from a file), the right place for the trigger is the UI itself, or if you’re stuck in MVC, the controller. Triggering from the UI would make for a much more responsive application (think web).

AOP-like interception / injection would help here. Basically, we don’t want the PhoneNumber widget and the Address widget to know each other. We need a place that can see both and inject an aspect there (a user interaction aspect). Don’t have aspects in your UI layer? Told you :-), your UI technology isn't good for this stuff.

Business Logic
In the usual BigPerson approach, it's obvious that we can put this logic inside the Person, because guess what, all the data you need is there, just add methods :-). Alternatively, some would put this inside a Controller.

With this design, there is no natural place for cross-concept logic. This is a good thing. That void is telling you something (see also the quote at the end).

Without making a big fuss out of it:

- at the UI level, we should be able to simply put a Welcome widget on the home page.
- the Welcome widget should be populated using a Welcome API / service.
- the Welcome class, given the person identity, should gather the necessary objects and create the right message. Something like:

Statement welcome = PersonRepository.PrepareIdentity( id );
PersonalNameRepository.PrepareGet( welcome ) ;
DateOfBirthRepository.PrepareGet( welcome ) ;
dynamic res = welcome.ExecSingle();
PersonalName pn = res.PersonalName;
DateOfBirth dob = res.DateOfBirth;
if( dob.IsBirthday() ) // a stable responsibility :-)
{
// say happy birthday + name
}
else
{
// say hello + name
}

(ouch, an if; call the anti-if police :-).

Yes, the static + dynamic style sucks a bit. If you got time to spare and wanna try the Tuple thing, maybe you can make it largely type-safe in many practical cases.

But it's not object oriented!
No, what you mean is that it is not coarse-grained-domain-oriented, at least not the way you want to think about your domain, that is, based on big, fat, unstable classes with a lot of logic, like Person. Nobody said it should be like that (well, somebody did; it's just not universally right).

I actually have lot of objects flowing around. When you decompose to small enough pieces, you'll find stable abstractions, like DateOfBirth. They have clear responsibilities, like Age or IsBirthday. Actually, I am allowing, or even forcing, small abstractions like Area to emerge, instead of disappearing into the faceless blob we usually call Person. I also have a Welcome object. A Welcome widget too. I have many more (cooperating) objects than the BigPerson style tends to create.

What about the UI?
Well, the UI for some use cases is pretty stable (see the Welcome widget). In other cases, of course, it's not. The paramount case is when you want to edit or show "the Person" itself.

What we need to do, of course :-), is to dynamically discover the Person satellites, then dynamically build a UI using satellite-specific widgets (so, a PersonalName widget, a DateOfBirth widget, etc). That discovery should also tell us about role-based satellites like Address, so that we can provide a way to add / display multiple widgets of that kind (for every role)

Note that we cannot use regular reflection for this, as there is no Person class with all that stuff inside. Sure, the right class to ask would be Person. But that requires that Person implements its own reflective behavior, and that satellite concepts register with Person (so that Person can expose satellites, without knowing satellites; that's how we deal with instability here). This might easily move to a base class.

Things get more complicated when satellites have satellites, especially when you want to fine-tune appearance. In some cases, a trade-off between aesthetics and stability is necessary, and a custom widget might have to be created instead of relying on reflective composition of lower-level widgets. As usual, proper UI technologies might help or hinder. A technology lacking any kind of sophisticated layout management, for instance, is less than optimal (yet we're stuck with HTML and CSS :-).

The Value of Emptiness (read this part at your own risk :-)
When I first thought about this stuff, a very old story came back to my mind. There are many translations (see here for a few; pick the one you like most), but here is one I like:

Thirty spokes are joined together in a wheel,
but it is the center hole
that allows the wheel to function.

Person is the hole. By providing identity and nothing more, Person is leaving an empty space where things can't coalesce into a gravitational center. Yet, that hole acts as a virtual gravitational center for a constellation of finer-grained, more stable concepts. It is by being empty that Person allows the system to grow organically, instead of monolithically.


Don’t do it
A promise is a promise, so I had to write this stuff, but I’m glad it's over, so I can move on to some more interesting post; way more interesting, actually.

Please, don’t write me in anger. It’s ok not to like this. It’s ok to disagree. It’s probably not ok to pretend that your Person class is stable while is not, but it’s ok to look the other way just like everyone else is doing, keep patching it as requirements change, and claim you're being agile.

If, on the other hand, that blurb above seems tempting, be smart: don’t do it anyway.

There is no invitation to follow me on twitter, because you can’t possibly like this post. Actually, I’ve downvoted it myself :-).

20 comments:

Eric Thelin said...

Fantastic!

I am one of those longtime anonymous cowards hanging around your blog in silence. I must say that am deeply impressed by your work and that I keep looking forward to your next post. Keep it up!

Anonymous said...

Carlo, this is enlightening.

But, apart for the lacking of materials that fit this architecture, I can't see why you keep saying "don't do it"!

Cheers
Daniele

Unknown said...

Great stuff, I can see why you said it could come up to be rather unpopular :)
But it is somewhat inspiring to really see a different approch/point of view.

By the way, you mentioned performance issue, and I can easily see the problem, but you also said you've done things this way a few times. How difficult is and how many tricks(?) would it take to make that architecture really feasible? I can see the exceptional degree of flexibility and configurability, to the point that it's simple to add a field to a concept (while I suspect you have something a bit more generic than the typed repository), and adding a sub-centre is a matter of add a new class, even dynamically, and adjust the configuration. On the other hand a join for each sub-centre at db level can really heart performance.

I can further see how limiting is current UI technologies for this architecture, and you mentioned HTML and CSS, but I'm sure you know quite extensively WPF. Is something along WPF idea a better place to explore from this standpoint?

Unknown said...

If you walked into a new job and started getting familiar with a large system designed this way, you might start with a lot of WTFs and OMGWHY reactions. I'm sure there are intelligent life forms out there who would model things completely inside out from what most people would consider normal and acceptable. It's been a while since I actually modeled something out of space, but I imagine I'd start exactly like that: Person contains a set of attributes. And oh how boring would that be now..

Carlo Pescio said...

Erich: thanks :-)

Carlo Pescio said...

Daniele: I'm so happy I didn't rush writing this answer because Brian did it better than I could have.

He speaks for Most People :-), and Most People would find this to be outside norm, unacceptable, and inside out. Most People would rather ignore concepts like forces and instability and just do what's considered normal and acceptable.

Most People are happier if you do BrianDrivenDesign than if you do BrainDrivenDesign (sorry, couldn't help it, it's the nerd inside), and will react with lots of WTF and OMGWHY if you keep doing things Most People can't understand without learning a damn lot of stuff first.

It's bad when you piss Most People off, so don't.

Carlo Pescio said...

Fulvio: as I don't want to help you piss off Most People, I'm not going to give you any useful answer, if not by observing that if you give up the separation of concepts at the database level, you only have to change what happens at the statement level, and that you can even get creative with that (but please don't).

I never fell in love with WPF. I see it started good, then made a sequence of design decisions that brought it very far from something I care for (productivity). It's very binding-friendly, but not particularly widget-friendly, not to the extent I would like, anyway.

Carlo Pescio said...

Brian: curious how, even after my copious "don't do it", after I said "I totally discourage you from doing things that way. It doesn't play well with the way you've been taught to structure your code [etc]", you still felt the need to tell me how crappy this idea looks : ))

On a tangent, if you cared to look, for instance, at the history of medicine, you would be surprised by how many things Most People used to consider normal and acceptable, and how completely inside out we do things today.
Like, Most People know the right way to balance "humors" inside your body is to use leeches to remove blood when you're sick, right? Transfusions and intravenous therapy are completely inside out! WTF! OMGWHY!

Johannes said...

Some coincidence. I ended up splitting everything up like you did a few days ago but luckily only on the whiteboard and not in code.

Now that you have shown us how problematic this is.. can you contrast this with how you would actually do it ? As in real life working on the clock. Please be honest.. Its almost like you keep the perfect answer from us ;- ) I have had some similar thoughts about my work but I struggle a bit with actually designing something implementable that I am content with.

Unknown said...

First of all let me apologise for "heart" instead of "hurt". Spell checker in italian on the phone really hurts your writing in english :D

Yep I can see you can give up separation of concepts at db level, but in this way you give up something you advocate. But I think this is more a technology issue and how DBMS are designed. At the end of the day, we still have to build something!

I really hope you are going to write something about UI soon, I'm very interested in the topic, I'm actually tinkering about the idea of a new C++11 GUI toolkit...

Carlo Pescio said...

[part 1]

Johannes,
I'm not keeping anything away from you guys, except a
tons of details (it would take a book to; I just have a blog post). About your question, I can tell you what I am doing right now in 3 different projects I'm following, with very different scales and requirements, so that you can get some context.

Just a note, however. A significant part of the "complexity" comes from an impedance mismatch with tools and libraries. Another significant part from the fact that I have compressed a long explanation in a relatively short amount of text. The Statement, the cooperating repositories... they're not rocket science. They're just different, and may appear complex because you can't reuse much of what you already know deeply and can do without even looking. Of course, "innovating on the clock" is always a big challenge.

That said, here it is:

Case 1: a small Android app I'm writing in my spare time (sportablet)
Most abstractions are stable, so I choose the "traditional" approach. Abstractions are stable because they come from stable, legacy protocols, or from stable domain concepts (like that a specific recording belongs to a specific device). So far I have a dozen tables or so, because at this time it's not a database-intensive app, but you can probably guess I've still partitioned things quite a bit. I don't have dynamic objects, except for one case, which is easily handled and is not fractal in nature (inside a recording I can have temperature or not, altitude or not, heart rate or not, etc; this is an open set and I have nothing hard-coded here). This can easily be managed as a type 1 + 2 instability, which is easy.
Looking into the near future, I can easily see it becoming more database-oriented, and I can easily foresee some abstractions to be unstable. For instance, I'm not tracking biometrics yet (height; weight; lean mass; VO2Max; etc), but they certainly form an unstable set. These will NOT become explicit fields of a Person class, but can be largely brought under control using inheritance, which is what I'll probably do (so mapping this instability to type 1 + 2 again). Gear tracking will present the same kind of instability, and I'm planning to reduce it to type 1 + 2 as well.
Bottom line: there was no need for it, so I didn't do it. I had a few relevant instabilities that I treated as such, but didn't need a "sophisticated" structure because I managed to bring everything under type 1 + 2.

Carlo Pescio said...

[part 2]

Case 2: a large corporate banking application with a legacy database
The team is not prepared for this approach, so we started the "traditional" way, considering we also have a legacy db. However, other forces are pushing us toward a similar solution for a few selected cases. The issue here is not instability, but a nested object model with leaves shared between different root aggregates, and a desire not to replicate logic in different repositories. So for a few selected cases we'll use a subset of this idea. At the very minimum, some repositories will deserialize their subset of fields from a record coming from the outside. We'll probably take the easy road and put all the coordination logic into the root aggregate repositories, which will probably build the SQL statement on their own, without cooperation from the sub-repositories. This at least in the first sprints, as the team can get acquainted with the idea and the consequences. Then we'll see.

Case 3: a brand-new, cloud-based, multi-tenant PLM as a service
Can't tell you much about this, but here things are much more complicated, because we want end users to easily customize entities by adding fields (and getting some behavior for free).
I'm doing something like this on steroids, because I'm facing an even harder problem (instability is in the hands of the end user). I also have serious performance issues here, and I cannot accept the performance hits of multiple joins, so I have a field mapping thing to flatten fields into a single table. Note that in this case the "traditional" would simply fail. It's no longer a matter of maintenance, but more of "can / can't do", can or cannot handle those forces.

Quoting part of the post: the right shape is totally dependent on the problem (forces), so there is no “right shape” per se. That's the downside of doing brain-driven design :-))

Unknown said...

Have you ever heard of Entity-Component systems? They're widely used in game engines, and they're mostly the same idea: an entity (representing almost anything in the game) is nothing but an ID by itself. The entity might be externally associated to different components (for example TransformComponent to provide its position in the world, PhysicsComponent to give it physical behavior, GraphicsComponent to specify its graphical appearance, AnimationComponent to make it move if it's a character for example, WeaponComponent if it's a weapon to specify how it shoots or whatever, ...).

It goes even further, the behavior of a component is not in the component class itself, but rather in a set of Systems which will process all entities that have a given set of components (for example the PhysicsSystem will need a TransformComponent and a PhysicsComponent of the same entity to process that entity, and a MovementSystem might need a TransformComponent, a PhysicsComponent, an AnimationComponent and an InputComponent for example).

The idea is the same, prevent changes in one area of the game from rippling into everything else. If there's a bug in the physics of a character, there's no reason it should affect other parts of its representation.

As an additional bonus, this kind of representation is generally faster for the game to process once all entities and components have been loaded, because of the way Systems will traverse the components. Since components of the same type are generally in contiguous memory, traversing them is cache-friendly, whereas traversing a list of fat entities would involve lots of cache misses. So yes, the data representation is a bit ugly and slow, but once the data is in memory it's pretty efficient.

If you want to see what others have written on entity-component systems, this is a good place to start.

So it would seem that in at least one form, your "Don't do it" has been done :-)

Tartley said...

@Jean, Ha! On the "Entity-Systems" page you linked to I seem to recall having a big debate with the author, who thinks that such a system is not object oriented programming, because (I think) it doesn't have one instance per in-game entity. I thought that it was still perfectly normal object-oriented programming, just with a different choice of classes and objects, to better match the required behaviour. (page 2 of his ES posts.)

Carlo Pescio said...

Jean-Sébastien: thanks for the link, I've been through the various posts and found them quite interesting.

There are certainly similarities between what I propose and the Entity-Component architecture as discussed there. There are also some notable differences, both because the context is different and because I didn't feel the need to go through the usual OOP-bashing :-).

Actually, the "systems" part won't translate properly to business applications. Well, honestly, I don't like that part much in a game engine either. It might be ok for relatively stable "systems" like physics, but if you have for instance an AI "system", there is a risk that it would be just another unstable monolith, whereby if you introduce a new character with its own peculiarities, you have to modify the "system" (unless, of course, you adopt strategies here to make polymorphic extension possible).

In this sense, I tend to agree with Tartle that what is presented there is an architecture, not a paradigm. That architecture could actually benefit a lot from OO thinking. Unfortunately there are a lot of OO misconceptions in those posts.

Still, yes, there are definitely some similarities, although the "tough" part (cooperating repositories) is also absent from the E-C architecture as presented.

Thanks again for the bringing up this parallel!

Unknown said...

There's actually something similar to your idea in drupal. There's a module (content creation kit) that allows site admin to define new content and dynamically define how they are composed.
At the db level, each field is mapped to a table that has a predefined structure depending on the type of the field (but I think it's somewhat extensible, I'm not an expert whatsoever though).
They have a single "node" class to provide identity (everything that's a node is a piece of content), but it has a bunch of other properties (like title, creation timestamp, etc.)

The difference with your approch is that they do not have the statement and repository, but just hook called when a node is loaded, added, deleted, etc. that allowed other modules to attach more structure to the base node (dynamically building an object).

Unknown said...

Hi Carlo, yes I was talking about general similarity not necessarily all the details.

The way games generally work with systems that require wildly different behavior for different types of objects (like an AI system for example) is through scripting or a graph-based behavior editor or something similar, essentially making the differing parts data instead of code. It makes it harder to debug, but prevents the code of such a system from becoming a large mess as you pointed out.

But since we're talking about code here, I'll close just by saying that the general concept is similar, but of course the final design and implementation will differ in different problem spaces. But it's useful to know what different industries are doing to solve similar problems to see what you can apply to your own problem space. In that respect, thanks for your very insightful post which has many items I will be able to apply to my own situations even though I work in a different area of programming. (even though I shouldn't do it :-) )

Carlo Pescio said...

Fulvio: I have no idea about Drupal :-), but yes, I can see some similarity in what you are describing here.

Generally speaking, a presentation-oriented, plugin-based system needs mostly to handle "type 3 instability".

The major difference with type 4 is that in type 3 you can easily present the user with dynamic forms and then store data, but there is no server-side behavior except CRUD, and definitely no cross-concept behavior.

That's significantly easier and can be directly mapped to a plugin model, perhaps with some client-side behavior. To move beyond that, things get a bit more complicated (like with the approach I discussed).

Still, even the simplified model can benefit from the idea of pure identity + concepts...

Piotr Wyczesany said...

Great post!

I don't know if you are familiar with Domain-Driven Design, but there is a concept of Value Object (that represents value - no id), and Entity (that has the actual id and may contain Value Objects). I can see a lot of similarities.

Carlo Pescio said...

Piotr: sure, I didn't mention DDD because as you said there are similarities, but also a few notable differences, and I didn't want to add just another source of controversy :-). The post is old news by now, so I guess I can add a little comparison with the DDD approach here.

- As you observed, the satellite concepts are very similar to value objects. Here, however, they get their own independent storage. They do not get their own identity, so to speak, because they adopt the identity of the dominating center (the former entity).

- Unstable entities are disbanded, not only from the storage perspective, but also from the domain perspective, so they do not contain value objects anymore. Here is where I see the largest distance with the DDD approach, where unstable entities still materialize as (unstable) classes.

- In most cases, you also have to let go of the concept of Aggregate, because when your domain is unstable, Aggregates will be unstable as well. In part, the role of the aggregate will be covered by the cooperating repositories here.

- Interestingly, you also gain something from all those changes. Some people have noticed that entities may grow unnecessarily large in naïve DDD, and have suggested to model different aspects of the same entity using different classes in different bounded contexts. It makes sense, of course, except that you end up with some redundancy between those aspects, because in practice there is never a perfect orthogonality (for instance, the PersonalName part is probably present in all the bounded contexts dealing with a Person). Once you disband the entity and rely on fine-grained concepts and cooperating repositories to build the subset you need, you’ll see that the redundancy tends to disappear, as you reuse the same repositories and the same fine-grained concepts in different bounded contexts (of course, some concepts will be used only in one BC, which is the point of having that BC in the first place). This can be obtained in “traditional” DDD with an heavier reliance on value objects, but it’s sort of built-in in this approach.

Overall, the differences are subtle enough to require some rather careful thinking before jumping from the DDD approach to something like this (not that I’m suggesting you do :-). The core message of DDD (mapping domain concepts to classes), of course, is still valid. However, the granularity tends to be finer, and the final architecture tends to be different.