We
used to be taught that, by spending enough time thinking about
a problem, we would come up with a "perfect" model, one
that embodies many interesting properties (often disguised as
principles). One of those properties was stability, that is, most
individual abstractions didn't need to change as requirements
evolved. Said otherwise, change was local, or even better, change
could be dealt with by adding new, small things (like new classes),
not by patching old things.
That
school didn't last; some would say it failed (as in "objects
have failed"). At some point in time, another school prevailed,
claiming that thinking too far into the future was bad, that it could
lead to the wrong model anyway, and that you'd better come up with
something simple that can solve today's problems, keeping the code
quality high so that you can easily evolve it later, safely protected
by a net of unit tests.
As
is common, one school tended to mischaracterize the other (and
vice-versa), usually by pushing things to the extreme through some
cleverly designed argument, and then claiming generality. It's easy
to do so while talking about software, as we lack sound theories and
tangible forces.
Consider
this picture instead:
Even
if you don't know squat about potential energy, local minima and
local maxima, is there any doubt the ball is going to fall easily?
Is
there any doubt that this ball is in stable equilibrium instead?
The
nice thing about mechanics is that we have both a sound theory of
forces (so that you can formally prove that some configuration is
stable, if needed) and an intuitive grasp of things (so that you
don't need big theories when things are reasonable simple).
This
is not so in software. When the guru claims something, the "proof"
stands mostly in his argumentation ability (rhetoric). Theorems, like
the CAP theorem, are few and far between. Lacking a theory of forces,
people tend to propose oversimplified models, where (for instance)
there is always a stable configuration for every problem (the old
school) or there is never a stable configuration for any problem
(some interpretations of the modern school).
Stability
Requirements
change; actually, our own understanding of the problem changes during
and after development. Technology shifts as we're writing our code.
All those changes in the requirement space act as forces. Those
forces materialize as changes in our artifacts (source code). Unless
we have the right shape in place, some of those forces will ripple
through our artifacts like waves. The problem, of course, is finding
(or even defining) "the right shape", one where stability
is maximized. The right shape is totally dependent on the problem
(forces), so there is no “right shape” per
se.
Unfortunately,
we don't know much about forces and shapes. Sure, there is an echo of
the "stability" property in some design principles. The
"open/closed" principle is trying to keep a class stable by
moving unstable parts into polymorphically-derived classes. The
"dependency inversion" principle is trying to contain the
wave of change by avoiding dependencies on concrete things (from the
classic Design Principles and DesignPatterns:
"One
motivation behind the DIP is to prevent you from depending upon
volatile modules. The DIP makes the assumption that anything concrete
is volatile"
[quite an assumption anyway]). Etc. But we're far from having a sound
theory. Concrete things might be stable, and interfaces unstable, for
instance.
Still,
sometimes we can find small, "stable" abstractions.
Sometimes, apparently, we can't. It would be ok if we could at least
recognize unstable configurations, and perhaps move toward more
stable shapes. This, however, would require an even more ambitious
step: to actually classify
the most common instabilities. Not unlike classifying mechanical
stress into tension, compression, bending, torsion, and shear (see
Notes on Software Design, Chapter 13: On Change for a digression on
this stuff), this would bring the software design conversation to an
entirely new level.
This
post is not the right place to do so, although I've spent quite some
time tinkering with the concept of stability in my exploration of the
Physics of Software. But let's at least try to classify a few common
instabilities:
1) Instability of internal structure or observable run-time behavior,
behind a uniform interface
For
instance, you have many shapes, each with its own optimized
implementation, but they can all Draw() themselves.
2) Instability in multiplicity, with uniform processing
You
have an unbounded collection of objects, but you treat them all the
same way.
3) Instability in structure, with a uniform externalized behavior
(usually reflective behavior)
For
instance, you have widely different structures, but all you want to
do is serialize them
4) Instability in structure, with a non-uniform (usually external)
behavior
You
have different structures or fields, and you do different things with
each one
Etc.
There aren’t many more cases anyway; for instance, encapsulation,
closures and currying can deal with another kind of instability
(guess which :-).
Progress
in programming languages brought us some helpful concepts to deal with
1, 2, and 3 (in different paradigms), but nothing to deal with 4.
We
may try to coerce 4 into 1, 2, or 3; for instance, by pretending that
everything is just a list or a map, so that instability in structure
becomes instability in multiplicity. That does not really work well.
It may work when you try to convert 3 into 2 (because your language
lacks reflection), but 4 is a different beast altogether.
Open
sets
In
my latest post on the physics of software, I used a visual metaphor to
show how some configurations are necessarily unstable. To recap, any
single
concept that is U/
entangled with an open
set of concepts is, by necessity, unstable.
In
the end, I came up with the uncomfortable idea that this class is
therefore unstable by construction:
class
Person
{
string
firstName;
string
lastName;
DateTime
dateOfBirth;
string
phone;
//
…
}
because, quoting myself :-), "there
is an unbounded set of attributes for a Person (mobile phone; email;
living address; working address; place of birth; height; weight;
etc), and we have a single class which is U/U entangled with that
set".
What
is worse, we're facing type-4 instability, as I will use the
dateOfBirth to calculate your age and email to contact you. That's
not uniform behavior.
I
also said that the force field is suggesting a different shape, not
necessarily something you want to use, but a different shape
nonetheless. So this post is not really about instability, but about
what the force field is telling us to do.
Don't
do this at home
What
I'm going to show is basically a logic consequence. It doesn't mean
you have to do things that way, or that I'm proposing you do things
that way. Actually, I
totally discourage you from doing things that way.
It doesn't play well with the way you've been taught to structure
your code. It doesn't play well with your language and tools; with
your database; with your UI library; etc. Of course, it might just be
the opposite (UI libraries, tools, languages, etc. are not well
aligned with the real nature of things). But the net result is the
same. Don't
do it.
That said, here is what you could do if you were insane (yes, I’ve
done that a few times :-).
Note:
I've removed quite a lot of details from what follows, as this post
was turning into an endless exploration of possibilities. That should
be ok since you're not supposed to use this post as a starting point
for anything practical anyway :-).
Step
1 (simple): Decompose to small Classes with a Role
Large
abstractions are usually unstable (I have a nice theory about this,
based on entanglement: to a screen near to you sooner or later). So
let's do the simple thing first, and re-group fields into smaller
abstractions aggregated by role:
class
PhoneNumber
{
//
some structure and responsibility here
}
class
Address
{
//
some structure and responsibility here
//
possibly using fine-grained classes like
//
Country, State, City, etc
}
class
PersonalName
{
//
some structure and responsibility here
//
see http://en.wikipedia.org/wiki/Personal_name
}
class
Sex
{
//
...
}
class
DateOfBirth
{
//
some structure and responsibility here
//
(see also below)
}
At
this point, we could at least represent our Person as:
class
Person
{
PersonalName
who;
Sex
gender;
DateOfBirth
born;
Address
livingAddress;
Address
workingAddress;
PhoneNumber
homePhone;
PhoneNumber
mobilePhone;
//
etc
}
This
does not
solve the problem, of course. But small abstractions are often more
stable, and given the proper overall design, partitioning might be
enough to prevent a ripple of changes. For instance (in theory) any
change to the structure of a phone number should be isolated into
PhoneNumber and should not ripple into Person (if you can make your
DB and UI play nice, that is).
This
partitioning needs careful attention. For instance, DateOfBirth is
now a class; so it’s not just a DateTime. The reason to do so is to
have a place
where we can move a responsibility formerly allocated to Person
(calculate age). You can’t ask a DateTime to calculate Age; it
wouldn’t be appropriate for such a general class. It’s a perfect
responsibility for a DateOfBirth class, though.
Note
that I probably need many more classes than you're willing to create.
Address, for instance, is a very specific abstraction, describing a
place with extreme detail. Sometimes we want a vaguer concept of
place (like "place of birth"). A structural hierarchy of
places can be in order here, and we may even find a simple way to
deal with increasingly detailed information about a place.
One
may also want to group the who,
gender
and born
fields into a separate Demographics sub-center. It's fine to do so,
of course.
The
shape of step 1
Apparently,
Step 1 is not making much of a difference. Yet there is something
here anyway. The initial shape was basically a large, big, unstable
center (the Person). Instability will occur, by necessity, inside
that center (as there is no other place).
Now
we have a number of sub-centers (the Address, the PersonalName, etc),
and instability can be localized. Oh, yes, PersonalName can easily be
unstable; move outside your familiar culture and you'll discover new
needs and new fields.
Step
1 is also suggesting a potential fractal hierarchy of concepts (see
Place) that I'm not going to investigate further, but is nonetheless
important in the pursuit of a truly object-oriented form. In a sense,
it's moving from a big circle (Person) to something like this
(represented at an intermediate step in a fractal decomposition)
Step
1 is also suggesting that sub-centers, like PhoneNumber and Address,
are full-fledged classes, with (at the very minimum) validation
behavior and probably more. It's also suggesting that we should have
a widget to edit a PhoneNumber, and a widget to edit an Address, and
that a UI technology without widgets is a
rather stupid idea
not fitting well with this. It's also suggesting that the UI should
be dynamically composed by reflectively exploring the Person
structure, looking for widget-supported concepts.
Finally,
Step 1 might suggest that we may want to have a PhoneNumberRepository
and so on to deal with persistence in a more modular way, but then,
let's wait for Step 2.
Step
2: the reversal
Abandon
you hopes. Here is where the fun begins. Just like User shouldn't
know about Credential (see that “Chapter 16” post), but
Credential should know about User, Person shouldn't know about
PhoneNumber. PhoneNumber should know about Person. (Yes, that might
require a more abstract concept than a Person, say a base class,
because companies have phone numbers too; I’ll ignore that for
simplicity).
That's
it, nice and easy. The way to avoid instability in structure is to
disband
the structure.
There is no spoon. There is no Person as you tend to think of it. The
role of Person, as we'll see, is to provide identity
and (optionally) reflective behavior.
So
Step 2 brings to this shape (as an object model)
Many
PhoneNumbers and Addresses can be connected to a single Person,
according to their role. If I come up with an Email class, I don't
need to change the Person. The person just provides identity. [Yes,
I’m stepping away from my usual “dependencies go up” standard
because I want to show the gravitational pattern here].
This
should really be the end of this post. That’s what the forcefield
is telling us to do. Everything that follows is just one
way to make it work, with its own consequences. There might be better
ways. Actually, I hope so.
Note
that the diagram above is a conceptual
object model. In practice, you won’t see those associations in your
classes, because they would be dealt with at the repository level.
This might be a little confusing, so let's start from the bottom: the
database.
The
Database
A
truly modular system would call for separate tables for each modular
concept. Therefore, a table for PhoneNumber, a table for Address,
etc. Most of those tables would have:
-
the person ID
-
the concept Role (like "home number"). In some cases
(DateOfBirth) this is not needed.
-
all the concept's fields (like country prefix and whatever)
(some
of the satellite classes are unstable; guess you can figure what
happens to their database counterpart)
There
is also a Person table. The table provides only an ID, to act as a
foreign key on the other tables. That's the persistence-level idea of
providing identity.
Of
course, that provides maximum modularity. Need a new role for an
existing concept? Nothing to change at the persistence level. Need a
new concept (email)? Just add a table.
Still,
now it’s also inefficient to build the "entire" person
(or a significant subset). Of course, it would be possible to come up
with a database (or DBMS as they used to say) where you define a
logical structure (tables) and then you specify a physical structure
(like: pack all this stuff into a single row) and the database is
smart enough to remove the logical joins when mapping to physical access. But our current toolset is
not well aligned with this. In a sense, this is basically the
opposite of views.
Worst
case: you do a lot of small queries. Alternative: a number of joins
and some smart code (see below). In practice, what
follows does not require separate tables; it’s just the shape most
aligned with the force field. As I’ll talk about persistence code,
you’ll see that the decision about database structure won’t
percolate much.
The
Repository
Coming
from a traditional design background, it would make sense to have a
Repository for every concept, like a PhoneNumberRepository (or
perhaps a PhoneBook class :-). After all, this would preserve
modularity as well (and avoid the hourglass shape we saw in chapter
16), again at a risk of poor performance.
Ignoring
performance for a minute, single-concept repositories would usually
be trivial. So, in practice, we could have classes like this (you may
or may not like static methods; it’s a rather irrelevant details in
this context; also, I'll assume that you can call a repository and
get back a business object. See Life without Stupid Objects, Episode 1 for a way to do that while
preserving strict layering.)
class
AddressRepository
{
public
static Address GetPersonAddress( Id personIdentity, Role r );
public
static IEnumerable
GetPersonAddresses( Id personIdentity );
GetPersonAddresses( Id personIdentity );
//
…
}
The
Address object will not
need to contain a Person object; it’s enough to have its Id inside.
However,
this choice will force us to do a select for each and every
fine-grained concept. That’s usually unacceptable, so let’s play
with this. I’ll assume a SQL database, because it’s what I know
best and it’s the environment where I’ve actually implemented
most of this weird stuff.
The
Statement (part 1)
A
Repository is usually charged with quite a few responsibilities. Say
that I introduce a new class instead, that I can call Statement. The
Statement, among other things, would offer the ability to:
-
specify a center / start table, providing identity
-
specify that we need some fields, from some other tables (joined with
the center table)
-
add conditions
- transform
all that into SQL
-
execute the SQL
-
help Repositories turn that stuff into objects (I'll get back to this
in a minute)
Using
a statement, I could to things like this (at a rather low level):
Statement
s = new Statement( “person”, “id” ) ; // the center table and identity field
s.AddFields(
“role”, “street” ).FromTable( “address” ).RelatedBy( “id”
) ;
s.AddFields(
“firstname” ).FromTable( “personalname” ).RelatedBy( “id”
) ;
Which
of course would become a 3-table join with 4 fields selected. The
real syntax needs to get a bit more complicated as the concepts get
fractal, but let’s keep it simple.
I
could also add conditions, like:
s.AddFields(
“role”, “street” ).FromTable( “address” ).RelatedBy( “id”
).Where( equal, “role”, “home” ) ;
At
this stage, the Statement is just a way to build SQL statements
compositionally, and then execute them. The trick is, of course, that
the
composition can now be spread among Repositories,
each dealing only with its own table.
Note:
In .NET land, this stuff can leverage LINQ expression trees, and
improve on my syntax as well using lambdas here and there. In other
languages, it might be slightly more challenging, but still doable.
The
new Repository
The
role of a repository is now to:
-
contribute in creating a statement
-
process the result of a statement execution and build objects
So,
the repository does not own the statement, does not execute the
statement, does not entirely control the statement. It participates
in building a statement, and in converting results back to objects.
Traditionally,
a method like GetPersonAddresses does everything at the same time, so
let’s split this thing in two:
class
AddressRepository
{
// mutable object syntax
public
static void PrepareGetAddresses( Statement s );
private
static List
ProcessGetAddress( Record r );
ProcessGetAddress( Record r );
//
…
}
Of
course, I can still provide the former GetPersonAddress method as a
shortcut when composition is not needed. The nice part, however, is
that now I can:
-
create a statement for a center table
-
call any number of satellite repositories to have their own tables,
fields and conditions merged in, by calling the Prepare... methods
- execute
the statement
Executing
the statement, at first, will just get a bunch of records from the
DB, but I want objects. Actually, I want each repository to create
its own sub-object for every record that has been retrieved. That's
quite simple: during prepare, the repository will instruct the
Statement with a callback function / delegate / whatever (that's why
ProcessGetAddress can be made private). The statement, after
execution, will iterate over the retrieved records, and call all
the callback functions in different repositories for each record.
Taken together, what we get back is the object-land equivalent of a
record. The problem, now, is where do we store that stuff.
(Oh, Record is just a convenient class to hide the technology-specific notion of a record).
The
Statement (part 2)
Ideally,
I'd like to prepare my statement (by calling various repositories; an
example will follow shortly), then execute my statement, and get back
a sequence of objects.
If
you're in the dynamic typing side of the world, that's very simple.
The statement can simply invoke the repository callback functions for
each and every record, take the contribution from every repository,
build a dynamic object, add it to a list, and that will be it.
If
you're into static typing, that's a bit of a challenge. Consider a
statement that retrieves a significant subset of fields formerly
belonging to Person, like DateOfBirth, Address, Email, etc. The
statement is built from the contribution of several different
repositories, then executed, then the result is processed by
different repositories, so for every record we get back a
DateOfBirth, an Address, an Email, etc. As
much as you might be tempted to bring this stuff together into a
Person class, that’s exactly what we’re trying not
to do, so don’t :-).
Note
how this is suggesting that when we deal with unstable structures, at
some point we may benefit from a dynamic object. Note the context.
I’m not saying that every object has to be dynamic, or that we
always need them. Just that in this case, in this specific role as a
result of statement execution for unstable structures, a dynamic
object is very useful.
An
idea I’ve had no chance to apply so far in real projects is to
adopt Tuples (like C# type-safe tuples, that is) as type-safe transport
objects. Would be nice if it worked, although C# generics are quite
limited; for instance, something like variadic templates in C++ would
be useful to deal with fractal decomposition.
So,
at this stage, I will assume that the Statement will return a dynamic
object, and that object will be made of typed business objects like
DateOfBirth. If your language lacks dynamic objects, an HashMap of
some sort would be ok too.
We’re
still missing a few things:
2) How do I deal with cross-concept (business) logic.
3) What about the [unstable] UI?
Believe it or not, it’s easier to start with (2) because it will pave the way to (1) as well. I'll postpone the UI till the end, because (2) will show that it's unstable only in some use cases.
Cross-concept logic
Ok, so we can extract a Person, or any projection of a Person (say, only phone and age), as a dynamic bag of type-safe objects. What about cross-concept logic, that is, some processing involving more than one satellite class?
When I talk about this, people tend to come up with unrealistic examples because, in practice, cross-concept logic is not so common. If it were, those concepts wouldn't be real sub-centers. I don't need your phone number to know whether or not it's your birthday. Still, there are realistic examples, like:
Data
extraction:
I want to find all males over 35 living in a given town, say for
spamming
marketing purposes.
Default
values:
if I know where you live and I'm asking for your phone number, I may
want to precompile the area prefix.
Business
logic:
even in this trivial Person domain, we can find something meaningful.
For instance, when the user logs in, we want to check if it’s his
birthday, and if so, we want to say "happy birthday, ".
It borders on presentation and extraction logic, but it's not, and
it’s cross-concept.
None
of those is particularly challenging. Actually, the new shape helps
revealing the nature of what we're doing much better than a
monolithic Person class.
Data
Extraction
The
basic idea for cross-concept extraction is simple: we want to resolve
a set of properties
to a set of identities,
then extract the individual concepts that we’re interested in, which are connected to those identities.
Well,
that's what the Statement + Repositories can easily do. The only
question left is where do we put the Statement composition,
execution, and processing of the results. Depending on your view of
layers, that question may already contain its own answer: if you
only allow Application-level classes to access repositories, that
logic should be in an Application (or Use-Case) layer class. A
code sketch for this logic could be:
Statement
spam = PersonRepository.PrepareIdentity( id );
PersonalNameRepository.PrepareGet(
spam ) ;
EmailRepository.PrepareGet(
spam ) ;
DateOfBirthRepository.SetCondition(
above, 35, spam ) ;
AddressRepository.SetCondition(
"city", "some city", spam ) ;
IEnumerable<
dynamic > victims = spam.Exec();
//
iterate over victims and do things with email and personal name
What
does something like
PersonRepository.PrepareIdentity(
id );
actually
do? Quite
simple:
return
new Statement( “person”, “id” ).Where( equal, “id”, id )
;
I
guess you can figure out the rest. We're basically building a
Statement from the cooperation of different repositories. PrepareGet
is not the nicest name ever for a method, but it's very easy to
understand (I hope), and that's my main concern at this time (as I'm
leaving out lot of details). AddTo or IncludeIn could be nicer
versions. Also, a more fluent approach like:
DateOfBirthRepository.SetCondition(
above, 35 ).In( spam ) ;
would
improve readability quite a bit.
I
understand you may not like that code at first. It "looks"
like the kind of code you want to hide behind repositories. But
that's mostly about habits, not substance. The persistence stuff is
indeed inside repositories (+ Statement). This is a type-4 unstable
selection of things, and must be in the upper layers, not in the
bottom layers.
Note: in this case, the set of fields was basically hard-coded. In other cases, it's dynamically determined by various conditions. The compositional nature of the Statement tends to play rather well with that.
Default
Values
The
default value thing can happen at different levels. In the specific
case given above, it's obviously a user interaction concern, as there is no
business rule about users having phones within the same prefix area
where they live. It's just a convenience when there is a human on the
other side of the screen, typing data.
So
let’s decompose this further, in two distinct responsibilities:
-
Find the most likely prefix given an Address
exercise:
where do we put that logic? Is that about the entire Address or just
a smaller concept like Area?
-
Propose that as a default value.
We
need a way to trigger a call to the logic above and use that as a default value; as this is a
user interaction concern (for instance, it would make no sense to do
that when we import people from a file), the right place for the
trigger is the UI itself, or if you’re stuck in MVC, the
controller. Triggering from the UI would make for a much more responsive application (think web).
AOP-like
interception / injection would help here. Basically, we don’t want the
PhoneNumber widget and the Address widget to know each other. We need
a place that can see both and inject an aspect
there (a user interaction aspect). Don’t have aspects in your UI
layer? Told you :-), your UI technology isn't good for this stuff.
Business
Logic
In
the usual BigPerson approach, it's obvious that we can put this logic
inside the Person, because guess what, all the data you need is
there, just add methods :-). Alternatively, some would put this
inside a Controller.
With
this design, there is no natural
place for cross-concept logic. This
is a good thing.
That void is telling you something (see also the quote at the end).
Without
making a big fuss out of it:
-
at the UI level, we should be able to simply put a Welcome widget on
the home page.
-
the Welcome widget should be populated using a Welcome API / service.
-
the Welcome class, given the person identity, should gather the
necessary objects and create the right message. Something
like:
Statement
welcome = PersonRepository.PrepareIdentity( id );
PersonalNameRepository.PrepareGet(
welcome ) ;
DateOfBirthRepository.PrepareGet(
welcome ) ;
dynamic
res = welcome.ExecSingle();
PersonalName
pn = res.PersonalName;
DateOfBirth
dob = res.DateOfBirth;
if(
dob.IsBirthday() ) // a stable responsibility :-)
{
//
say happy birthday + name
}
else
{
//
say hello + name
}
(ouch,
an if;
call the anti-if police :-).
Yes,
the static + dynamic style sucks a bit. If you got time to spare and
wanna try the Tuple thing, maybe you can make it largely type-safe in
many practical cases.
But
it's not object oriented!
No,
what you mean is that it is not coarse-grained-domain-oriented, at
least not the way you want to think about your domain, that is, based
on big, fat, unstable classes with a lot of logic, like Person.
Nobody said it should be like that (well, somebody did; it's just
not universally right).
I
actually have lot
of objects
flowing around. When you decompose to small enough pieces, you'll
find stable abstractions, like DateOfBirth. They have clear
responsibilities, like Age or IsBirthday. Actually, I am allowing,
or even forcing, small abstractions like Area to emerge, instead of
disappearing into the faceless blob we usually call Person. I also have a
Welcome object. A Welcome widget too. I have many more (cooperating)
objects than the BigPerson style tends to create.
What
about the UI?
Well,
the UI for some use cases is pretty stable (see the Welcome widget).
In other cases, of course, it's not. The paramount case is when you
want to edit or show "the Person" itself.
What
we need to do, of course :-), is to dynamically discover the Person
satellites, then dynamically build a UI using satellite-specific
widgets (so, a PersonalName widget, a DateOfBirth widget, etc). That
discovery should also tell us about role-based satellites like
Address, so that we can provide a way to add / display multiple
widgets of that kind (for every role)
Note
that we cannot use regular reflection for this, as there is no Person
class with all that stuff inside. Sure, the right class to ask would
be Person. But that requires that Person implements its own
reflective behavior, and that satellite concepts register
with Person (so that Person can expose satellites, without knowing
satellites; that's how we deal with instability here). This might easily move to a base class.
Things
get more complicated when satellites have satellites, especially when
you want to fine-tune appearance. In some cases, a trade-off between
aesthetics and stability is necessary, and a custom widget might have
to be created instead of relying on reflective composition of
lower-level widgets. As usual, proper UI technologies might help or
hinder. A technology lacking any kind of sophisticated layout
management, for instance, is less than optimal (yet we're stuck with
HTML and CSS :-).
The
Value of Emptiness (read
this part at your own risk :-)
When
I first thought about this stuff, a very old story came back to my
mind. There are many translations (see here for
a few; pick the one you like most), but here is one I like:
Thirty
spokes are joined together in a wheel,
but
it is the center hole
that
allows the wheel to function.
Person
is the hole. By providing identity and nothing more, Person is
leaving an empty space where things can't coalesce into a
gravitational center. Yet, that hole acts as a virtual
gravitational center for a constellation of finer-grained, more
stable concepts. It is by being empty that Person allows the system
to grow organically, instead of monolithically.
Don’t
do it
A
promise is a promise, so I had to write this stuff, but I’m glad it's over, so I can move on to some more interesting post; way
more interesting, actually.
Please,
don’t write me in anger. It’s ok not to like this. It’s ok to
disagree. It’s probably not ok to pretend that your Person class is
stable while is not, but it’s ok to look the other way just like
everyone else is doing, keep patching it as requirements change, and
claim you're being agile.
If,
on the other hand, that blurb above seems tempting, be smart: don’t
do it anyway.
There
is no invitation to follow me on twitter, because you can’t
possibly like this post. Actually, I’ve downvoted it myself :-).
20 comments:
Fantastic!
I am one of those longtime anonymous cowards hanging around your blog in silence. I must say that am deeply impressed by your work and that I keep looking forward to your next post. Keep it up!
Carlo, this is enlightening.
But, apart for the lacking of materials that fit this architecture, I can't see why you keep saying "don't do it"!
Cheers
Daniele
Great stuff, I can see why you said it could come up to be rather unpopular :)
But it is somewhat inspiring to really see a different approch/point of view.
By the way, you mentioned performance issue, and I can easily see the problem, but you also said you've done things this way a few times. How difficult is and how many tricks(?) would it take to make that architecture really feasible? I can see the exceptional degree of flexibility and configurability, to the point that it's simple to add a field to a concept (while I suspect you have something a bit more generic than the typed repository), and adding a sub-centre is a matter of add a new class, even dynamically, and adjust the configuration. On the other hand a join for each sub-centre at db level can really heart performance.
I can further see how limiting is current UI technologies for this architecture, and you mentioned HTML and CSS, but I'm sure you know quite extensively WPF. Is something along WPF idea a better place to explore from this standpoint?
If you walked into a new job and started getting familiar with a large system designed this way, you might start with a lot of WTFs and OMGWHY reactions. I'm sure there are intelligent life forms out there who would model things completely inside out from what most people would consider normal and acceptable. It's been a while since I actually modeled something out of space, but I imagine I'd start exactly like that: Person contains a set of attributes. And oh how boring would that be now..
Erich: thanks :-)
Daniele: I'm so happy I didn't rush writing this answer because Brian did it better than I could have.
He speaks for Most People :-), and Most People would find this to be outside norm, unacceptable, and inside out. Most People would rather ignore concepts like forces and instability and just do what's considered normal and acceptable.
Most People are happier if you do BrianDrivenDesign than if you do BrainDrivenDesign (sorry, couldn't help it, it's the nerd inside), and will react with lots of WTF and OMGWHY if you keep doing things Most People can't understand without learning a damn lot of stuff first.
It's bad when you piss Most People off, so don't.
Fulvio: as I don't want to help you piss off Most People, I'm not going to give you any useful answer, if not by observing that if you give up the separation of concepts at the database level, you only have to change what happens at the statement level, and that you can even get creative with that (but please don't).
I never fell in love with WPF. I see it started good, then made a sequence of design decisions that brought it very far from something I care for (productivity). It's very binding-friendly, but not particularly widget-friendly, not to the extent I would like, anyway.
Brian: curious how, even after my copious "don't do it", after I said "I totally discourage you from doing things that way. It doesn't play well with the way you've been taught to structure your code [etc]", you still felt the need to tell me how crappy this idea looks : ))
On a tangent, if you cared to look, for instance, at the history of medicine, you would be surprised by how many things Most People used to consider normal and acceptable, and how completely inside out we do things today.
Like, Most People know the right way to balance "humors" inside your body is to use leeches to remove blood when you're sick, right? Transfusions and intravenous therapy are completely inside out! WTF! OMGWHY!
Some coincidence. I ended up splitting everything up like you did a few days ago but luckily only on the whiteboard and not in code.
Now that you have shown us how problematic this is.. can you contrast this with how you would actually do it ? As in real life working on the clock. Please be honest.. Its almost like you keep the perfect answer from us ;- ) I have had some similar thoughts about my work but I struggle a bit with actually designing something implementable that I am content with.
First of all let me apologise for "heart" instead of "hurt". Spell checker in italian on the phone really hurts your writing in english :D
Yep I can see you can give up separation of concepts at db level, but in this way you give up something you advocate. But I think this is more a technology issue and how DBMS are designed. At the end of the day, we still have to build something!
I really hope you are going to write something about UI soon, I'm very interested in the topic, I'm actually tinkering about the idea of a new C++11 GUI toolkit...
[part 1]
Johannes,
I'm not keeping anything away from you guys, except a
tons of details (it would take a book to; I just have a blog post). About your question, I can tell you what I am doing right now in 3 different projects I'm following, with very different scales and requirements, so that you can get some context.
Just a note, however. A significant part of the "complexity" comes from an impedance mismatch with tools and libraries. Another significant part from the fact that I have compressed a long explanation in a relatively short amount of text. The Statement, the cooperating repositories... they're not rocket science. They're just different, and may appear complex because you can't reuse much of what you already know deeply and can do without even looking. Of course, "innovating on the clock" is always a big challenge.
That said, here it is:
Case 1: a small Android app I'm writing in my spare time (sportablet)
Most abstractions are stable, so I choose the "traditional" approach. Abstractions are stable because they come from stable, legacy protocols, or from stable domain concepts (like that a specific recording belongs to a specific device). So far I have a dozen tables or so, because at this time it's not a database-intensive app, but you can probably guess I've still partitioned things quite a bit. I don't have dynamic objects, except for one case, which is easily handled and is not fractal in nature (inside a recording I can have temperature or not, altitude or not, heart rate or not, etc; this is an open set and I have nothing hard-coded here). This can easily be managed as a type 1 + 2 instability, which is easy.
Looking into the near future, I can easily see it becoming more database-oriented, and I can easily foresee some abstractions to be unstable. For instance, I'm not tracking biometrics yet (height; weight; lean mass; VO2Max; etc), but they certainly form an unstable set. These will NOT become explicit fields of a Person class, but can be largely brought under control using inheritance, which is what I'll probably do (so mapping this instability to type 1 + 2 again). Gear tracking will present the same kind of instability, and I'm planning to reduce it to type 1 + 2 as well.
Bottom line: there was no need for it, so I didn't do it. I had a few relevant instabilities that I treated as such, but didn't need a "sophisticated" structure because I managed to bring everything under type 1 + 2.
[part 2]
Case 2: a large corporate banking application with a legacy database
The team is not prepared for this approach, so we started the "traditional" way, considering we also have a legacy db. However, other forces are pushing us toward a similar solution for a few selected cases. The issue here is not instability, but a nested object model with leaves shared between different root aggregates, and a desire not to replicate logic in different repositories. So for a few selected cases we'll use a subset of this idea. At the very minimum, some repositories will deserialize their subset of fields from a record coming from the outside. We'll probably take the easy road and put all the coordination logic into the root aggregate repositories, which will probably build the SQL statement on their own, without cooperation from the sub-repositories. This at least in the first sprints, as the team can get acquainted with the idea and the consequences. Then we'll see.
Case 3: a brand-new, cloud-based, multi-tenant PLM as a service
Can't tell you much about this, but here things are much more complicated, because we want end users to easily customize entities by adding fields (and getting some behavior for free).
I'm doing something like this on steroids, because I'm facing an even harder problem (instability is in the hands of the end user). I also have serious performance issues here, and I cannot accept the performance hits of multiple joins, so I have a field mapping thing to flatten fields into a single table. Note that in this case the "traditional" would simply fail. It's no longer a matter of maintenance, but more of "can / can't do", can or cannot handle those forces.
Quoting part of the post: the right shape is totally dependent on the problem (forces), so there is no “right shape” per se. That's the downside of doing brain-driven design :-))
Have you ever heard of Entity-Component systems? They're widely used in game engines, and they're mostly the same idea: an entity (representing almost anything in the game) is nothing but an ID by itself. The entity might be externally associated to different components (for example TransformComponent to provide its position in the world, PhysicsComponent to give it physical behavior, GraphicsComponent to specify its graphical appearance, AnimationComponent to make it move if it's a character for example, WeaponComponent if it's a weapon to specify how it shoots or whatever, ...).
It goes even further, the behavior of a component is not in the component class itself, but rather in a set of Systems which will process all entities that have a given set of components (for example the PhysicsSystem will need a TransformComponent and a PhysicsComponent of the same entity to process that entity, and a MovementSystem might need a TransformComponent, a PhysicsComponent, an AnimationComponent and an InputComponent for example).
The idea is the same, prevent changes in one area of the game from rippling into everything else. If there's a bug in the physics of a character, there's no reason it should affect other parts of its representation.
As an additional bonus, this kind of representation is generally faster for the game to process once all entities and components have been loaded, because of the way Systems will traverse the components. Since components of the same type are generally in contiguous memory, traversing them is cache-friendly, whereas traversing a list of fat entities would involve lots of cache misses. So yes, the data representation is a bit ugly and slow, but once the data is in memory it's pretty efficient.
If you want to see what others have written on entity-component systems, this is a good place to start.
So it would seem that in at least one form, your "Don't do it" has been done :-)
@Jean, Ha! On the "Entity-Systems" page you linked to I seem to recall having a big debate with the author, who thinks that such a system is not object oriented programming, because (I think) it doesn't have one instance per in-game entity. I thought that it was still perfectly normal object-oriented programming, just with a different choice of classes and objects, to better match the required behaviour. (page 2 of his ES posts.)
Jean-Sébastien: thanks for the link, I've been through the various posts and found them quite interesting.
There are certainly similarities between what I propose and the Entity-Component architecture as discussed there. There are also some notable differences, both because the context is different and because I didn't feel the need to go through the usual OOP-bashing :-).
Actually, the "systems" part won't translate properly to business applications. Well, honestly, I don't like that part much in a game engine either. It might be ok for relatively stable "systems" like physics, but if you have for instance an AI "system", there is a risk that it would be just another unstable monolith, whereby if you introduce a new character with its own peculiarities, you have to modify the "system" (unless, of course, you adopt strategies here to make polymorphic extension possible).
In this sense, I tend to agree with Tartle that what is presented there is an architecture, not a paradigm. That architecture could actually benefit a lot from OO thinking. Unfortunately there are a lot of OO misconceptions in those posts.
Still, yes, there are definitely some similarities, although the "tough" part (cooperating repositories) is also absent from the E-C architecture as presented.
Thanks again for the bringing up this parallel!
There's actually something similar to your idea in drupal. There's a module (content creation kit) that allows site admin to define new content and dynamically define how they are composed.
At the db level, each field is mapped to a table that has a predefined structure depending on the type of the field (but I think it's somewhat extensible, I'm not an expert whatsoever though).
They have a single "node" class to provide identity (everything that's a node is a piece of content), but it has a bunch of other properties (like title, creation timestamp, etc.)
The difference with your approch is that they do not have the statement and repository, but just hook called when a node is loaded, added, deleted, etc. that allowed other modules to attach more structure to the base node (dynamically building an object).
Hi Carlo, yes I was talking about general similarity not necessarily all the details.
The way games generally work with systems that require wildly different behavior for different types of objects (like an AI system for example) is through scripting or a graph-based behavior editor or something similar, essentially making the differing parts data instead of code. It makes it harder to debug, but prevents the code of such a system from becoming a large mess as you pointed out.
But since we're talking about code here, I'll close just by saying that the general concept is similar, but of course the final design and implementation will differ in different problem spaces. But it's useful to know what different industries are doing to solve similar problems to see what you can apply to your own problem space. In that respect, thanks for your very insightful post which has many items I will be able to apply to my own situations even though I work in a different area of programming. (even though I shouldn't do it :-) )
Fulvio: I have no idea about Drupal :-), but yes, I can see some similarity in what you are describing here.
Generally speaking, a presentation-oriented, plugin-based system needs mostly to handle "type 3 instability".
The major difference with type 4 is that in type 3 you can easily present the user with dynamic forms and then store data, but there is no server-side behavior except CRUD, and definitely no cross-concept behavior.
That's significantly easier and can be directly mapped to a plugin model, perhaps with some client-side behavior. To move beyond that, things get a bit more complicated (like with the approach I discussed).
Still, even the simplified model can benefit from the idea of pure identity + concepts...
Great post!
I don't know if you are familiar with Domain-Driven Design, but there is a concept of Value Object (that represents value - no id), and Entity (that has the actual id and may contain Value Objects). I can see a lot of similarities.
Piotr: sure, I didn't mention DDD because as you said there are similarities, but also a few notable differences, and I didn't want to add just another source of controversy :-). The post is old news by now, so I guess I can add a little comparison with the DDD approach here.
- As you observed, the satellite concepts are very similar to value objects. Here, however, they get their own independent storage. They do not get their own identity, so to speak, because they adopt the identity of the dominating center (the former entity).
- Unstable entities are disbanded, not only from the storage perspective, but also from the domain perspective, so they do not contain value objects anymore. Here is where I see the largest distance with the DDD approach, where unstable entities still materialize as (unstable) classes.
- In most cases, you also have to let go of the concept of Aggregate, because when your domain is unstable, Aggregates will be unstable as well. In part, the role of the aggregate will be covered by the cooperating repositories here.
- Interestingly, you also gain something from all those changes. Some people have noticed that entities may grow unnecessarily large in naïve DDD, and have suggested to model different aspects of the same entity using different classes in different bounded contexts. It makes sense, of course, except that you end up with some redundancy between those aspects, because in practice there is never a perfect orthogonality (for instance, the PersonalName part is probably present in all the bounded contexts dealing with a Person). Once you disband the entity and rely on fine-grained concepts and cooperating repositories to build the subset you need, you’ll see that the redundancy tends to disappear, as you reuse the same repositories and the same fine-grained concepts in different bounded contexts (of course, some concepts will be used only in one BC, which is the point of having that BC in the first place). This can be obtained in “traditional” DDD with an heavier reliance on value objects, but it’s sort of built-in in this approach.
Overall, the differences are subtle enough to require some rather careful thinking before jumping from the DDD approach to something like this (not that I’m suggesting you do :-). The core message of DDD (mapping domain concepts to classes), of course, is still valid. However, the granularity tends to be finer, and the final architecture tends to be different.
Post a Comment