When we build large systems, abstraction is our best friend. We can't constantly worry about all the tiny details of each and every component, class, library, API, etc. Still, abstraction is not a substitute for understanding. We can safely abstract away what we know, not what we don't know.
Real-world example: I know how to do asynchronous I/O using completion ports in Windows. I know how it works (right down into the kernel, although this depth is not really necessary) and when to use it. I've also built a small framework of classes to abstract away some details and crystallize some sensible design decisions. It is easier to build systems over the abstract view of my miniframework; however, I can always look under the hood if needed, because I'm abstracting away something that I know.
We can also try to abstract away things we don't really know. We routinely do that with communication protocols: we don't need to know the gory details of TCP/IP to open a socket and send/receive a few data. Well, maybe we do need to know :-) if we want to squeeze good performances out of it. Abstracting the unknown is the primary purpose of many libraries, components and technologies.
Whenever we use abstraction to shield ourselves from missing knowledge, however, we are taking a chance. Two things can go wrong with abstracting the unknown:
- the abstraction is not working the way it should. In this case, people often resort to exploratory programming, trying out things (parameters, flags, execution order) and looking for a combination that works. This is just a way to avoid looking under the hood, but we end up with a system that seems to work, but you don't know why.
- much more dangerous: we build our system on fragile foundations, because we are misinterpreting the abstraction.
Real world example again (this is actually what inspired me to write this post). I was reading an article on the Caching Framework Application Block in .NET. The author says that you can host the CacheService in-process, out of process, or in another (shared) computer, which is right. He then goes on to say that you can choose your storage to be Singleton, Memory Mapped File, and Sql Server. He says that Singleton is a hash table, therefore not thread-safe, therefore to be used only in process. That Memory Mapped File is thread-safe. That Sql Server is persistent and can handle large date volumes. Everything in italics is basically wrong. Seems like he didn't know the underlying abstractions, ending up with the wrong reasons to choose the storage. Singleton is a hash table, and a hash table can be easily turned into a thread-safe container simply by using a synchronization primitive. Indeed, even in an in-process scenario, you may have multiple threads, so it would be ridiculous if the Singleton storage wasn't thread safe. However, since the hash table will be stored into the address space of one single process, it won't be easily shared among processes. A Memory Mapped File is not thread safe by itself. We still need to build synchronization around it to avoid conflicts. Still, a MMF can be easily shared between processes, and is the ideal candidate for an out of process scenario. Again, MMF are easily shared between processes but not between computers. Opening your MMF on a shared server does not reliably work. If you want to share your cache between multiple computers, you need a standalone process communicating over the network and keeping your stuff cached. A database is a ready-made implementation of that, which accidentally is also persistent and can handle large storages (hardly useful for a cache anyway!).
In this specific case, the misunderstanding wasn't really that dangerous, but I see this happening all the time: people committing to (e.g.) COM+ or J2EE not because they need those services, they understand what is going on under the abstraction, and want the details abstracted away, but because they don't want to understand what is going on, and they hope the abstractions will take care of everything.
Conclusion: there is no free lunch - we need to understand the potential gain and potential risk of our decisions, including abstracting away the unknown, weight the two, and choose responsibly.