Misapplied abstractions and unnecessary complexity
2015-12-16

Let me start off with a few quotes. Here are several from Dijkstra:

“How do we convince people that in programming simplicity and clarity — in short: what mathematicians call “elegance” — are not a dispensable luxury, but a crucial matter that decides between success and failure?”

“It may sound paradoxical, but a reliable (and therefore simple) program is much cheaper to develop and use than a (complicated and therefore) unreliable one.”

“Much of what has been done in programming (and in computing science in general) can be simplified drastically”.

And here is one more from C.A.R. Hoare:

“…the threshold for my tolerance of complexity is much lower than it used to be”.

My tolerance for complexity is similarly reducing as I gain more years of experience writing software, and I discover more and more unnecessary complexity in the way software systems are built.

There is a lot to discuss about the sources of complexity, whether it’s accidental or essential, and whether abstractions ultimately increase or reduce complexity.

In this post, I’d like to make just one point: it seems to me that one reason for avoidable complexity is that programmers have a tendency to get carried away with abstractions and endless layers of software.

Let me point out a few examples I have seen.

ORMs

There’s a huge variety of ORMs for many languages. They all try to hide away the SQL, but they always fail and worse, they often create performance problems (e.g. the famous n+1 queries in ActiveRecord).

While the end result of queries is a set of objects, SQL simply doesn’t map cleanly to an object oriented query interface.

Further, why abstract the database at all? A common argument is that you’ll be able to switch to a different database easily, but it is a fallacy which either leaves you with the lowest common denominator functionality, or you end up with database specific bits of SQL stuffed into your object oriented query builder anyway.

And how often do you need to replace the database? Even if it becomes necessary, the effort is most likely going to be less than the time you spent struggling with ORM limitations over the preceding years.

As a result of this line of reasoning, I don’t use ORMs in my projects anymore.

Complicated JavaScript build systems like Grunt or Gulp

The problem with these systems is that you need to figure out a large amount of config directives, and you need all sorts of plugins to integrate other tools (Babel, WebPack, JSLint etc.). But all of these plugins consist of non-trivial amounts of code which (a) will inevitably have bugs, (b) is likely to add constraints to your use of the tool, and (c) has to be updated when the underlying tool changes.

In a large number of cases, simple shell scripts, perhaps combined with NPM script support, are all you need. You get to use the tools directly and avoid all the complexity.

Wrapping everything in an Angular directive

When I worked on an Angular.js project, I got a suggestion that I should create a directive for a chart I set up exactly once with JavaScript. I thought it would create unnecessary complexity.

Similarly to build system plugins, writing directives for everything under the sun adds the overhead of mapping custom tags to JavaScript, but to what end? The same thing can often be written more concisely in plain JavaScript.

Directives only make sense when it’s something that adds nice semantics to your markup and it is used a number of times (not just once).

Most aspects of Angular.js

Actually, my experience with Angular.js led me to conclude that most of it is examples of bad abstractions, from its horribly inefficient data binding system to its convoluted constructs (scopes? services? controllers?).

I guess that could be expected of a framework that started as a hack for page mockups, but its ongoing popularity supports my point that programmers easily get carried away with bad abstractions.

BDD with Cucumber-like test specs

This technique requires writing “human readable” specs of this sort:

Scenario: Jeff returns a faulty microwave
    Given Jeff has bought a microwave for $100
    And he has a receipt
    When he returns the microwave
    Then Jeff should be refunded $100

Then you have to use regexes (possibly quite complex) to get the bits of data out of these specs.

The stated benefit is that such specs can be read by non-technical people and constitute useful documentation. However, I haven’t seen it happen in practice, so in fact you just end up writing extremely verbose tests with a rickety collection of regexes strapped on top. All overhead and no value.

In my view, it’s much better to use a test library where you can just write tests as code.

Wrapping everything in a Ruby gem

Back when I was using Ruby on Rails, it was fashionable to wrap JavaScript libraries such as jQuery or Google Maps in Ruby gems. The gem would end up implementing a subset of the JavaScript functionality, and your project would end up with a dependency which could be buggy, incomplete or out of date relative to the underlying JavaScript library.

A similar thing was happening to allow you to emulate bits of SQL not supported by default in migrations, or to support database plugins (such as PostGIS for PostgreSQL). Same problems would result.

My solution was to use JavaScript libraries directly, and to simply write migrations as SQL.

REST

REST as it is practised is, essentially, an attempt to abstract the notion of an API by saying that everything is represented as a resource with a fixed set of operations which can be applied to it.

You end up trying to represent all of your data as a set of objects, and mapping all the possible operations on these objects to a handful of verbs (Create, Retrieve, Update, Delete - CRUD).

In theory, this makes APIs uniform, easy to understand and discoverable, but in practice, not everything is a CRUD resource! It results in awkward mappings, and the amount of bikeshedding discussions over the choice of verbs and URLs produces is unbelievable (see, e.g. discussions on how to implement versioning in a REST API).

Data hiding and constness

In languages like C++, there is an obsession with limiting access to data and methods via public/private/protected and const. Many other languages have similar mechanisms, e.g. JavaScript ES6 now also has let and const to distinguish modifiable and non-modifiable data.

Doing this properly and consistently takes quite a bit of effort (e.g. const in C++ is “infectious” - it spreads through the interfaces as const values are used), and also causes problems, for example when you want to use a class/object in a way not anticipated by the author. Should that really be forbidden? I’ve been stymied by things being private or const quite a few times.

I see value in controlling the mutability of data and providing clear, modularised interfaces, but in my experience, the mechanisms above are too cumbersome. Rather than marking up the type of access to every little bit of code, I just lean towards immutable data and simple, clear interfaces.

Conclusion

I think this quote makes an excellent point:

Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. - attributed to Alan Perlis

Badly chosen abstractions don’t help to reduce complexity.

Because software consists entirely of decisions, there is so much complexity in it, accidental or otherwise, that adding any more through misapplied or incorrect abstractions is a bad idea.

Having realised this, I’ve become more cautious about choosing frameworks, tools and languages. I consider the complexity they add along with the productivity benefits they promise. I look for more fundamental abstractions. I strive for simplicity and minimalism in the systems I build.