Test code smells catalogue
2024-08-05

Recently, the topic of writing good tests came up at work, and I had the idea that it would be great to have a catalogue of code smells for test code, just like the lists we have for regular code. It so happens that most of the regular code smells don’t really apply to test code.

I looked around, thinking that surely somebody’s catalogued these somewhere. I found this list of unit test smells and another similar list, as well as a smattering of papers like this one, and that was it!

None of these covered what I consider all of the things I consider important in a good test suite, so I decided to have a go at writing my own list.

Test code smells

Coupling to implementation

Tests shouldn’t have any idea of the implementation details of code under test. They should be checking its return values or effects, not that particular functions were called during the execution of the code.

Excessive mocking

Mocking should be viewed as a necessary evil rather than the go-to tool. Your tests should ideally be checking the outcomes of the code under test (most likely things like return values or rows written to DB). This should be done without relying on mocks whenever possible.

Complex, abstracted setup code

This includes things like highly abstracted factories.

Flaky test

The test suite should pass or fail consistently. It shouldn’t be dependent on the execution order of tests, on whether you’re running locally or on CI, on timing issues, on external service availability etc.

Overlapping test

A test that verifies functionality covered by other tests (a typical example would be testing library code which already has its own test suite).

Conditional test

Tests should always verify the same behaviour in the same way, so they shouldn’t have conditional logic which alters whether and what is asserted.

Unfocused test

Tests more than one piece of functionality (note that having multiple asserts in a test isn’t an issue in itself - eg. you could do that to check different aspects of a return value, or to check both a return value and the side effect of executing code)

Empty test

No assertions at all! Test runners don’t always warn about this, so it’s something to watch out for. Empty tests can give a false impression that a piece of functionality is being tested when it is not (eg. when reading through the test descriptions).

Inactive or commented out test

Such tests are either obsolete and unnecessary or broken (or flaky) but actually necessary. Either way, it’s best to figure out what happened and either fix them or remove them entirely.

Hourglass test

Introduces delay in order to test something that doesn’t happen synchronously. Problematic both because timeout is likely to be unreliable, and also because it makes the test slower

Handbrake test

A slow test acts as a handbrake on the developer’s feedback loop. We want tests to be executed often and provide a fast feedback loop. Test runners often provide tools to identify such tests.

Non-isolated test

It’s bad when tests rely on something else happening prior, such as global state being set in a particular way.

Brittle test

A test that fails when refactoring the code under test (in a way that shouldn’t affect the test).

Cryptic test

This kind of test makes it hard to figure out what’s being tested. It should be clear from the description and test code what exactly is under test and what is being asserted.

Tautological test

A test that doesn’t really test anything (but pretends to). For example, it might be setting up a mock and then asserting that the the result is equal to the value returned by the mock.

Banal test

A test that shouldn’t exist because it doesn’t add any value. For example, there is no point writing a test that asserts the value of a constant (and such test would necessarily be coupled to the implementation - in this case is the value of the constant - which is another test code smell).

Test-aware code

This one relates to code outside the test suite; doing something different when the code is executed in the context of tests is probably not a good idea.

NOT smells

Some people consider the following to be smells but they are not (at least not always!):

  • Multiple asserts in a test. It’s perfectly fine to have multiple asserts, as long as they’re related (otherwise you might have an Unfocused Test on your hands).
  • Testing private functions/methods. It is sometimes useful to test private functions. The idea that needing to test a private function is a signal to refactor it into a separate module is misguided and likely to lead to worse software design
  • Non-DRY test code. There is benefit in de-duplicating some test code (eg. the construction of fixtures) but in general I would advise a lot more tolerance for repetition. The key is for test code to be straightforward, easy to modify, and contain minimal logic. DRYing it up often works against these goals.
  • <100% code coverage. Code coverage is a fraught metric. It’s far too easy for it to turn into a checkbox exercise, it encourages banal tests, and it can give a false sense of confidence in the test suite. There are also different ways of measuring coverage (eg. branch coverage).

This post would really benefit from examples - maybe I’ll add them at some point.