Start The World's Best Introduction to TDD... free!

Integrated Tests Are a Scam Comments

I write contract tests in order to help with the drifting test doubles problem: how do I know that my stubs and expectations and spies and mocks behave “like the real thing”? I do this because contract tests run in less time than integrated tests, I feel more confident with passing contract tests than I do with integrated tests, and they help me document my understanding of the contracts of the code that I use.

Unfortunately, it remains possible to write a contract test that contradicts a collaboration test. It remains possible to change a stub or an expectation or a spy or a mock and not notice that the new behavior doesn’t match the contract tests. It seems that this doesn’t fix the drifting test doubles problem after all. So what do I do? Who tests the contract tests?

The Two Parts of a Contract

I don’t know of any automated system for verifying that collaboration tests and contract tests correspond to each other correctly. At least not the truly interesting parts. Let me describe the two parts of a contract.

Remember that the contract of an interface is just the union of the contracts of all its methods. I can replace “interface” with “module” and “method” with “function” and nothing changes, so I’ll continue to use the object-oriented terms without losing any information. It also means that I’ll refer to contracts without specifying that I mean a single method, a group of methods, or an entire interface, because the difference almost never matters.

A contract has two parts: its syntax and its semantics.

The syntax of a contract refers to the method signatures: the names, parameter types, return value type, and any exceptions it might throw. I think of the syntax as the “shape” of the interface. We check the syntax of a contract in order to check that the pieces will fit together. In a language that checks types at compile time (Java, C#, C, C++), then type checking will pass and the code will compile. In a language that checks types at run time (Ruby, Python, Smalltalk, Javascript), then I do not experience “method missing” kinds of problems. When I feel confident that components agree on the syntax of a contract, I feel confident that those components will talk to each other correctly, even if the conversation they have might not make sense and might do the wrong thing.

The semantics of a contract refers to the rules of behavior: how inputs map to outputs, which side-effects are expected or permitted, and when the method throws which type of exception. I think of the semantics as the “working agreements” between client and supplier. We check the semantics of a contract in order to check that the pieces will work together. When I feel confident that components agree on the semantics of a contract, I feel confident that those components will behave sensibly together. In this situation, clients can freely choose how to use suppliers to solve a specific domain problem without worrying about whether the suppliers might do something unexpected, even if we can’t yet conclude that the clients are trying to solve the right domain problem.

Collaboration Tests Rely On Clear Contracts

When I write collaboration tests, I make assumptions about the contracts of the suppliers that the subject under test uses. I need to check these assumptions somehow. I write contract tests that document the contracts of methods so that I can confidently write collaboration tests for code that uses those methods. The drifting test doubles problem happens exactly when programmers write collaboration tests without a clear understanding of the contracts of the subject under test’s collaborators. This lack of clarity leaves them searching for a way to bridge the gap. Many of them turn to integrated tests to check the assumptions in their collaboration tests. I recommend against using integrated tests this way. This way lies the integrated tests scam.

How Not To Paint A Wall

Imagine that you need to paint a wall. Not a small wall: a few meters tall and several meters wide. Now imagine how you might paint that wall, even if you don’t consider yourself a painter. You probably use some combination of paint rollers for the main part of the wall and paint brushes for the corners. You probably put masking tape along the edges of the wall in order to help yourself paint only the parts of the wall that you want to paint. You probably do the same around the electrical outlets, any doorways, or any other part of the wall that you don’t want to paint. In other words, you use precision tools to paint the wall when you need precision and you use less-precise tools to paint the larger portions of the wall where you don’t have to worry as much about precision. All this probably sounds quite sensible to you.

Now imagine your friend who has a different approach. They line up a bunch of cans of paint three meters away from the wall. They stand there, staring at the wall a moment. Next, they pick up a can and throw paint at the wall. Some of the paint sticks to the wall. Maybe even a lot of the paint sticks to the wall. Maybe your friend is world paint-throwing champion and manages to get most of the paint to stick to mostly the right parts of the wall. So far, so good, if a little unconventional. But now what about the corners? What about avoiding the electrical outlets and the windows? How do they get paint in the very top-left corner of the wall? From where they stand, that corner is about six meters away. That’s a long way to throw paint so accurately and precisely. They keep picking up buckets of paint and throwing them at the wall. The central parts of the wall end up with many coats of paint (how many coats? nobody knows) and the corners with very little. No matter how long they keep throwing paint at the wall, the corners never seem to get any paint, and they might as well stop. The whole thing seems very haphazard. At some point, you probably want to yell at your friend to pick up a brush to paint the corners!

I feel exactly this way about using integrated tests to check your understanding of the contracts of the collaborators in your system. By putting all the components together and running them in a single test, you’re throwing tests at the system, hoping to cover the whole thing. You’re also covering certain parts of the system much more than you need to and missing other parts entirely. Even if you manage to cover 30% to 70% of the system relatively quickly this way, I can’t tell what you’ve covered and what you haven’t. I also see the risk of significant duplication in your tests. As long as you insist on throwing tests at the system, you’ll miss significant parts of the system and you won’t really know which parts you’ve missed until a customer reports a problem. Pick up a brush and paint the corners already!

I use contract tests to paint the wall. They help me cover the wall evenly and completely.

How Collaboration Tests And Contract Tests Check Each Other

This leaves the original question of this article: if we don’t use integrated tests to check the assumptions in our collaboration tests, and if contract tests don’t solve the drifting problem, then what else do we need?

In short, we can use software to address the drifting problem for the syntax of contracts, but not for the semantics. You can already find libraries that help detect interface syntax changes in languages without compile-time type checking. These include chado for Javascript and Pact for Ruby. These libraries effectively provide a form of type checking to help alert the programmer to potential incompatibilities between collaboration tests and the syntax of the contract of the collaborators. We can also add type checking to languages that don’t have it built in, which explains why we have Typescript. The semantics of a contract, however, require more complicated and varied checks, so I don’t know of any software that can help write those checks. A human needs to do that.

I maintain the correspondence between collaboration tests and contract semantics tests by hand. I don’t know of any automated way to do this. Building software to help with this task sounds like a suitable Ph. D. project—indeed, a few people have told me that they intended to do exactly that, but I haven’t seen any results yet.

The Rules I Follow

Rather than throw paint at the wall, let me describe what I do, which corresponds to painting the corners of the wall with a brush.

First, let me share the key properties of collaboration and contract tests:

  • A stub in a collaboration test corresponds to an expected result in a contract test.
  • An expectation in a collaboration test corresponds to an action in a contract test.
  • These correspondences apply in both directions.

From this, we can extract some rules:

  • When I write a stub in a collaboration test, then I remind myself to also write a contract test where the stubbed return value becomes the expected result of the contract test.
  • When I write a contract test with assert_equals() in it, then I can safely stub that method under test to return that value in a collaboration test.
  • When I write an expectation (a “mock”) for certain parameters in a collaboration test, then I remind myself to also write a contract test that describes what happens when we invoke that method with those parameters.
  • When I write a contract test with the action foo(a, b, c) in it, then I can safely write a collaboration test that expects (“mocks”) foo(a, b, c).

I should mention that the method parameters don’t need to match exactly, but they do need to come from the same class of equivalence. If you feel unsure how to start, then start by making the method parameters match the contract test values exactly.

I follow these rules rather than write a bunch of integrated tests, hoping that those tests collectively check these conditions. Although following these rules requires more care and attention, their precision better assures me that my collaboration tests and contract tests correspond correctly. This correct correpsondence better assures me that I have agreement between clients and suppliers regarding the interface between them. It helps with both the syntax and semantics of my interfaces. When I finally put things together, they just work.

What About The Domain Problem?

This approach tells me that my code respects the contracts of the interfaces, which means that when I put things together, they will fit together (syntax) and they will make sense together (semantics). This approach does not guarantee that the resulting system solves the intended domain problem. For that, I need customer tests, and I will probably write most of those customer tests as end-to-end tests.

Wait. I know what you’re thinking.

I often use a smaller number of customer-focused integrated tests to help the customer feel confident that we programmers have understood what they need, but I definitely do not rely on an exhaustive set of integrated tests to help the programmers feel confident that the system “hangs together” correctly. On the contrary, because I write collaboration and contract tests, my customer tests do not fail due to bad code, but rather due to differences of understanding about the business problem that we need to solve. This makes those customer tests much more effective as tools for guiding the development of a system. It also helps the customer see those tests as providing evidence of progress. It helps the customer feel confident in those tests. It helps them believe and trust those tests. That makes the project go more smoothly.

My Fellow Programmers Won’t Do This

I understand. Not every programmer works with the same level of discipline in the same facets of their work as I do. Even if they did, not every programmer believes that this approach works better than writing integrated tests. Some programmers insist on painting a wall from three meters back. I understand why they might prefer it (it seems easier), but I’ll never understand why they consider it more effective than picking up a brush. If you decide that you prefer to paint the wall with rollers and brushes and your fellow programmers try to stop you, then you have an entirely different problem. Maybe I can help with that, too.