I wish that I could take credit for this article, but I can’t. Instead, I have to credit two people: Lasse Koskela and Miško Hevery. I credit Lasse for writing the manuscript that led me to Miško’s article. Look for Lasse’s upcoming book Unit Testing in Javaat Manning Publications’ site. Let me summarise Miško’s article here, then you can read the details at his blog.
Why would I simply summarise someone else’s article on this site? Isn’t that dishonest? Not in this case: Miško wrote essentially the same thing that I say over and over again in my own articles and in my training classes. I consider it unfortunate that he and I have never met nor worked together. We really should. You should read his stuff.
[We might choose to] write tests so that a failure in a predecessor test causes dependent tests not to execute. In object tests, we do this by writing a test method with multiple assertions. – Dale Emery
When multiple assertions check very tightly related things, I don’t mind them, but when they check relatively loosely related things, they act as integrated tests for multiple behaviors that we should consider separating. This is even subtler than the simpler idea of “one action per test”.
If you’d like a Novice algorithm to follow:
Look for any test with multiple assertions.
Move those assertions to the bottom of the test. (If they aren’t already at the bottom, then you might have more than one action per test; this refactoring will help you discover that.)
Extract all the assertions together into a single method.
Now look at the new method. How many different objects does the method use?
If it’s more than one, then you almost certainly have unrelated assertions in the same place, so consider splitting the unrelated assertions into separate methods, then split the test into two so that each test invokes one of the two new separated assertion methods.
If your new assertion method uses only one object, then it might not be so clear whether those assertions are related. You can try this simple test: put all the values you’re checking in your assertions into a single object. Can you think of a good name for it? If yes, then perhaps you’ve just identified a missing abstraction in your system; and if not, then perhaps the assertions have too little to do with each other, in which case, try the trick in the preceding paragraph.
I apologise for not having a good example of this right now. If you point me to one, I’ll analyse it in this space.
When you try to learn a new library at the same time as explore the behavior and design of your application, you slow down more than you think.
When you can’t figure out how to make the new library work for this thing you want to build, you might spend hours fighting, debugging, swearing.
Stop. Write a Learning Test.
Start a new test suite, test class, spec file, whatever you want to call it.
Write a test that checks the things you tried to check earlier with debug statements.
Write a test that has nothing to do with your application and its domain.
Remove unnecessary details from your test.
When this test passes, then you understand what that part of the library does. If it behaves strangely, then you have the perfect test to send to the maintainers of the library.1
The Details
I just did this on a project using the context-free grammar parser treetop. Of course, I hadn’t used treetop before, so I had to learn it at the same time as design the grammar for the language I wanted to parse. I reached the point where I couldn’t write a grammar rule correctly, and spent probably an hour trying to figure out get it to work.2 Fortunately, at that moment, my laptop ran out of power, so I left the coffee shop3 and did the usual thing: I explained the problem to my wife so that I could hear myself doing that. After about 15 minutes away from the problem, I decided to write some Learning Tests.
Summary of what I did
I wrote a Learning Test for a simple case that I thought I already understood well.
I wrote a Learning Test similar to the problem I had to deal with, to make sure I understood that well.
I wrote a Learning Test for the exact case that behaved unexpectedly.
The whole thing took an hour, and I understood the problem well enough to explain it to my wife. She understood it and agreed that it sounded like a mistake in the library.4 I used this Learning Test to open an issue at github. Now I can proceed without pulling my own hair out.
Do you want to see the Learning Tests?
The case I already understood well (single_simple_rule_spec.rb)download
require'treetop'describe"Grammar with a simple rule"dolet(:subject){Treetop.load_from_string(<<GRAMMARgrammar SimpleRule rule word [A-Za-z]+ endendGRAMMAR)}let(:parser){subject.new}it"doesn't match empty string"doparser.parse("").shouldbe_falseendcontext"matching single letter, the match result"dolet(:result){parser.parse("a")}it{result.shouldbe_true}it{result.text_value.should=="a"}it{result.to_s.should_not=="a"}it{result.should_notrespond_to(:word)}endcontext"matching many letters, the match result"dolet(:result){parser.parse("aBcDeF")}it{result.shouldbe_true}it{result.text_value.should=="aBcDeF"}it{result.to_s.should_not=="aBcDeF"}it{result.should_notrespond_to(:word)}endend
The cases I wasn’t sure I understood (single_rule_using_labels_spec.rb)download
require'treetop'describe"Grammar with a simple rule that uses a label"docontext"Labeled subexpression followed by another expression"dolet(:subject){Treetop.load_from_string(<<GRAMMARgrammar SimpleRuleWithLabel rule word letters:[A-Za-z]+ [A-Za-z]* endendGRAMMAR)}let(:parser){subject.new}context"matching many letters, the match result"dolet(:result){parser.parse("aBcDeF")}it{result.shouldrespond_to(:letters)}it{result.letters.text_value.should=="aBcDeF"}endendcontext"Labeled subexpression without another expression"doit"does not represent a valid grammar, even though I think it should"dolambda{Treetop.load_from_string(<<GRAMMARgrammar SimpleRuleWithLabel rule word letters:[A-Za-z]+ endendGRAMMAR)}.shouldraise_error(RuntimeError, /Expected \#/)endit"really should let me refer to the expression as #letters"dopending"https://github.com/nathansobo/treetop/issues/21"endendend
Remember, we don’t call them bugs anymore: we call them “mistakes”. In this case, we can’t call it a “mistake” yet, because we might simply have a difference of opinion or mindset.↩
Note the wording: I already assuming that I have it right and they have it wrong. Bad programmer.↩
They have a Second Cup in Romania. Canadians get why I’d find that weird.↩
Invert the dependency on a Service, moving the new statement up one level in the call stack.1
Repeat for all dependencies on Services until the corresponding new statements arrive at your application’s entry point. The entry point now creates a large object graph of all your Services in its initialise function.
Remove duplication in EntryPoint.initialise():
Instantiate common objects only once, passing them into the necessary constructors, replacing any Singletons with plain objects.
Extract the choice of implementation for each Service interface into a lookup table mapping interface type to implementation type.
Externalise the lookup table to a file, if you like.
Now you have a customised Dependency Injection Container for your application. To go a little farther:
Remove duplication in EntryPoint.initialise() among three applications.
Now you have a generic Dependency Injection Container that probably provides 80% or more of the features you’ll ever need.
I recommend trying this incrementally. Think of the new statements flowing up the call stack, into the entry point, then changing from code into data. Nice, no?
The Details
I don’t have much to add.
I hope this helps to demystify dependency injection containers. To read about the technique of injecting dependencies, I refer you to one of my articles and then a trusty web search.
This technique applies the Dependency Inversion Principle repeatedly to move the choice of implementation for an interface up the call stack. This way, concrete things depend on abstract things.
Removing duplication in the entry point respects the principle Abstractions in Code, Details in Data, but it does rely on reflection, which can cause some problems. All the better not to scatter this reflection throughout the code causing a serious cohesion problem. Using reflection like this, all in one place, helps balance using a powerful technique with a design that everyone can understand.
After running a handful of Legacy Code Retreats I’ve had the chance to try a number of exercises related to rescuing legacy code. I’d like to share one that has met with very positive reactions from people: extracting pure functions. I use the term pure function to describe a function that only operates on local variables and parameters, but does not touch any state outside the function. No object fields, no global data. I have very little experience in functional programming, so I hope I use the term in a standard way.
No doubt you already know about the Composed Method design pattern, which we commonly use to help understand code as we read it. Most commonly you encounter a block of code with a comment that describes what that code does. You then extract that block of code into a method whose name matches the comment. I’ve used this technique for well on a decade to raise the level of abstraction in code, help code document itself, and eliminate misleading comments. While introducing these methods helps me read the code, it sometimes hides the tangle of dependencies that makes separating responsibilities so difficult. For this reason, I recommend trying to extract pure functions instead.
To introduce a pure function, start with a block of code and extract a method for it. Now look at all the fields and global variables that the method reads and introduce each one as a parameter to the method. Now look at all the fields and global variables that the method writes to and turn these into return values. Where the old code invokes the new function, assign each new return value to the corresponding field or global variable. You know you’ve done this correctly if, of course, the overall behhavior of the program hasn’t changed and you can mark the new function as static or whatever your language calls a class-level function.
In some languages, like Java, you’ll have to introduce a new little class to allow you to return multiple values from the new function. If you don’t want to do that right away, then return a Map of return values. Once you see that you need to return similar Maps from different functions, consider replacing those Maps with a new class. Perhaps that new class will attract some code!
When I’ve used this technique, two key things have happened: either I’ve noticed duplication in the parameter lists of the new functions or introducing a parameter has changed the behavior of the system. In the first case, I introduce Parameter Objects for the duplicated parameters, which then probably attract code and become useful domain objects. In the second case, I’ve detected temporal coupling, which requires me to separate the function into two smaller ones so that some output from the first becomes input to the second. This helps me uncover cohesion problems, usually of the type of different things written too close together.
I realise that an example would help right about now, but I would rather create some screencasts than write out examples in code, but I don’t know when exactly I intend to do that. I wanted to share this idea with you without waiting for the energy to put together a suitable screencast.
I invite you to try introducing pure functions into some legacy code and practising the technique as a kata. Get used to the various maneuvers, like introducing Parameter Objects or Return Value Objects or solving temporal coupling by splitting the function in two. It sounds crazy, but I’d like to try a Legacy Code Retreat where we practise only this technique all day. I don’t know whether anyone else would find it valuable enough to try it together for an entire day, over and over and over.
One of the people who watched the 2009 version of Integrated Tests Are A Scam recently asked me: I wonder how you deal with updates of third-party libraries. How do you detect subtle API or behaviour changes? At the moment, I write state-based integration tests for these cases and I wonder whether this isn’t a sensible use of integration tests.
I write Learning Tests to discover how a third-party library works. I isolate myself from the third-party library through a layer of interfaces and adapter classes that evolve from the common ways I use the third-party library. I call this the “Pattern of Usage API”, as it represents the way my application uses that third-party library. Now my application uses the third-party library through a layer of interfaces, which means that I can introduce Contract Tests on those interfaces. These Contract Tests effectively describe the subset of the third-party library’s behavior on which I depend.
Now when I upgrade the third-party library, I run the Contract Tests against my adapters to that library. Test failures usually indicate a backwards incompatible change in the third-party library. (Sometimes they indicate a trivial difference in the API which requires a trivial fix, such as an API call having been renamed or something.)
Of course, this only helps me detect behavior changes related to computing answers, and not related to responsiveness, reliability, scalability, and so on. For that, I’ll always need system tests.
The Contract Tests are almost always state-based integration tests. I simply limit these to the implementation of Pattern of Usage API and don’t let it leak farther up the call stack. At some point you have to integrated with the Outside World. I simply teach people to look to make that integration thinner.
As I learn to become a better programmer, I continue to follow the four elements of simple design. Of these, I have observed that “remove duplication” helps me discover an appropriate structure for the thing I want to build. In my classes, we practise removing duplication a lot, in part because most people understand the rule “remove duplication” well enough to find it useful. After the first few weeks of practice, however, programmers following this rule observe varying results: sometimes removing the duplication makes the design much clearer, and sometimes it muddies the water. At this stage, she usually looks for more detailed rules to help decide when to remove duplication and when to leave it alone. I offer some simple rules and guidelines for this situation.
When I can’t decide whether to remove a certain bit of duplication I’ve found, I fall back on two rules:
Remove duplication only after you see three copies.
If you don’t know how to remove this particular kind of duplication, then write more tests. Either you need more examples to see the pattern, or more examples will show a different, better pattern.
I also remember two guidelines:
Don’t be afraid to remove duplication by introducing a method or class or interface with a stupid name. Remember that you can always improve the name later.
If you sense duplication, but don’t really see it, or can’t explain it to others, then make the surrounding names more precise; then maybe you will see the duplication.
I just wanted to know whether anyone has tried nesting the concrete tests inside the abstract test like this, and if you have, then how do/did you like it?
I found an example of contract tests in Arlo Belshee’s series of articles about mock-free testing. I must strongly, strongly point out that Arlo uses the term “mock” narrowly to refer to runtime- or bytecode-generated proxies that intercept interface method invocations and provide the ability to set method expectations, in the way that JMock and NMock do. He does not mean the generic term “mock”, where he uses the term “test double” instead. I thank him for that.
If you click here you’ll see an almost textbook example of a contract test: that is, a test class that can run the same set of tests for two different implementations of the same interface. I would change only one thing: I’d extract the tests into an abstract superclass—something I otherwise hate to do—and pull the declaration of the method MakeTestSubject() up there, leaving two subclasses, one for the real file system and one for the simulated one. “YAGNI,” you say, and I agree, but I prefer the symmetry of the abstract superclass design to the asymmetry of having one class inherit from the other. I find it easier to grok quickly.
Either way, I feel good seeing contract tests out in the wild. I’m not so crazy after all.
Adding behavior confidently involves having fewer parts to change (low duplication), knowing which ones to change (high cohesion), ease of changing just the part you want to change (low coupling), and understanding how you’ve changed it (strong tests).
Adding behavior requires breaking an existing assumption.
In a well-factored design, we can easily find the one place we have made that assumption. (Otherwise, why bother refactoring?)
First, make room for the new code, then add it.
To make room for the new code, extract the existing code into a method whose name describes the generalisation we want to make, or the idea we want to introduce.
By making room for the new code, we make that code easier to reuse by reducing its dependence on its surrounding context.
We all want to add behavior to a system confidently, and I have observed that my confidence in adding behavior depends on two factors:
I know where to add code.
I understand the behavior of the code I am adding.
I use test-driven development as the main technique for handling the second of these two factors, but what about the first? I have uncovered a technique that I both use and teach, and I’d like to share that with you. I call this “making room for the new code”, naming it for a phrase I vaguely remember reading in one of the Grand Old XP books. (Did Kent Beck or Ron Jeffries write it? I can’t remember.) This technique helps me quickly find a reasonable first-draft place in the code base to put new code. After I have put the new code in place, and I feel confident that it does what I expect, I then use the Four Elements of Simple Design to guide me in refactoring to improve the design.
A premise
I start with the premise that adding behavior means breaking an assumption. By this I mean that whenever we add code to a system in order to extend its behavior, we have to falsify at least one assumption we’ve previously made. For example:
In a payroll system, in order to support a second cheque printer, we likely have to break the assumption that there is only one cheque format.
In a point of sale system, in order to support separate cash and card payment reports, we likely have to break the assumption that all “we made a sale” events look the same.
In a mobile phone monitoring system, in order to support billing by the second, we likely have to break the assumption that we only have to count the number of minutes a call lasted.
Some of these seem obvious, and others less so, and it bears emphasising that the specific assumption or assumptions we break depends heavily on what we’ve built so far and the way we articulate the soon-to-be-added behavior. Even so, I conjecture that for every behavior we want to add to a system, we can identify a non-empty list of assumptions that we need to break.
The technique
Identify an assumption that the new behavior needs to break.
Find the code that implements that assumption.
Extract that code into a method whose name represents the generalisation you’re about to make.
Enhance the extracted method to include the generalisation.
The less duplication you have in the system, the better this works, because duplicate code makes it difficult to find all the code that implements the assumption in question. Similarly, the more appropriate the names in your system, the better this works, because unsuitable names make it difficult to know which code implements the assumption in question, as opposed to something unrelated. You’ll notice that these points relate both to the Four Elements of Simple Design and to the core concepts of Coupling and Cohesion.
Yes, yes…
An example
We are building a point of sale system, and we’ve just decided to implement sales tax. I live in PEI, Canada, where not only do we exclude sales tax from the price, we have two sales taxes, and we charge the second one on top of the first one. For example:
A $125 item that attracts both GST (the “Goods and Services” tax) and PST (provincial sales tax) costs a total of $144.38. GST at 5% costs $6.25, then PST at 10% of $131.25 (= $125 + $6.25) costs $13.13. The total is $125 + $6.25 + $13.13 = $144.38.
Notice that this example implies that GST or PST might or might not apply to a given product, so even before we identify the old assumption to break, we need to note the new assumption we’ll make: we assume that all products attract both GST and PST. Our customers won’t like it, but the tax authorities will love it, and only they have the power to treat us guilty until proven innocent.
Our code has a class like this in it.
What do we assume now that we can’t allow ourselves to assume any longer? The sale total should increment by the (net) price of the item we sell. By net price here, I refer to the pre-tax price, because of course, until now, our system has no notion of “tax”. Fortunately, because we’ve ruthlessly removed duplication so far, computing the running total of the sale requires only this line of code. We can pretty safely apply the technique of this article right here. To do this, we extract the assumption into a new method whose name represents the generalisation we’re about to make. In this case, we don’t want to increase the sale total by the product’s price, but rather by its cost, which includes any additional charges beyond net price. So, we introduce a method for accumulating the scanned product’s cost.
This creates space for the new code. We test-drive the new code, and end up with this delightful monstrosity.
It looks ugly, but it works. In addition to looking ugly, this method has Feature Envy. Specifically, the calculation part of #accumulate_cost only talks to product, and so it can move onto the class Product, leaving only the accumulating left behind. You could also say that this method had two responsibilities, so I separated them, then notice the feature envy in one of them, then moved it. I can almost always take smaller steps.
Notice the context independence we’ve achieved with the method Product#cost. We can now refine the notion of “cost” freely without worrying about how someone will use that information. We have an easy-to-find, easy-to-extend part of the system where we can add behavior for determining which products attract which taxes, supporting multiple tax calculation policies, and even including shipping, handling and restocking fees. Now we can really add future behavior with confidence.