The Code Whisperer

Your code is trying to tell you something.

What Your Tests Don’t Need to Know Will Hurt You

| Comments

I just finished reading Brian Marick’s article, “Mocks, the removal of test detail, and dynamically-typed languages”, which focused me on a design technique I use heavily: awareness of irrelevant details in tests. Referring back to the four elements of simple design, I focus on irrelevant details in tests in order to help me maximize clarity in production code. Please allow me to sketch the ideas here.

I use the term irrelevant detail to refer to any detail that does not contribute directly to the correctness of the behavior I’ve chosen to check. I know when I’ve bumped into an irrelevant detail: while writing the code that allows me to check something, I start typing something, then my shoulders slump and I exhale with annoyance. I think, I shouldn’t have to write this, because it has nothing to do with what I want to check. Brian’s example illustrates this perfectly:

The random methods save a good deal of setup by defaulting unmentioned parameters and by hiding the fact that Reservations have_many Groups, Groups have_many Uses, and each Use has an Animal and a Procedure. But they still distract the eye with irrelevant information. For example, the controller method we’re writing really cares nothing for the existence of Reservations or Procedures–but the test has to mention them.

I maintain one Ruby code base, which runs my weblog at jbrains.ca and I often find myself creating a Posting object in my tests, and often the content of that posting doesn’t matter.

I use a simple rule to help me identify irrelevant data in tests.

If I can change a value without changing the result of the behavior I want to check, then I call that **irrelevant data** for this test.

I first identify data as irrelevant and mark it that way. For string values, I include the word “irrelevant” in the string. Some people use words like “dummy” for this purpose, but I prefer “irrelevant”, because I don’t want others to confuse an irrelevant detail for a type of stub. I used to simply choose random values for irrelevant data, because I had read that good testing principles included varying data from test to test. Now I feel that choosing especially meaningful-looking values for irrelevant data obscures the purpose of a test.

Irrelevant data can hurt in a number of ways.

  • You might “play it safe” and duplicate the irrelevant data in more tests, leading to excess investment in maintaining your tests.
  • You might “play it safe” and check the irrelevant data in your test, leading to misleading failures as you change unrelated behavior in the system.
  • Changing the data might affect the result (pass/fail) of your test even though the data does not relate conceptually to the behavior you want to check.

I estimate that I have experienced more pain from this last effect than from all other effects combined.

Once I have identified and labeled irrelevant data, I look for ways to eliminate it. Moving irrelevant data into fixtures or test data builders, hides the symptoms without solving the problem. Ironically, moving irrelevant data into fixtures or test data builders merely makes that data easier to spread to more tests to which they do not relate, creating more, not less, potential for unhealthy dependency. It masks the very kind of observation that Brian made in his code base. Sometimes, I write a test and notice that not both the value and the type of a piece of data doesn’t matter. I find this happens a lot when I test-drive controller behavior that takes data from a model and hands it directly to a view.

I notice here that [1,2,3] represents “any non-empty array”. While writing this sentence, I wondered if I could change this to Object.new, so I tried that, and the test passed. In this case, while the actual data and type don’t matter, I need the model to return anything but nil to ensure that the view has something and that that something came from the model. With this realization, I rewrote the test.

Instead of should == I use should equal() here, which translates to assertSame() in other languages, to emphasize that I expect the controller to take whatever it receives from the model and hand it to the view. When I want to check what the view does with it, I’ll send valid data to the view and check it in isolation. When I want to check what the model returns, I’ll look at what the view expects and check that the model can provide it. The controller need not bother itself with the details.

Compare this with the corresponding integration test for checking the controller, which would require knowing all these otherwise irrelevant things and setting all this otherwise irrelevant data…

  • How to create a Posting in the database in the “queued for publication” state, which means setting the queued_at attribute to a time in the past, but the published_at attribute to nil.
  • How to instantiate a valid Posting, which requires title and content.
  • Which view the controller renders when it displays the publication queue.
  • Which attributes of the Posting the view expects, to ensure that I populate them with valid, if meaningless, data.

I suppose I could think of more irrelevant behavior and data, but this will do. How does this irrelevant data and behavior hurt specifically in this situation?

  • I have this note in my problems list: “A Posting can be both queued and published. I want to remove this possibility, rather than forcing Posting to maintain the invariant.” When I fix this problem, my controller test will change, even though I won’t have changed the controller.
  • If I added a mandatory attribute to Posting, then my controller test would change, even though I wouldn’t have changed the controller.
  • If I removed a mandatory attribute from Posting, then my controller test would have even more irrelevant data, meaning more accidental ways to go wrong.
  • If I changed the view that the :new action renders, then my controller test would change, even though I wouldn’t have changed the controller behavior that my test checks.
  • If I changed which attributes of Posting the view processed, then my controller test would either need to change or contain even more irrelevant details than it did before.

When I write that integrated tests are a scam, I include as a reason that writing integration tests encourages including irrelevant details in tests, and by now I hope I’ve shown some ways that that hurts.All this leaves me with a few guiding principles to use when writing tests.

A test should make it clear how its expected output relates to its input.

I can use this guiding principle to develop some Novice rules:

Mark all irrelevant data in a test by extracting the values to variables whose name includes the word “irrelevant”. Hide all irrelevant data using techniques like the [Test Data Builder pattern](http://bit.ly/2oJaTU). Remember to remove duplication after hiding irrelevant data. Call attention to all input values that have a direct bearing on computing the expected result in a test.

Beyond these novice rules, this guiding principle helps in two key ways. Certainly, it encourages me to write shorter, more focused tests, which tend to have all the properties I want in a test, but more importantly, it leads me to a higher guiding principle when writing tests.

When you find it difficult to write a concise, focused, isolated test, the production code has unhealthy dependencies to loosen, invert, or break.

Here, “concise” means having no irrelevant details; “focused” means failing for only one reason; “isolated” means executing without side effects on other tests.

I loosen dependencies by applying the Generalize Declared Type refactoring. I most commonly invert dependencies by extracting an interface and using constructor injection. I most commonly break dependencies by either extracting a memento or the return value of a method and depending on that, rather than the original object.

This guiding principle leads me in the direction of The Fundamental Theory of Test-Driven Development.

**Assuming we know what we want to check and understand the mechanics of how to check it, difficulty checking behavior means unhealthy dependencies in the design.**

Who Tests the Contract Tests?

| Comments

I’ve already written about contract tests here and here, but I haven’t written specifically about the question that contract tests raise. I’ll do that here.

How do I match the collaboration tests to the contract tests? In other words, what stops me from stubbing foo() a certain way, but writing tests for foo() that expect different behavior? What if I change the tests for foo(), but forget to change the corresponding stubs for foo()? Haven’t I just introduced an integration defect without any failing tests?

Yes you have, but you can do something about it. I have vehemently reinforced the notion that collaboration tests and contract tests don’t replace thinking, but rather help make my thinking about tests more systematic. I’ll describe what I mean by that in the next few paragraphs.

Before I started writing contract tests, I would typically write decent collaboration tests, but frequently encounter the case where I found a mistake even though all my tests passed. I would have a case where either I missed an important collaboration test or I stubbed a method in a way that no test checked, or I mocked a method in a way that no test tried to use it. Unfortunately, I didn’t know how to categorize those mistakes at the time, so I had two basic recourses: write collaboration tests with more care or write integration tests. I tried both, and neither alone helped. Both together helped a little, but not much. When I wondered how to solve this problem once and for all, I hit on contract tests. They helped, and considerably, but I still encountered problems.

Good news, though: I could see the problems more carefully. When I made an integration mistake, I could categorize it one of two ways:

  1. I stubbed foo() to return 23 even though foo() would never return 23, or foo() would never return 23 in that situation.
  2. I mocked foo() to expect parameters a and b even though I never checked to see what happens when I invoke foo(a, b).

I could limit this mistake by checking for these two mistakes. I created a simple system. Whenever I stubbed a method to return 23, I’d write a contract test for that method that expected the result 23. If I couldn’t do that, then I didn’t understand the method’s contract well enough yet, and I stopped to figure it out. Whenever I mocked a method to accept parameters a and b, I’d write a contract test for that method taking a and b as parameters. If I couldn’t do that, then I didn’t understand the method’s contract well enough yet, and I stopped to figure it out. This system didn’t solve the problem of mismatched tests, but it gave me a repeatable method for reducing the risk of mismatched tests.

Even better, when I made an integration mistake, I knew what kinds of mistakes to look for:

  1. I missed a collaboration test.
  2. I missed a contract test.
  3. I missed a contract test corresponding to the way I stubbed a method.
  4. I missed a contract test corresponding to the way I mocked a method.

I imagine one could automate these checks, and I think that would make a splendid Ph. D. project for some eager young mind. I don’t believe I’ll do it.

So let me answer the question at least: who tests the contract tests? The collaboration tests and the contract tests check each other, if only you stop to listen to them.

In closing, I refer you to this article in which I describe how this technique could have prevented the Mars rover from prematurely deploying its parachute.

</p>

Surely the Mars Rover Needed Integrated Tests! (Maybe Not?)

| Comments

The 30-second version

  • I can find integration problems of basic correctness without integrated tests using a simple double-entry book-keeping approach.
  • I use test doubles to articulate the assumptions I make in my design; these become tests for the next layer.
  • You don’t need to do TDD to use this technique: you can use it when adding tests to existing or legacy code.
  • This technique makes rescuing legacy code easier, since you can write more tests without having to deploy the system in its usual runtime environment.
  • Most programmers write good collaboration tests, but forget to write contract tests, and that creates integration problems, so write contract tests!

The Details

“Guest” commented about my Agile 2009 tutorial, Integration Tests Are A Scam. “Guest” wrote this:

A Mars rover mission failed because of a lack of integrated tests. The parachute system was successfully tested. The system that detaches the parachute after the landing was successfully – but independently – tested. On Mars when the parachute successfully opened the deceleration “jerked” the lander, then the detachment system interpreted the jerking as a landing and successfully detached the parachute. Oops. Integration tests may be costly but they are absolutely necessary.

I don’t doubt the necessity of integrated tests. I depend on them to solve difficult system-level problems. By contrast, I routinely see teams using them to detect unexpected consequences, and I don’t think we need them for that purpose. I prefer to use them to confirm an uneasy feeling that an unintended consequence lurks.

Let’s consider a clean implementation of the situation my commenter describes. I see this design, comprising the lander, the parachute, the detachment system, an accelerometer and an altimeter. A controller connects all these things together. Let’s look at the “code”, which I’ve written in a fantasy language that looks a little like Java/C# and a little like Ruby.

Ashley Moran has posted a working Ruby version of this example. If you speak Ruby, then I highly recommend looking at that example after you’ve read this.}

Controller.initialize() {
  parachute = Parachute.new(lander)
  detachment_system = DetachmentSystem.new(parachute)
  accelerometer = Accelerometer.new()
  lander = Lander.new(accelerometer, Altimeter.new())
  accelerometer.add_observer(detachment_system)
}
          
Parachute {
  needs a lander
  
  open() {
    lander.decelerate()
  }
  
  detach() {
    if (lander.has_landed == false)
      raise "You broke the lander, idiot."
  }
}
                        
AccelerationObserver is a role {
  handle_acceleration_report(acceleration) {
    raise "Subclass responsibility"
  }
}
                        
DetachmentSystem acts as AccelerationObserver {
  needs a parachute
  
  handle_acceleration_report(acceleration) {}
    if (acceleration <= -50.ms2) {
      parachute.detach()
    }
  }
}
 
Accelerometer acts as Observable {
  manages many acceleration_observers
                                    
  report_acceleration(acceleration) {
    acceleration_observers.each() {
      each.handle_acceleration_report(acceleration)
    }
  }
}
 
Lander {
  needs an accelerometer
  needs an altimeter
  
  decelerate() {
    // I know how much to decelerate by
    accelerometer.report_acceleration(how_much)
  }
}
 
view raw This Gist brought to you by GitHub.

I need to test what happens when I open the parachute. The lander should decelerate.

testOpenParachute() {
  parachute = Parachute.new(lander = mock(Lander))
  lander.expects().decelerate()
  
  parachute.open()
}
 
view raw This Gist brought to you by GitHub.

Since this test expects the lander to decelerate, I have to test that. When the lander decelerates, the accelerometer should report its deceleration.

testLanderDecelerates() {
  accelerometer = mock(Accelerometer)
  lander = Lander.new(accelerometer)
  accelerometer.expects().report_acceleration(-50.ms2)
  
  lander.decelerate()
}
 
view raw This Gist brought to you by GitHub.

Since this test shows that the accelerometer can report acceleration of −50 m/s2, I have to test that.

testAccelerometerCanReportRapidAcceleration() {
  accelerometer = Accelerometer.new()
  accelerometer.add_observer(observer = mock(AccelerationObserver))
  observer.expects().handle_acceleration_report(-50.ms2)
  
  accelerometer.report_acceleration(-50.ms2)
}
 
view raw This Gist brought to you by GitHub.

Since this test shows that any acceleration observer must be prepared to handle an acceleration report of −50 m/s2, I have to test that.

First, the general test for the contract of the interface:

AccelerationObserverTest {
  testAccelerationObserverCanHandleRapidAcceleration() {
    observer = create_acceleration_observer() // subclass responsibility
    this_block {
      observer.handle_acceleration_report(-50.ms2)
    }.should execute_without_incident
  }
}
 
view raw This Gist brought to you by GitHub.

Now the test for DetachmentSystem, which acts as an AccelerationObserver. What should it do if it detects such sudden deceleration? It should detach the parachute.

DetachmentSystemTest extends AccelerationObserverTest {
  // I inherit testAccelerationObserverCanHandleRapidAcceleration()
  
  create_acceleration_observer() {
    DetachmentSystem.new(parachute = mock(Parachute))
    parachute.expects().detach()
  }
}
 
view raw This Gist brought to you by GitHub.

You might find that easier to read this way, by inlining the method create_acceleration_observer():

DetachmentSystemTest {
  testRespondsToRapidAcceleration() {
    detachment_system = DetachmentSystem.new(parachute = mock(Parachute))
    parachute.expects().detach()
    this_block {
      detachment_system.handle_acceleration_report(-50.ms2)
    }.should execute_without_incident
  }
}
 
view raw This Gist brought to you by GitHub.

Since this test expects the parachute to be able to detach, I have to test that. Now, detaching only works if we’ve landed. (I’ve simplified on purpose. Suppose the parachute can’t survive a drop from any height. It’s easy to add that detail in later.)

ParachuteTest {
  testDetachingWhileLanded() {
    parachute = Parachute.new(lander = mock(Lander))
    lander.stubs().has_landed().to_return(true)
    this_block {
      parachute.detach()
    }.should execute_without_incident
  }
  
  testDetachingWhileNotLanded() {
    parachute = Parachute.new(lander = mock(Lander))
    lander.stubs().has_landed().to_return(false)
    this_block {
      parachute.detach()
    }.should raise("You broke the lander, idiot.")
  }
}
 
view raw This Gist brought to you by GitHub.

Hm. I notice that parachute.detach() might fail. But I just wrote a test that uses parachute.detach() and doesn’t yet show how it handles that method failing. I have to test that.

DetachmentSystemTest {
  testRespondsToDetachFailing() {
    detachment_system = DetachmentSystem.new(parachute = mock(Parachute))
    parachute.stubs().detach().to_raise(AnyException)
 
    this_block {
      detachment_system.handle_acceleration_report(-50.ms2)
    }.should raise(AnyException)
  }
}
 
view raw This Gist brought to you by GitHub.

Hm. So handling an acceleration report of −50 m/s2 can fail. Who might issue such a right? The accelerometer. Since the detach system doesn’t handle this failure, I have to test what the accelerometer does when issuing an acceleration report might fail.

testAccelerometerCanRespondToFailureWhenReportingAcceleration() {
  accelerometer = Accelerometer.new()
  accelerometer.add_observer(observer = mock(AccelerationObserver))
  observer.stubs().handle_acceleration_report().to_raise(AnyException)
 
  this_block {
    accelerometer.report_acceleration(-50.ms2)
  }.should raise(AnyException)
}
 
view raw This Gist brought to you by GitHub.

It turns out that the accelerometer might fail when reporting acceleration of −50 m/s2. When might it do that? When the lander decelerates. What happens then?

testLanderDeceleratesRespondsToFailure() {
  accelerometer = mock(Accelerometer)
  lander = Lander.new(accelerometer)
  accelerometer.stubs().report_acceleration().to_raise(AnyException)
 
  this_block {
    lander.decelerate()
  }.should raise(AnyException)
}
 
view raw This Gist brought to you by GitHub.

Hm. So decelerating could fail! All right, who causes the lander to decelerate? That code might fail. Oh yes… the parachute opening!

testOpenParachuteRespondsToFailure() {
  parachute = Parachute.new(lander = mock(Lander))
  lander.stubs().decelerate().to_raise(AnyException)
  
  this_block {
    parachute.open()
  }.should raise(AnyException)
}
 
view raw This Gist brought to you by GitHub.

So opening the parachute could fail! We probably want to nail down when that happens. We have a test that shows us when:

testDetachingWhileNotLanded() {
  parachute = Parachute.new(lander = mock(Lander))
  lander.stubs().has_landed().to_return(false)
  this_block {
    parachute.detach()
  }.should raise("You broke the lander, idiot.")
}
 
view raw This Gist brought to you by GitHub.

So the parachute opening could cause it to detach because the lander hasn’t landed yet. I don’t know about you, but I think the parachute provides the most value when its helps the lander land, and not once it has landed. That tells me that someone, somewhere needs to handle the exception that detach() would raise, or at least prevent detach() from happening while the altimeter reads above a few meters off the ground.

testDoNotDetachWhenTheLanderIsTooHighUp() {
  altimeter = mock(Altimeter)
  altimeter.stubs().altitude().to_return(5.m)
  
  DetachmentSystem.new(parachute = mock(Parachute))
  parachute.expects(no_invocations_of).detach()
  
  detachment_system.handle_acceleration_report(-50.ms2)
  
  // ???
}
 
view raw This Gist brought to you by GitHub.

In writing this test, I see that in order to stop the detachment system from telling the parachute to detach, it needs access to the altimeter.

Integration problem detected.

When I wire the detachment system up to the altimeter, even the collaboration test shows how to ensure that the parachute doesn’t detach in this kind of dangerous situation.

testDoNotDetachWhenTheLanderIsTooHighUp() {
  DetachmentSystem.new(parachute = mock(Parachute), altimeter = mock(Altimeter))
  altimeter.stubs().altitude().to_return(5.m)
  parachute.expects(no_invocations_of).detach()
  
  detachment_system.handle_acceleration_report(-50.ms2)
}
 
view raw This Gist brought to you by GitHub.

This means I have to add the following production behavior.

DetachmentSystem acts as AccelerationObserver {
  needs a parachute
  needs an altimeter // NEW!
  
  handle_acceleration_report(acceleration) {}
    if (acceleration <= -50.ms2 and altimeter.altitude() < 5.m) {
      parachute.detach()
    }
  }
}
 
view raw This Gist brought to you by GitHub.

Integration problem solved with no integrated tests. Instead, I have a bunch of collaboration tests, one important contract test, and the ability to notice things a systematic approach to choosing the next test, which I describe in the comments below. Any questions?

Dan Fabulich rightly jumped on me for using the phrase “an ability to notice things” just a little earlier in this article. I choose that phrase lazily because I didn’t want to patronize you by writing, “an ability to perform basic reasoning”. Oops. I thought about how I choose the next test, and I decided to take the time to include that here. Enjoy.

In this example, I used no magic to choose the next test; but rather some fundamental reasoning.

Every time I say “I need a thing to do X” I introduce an interface. In my current test, I end up stubbing or mocking one of those tests.

Every time I stub a method, I make an assumption about what values that method can return. To check that assumption, I have to write a test that expects the return value I’ve just stubbed. I use only basic logic there: if A depends on B returning x, then I have to know that B can return x, so I have to write a test for that.

Every time I mock a method, I make an assumption about a service the interface provides. To check that assumption, I have to write a test that tries to invoke that method with the parameters I just expected. Again, I use only basic logic there: if A causes B to invoke c(d, e, f) then I have to know that I’ve tested what happens when B invokes c(d, e, f), so I have to write a test for that.

Every time I introduce a method on an interface, I make a decision about its behavior, which forms the contract of that method. To justify that decision, I have to write tests that help me implement that behavior correctly whenever I implement that interface. I write contract tests for that. Once again, I use only basic logic there: if A claims to be able to do c(d, e, f) with outcomes x, y, and z, then when B implements A, it must be able to do c(d, e, f) with outcomes x, y, and z (and possibly other non-destructive outcomes).

I simply kept applying these points over and over again until I stopped needing tests. Along the way, I found a problem and fixed it before it left my hands.

If I can describe the steps well enough for others to follow – and I posit I’ve just done that here – then I don’t agree to labeling it “magic”.

In Brief: Contract Tests

| Comments

Contract Tests explain how a class should extend a superclass or implement and interface, so that I don’t have to read a bunch of prose to figure out how to do that. Typically, a contract test case class is abstract, then I extend it and implement a creation method or two to return instances of my own implementation of the given interface. That gives me a standard battery of tests I can run to drive my implementation. It might not be perfect (I’ll have n failing tests to start) but I prefer it to documentation written in prose.

So if you’re delivering something you want me to extend and I need to follow more than three rules, please deliver me some contract tests.

JUnit: A Starter Guide

| Comments

Not long after the Christmas break between 2001 and 2002 I wrote “JUnit: A Starter Guide” in response to a flurry of complaints over the lack of a decent JUnit tutorial. I didn’t know at the time that I would turn this simple tutorial, written somewhat in haste, into JUnit Recipes: Practical Methods for Programmer Testing, which I completed in 2004. It has surprised me to note that several hundred people continue to read the Starter Guide, so I thought I would include a link to it from here. In order not to deal with differences in HTML formatting, I have decided to link to the document as a PDF and keep it mostly for historical purposes. Much has changed since 2002, and I hope the Starter Guide remains a solid introduction to JUnit, test-driven development and software design.