Hit 'em High, Hit 'em Low:

Regression Testing and the Saff Squeeze

Kent Beck, Three Rivers Institute
(interested in hiring Kent?)

Abstract: To effectively isolate a defect, start with a system-level test and progressively inline and prune until you have the smallest possible test that demonstrates the defect.

Introduction

Pirate Sandwich
In American football there is a play called "The Sandwich" in which two people hit the person carrying the ball simultaneously, one up near his shoulders and the other near his waist. Sandwiching is definitely not done out of concern for the ball carrier's health.

A recent bug in JUnit put me in mind of The Sandwich. We had just gotten JUnitMax (about which more later) to the dog food stage, but it ran some tests repeatedly. What was going on?

High...

The system-level test was straightforward to write:

    private MaxCore fMax;

    @Before public void createMax() {
        fMax= MaxCore.createFresh();
    }

    public static class TwoOldTests extends TestCase {
        public void testOne() {}
        public void testTwo() {}
    }
   
    @Test public void junit3TestsAreRunOnce() throws Exception {
        Result result= fMax.run(Request.aClass(TwoOldTests.class), new JUnitCore());
        assertEquals(2, result.getRunCount());       
    }

Running this test showed that four tests were being run, not two. (The first version of the test ran JUnit 4-style tests and it passed, leading to a minute or two of head scratching.) Here's where the Sandwich Play came in. The test above hits the defect high, from the point of view of a user.

And Low...

Just because I have a failing test, though, doesn't mean I know how to fix the defect. If I can write the narrowest possible test that still fails, I will have isolated the code that needs to change. Finding the problem logic will help me prepare to fix it. Finally, the resulting test will help ensure that the defect is fixed and stays fixed.

The usual way I would isolate a problem like this is by single stepping in a debugger. By watching data values as I step through the code I have a chance to quickly catch one that looks wrong. In fact, my first approach to this defect was to single step. It took me quite a while to find the offending method, so I decided to try a new technique, introduced to me by David Saff, to isolate the problem method again.

The Saff Squeeze, as I call it, works by taking a failing test and progressively inlining parts of it until you can't inline further without losing sight of the defect. Here's the cycle:
  1. Inline a non-working method in the test.
  2. Place a (failing) assertion earlier in the test than the existing assertions.
  3. Prune away parts of the test that are no longer relevant.
  4. Repeat.
Here's how the Squeeze worked in this case. First I duplicated the system-level test. (I want the original left around to communicate the user's experience of the defect.) Next I inlined the call to fMax.run().

    @Test public void saffSqueezeExample() throws Exception {
        Request request= Request.aClass(TwoOldTests.class);
        JUnitCore core= new JUnitCore();
        // fMax.run(request, core); -- inlined
        core.addListener(fMax.new RememberingListener());
        Result result;
        try {
            result= core.run(fMax.sortRequest(request).getRunner()); // We can assert right here
        } finally {
            try {
                fMax.save();
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        assertEquals(2, result.getRunCount());        
    }

This made a big mess, but only temporarily. What I noticed is that I could move the assertion immediately after the call to core.run(). Once I do that, all the code to save fMax is irrelevant, as is the listener. After pruning, here is the test that was left:

    @Test public void saffSqueezeExample() throws Exception {
        Request request= Request.aClass(TwoOldTests.class);
        JUnitCore core= new JUnitCore();
        Result result= core.run(fMax.sortRequest(request).getRunner());
        assertEquals(2, result.getRunCount());       
    }

Now the test is one step closer to isolating the problematic logic. Next I inlined the call to core.run(). And moved the assertion. And pruned. And inlined... Eventually (after ~10 cycles) I had isolated the method that was causing the problem:

    @Test public void saffSqueezeExample() throws Exception {
        final Description method= Description.createTestDescription(TwoOldTests.class, "testOne");
        Filter filter= Filter.matchDescription(method);
        JUnit38ClassRunner child= new JUnit38ClassRunner(TwoOldTests.class);
        child.filter(filter);
        assertEquals(1, child.testCount());
    }

JUnit38ClassRunner can't filter its tests if they are plain JUnit 3.8 tests. That's the problem. Now, if this was a fairy tale I'd be able to tell you the simple fix that made everything work. Instead, I'm still struggling with how in the world to fix that method. The Saff Squeeze worked well enough, though, that I wanted to get it written up right away.

Conclusion

Isolating the defect with the debugger took me about twenty minutes (the code is twistier than I thought and I was having a Bad Brain Day). The Saff Squeeze took me about an hour, but much of that time was spent manually inlining code that Eclipse should have been able to handle automatically but couldn't (example in the appendix). One key difference between the two processes was that after debugging I knew where the defect was, but after squeezing I had a minimal unit test for the defect as well. That concise test is a handy by-product of the process.

I learned a few things during this exercise (the first time I've used the technique without pairing with Saff). Here are some points I learned to pay attention to:
The Saff Squeeze seems particularly suited to regression testing. At least I can't yet see how to fit it into a development cycle for new code. It would work as the heart of a disciplined approach to identifying and fixing defects:
  1. Reproduce the defect with a system-level test.
  2. Squeeze.
  3. Make both tests work.
  4. Analyze and eliminate the root cause of the defect.
For now, though, I'm stuck on 3). At least I was able to hit 'em high and hit 'em low.

Appendix: Method Eclipse Should Be Able To Inline

In case you're curious, Eclipse 3.5M2 can't inline a method like this:

    public void caller() {
        boolean thrown= foo.callee();
        ...
    }

    Foo
    private boolean callee() {
        try {
            return true;
        } catch (Exception e) {
            return false;
        }
    }

You can inline this safely by hand by replacing the return statements with assignments:

    public void caller() {
        boolean thrown;
        try {
            thrown= true;
        } catch (Exception e) {
            thrown= false;
        }
        ...
    }

Having automated inlining that worked sped the squeezing process considerably. David Saff points out that you can at least automatically move the code into your test class by extracting the method call into its own private method in the test class and inlining that. For the example above, you would first extract the call to callee():

    public void caller() {
        boolean thrown= callee2();
        ...
    }

    private boolean callee2() {
       foo.callee();
    }

The reference to foo.callee() can now be inlined automatically:

    private boolean callee2() {
        try {
            return true;
        } catch (Exception e) {
            return false;
        }
    }

Now, because you are dealing with particular data values, you can likely eliminate one side or the other of the try/catch block. This lets you automatically inline callee2() and continue squeezing.