Jump to: navigation, search

SmallTestingGuide

What are Small Tests?

Small Tests are the tests that most developer read, write, and run with the greatest frequency. Small Tests are bundled with the source code, can be executed in any environment, and run extremely fast. Small tests cover the codebase in as fine a granularity as possible in order to make it very easy to locate problems when tests fail.

A Basic Example

Consider the following example, adapted from Martin Fowler


import unittest

TALISKER = "Talisker"
HIGHLAND_PARK = "Highland Park"

class OrderTests(unittest.TestCase):

    def setUp(self):
        self.warehouse = Warehouse()
        self.warehouse.add(TALISKER, 50) 
        self.warehouse.add(HIGHLAND_PARK, 25) 

    def test_order_is_filled_if_enough_in_warehouse(self):
        order = Order(TALISKER, 50) 
        order.fill(self.warehouse)
        self.assertTrue(order.is_filled())
        self.assertEqual(self.warehouse.get_inventory(TALISKER), 0)

    def test_order_does_not_remove_if_not_enough(self):
        order = Order(TALISKER, 51) 
        order.fill(self.warehouse)
        self.assertFalse(order.is_filled())
        self.assertEqual(self.warehouse.get_inventory(TALISKER), 50)


In this example, the basic structure of a small test in python is shown. Hopefully it appears familiar to anyone who has experience with xUnit-style tests in any language. The test case [[OrderTests]] inherits from unittest.TestCase in order to fit into the normal testing frameworks such as python-nosetests. There is a setUp method which prepares the initial state common to all of the tests in this test case. Each test method begins with "test" so that unittest knows which methods to run. Assertions are made using self.assert*()--more thorough documentation of the available assertion methods can be found at python.org. It is not shown above, but an optional tearDown method can take care of any cleanup actions that must take place after each test method.

Isolation

The basic example above may have a small problem. The system under test (SUT), Order, is tested alongside its collaborator, Warehouse. This means that if there is a problem with the code in Warehouse, it is likely that some of the [[OrderTests]] will fail. This may create a confusing situation when another developer encounters this failure and has to figure out where the problem is.

First, we must acknowledge that it is not certain that this is a real problem. This particular example is very simple and hopefully easy to understand. As such, it is probably sufficient to add truly isolated tests for Warehouse and not worry about a lack of isolation in the [[OrderTests]].


class WarehouseTests(unittest.TestCase):

    def setUp(self):
        self.warehouse = Warehouse()
        self.warehouse.add('Glenlivit', 10) 

    def test_warehouse_shows_new_inventory(self):
       self.assertEqual(self.warehouse.get_inventory('Glenlivit'), 10) 

    def test_warehouse_shows_added_inventory(self):
        self.warehouse.add('Glenlivit', 15) 
        self.assertEqual(self.warehouse.get_inventory('Glenlivit'), 25) 

    def test_warehouse_shows_removed_inventory(self):
        self.warehouse.remove('Glenlivit', 10) 
        self.assertEqual(self.warehouse.get_inventory('Glenlivit'), 0)


With this added coverage, a developer will at least be able to infer a problem with Warehouse if both [[WarehouseTests]] and [[OrderTests]] are failing. This is bending the rule of maximum isolation for small tests, but it is a practical approach.

Test Doubles

In our example, a Warehouse is very easy to create, doesn't introduce any other dependencies, and presumably runs very fast. However, if this were not the case, it would be imperative to test Order in isolation. But how could we accomplish this isolation?

The principle approach to isolating the SUT from its dependencies is to introduce a Test Double (like a stunt double) that fills in for each dependency for the purposes of that test. There are a variety of more specific types of Test Doubles--and an even wider range of vocabulary used to describe those types. For the purposes of internal consistency this document will use the vocabulary defined by Gerard Meszaros in his book XUnit Test Patterns. More detail about this particular terminology can be found here. Not all of the approaches Meszaros describes are relevant here, so we won't cover them all. But it is useful for you to be familiar with all of them.

Test Stub

A Test Stub is a Test Double that provides a canned response to method calls, irrespective of the inputs provided to the method call. It helps when you are trying to set up a particular situation in which to exercise the SUT. After the exercise, the state of the SUT is verified in the normal way with assertions.


class OrderTestsWithStub(unittest.TestCase):

    def test_order_is_filled_if_enough_in_warehouse(self):

        class StubWarehouse(object):

            def get_inventory(self, item):
                return 50

            def remove(self, item, qty):
                pass

        warehouse = StubWarehouse()
        order = Order(TALISKER, 50)
        order.fill(warehouse)
        self.assertTrue(order.is_filled())


In this example, the [[StubWarehouse]] pays no attention to the item in question--it always returns 50 for how much is available. In addition, only the Warehouse methods that are required for this test to run are defined. Because of the limited applicability of this stub, it is defined directly in the test. A more configurable stub might make more sense living at a higher level in the code.

Mock Object

Mock Objects take a different approach from classical unit testing in order to verify the correctness of code. Where the classical approach verifies the end state after the SUT is exercised, Mock Objects verify the behavior of the SUT during the exercise.

Mocks can be coded directly, but most often they are created with the help of a separate library. One such library for python is Mox. With Mox, when you first create a mock object, it is in "record" mode. The test then goes through the motions with the mock object, teaching it what to expect from the SUT and what to return. When this is complete, the mock object is placed in "replay" mode and the SUT is exercised. During the verification phase, the mock object confirms that its methods were called in order with the appropriate arguments.


# The comments in this example are included only for the sake of
# showing the subtleties of mox--they should not be included in a
# real test. 
class OrderTestsWithMox(unittest.TestCase):

    def test_order_is_filled_if_enough_in_warehouse(self):
        # Create the Order as usual
        order = Order(TALISKER, 50) 

        # Create the mock warehouse object in record mode
        mocker = mox.Mox()
        warehouse = mocker.CreateMockAnything()

        # Record the sequence of actions expected from the Order object
        warehouse.get_inventory(TALISKER).AndReturn(50)
        warehouse.remove(TALISKER, 50) 

        # Put all mock objects in replay mode
        mocker.ReplayAll()

        # Exercise the Order object
        order.fill(warehouse)

        # Verify that the order is filled and that the warehouse saw
        # the correct behavior
        self.assertTrue(order.is_filled())
        mocker.VerifyAll()


Behavior verification, however, is risky business. Often we are not concerned with the particular behavior of the SUT, we just want to make sure that it correctly implements its interface.

For example, suppose we changed the implementation of Order to the following.


class Order(object):
    # ...
    def fill(self, warehouse):
	try:
            warehouse.remove(self._item, self._quantity)
            self._filled = True
        except:
            pass
    # ...


This is a perfectly valid approach, yet it would break the mock object test above. This might create confusion and it would certainly require making modifications to the test. Because of the tendency of to overspecify the requirements of the software, it is recommended that developers avoid mock objects and behavior verification unless it is truly necessary.

A good example of a case where behavior verification is preferred is when testing a cache. A cache is specifically intended to have no noticeable difference in interface behavior than its underlying backend. In this case, behavior verification would be the simplest and best way to verify the intended functionality.

Fake Object

A Fake Object provides a working implementation of object it is standing in for, but usually with simplifications that make it suitable for testing and unsuitable for production. It is often desirable to create Fake Objects to stand in for databases that would normally cause a test to run too slowly to qualify as a small test.


class FakePersonGateway(object):

    def __init__(self):
        self._person_data = {}

    def insert(self, person):
        person.id = len(self._person_data)
        self._person_data[person.id] = person

    def find_by_name(self, name):
        for person in self._person_data.values():
            if person.name == name:
                return person

    def find_by_parent(self, parent_id):
        people = []
        for person in self._person_data.values():
            if person.mother_id == parent_id or person.father_id == parent_id:
                people.append(person)
        return people


class FamilyTreeTests(unittest.TestCase):

    def setUp(self):
        self.gateway = FakePersonGateway()
        bob = Person('Bob Smith')
        self.gateway.insert(bob)
        alice = Person('Alice Smith')
        self.gateway.insert(alice)
        james = Person('James Smith', father_id=bob.id, mother_id=alice.id)
        self.gateway.insert(james)

    def test_child_descends_from_mother(self):
        tree = FamilyTree(self.gateway)
        self.assertTrue(tree.descends_from('Alice Smith', 'James Smith'))

    def test_father_does_not_descend_from_mother(self):
        tree = FamilyTree(self.gateway)
        self.assertFalse(tree.descends_from('Alice Smith', 'Bob Smith'))


In this example, the SUT is [[FamilyTree]], which depends on a [[PersonGateway]] to look up a given person's children. Because small tests are not allowed to talk to a database (to prevent slow tests), we substitute in a FakePersonGateway which operates out of a local in-memory dictionary.

Using Test Doubles

How do we persuade the SUT to interact with the Test Double instead of the real dependency? In some situations the solution is obvious, as in the Warehouse and Order examples above. In more difficult cases, there are two general approaches: Dependency Injection and Monkey Patching.

Dependency Injection

When using Dependency Injection, you write (or refactor) the SUT in such a way that it depends on abstractions or interfaces rather than on concrete instances. This is probably best shown through additional examples. Suppose when we first went to test the [[FamilyTree]] we found its constructor to be the following.


class FamilyTree(object):

    def __init__(self):
        self._person_gateway = mylibrary.dataaccess.PersonGateway()


This implementation of [[FamilyTree]] depends directly on the concrete class mylibrary.dataaccess.[[PersonGateway]]. This dependency makes [[FamilyTree]] hard to test because creating a [[PersonGateway]] probably requires certain config files and some sort of database to be present. Even if we do the extra work to test [[FamilyTree]] in this way, it would probably run a lot slower than our other tests because it must interact with a database.

Applying Dependency Injection, we might refactor [[FamilyTree]].


class FamilyTree(object):

    def __init__(self, person_gateway):
        self._person_gateway = person_gateway


Now, [[FamilyTree]] depends on the person_gateway that is passed in to its initializer. This variable is an abstraction in the sense that it could be anything the client code wants it to be, so long as it implements the methods [[FamilyTree]] needs to use. When the client code is the small test suite, we are free to create the [[FakePersonGateway]] as above and inject it into the [[FamilyTree]] simply.


        person_gateway = FakePersonGateway()
        # ...
        tree = FamilyTree(person_gateway)


Giving the client of [[FamilyTree]] more power in this way also requires that it take on more responsibility--it now has to know which [[PersonGateway]] to use when it creates a [[FamilyTree]].

Refactoring a system to use Dependency Injection for testing purposes has another drawback. Since we are adding tests to increase the coverage of the SUT, presumably we do not already have small tests that would ensure that the refactoring does not break anything. Any refactoring attempted without prior test coverage incurs a bigger risk of causing bugs. In this case, it may be better to ensure the SUT is thoroughly covered by Medium and Large tests before refactoring to improve Small test coverage.

Monkey Patching

Monkey Patching takes advantage of the dynamic nature of Python to alter the global namespace. When testing, monkey patching can be used to replace a hard-coded dependency with a Test Double. For instance, we could have left the [[FamilyTree]] initializer looking like the following.


class FamilyTree(object):

    def __init__(self):
        self._person_gateway = mylibrary.dataaccess.PersonGateway()


Then, in our tests, we would simply overwrite mylibrary.dataaccess.[[PersonGateway]].


        mylibrary.dataaccess.PersonGateway = FakePersonGateway
        # ...
        tree = FamilyTree()


There is a third party library, python stubout, which is distributed with python mox.

Stubout is helpful when you need to reverse the patches you have set. These reversals are done in the tearDown method. It also provides convenience methods for monkey patching that preserve inheritance hierarchies which is necessary if the SUT is checking object types.

When adding tests to a legacy system, monkey patching offers the very attractive benefit of not requiring any refactoring to inject the Test Double. This approach is also especially handy when the SUT depends on procedural code which is often awkward to inject.

However, tests that rely on monkey patching are more fragile because they tend to depend on the implementation details of the SUT. Relying on monkey patches also encourages bad design because it reinforces the idea that the SUT is free to depend on anything and everything that is available through the global namespace. Encouraging dependencies to range so far and wide tightens the coupling of the system, making it harder to change or reuse.