Difference between revisions of "TestGuide"

Latest revision as of 23:30, 17 February 2013

Why Write Automated Tests?

Often when a developer is testing code, the goal in mind is to verify that a recently written piece of code works correctly. But good automated tests accomplish much more. Good automated tests primarily serve other people--as living documentation and continuous troubleshooting as the system evolves and grows. The goal of making tests easy to run and understand by others is equally as important as the goal of testing the production code.

Types of Tests

There are many different types of tests. You will very often hear developers use terminology like unit test, integration test, system test, black box test, functional test, characterization test, and more. Often, the same term means something different when used by different people. To avoid semantic arguments, this guide only distinguishes three types of tests: small, medium and large tests. The hope is that these terms are general enough to mean something similar to everyone. This guide will be focusing on small tests.

All Tests

Every test should be stable, independent, and readable.

Stable

Stable tests need to be changed less frequently than unstable tests. Change isn't always avoidable, but stability can be maximized by writing tests that only depend on the public interfaces of the code under test. In this way only interface changes require tests to be modified. For small tests this requirement means only calling public functions and methods. In large tests, this may mean different things depending on the context. In a web application, for example, stability might require only depending on the public HTTP API.

Independent

A test should stand on it's own two feet and be self-contained. No test should depend on another test to run.

Readable

Readability is as important in test code as it is in production code. A test is most likely the first thing a developer will read when they make a change that causes the test to fail. The test itself is then the first and best opportunity to correct a misunderstanding about how the production code works. This observation argues not just for cleanliness of the test code but also for maximum clarity and significance in the test name and its assertion messages.

Small Tests

AKA: Unit tests

Most developers in their day-to-day work will be reading, writing, and running small tests. The small test workflow has a critical impact on a developer's productivity. In order to have the best impact possible, all small tests should be extremely fast, maximally isolated, and portable. The goal is that any developer can run tests after each small change and that any failing test will be as useful as possible in helping to diagnose the problem.

Example: Unit tests that test application logic of each class fall under this category.

Speed

Small tests should not take longer than 1/10th of a second to run--and on average they should run much faster than that [Feathers 2005]. The idea is that a developer can get feedback on the effects of a code change in a few seconds. How do we make sure small tests run this fast? There are a few rules to follow. [Feathers 2005]

Small tests do not talk to a database.
Small tests do not touch the file system.
Small tests do not communicate across a network.

It's not that these are bad things to do in tests. It's just that tests that cover these areas take too long to provide feedback. Hence, they are the province of medium and large tests and are run as part of a larger development cycle.

Isolation

Small tests should test at the finest granularity of all sizes of tests. This ensures that it is very easy to find the exact source of a problem when tests fail--perhaps just from the name of the test itself. Isolation can be achieved by having only one concept per test. Often, this means only one function in only one class should be tested at a time. In practice, if a small test restricts itself to one feature of a few collaborating classes from the same package, it may be sufficiently isolated. Again, sufficient isolation means that it is very easy for a developer to find the source of the problem when a test fails.

Portability

Small tests should be able to be run by any developer without requiring more hardware resources than it takes to develop.

Medium Tests

AKA: Functional Tests

Medium-sized tests are less isolated and slower to run than small tests, but just as portable. The medium testing level is where we can bring in real dependencies like database, file system, or hypervisor interaction. It is also acceptable to employ fake dependencies to isolate a subsystem for testing. Medium tests should not require a full deployment of the system under test. So, for example, medium tests might require a local sqlite database, but must not require a mysql server installation. In this way, any developer can run medium tests locally, but not as frequently as small tests because they are not expected to run as fast.

There are a lot of different approaches that fit under the category of medium testing. But too much testing at this level risks making the system hard to change and duplicating effort. Therefore, a project-specific strategy is required to make medium tests most effective.

Example: Functional tests where the the app is broken down according to functional components and each component is tested for it's behavior and interaction are good candidates for Medium tests.

Large Tests

AKA: Integration Tests

Large tests focus on end-to-end functionality of the entire system in a realistic deployment. In general no test fakes are allowed in large tests. Large tests are the costliest tests to run, both in terms of time and hardware resources. As such, it is not expected that every developer can run large tests directly. However, large tests should be run against trunk as frequently as possible in an automated fashion and if possible, developers should be given appropriate tools to trigger large test runs on development branches. Large tests are highly dependent upon the installation against which they're running. Large tests are expected to be capable of testing multiple systems functioning together (i.e. Dashboard, Keystone, Glance, and Nova) under a predefined configuration.

Unlike small and medium tests, it is not required that large tests live in the same source repository as the system under test. However, all developers should have access to the large tests so that they can be added and modified along with changes to the system. Indeed all developers share the responsibility of keeping large tests up-to-date.

Example: Scenario and complete system tests where user interaction with the system under test is tested are an example of Large tests.

References

Feathers, Michael C. (2005). Working Effectively with Legacy Code. Prentice Hall. ISBN 0-13-117705-2.

@@ Line 1: / Line 1: @@
-__NOTOC__
-WARNING: This document is still in early draft stages!
+<!-- ## page was renamed from [[TestingGuideDraft]] -->
 == Why Write Automated Tests? ==
+Often when a developer is testing code, the goal in mind is to verify that a recently written piece of
+code works correctly. But good automated tests accomplish much more. Good automated tests primarily serve
+other people--as living documentation and continuous troubleshooting as the system evolves and grows.
+The goal of making tests easy to run and understand by others is equally as important as the goal of
+testing the production code.
-Often when writing tests, our main goal is to verify that the piece of code we
+== Types of Tests ==
-just wrote works the way we want it to. However, tests serve a large
+There are many different types of tests. You will very often hear developers use terminology like unit test, integration test, system test, black box test, functional test, characterization test, and more. Often, the same term means something different when used by different people. To avoid semantic arguments, this guide only distinguishes three types of tests: small, medium and large tests. The hope is that these terms are general enough to mean something similar to everyone. This guide will be focusing on small tests.
-number of important purposes beyond that scope. First, tests can help us design
-our code, especially if written first as in TDD. Second, tests serve as
-useful documentation of how code works.
-Crucially, the tests we write will be run many many times by other people.
-Indeed, their primary use is that good test coverage enables other developers
-to make changes at a very fast pace with the confidence that they are not
-breaking  (?)
-== Assumptions ==
+== All Tests ==
+Every test should be stable, independent, and readable.
-* familiar with python unittest (link?)
+=== Stable ===
+Stable tests need to be changed less frequently than unstable tests. Change isn't always avoidable,  but stability can be maximized by writing tests that only depend on the public interfaces of the code under test. In this way only interface changes require tests to be modified. For small tests this requirement means only calling public functions and methods. In large tests, this may mean different things depending on the context. In a web application, for example, stability might require only depending on the public HTTP API.
-== Types of Tests ==
+=== Independent ===
+A test should stand on it's own two feet and be self-contained. No test should depend on another test to run.
-There are many different types of tests. You will very often hear developers
+=== Readable ===
-use terminology like unit tests, integration tests, system tests, black box
+Readability is as important in test code as it is in production code. A test is most likely the first thing a developer will read when they make a change that causes the test to fail. The test itself is then the first and best opportunity to correct a misunderstanding about how the production code works. This observation argues not just for cleanliness of the test code but also for maximum clarity and significance in the test name and its assertion messages.
-tests, functional tests, characterization tests, and more. Often, the same term
-means something different when used by different people. To avoid semantic
-arguments, this guide only distinguishes three types of tests: small, medium
-and large tests. The hope is that these terms are general enough to mean
-something similar to everyone.
 == Small Tests ==
-Most developers in their day to day work will be reading, writing, and running
+AKA: Unit tests
-small tests. Additionally, medium and large tests are in part defined by
-what small tests can not be. Therefore, small tests are the main focus of this
-document.
-What makes a test small? Small tests
-* are extremely isolated
-* run extremely fast
-Let's break down each of these attributes in turn.
+Most developers in their day-to-day work will be reading, writing, and running small tests.  The small test workflow has a critical impact on a developer's productivity. In order to have the best impact possible, all small tests should be extremely fast, maximally isolated, and portable. The goal is that any developer can run tests after each small change and that any failing test will be as useful as possible in helping to diagnose the problem.
-=== Isolation ===
+Example: Unit tests that test application logic of each class fall under this category.
-Having isolated tests mean it is easy to find the code that is causing a
+=== Speed ===
-bug. Consider the following bad example:
+Small tests should not take longer than 1/10th of a second to run--and on average they should run much faster than that [Feathers 2005]. The idea is that a developer can get feedback on the effects of a code change in a few seconds. How do we make sure small tests run this fast? There are a few rules to follow. [Feathers 2005]
-<code><nowiki>EXAMPLE 1: UNISOLATED TEST (pulled from nova code?)</nowiki></code>
+# Small tests do not talk to a database.
+# Small tests do not touch the file system.
+# Small tests do not communicate across a network.
-If this test fails, where is the problem? If instead, this test were rewritten
+It's not that these are bad things to do in tests. It's just that tests that cover these areas take too long to provide feedback. Hence, they are the province of medium and large tests and are run as part of a larger development cycle.
-as the following set of small tests, we would know precisely which methods
-were buggy when a test failed.
-<code><nowiki>EXAMPLE 2: ISOLATED FORM OF EXAMPLE 1, AVOIDING NEED FOR FAKES</nowiki></code>
+=== Isolation ===
+Small tests should test at the finest granularity of all sizes of tests. This ensures that it is very easy to find the exact source of a problem when tests fail--perhaps just from the name of the test itself. Isolation can be achieved by having only one concept per test. Often, this means only one function in only one class should be tested at a time. In practice, if a small test restricts itself to one feature of a few collaborating classes from the same package, it may be sufficiently isolated. Again, sufficient isolation means that it is very easy for a developer to find the source of the problem when a test fails.
-Another way to help isolate a test is to focus the assertions in that test
-around a single concept. If the test in the following example failed, we
-would have to look at the exact line number of the failure to know what
-went wrong.
-<code><nowiki>EXAMPLE 3: Too many asserts</nowiki></code>
-If instead the test were broken down as follows, merely reading the name
+=== Portability ===
-of the method that failed would give us a good idea of what went wrong.
+Small tests should be able to be run by any developer without requiring more hardware resources than it takes to develop.
-<code><nowiki>EXAMPLE 4: Rewrite of example 3 with one concept per test</nowiki></code>
+== Medium Tests ==
-==== Fake Objects ====
+AKA: Functional Tests
-Frequently it is impossible to get an object on its own, because it depends
+Medium-sized tests are less isolated and slower to run than small tests, but just as portable. The medium testing level is where we can bring in real dependencies like database, file system, or hypervisor interaction. It is also acceptable to employ fake dependencies to isolate a subsystem for testing. Medium tests should not require a full deployment of the system under test. So, for example, medium tests might require a local sqlite database, but must not require a mysql server installation.  In this way, any developer can run medium tests locally, but not as frequently as small tests because they are not expected to run as fast.
-on the behavior of another object. Consider the following code.
-<code><nowiki>EXAMPLE 5: Code of an object aggregating another object</nowiki></code>
+There are a lot of different approaches that fit under the category of medium testing. But too much testing at this level risks making the system hard to change and duplicating effort. Therefore, a project-specific strategy is required to make medium tests most effective.
-How can we test the <aggregator> without also testing the <aggregated>?
+Example: Functional tests where the the app is broken down according to functional components and each component is tested for it's behavior and interaction are good candidates for Medium tests.
-In situations like this, it is frequently helpful to make use of fake objects.
-A fake object is an object that has the same interface as the dependency we
-are trying to avoid instantiating. There are two general ways to inject
-fake objects as dependencies--dependency injection and stubbing. Stubbing
-in a fake object makes use of the features of dynamic languages by overwriting
-the namespace that the object we want to test will use to instantiate the
-dependency.
-<code><nowiki>EXAMPLE 6: Test isolation through stubbing</nowiki></code>
+== Large Tests ==
-As you can see, the <object we are testing> thinks it is getting the usual
+AKA: Integration Tests
-<dependency object>, but instead it is getting the fake object we created.
-Dependency injection is very similar but uses more traditional approaches
+Large tests focus on end-to-end functionality of the entire system in a realistic deployment. In general no test fakes are allowed in large tests. Large tests are the costliest tests to run, both in terms of time and hardware resources. As such, it is not expected that every developer can run large tests directly. However, large tests should be run against trunk as frequently as possible in an automated fashion and if possible, developers should be given appropriate tools to trigger large test runs on development branches. Large tests are highly dependent upon the installation against which they're running. Large tests are expected to be capable of testing multiple systems functioning together (i.e. Dashboard, Keystone, Glance, and Nova) under a predefined configuration.
-to providing the fake object.
-<code><nowiki>EXAMPLE 7: Test isolation through DI</nowiki></code>
+Unlike small and medium tests, it is not required that large tests live in the same source repository as the system under test. However, all developers should have access to the large tests so that they can be added and modified along with changes to the system. Indeed all developers share the responsibility of keeping large tests up-to-date.
-=== Style Guidance ===
+Example: Scenario and complete system tests where user interaction with the system under test is tested are an example of Large tests.
 == References ==
+Feathers, Michael C. (2005). ''Working Effectively with Legacy Code''. Prentice Hall. ISBN 0-13-117705-2.