Test driven development proponents often tend to push code coverage as a useful metric for gauging how well tested an application is. 100% code coverage has long been the ultimate goal of testing fanatics. But is code coverage really all that useful? If I told you that my application has 100% code coverage, should that mean anything to you?
What does code coverage tell us?
Code coverage tells us which lines in our application are executed by our unit tests. For example, the code below has 50% code coverage if the unit tests only call Foo with condition = true:
string Foo(bool condition)
What does code coverage not tell us?
Code coverage does not tell us what code is working and what code is not. Again, code coverage only tells us what was executed by our unit tests, not what executed correctly. This is an important distinction to make. Just because a line of code is executed by a unit test, does not necessarily mean that that line of code is working as intended.
For example, the following code could have 100% code coverage and pass all unit tests if it is never called with b = 0. However, once this code is introduced into the wild it could very well crash with a div by zero exception:
double Foo(double a, double b)
return a / b;
So what is code coverage good for then?
To borrow an analogy from Scott Hanselman's interview with Quetzal Bradley, imagine you are a civil engineer responsible for testing a newly constructed series of roads. To test the roads, your first thought might be to drive over them in your car, making sure that there are no potholes, missing bridges, etc. After driving over all of the roads a few times, you might conclude that they have been tested and are ready for public use. But once you open the roads to the public, you discover that the bridge overhangs are too low for big rigs, the turns are too sharp for sports cars, and that certain areas of the roads flood when it rains.
In the above scenario, you had the equivalent of 100% code coverage since you had driven over all the roads, but you only superficially tested their behavior. Specifically, you didn't test the roads in different vehicles and under different weather conditions. So although you went through each possible execution path, you failed to accomodate for different states while doing so.
In light of this, the only solid conclusion you can draw from code coverage seems to be what lines of your code have definitely not been tested. The lines that have been tested are still up for grabs it seems unless you are willing to go through each and every possible state the application can be in when executing them. This makes code coverage far less useful as a metric as it only tells you what still needs testing but offers you no help in determining when you are done testing.
What *is* a good metric then?
Unfortunately, there doesn't seem to be a good metric for determining whether a line of code has been thoroughly tested or when a developer is done testing. Perhaps this is a good thing as it keeps us from falling into a false sense of complacency. It simply isn't feasible in even a moderately complex application to test each and every line of code under every possible circumstance. The best case scenario seems to be to test the most common scenarios and reasonable edge cases, then add additional tests as functionality inevitably breaks on those scenarios that you didn't account for. It's an admitedly clumsy system, but it's a realistic one compared to depending on 100% code coverage to weed out all possible bugs. That's not to say that there isn't use in achieving 100% code coverage. Executing the code in one particular state still has value, just not as much as developers seem to give it.
As always, I'm very interested to hear your thoughts and observations on this. Please leave them in the comments below.