The Fallibility of Tests

facebooktwitterdiggdzonestumbleuponredditdelicious


Recently I posted about Test Flow and Method Contracts. The key takeaway was that we can use tests to prove the contracts our system lives up to.

We all create a logical representation of our software in our minds in order to reason about it. Tests (either manual, or automatic) allow us to prove the rules of that representation to be true.

Today I’ll talk about what it means when there are mistakes in those tests. For the purposes of this discussion, “tests” may be manual or automatic, but examples are given in code for clarity.
 
 

Test Fallacy vs Test Absence

The first distinction I want to make is between incorrect tests, and missing tests.

Test Fallacy is when a test exists, but creates a logical fallacy within our reasoning system.

For example:

// {new Cache} r = cache.has(x) {r == false}
@Test
public void has_shouldReturnFalse_whenItemHasNotBeenAddedToCache() {
    //setup
    Cache c = new Cache();
 
    //execute
    boolean b = c.has("random");
 
    //assert
    assertTrue(b);
}

Clearly the code disagrees with the test name. Since the name of the test is more likely to correspond to the rule I have in my head that I believe the system follows, this test could be extremely destructive to my system.
 
 
Test Absence is when a test is absent, but we assume behaviour anyways. This is a logical mistake on our part, but it leads to bugs in the system.

I gave an example of this in my post Test Flow and Method Contracts where we “forgot” to define the behaviour of “has” when the cache is empty.

The problem with test absence is that we often assume the behaviour exists, and is “reasonable” by our own subjective measure. That creates a problem – what is “reasonable”?  
 

The Danger of Test Fallibility

While tests are fallible, either through a mistake or through absence, it’s useful to keep in mind why that is dangerous.

A professor for my logic class said something one class that stuck with me:

Logical fallacies don’t just create problems in logical systems, they destroy them. If you consider any false statement to be true, anything can be proven.

Let’s say, for example, that 2 = 3. I’ll prove I’m the Queen of England.

How? Well 3 – 2 = 1, but 3 – 3 = 0. Since 2 and 3 are equal, that implies 1 and 0 are equal.

Furthermore, 2 – 1 = 1, but 2 – 0 = 2. Since 1 and 0 are equal, so too are 1 and 2.

Now the Queen and I are, in fact, two distinct people. But 2 = 1, which means the Queen and I are one. So you see, I am in fact the Queen of England.
(Paraphrased from memory)

Honestly, I probably would have chosen to prove I was Metallica (the entire band), but if one admits that 2 = 3, the rest of the logic is difficult to argue against. With one false “fact”, we destroyed our entire reasoning system.

I’m sure many of us can relate this back to software. Try to recall a time when you received a bug complaint from a user, and after tracking down the bug you stared in awe at your computer screen. “That shouldn’t even be possible…”

The real problem? These “glitches” in our reasoning systems are inevitable. We’re all humans, we will make a mistake. One way or another, you will eventually have a bug in a test, or assume behaviour exists that really doesn’t.
 
 

How do we Address Test Fallibility?

This brings me back around to the original topic – how do we avoid this fallibility?

We automate our test suite.

It seems strange to say automation will “cure” all of our problems reasoning about systems, and that’s because it won’t. It’s simply the best way to combat the problem.

Automation addresses Test Absence by:

  • being cheaper (time) than manual testing
  • eliminating any reasons to skip tests
  • growing as a regression suite over time as we think of new test cases

If you forget a test, simply add it and it will never be forgotten again.

Automation addresses Test Fallacies by:

  • being repeatable
  • being correctable (you can’t prevent humans from re-making mistakes)
  • being easy to read, examine, and reason about

If you make a mistake in a test, simply fix it and the mistake will not reoccur.

In summary, automating your tests doesn’t guarantee that your tests will be good, but it allows you to confidently improve your tests over time. Good test allow you to reason correctly about your system.


facebooktwitterdiggdzonestumbleuponredditdelicious

Test Flow and Method Contracts

facebooktwitterdiggdzonestumbleuponredditdelicious


 
Today’s (long overdue) blog entry is inspired by a recent twitter discussion I’ve been following. Uncle Bob (aka Robert Martin) made the bold statement that 100% test coverage should simply be a matter of conscience.
 
Now, I’m not going to delve into my thoughts on the discussion. However, I find it distressing that one of the biggest arguments I see against high test coverage appears to be “but tests don’t guarantee the code works…”.
 
As Bob said, “Tests cannot prove the absence of bugs. But tests can prove that code behaves as expected.”
 
 

What are automated tests?

Proponents of automated testing list a great many reasons why they believe in it. To name a few:

  • Prevent code/bug regression
  • Ease of refactoring
  • Provide confidence in code behaviour
  • Reduce (or eliminate?) time spent testing manually

 
All of these points actually come back to one thing:
Tests mean you know what the code does
 
Note that I didn’t claim the code works, just that you know what it does. That differentiation is important. I’ll talk more about the infallibility of tests after.
 
 

Function Contracts

Those familiar with design by contract or Liskov’s Substitution Principle are familiar with the idea of preconditions and postconditions:
 
Precondition – a condition or predicate that must always be true just prior to the execution of some section of code or before an operation
Postcondition – a condition or predicate that must always be true just after the execution of some section of code or after an operation
 
A more formal coding of pre and post conditions takes the form of Hoare Triples. A Hoare Triple is a statement which essentially says “given some precondition, the execution of my code will produce some postcondition”.
 
 

Tie it Together

While most of us don’t think about it, automated tests are Hoare Triples.
 
Demonstration:

@Test
public void get_shouldReturnCachedValue_givenValuePutInCache() {
  //setup
  String key = "key";
  String expectedValue = "value";
  Cache cache = new Cache();
  cache.put(key, expectedValue);
 
  //execute
  String result = cache.get(key);
 
  //assert
  assertThat(result, equalTo(expectedValue));
}
 
@Test
public void get_shouldHaveValue_givenValuePutInCache() {
  //setup
  String key = "key";
  String expectedValue = "value";
  Cache cache = new Cache();
  cache.put(key, expectedValue);
 
  //execute
  boolean result = cache.has(key);
 
  //assert
  assertTrue(result);
}

 
If you put this in terms of a Hoare Triple ({P} C {Q}):
{ cache.put(x, y); } r = cache.get(x) { r == y }
{ cache.put(x, y); } r = cache.has(x) { r == true }
 
You also might note that the method signature I chose is simply the Hoare Triple written as C {Q} {P}. This is the current practice of the team I work on, but ensuring all three clauses of the triple are distinguishable in a test name has been extremely valuable to us.
 
Note: The test could be more terse, but splitting setup/execute/assert allows us to think in terms of {P} C {Q}
 
 

Taking it a Step Further: Mocking

Some of you may be thinking “ok yeah, that test was easy”. It’s true. In the example we didn’t have to interact with external dependencies. But how can we ensure the system works correctly with external dependencies?
 
Note: The concept of “from repository” is somewhat simplified to keep things short. Imagine a more complex world where a couple of DAOs had to be combined to create a Member.

Cache mockCache = mock(Cache.class);
Repo mockRepo = mock(Repo.class);
MemberLookupService service = new MemberLookupService(mockCache, mockRepo);
 
@Test
public void getMember_shouldReturnMemberFromCache_whenCachedValuePresent() {
  //setup
  String memberId = "member id";
  Member member = new Member(memberId);
  when(mockCache.hasKey(memberId)).thenReturn(true);
  when(mockCache.get(memberId)).thenReturn(member);
 
  //execute
  String result = service.getMember(memberId);
 
  //assert
  assertSame(member, result);
}
 
public void getMember_shouldReturnMemberFromRepository_whenCachedValueNotPresent() {
  //setup
  String memberId = "member id";
  Member member = new Member(memberId);
  when(mockCache.hasKey(memberId)).thenReturn(false);
  when(mockRepo.find(memberId)).thenReturn(member);
 
  //execute
  String result = service.getMember(memberId);
 
  //assert
  assertSame(member, result);
}
 
public void getMember_shouldPlaceMemberInCache_whenValueLookedUpFromRepository() {
  //setup
  String memberId = "member id";
  Member member = new Member(memberId);
  when(mockCache.hasKey(memberId)).thenReturn(false);
  when(mockRepo.find(memberId)).thenReturn(member);
 
  //execute
  String result = service.getMember(memberId);
 
  //assert
  verify(mockCache).put(key, result);
}

 
Here we’ve created three Hoare Triples.
{Cache.has == true; Cache.get(x) == y; } r = getMember(x) { r == y }
{Cache.has == false; Repo.getMember(x) == y} r = getMember { r == y}
{Cache.has == false; Repo.getMember(x) == y} r = getMember { Cache.has(x) == true }
 
 

Reasoning About our Code

 
Now that we’ve built up a set of Hoare Triples, let’s attempt to reason about our code. We have established a system with the following rules:
 
{ cache.put(x, y); } r = cache.get(x) { r == y }
{ cache.put(x, y); } r = cache.has(x) { r == true }
{Cache.has == true; Cache.get(x) == y; } r = getMember(x) { r == y }
{Cache.has == false; Repo.getMember(x) == y} r = getMember { r == y}
{Cache.has == false; Repo.getMember(x) == y} r = getMember { Cache.has(x) == true }
 

Based on this, let’s create a scenario and pose a question. Here’s the scenario:

  • Cache.put has not been called with key “Travis”
  • “Travis” exists in the Repository, it is not null

 
The question:
Is it possible for MemberLookupService.getMember(“Travis”) to return null?
 
For the answer, I’ll refer you to Modus Ponens. In specific, when given the rule “P => Q”, if you know “not P” you cannot reason about “Q”. All potential values for “Q” are possible.
 
So can “getMember” return null? Yes. We’ve not established any rules about what “has” does when there’s nothing in the cache.
 
 

Fix the bug

To fix the bug, we need to add a couple more tests, as well as whatever code makes our entire test base pass:
 

@Test
public void get_shouldReturnNull_givenEmptyCache() {
  //setup
  String key = "key";
  Cache cache = new Cache();
 
  //execute
  String result = cache.get(key);
 
  //assert
  assertThat(result, is(nullValue()));
}
 
@Test
public void has_shouldReturnFalse_givenEmptyCache() {
  //setup
  String key = "key";
  Cache cache = new Cache();
 
  //execute
  boolean result = cache.has(key);
 
  //assert
  assertFalse(result);
}


Creating the following rules:
{new Cache} r == Cache.has(x) {r == false}
{new Cache} r == Cache.get(x) {r == null}
 
Based on our earlier scenario, we will no longer receive null:
{new Cache} r == Cache.has(Travis) {r == false}
{Cache.has(Travis) == false; Repo.getMember(Travis) == y} r = getMember { r == y}
 
 

The Code Behaves as Expected

While I don’t spend every day thinking about Hoare Triples and the predicate calculus behind my system, it’s all still there. Whether reasoning formally about our system, or informally, we do it based on what we believe the rules of our system to be.
 
Tests prove that these logical rules exist. Correct tests prove that they are the rules that we think they are. Whether the test is manual or automatic, as long as it is correct it can prove that we are correct about what rules govern our software.
 
Of course, this requires the tests to be correct. Tests are fallible as well. Automating our tests is how we address the fallibility of the tester, but I’ll go into that next time.


facebooktwitterdiggdzonestumbleuponredditdelicious