ScaLearning 7 – Distributing Concurrent Tests

facebooktwitterdiggdzonestumbleuponredditdelicious


Like many developers who make the journey from Java to Scala, I often find myself amazed at how much easier it is to do some things, or how much easier it is to express myself in Scala.

“ScaLearning” will be a series of short blog-posts just documenting little tidbits I find interesting, confusing, amusing, or otherwise worthy of talking about.
 
 

Motivation

Recently in order to gain confidence in our web application, our team decided it pragmatic to run a series of tests over a deployed version of the application complete with production-like database. A simple suite of non-destructive tests we could run in any environment was quickly put together.
 
Unfortunately one of our simplest tests quickly began causing us trouble. The test emulated a search spider, crawling every link it found on the entire site in an exhaustive graph traversal complete with cycle-detection. Unfortunately, the test ran for over 24 hours without completing.
 
While we’ve made other optimizations to improve performance of the test (such as excluding sufficiently similar pages), the topic of today’s post is the concurrency we introduced in order to help take the edge off test time.
 
 

Distributing Concurrent Tests

Our goal was relatively simple. We wished to run a very simple test across many thousands of URLs:

def testSingleUrl(nextUrl : String) = {
      client.open(nextUrl)
      client.statusCode should equal(200)
}

 
 
If either of the calls inside this test fail, an exception would be thrown. This was an acceptable way of detecting test failure. However, the test needed to do a bit more – traverse the graph:

  def testUrl(nextUrl: String): Unit = {
      if (wasVisited(nextUrl)) return

      val client = new HtmlUnitRunner
      client.open(nextUrl)
      client.statusCode should equal(200)

      registerVisitedUrl(nextUrl)
      
      JListWrapper(client.currentPage.getAnchors())
              .map(_.getAttribute("href"))
              .filter(isLinkValid(_))
              .foreach((a: String) => {
                      markForTesting(nextUrl)
              })
  }

Of course, this was called by a method which pushed and popped from a stack, and “markForTesting” pushed a new link onto that stack. This code worked great sequentially, but we wanted it to operate concurrently in order to minimize testing time. For this, we employed Akka’s actors:

class ConcurrentTest extends Actor {
  def receive = {
    case name: (() => Unit) =>
      try {
        name()
        self reply(true)
      } catch {
        case t: Throwable =>
          t.printStackTrace()
          self reply(false)
      }
    case _ =>
      println("Actor received an unrecognized message")
      self reply(false)
  }
}

ConcurrentTest, as you can see, is the driver behind an individual run of the test method:

val testRunner = actorOf[ConcurrentTest].start()
val result = testRunner !!! (() => testUrl("/index.html")
// Other code can go here
result.get

 
 
Assuming the methods called within testUrl are thread-safe (which we also did using Actors), this will run a single test using a second Thread, and allow us to continue on with our business. However, since there’s only a single Actor here, we only have one Thread with which to process URLs. This means that we’re still effectively opening each link sequentially.

We need a Thread pool, and Akka is glad to provide one:

class Master(nofWorkers: Int) extends Actor {
  val workers = List.fill(nofWorkers)(actorOf[ConcurrentTest].start())
  val router = Routing.loadBalancerActor(new CyclicIterator(workers)).start()
  val answers = ListBuffer[Future[Boolean]]()

  def receive = {
    case name: (() => Unit) => answers += router !!! name
    case "result" => {
      Futures.awaitAll(answers.toList)
      self reply(answers.foldLeft(true)(_ && _.result.get))
      workers.foreach(_.stop())
      router.stop()
      self stop()
    }
  }
}

 
 
So far everything I’ve presented as code came very naturally. In fact, minor modifications for the purpose of blogging notwithstanding, we used the code I’ve presented so far to test several links across our site very successfully, in a fashion very similar to the following:

val masterRunner = actorOf(new Master(concurrentActors)).start()
val productList = fetchAllProductsFromDatabase

productList.map(productToUrl(_))
        .foreach(url => masterRunner ! (() => testSingleUrl(url)))

((masterRunner !! "result").getOrElse(false).asInstanceOf[Boolean]) should equal(true)

However, when we attempted to apply the same methodology to our crawl test, things didn’t work as well as we’d hoped:

    val masterRunner = actorOf(new Master(concurrentActors)).start()

    // def testUrl as seen above

    def markForTesting(nextUrl : String) {
            masterRunner ! (() => testUrl(nextUrl))
    }

   def fullTest() = {
            val baseUrl = "/index.html"
            masterRunner ! testUrl("baseUrl")

             ((masterRunner !! "result").getOrElse(false).asInstanceOf[Boolean]) should equal(true)
    }

 
 
The theory was that “result” would wait for all of the answers to come back before returning. Unfortunately, that’s not quite the sequence of events the Actor sees. In reality, after digging, we figured out what messages the actor received:

  1. (() => testUrl(“/index.html”)) occurs, which is quickly sent to a ConcurrentTest runner
  2. “result” comes next, as it takes a second or two for the runner to open the test
  3. (() => testUrl(_)) is received for several other URLs as links are scraped off the first page

“result” doesn’t actually wait for all the answers to come back, as it has no way of knowing how many answers are actually required. For that matter, we aren’t sure of that number either, as the test is meant to be dynamic. Instead, “result” simply compiles the answers it has so far, and then shuts down all of the actors. This means we get a “yes” or “no” about “/index.html”, but all of the other URLs are still sitting in the mailbox of Master when it’s shut down. Uh-oh!

So how do we know when we’re done? Mailbox sizes. We added a new match case to Master which would calculate if it believed the tests to be done yet:

class Master(nofWorkers: Int) extends Actor {
  ...
  def receive = {
    ...
    case "done" => { // Added this case, the rest of "Master" remained unchanged
        Futures.awaitAll(answers.toList)
        val overallSize = workers.map(worker => worker.mailboxSize).foldLeft(0)(_ + _) + router.mailboxSize
        self.reply(overallSize == 0 && self.mailboxSize == 0)
    }
  }
}

 
 
This code is different than “result” in that it actually attempts to detect if the tests are done by:

  1. Waiting for all currently outstanding test-methods to complete
  2. Counting any pending messages in the router and worker mailboxes.
    (This should always be zero, as we’ve waited for all answers to return, but it’s still safer to be sure)
  3. Counting any pending messages on the master
  4. Return “true” if the total pending messages is 0, otherwise “false” as more tests have to run

This algorithm will work for us because when we run Futures.awaitAll it runs every outstanding test to completion. Any URLs found on the pages to be tested are checked against previously-visited URLs, and added to Master’s queue if they are new. Since Master is still processing “done” those tests will stay on the queue and “mailboxSize” will return a positive non-zero number. If, however, no new links are encountered, then there will be no tests waiting on the Master queue, and our “done” operation will detect 0 pending tests.

In use:

val masterRunner = actorOf(new Master(concurrentActors)).start()
def markForTesting(nextUrl : String) = masterRunner ! (() => testUrl(nextUrl))
// def testUrl as seen above

def fullTest() = {
    val baseUrl = "/index.html"
    masterRunner ! testUrl("baseUrl")
    while((masterRunner !! "done").getOrElse(false).asInstanceOf[Boolean] == false) {
        Thread.sleep(1000);
    }
    ((masterRunner !! "result").getOrElse(false).asInstanceOf[Boolean]) should equal(true)
}

Now we sleep our thread, asking the master if it’s completed its job once every cycle, until the master claims all of its workers have completed their work and no new work is pending for the master to distribute.

Feedback as to other potential approaches is very welcome, I find the entire topic of concurrency and job distribution very interesting.


facebooktwitterdiggdzonestumbleuponredditdelicious

Advertisements

The Fallibility of Tests

facebooktwitterdiggdzonestumbleuponredditdelicious


Recently I posted about Test Flow and Method Contracts. The key takeaway was that we can use tests to prove the contracts our system lives up to.

We all create a logical representation of our software in our minds in order to reason about it. Tests (either manual, or automatic) allow us to prove the rules of that representation to be true.

Today I’ll talk about what it means when there are mistakes in those tests. For the purposes of this discussion, “tests” may be manual or automatic, but examples are given in code for clarity.
 
 

Test Fallacy vs Test Absence

The first distinction I want to make is between incorrect tests, and missing tests.

Test Fallacy is when a test exists, but creates a logical fallacy within our reasoning system.

For example:

// {new Cache} r = cache.has(x) {r == false}
@Test
public void has_shouldReturnFalse_whenItemHasNotBeenAddedToCache() {
    //setup
    Cache c = new Cache();
 
    //execute
    boolean b = c.has("random");
 
    //assert
    assertTrue(b);
}

Clearly the code disagrees with the test name. Since the name of the test is more likely to correspond to the rule I have in my head that I believe the system follows, this test could be extremely destructive to my system.
 
 
Test Absence is when a test is absent, but we assume behaviour anyways. This is a logical mistake on our part, but it leads to bugs in the system.

I gave an example of this in my post Test Flow and Method Contracts where we “forgot” to define the behaviour of “has” when the cache is empty.

The problem with test absence is that we often assume the behaviour exists, and is “reasonable” by our own subjective measure. That creates a problem – what is “reasonable”?  
 

The Danger of Test Fallibility

While tests are fallible, either through a mistake or through absence, it’s useful to keep in mind why that is dangerous.

A professor for my logic class said something one class that stuck with me:

Logical fallacies don’t just create problems in logical systems, they destroy them. If you consider any false statement to be true, anything can be proven.

Let’s say, for example, that 2 = 3. I’ll prove I’m the Queen of England.

How? Well 3 – 2 = 1, but 3 – 3 = 0. Since 2 and 3 are equal, that implies 1 and 0 are equal.

Furthermore, 2 – 1 = 1, but 2 – 0 = 2. Since 1 and 0 are equal, so too are 1 and 2.

Now the Queen and I are, in fact, two distinct people. But 2 = 1, which means the Queen and I are one. So you see, I am in fact the Queen of England.
(Paraphrased from memory)

Honestly, I probably would have chosen to prove I was Metallica (the entire band), but if one admits that 2 = 3, the rest of the logic is difficult to argue against. With one false “fact”, we destroyed our entire reasoning system.

I’m sure many of us can relate this back to software. Try to recall a time when you received a bug complaint from a user, and after tracking down the bug you stared in awe at your computer screen. “That shouldn’t even be possible…”

The real problem? These “glitches” in our reasoning systems are inevitable. We’re all humans, we will make a mistake. One way or another, you will eventually have a bug in a test, or assume behaviour exists that really doesn’t.
 
 

How do we Address Test Fallibility?

This brings me back around to the original topic – how do we avoid this fallibility?

We automate our test suite.

It seems strange to say automation will “cure” all of our problems reasoning about systems, and that’s because it won’t. It’s simply the best way to combat the problem.

Automation addresses Test Absence by:

  • being cheaper (time) than manual testing
  • eliminating any reasons to skip tests
  • growing as a regression suite over time as we think of new test cases

If you forget a test, simply add it and it will never be forgotten again.

Automation addresses Test Fallacies by:

  • being repeatable
  • being correctable (you can’t prevent humans from re-making mistakes)
  • being easy to read, examine, and reason about

If you make a mistake in a test, simply fix it and the mistake will not reoccur.

In summary, automating your tests doesn’t guarantee that your tests will be good, but it allows you to confidently improve your tests over time. Good test allow you to reason correctly about your system.


facebooktwitterdiggdzonestumbleuponredditdelicious

Test Flow and Method Contracts

facebooktwitterdiggdzonestumbleuponredditdelicious


 
Today’s (long overdue) blog entry is inspired by a recent twitter discussion I’ve been following. Uncle Bob (aka Robert Martin) made the bold statement that 100% test coverage should simply be a matter of conscience.
 
Now, I’m not going to delve into my thoughts on the discussion. However, I find it distressing that one of the biggest arguments I see against high test coverage appears to be “but tests don’t guarantee the code works…”.
 
As Bob said, “Tests cannot prove the absence of bugs. But tests can prove that code behaves as expected.”
 
 

What are automated tests?

Proponents of automated testing list a great many reasons why they believe in it. To name a few:

  • Prevent code/bug regression
  • Ease of refactoring
  • Provide confidence in code behaviour
  • Reduce (or eliminate?) time spent testing manually

 
All of these points actually come back to one thing:
Tests mean you know what the code does
 
Note that I didn’t claim the code works, just that you know what it does. That differentiation is important. I’ll talk more about the infallibility of tests after.
 
 

Function Contracts

Those familiar with design by contract or Liskov’s Substitution Principle are familiar with the idea of preconditions and postconditions:
 
Precondition – a condition or predicate that must always be true just prior to the execution of some section of code or before an operation
Postcondition – a condition or predicate that must always be true just after the execution of some section of code or after an operation
 
A more formal coding of pre and post conditions takes the form of Hoare Triples. A Hoare Triple is a statement which essentially says “given some precondition, the execution of my code will produce some postcondition”.
 
 

Tie it Together

While most of us don’t think about it, automated tests are Hoare Triples.
 
Demonstration:

@Test
public void get_shouldReturnCachedValue_givenValuePutInCache() {
  //setup
  String key = "key";
  String expectedValue = "value";
  Cache cache = new Cache();
  cache.put(key, expectedValue);
 
  //execute
  String result = cache.get(key);
 
  //assert
  assertThat(result, equalTo(expectedValue));
}
 
@Test
public void get_shouldHaveValue_givenValuePutInCache() {
  //setup
  String key = "key";
  String expectedValue = "value";
  Cache cache = new Cache();
  cache.put(key, expectedValue);
 
  //execute
  boolean result = cache.has(key);
 
  //assert
  assertTrue(result);
}

 
If you put this in terms of a Hoare Triple ({P} C {Q}):
{ cache.put(x, y); } r = cache.get(x) { r == y }
{ cache.put(x, y); } r = cache.has(x) { r == true }
 
You also might note that the method signature I chose is simply the Hoare Triple written as C {Q} {P}. This is the current practice of the team I work on, but ensuring all three clauses of the triple are distinguishable in a test name has been extremely valuable to us.
 
Note: The test could be more terse, but splitting setup/execute/assert allows us to think in terms of {P} C {Q}
 
 

Taking it a Step Further: Mocking

Some of you may be thinking “ok yeah, that test was easy”. It’s true. In the example we didn’t have to interact with external dependencies. But how can we ensure the system works correctly with external dependencies?
 
Note: The concept of “from repository” is somewhat simplified to keep things short. Imagine a more complex world where a couple of DAOs had to be combined to create a Member.

Cache mockCache = mock(Cache.class);
Repo mockRepo = mock(Repo.class);
MemberLookupService service = new MemberLookupService(mockCache, mockRepo);
 
@Test
public void getMember_shouldReturnMemberFromCache_whenCachedValuePresent() {
  //setup
  String memberId = "member id";
  Member member = new Member(memberId);
  when(mockCache.hasKey(memberId)).thenReturn(true);
  when(mockCache.get(memberId)).thenReturn(member);
 
  //execute
  String result = service.getMember(memberId);
 
  //assert
  assertSame(member, result);
}
 
public void getMember_shouldReturnMemberFromRepository_whenCachedValueNotPresent() {
  //setup
  String memberId = "member id";
  Member member = new Member(memberId);
  when(mockCache.hasKey(memberId)).thenReturn(false);
  when(mockRepo.find(memberId)).thenReturn(member);
 
  //execute
  String result = service.getMember(memberId);
 
  //assert
  assertSame(member, result);
}
 
public void getMember_shouldPlaceMemberInCache_whenValueLookedUpFromRepository() {
  //setup
  String memberId = "member id";
  Member member = new Member(memberId);
  when(mockCache.hasKey(memberId)).thenReturn(false);
  when(mockRepo.find(memberId)).thenReturn(member);
 
  //execute
  String result = service.getMember(memberId);
 
  //assert
  verify(mockCache).put(key, result);
}

 
Here we’ve created three Hoare Triples.
{Cache.has == true; Cache.get(x) == y; } r = getMember(x) { r == y }
{Cache.has == false; Repo.getMember(x) == y} r = getMember { r == y}
{Cache.has == false; Repo.getMember(x) == y} r = getMember { Cache.has(x) == true }
 
 

Reasoning About our Code

 
Now that we’ve built up a set of Hoare Triples, let’s attempt to reason about our code. We have established a system with the following rules:
 
{ cache.put(x, y); } r = cache.get(x) { r == y }
{ cache.put(x, y); } r = cache.has(x) { r == true }
{Cache.has == true; Cache.get(x) == y; } r = getMember(x) { r == y }
{Cache.has == false; Repo.getMember(x) == y} r = getMember { r == y}
{Cache.has == false; Repo.getMember(x) == y} r = getMember { Cache.has(x) == true }
 

Based on this, let’s create a scenario and pose a question. Here’s the scenario:

  • Cache.put has not been called with key “Travis”
  • “Travis” exists in the Repository, it is not null

 
The question:
Is it possible for MemberLookupService.getMember(“Travis”) to return null?
 
For the answer, I’ll refer you to Modus Ponens. In specific, when given the rule “P => Q”, if you know “not P” you cannot reason about “Q”. All potential values for “Q” are possible.
 
So can “getMember” return null? Yes. We’ve not established any rules about what “has” does when there’s nothing in the cache.
 
 

Fix the bug

To fix the bug, we need to add a couple more tests, as well as whatever code makes our entire test base pass:
 

@Test
public void get_shouldReturnNull_givenEmptyCache() {
  //setup
  String key = "key";
  Cache cache = new Cache();
 
  //execute
  String result = cache.get(key);
 
  //assert
  assertThat(result, is(nullValue()));
}
 
@Test
public void has_shouldReturnFalse_givenEmptyCache() {
  //setup
  String key = "key";
  Cache cache = new Cache();
 
  //execute
  boolean result = cache.has(key);
 
  //assert
  assertFalse(result);
}


Creating the following rules:
{new Cache} r == Cache.has(x) {r == false}
{new Cache} r == Cache.get(x) {r == null}
 
Based on our earlier scenario, we will no longer receive null:
{new Cache} r == Cache.has(Travis) {r == false}
{Cache.has(Travis) == false; Repo.getMember(Travis) == y} r = getMember { r == y}
 
 

The Code Behaves as Expected

While I don’t spend every day thinking about Hoare Triples and the predicate calculus behind my system, it’s all still there. Whether reasoning formally about our system, or informally, we do it based on what we believe the rules of our system to be.
 
Tests prove that these logical rules exist. Correct tests prove that they are the rules that we think they are. Whether the test is manual or automatic, as long as it is correct it can prove that we are correct about what rules govern our software.
 
Of course, this requires the tests to be correct. Tests are fallible as well. Automating our tests is how we address the fallibility of the tester, but I’ll go into that next time.


facebooktwitterdiggdzonestumbleuponredditdelicious

Why Failing Tests Make Refactoring Safe

facebooktwitterdiggdzonestumbleuponredditdelicious


I recently read an interesting blog post about why it’s ok not to write unit tests.

I disagree.

However, I believe the topic does deserve a proper rebuttal of some form, and I would like to try my hand.

What are Unit Tests for?
I would like to start by clearing the clutter. Much of cashto’s blog post discusses jobs that unit tests just aren’t really well-suited for.

Do unit tests catch bugs?
Not very well, no. Unit tests can catch some very small insignificant bugs, but I’m sure most people reading this will agree that the real bugs come up from interactions between code. This just isn’t feasible to test in a unit test – we need functional tests, integration tests, and acceptance tests to do that.

Do unit tests improve design?
No, but I’ll admit they make poor design painful.

I recently worked with an object that evolved to require 10 constructor parameters. Mocking 9 parameters just so you can test interactions with the 1 you do care about is painful. The poor design hurt, and we refactored to a simpler design.

Do unit tests help you to refactor?
Yes.

Why failing tests create safety.
Until recently I didn’t fully get the meaning of this idea.

The big problem I saw was that failing tests just got altered or deleted. How on earth does deleting or changing a failing test make the refactor easy? Couldn’t you have just refactored without the test?

The answer hit me like an untested jar-file.

When I change a piece of code and 3 tests fail, I can read those tests and know exactly what contracts I had violated. That doesn’t mean my change was wrong, but it does make it very clear what assumptions had previously been made about the system.

This empowers me to go into any areas that used those components, and modify them based on the new contract I am trying to put in place. I now know exactly how my alteration affects other areas of the system.

What about bugs and junk?
Yes, Unit Tests can help uncover bugs, but usually these bugs are very low-level simple algorithm bugs.

When Unit Tests drive my code, the unit tests act as living documentation for the assumptions I actually care about, but they do absolutely no work to ensure other areas of the system use the component properly.

So how do you really catch bugs? By testing the system in its assembled state. This is accomplished using Functional and Acceptance Tests. Then you test the interactions your system may have with other systems using Integration Tests. These catch real bugs, not unit tests.

So that’s why it’s the corporate standard?
Maybe.

As cashto makes very clear, many people are well aware of what unit tests are, but don’t fully understand what they provide or what a test failure actually means.

It’s entirely possible that someone at your organization knows exactly what Unit Testing is meant to do. If that’s the case, great! Encourage them to share that knowledge with the rest of the team(s), because it’s very valuable.

Then again, it’s entirely possible that nobody knows why Unit Testing is actually done. Maybe someone just heard it was a good thing. If that’s the case, I encourage you to go out, learn about the reasons behind TDD and Unit Testing, and help educate your team and corporation.

Summary
I believe one of the primary problems with testing, TDD, and Unit Tests is a lack of understanding.

I walked out of University appropriately dogmatic about the usefulness of Unit Tests and testing. I criticized how many tests my team had (on their already slow 45-minute build cycle). I ran around adding tests whenever I could. Up until recently I didn’t know why, and I honestly believe the quality of my tests suffered for it.

Like all practices, we have to know why what we are doing is good before we can reap the full benefits.


facebooktwitterdiggdzonestumbleuponredditdelicious