Fast tests for slow services: why you should use verified fakes

by Itamar Turner-Trauring
Last updated 01 Oct 2021, originally created 11 Jan 2019

Let’s say your code talks to some slow or expensive external service—the Twitter API, say. When it comes time to write tests you face a dilemma:

One the one hand, talking to the real API would make your tests slow, hard to run, and flaky.
On the other hand, if you use a fake or mock test double, how do you know that your code will actually work in the real world? After all, the fake is—fake. Your code isn’t talking to the real thing.

You want correctness and speed: the confidence that if your tests pass, your code will run in production, as well as the ability to have a fast and robust test suite.

The solution is a special kind of test double: a verified fake. Unlike regular test doubles, where you swap out the real thing for some random object, with verified fakes you actually prove the test double has the same behavior as the real thing.

In this article I’ll cover:

A quick intro to test doubles in general.
How to write a verified fake, and why it’s different.
The limitations of verified fakes, and when you should use them.

Testing with test doubles

Before looking at Verified Fakes, let’s take a quick look at why test doubles are useful. Let’s say you have a MessageService class you want to test, and it uses TwitterClient.

MessageService → TwitterClient

You could write tests like this:

Tests → MessageService → TwitterClient

But then your tests will end up talking to the real Twitter API. So instead, you create a FakeTwitterClient and use it in your tests:

Tests → MessageService → FakeTwitterClient

Now your tests can prove MessageService works without have to talk to the Twitter API.

There’s only one problem: you’re assuming that TwitterClient and FakeTwitterClient behave the same, without any evidence to suggest that. Yes, there are tools like the mock library in Python that make this a little easier, but at best those make sure you’re matching the function signature.

What they don’t do is validate anything about behavior.

From fakes to verified fakes

In order to make FakeTwitterClient into a verified fake, a fake you can trust, you need to write an additional set of tests that run against both TwitterClient and FakeTwitterClient. These tests validate some sort of contract or interface that you expect both implementations to adhere to.

Running the same tests against both implementations ensures both versions behave the same way:

TwitterContractTests → TwitterClient

TwitterContractTests → FakeTwitterClient

A worked out example

Let’s say TwitterClient looks like this:

class TwitterClient(object):
    """A client for the Twitter API."""
    def tweet(self, message):
        """Tweet a message for the user."""
        # ... implementation ...
        
    def list_tweets(self):
        """Return a list of the user's tweets."""
        # ... implementation ...

This client provides a behavioral guarantees, a contract of sorts: if a message is tweeted it will show up in the list of tweets. You can encode this contract into a test:

def test_tweet_listed(client):
    "A tweeted messages shows up in the list of messages."
    message = generate_random_message()
    client.tweet(message)
    assert message in client.list_tweets()

You want FakeTwitterClient to provide the same guarantee, so that when you can confidently use it as a drop-in replacement for TwitterClient. So you implement a FakeTwitterClient that implements this contract:

class FakeTwitterClient(object):
    """A fake client."""
    def __init__(self):
        self.messages = []

    def tweet(self, message):
        """Tweet a message for the user."""
        self.messages.append(message)

    def list_tweets(self):
        """Return a list of the user's tweets."""
        return self.messages

And here’s the important bit: you want to run test_tweet_listed twice. Once against the real client and once against the fake client, to ensure they both provide the same behavior.

The version of test_tweet_listed that runs against FakeTwitterClient is just another fast in-memory test, so it can be run anywhere by anyone.

The version that will run against the real client will need to use a real Twitter login (presumably a test account of some sort). This means it will be slow and not something you want developers running regularly. The contract verification test for TwitterClient could therefore be configured to only run on the CI server once a night, or when the relevant code changes.

Once you have test_tweet_listed running against both classes you have some guarantee that TwitterClient and FakeTwitterClient behave the same way. And that means you can trust that the tests for MessageService are valid tests even though they rely on FakeTwitterClient and not the real thing.

The limits of verified fakes

TwitterClient may produce errors in some cases, e.g. if a message is too long it might throw a InvalidMessageException. This sort of error can easily be implemented in FakeTwitterClient and verified by the contract tests.

Some errors cannot be verified by a contract test, however. For example, if the Twitter API server has a bug it might return an error that results in an exception being raised by TwitterClient. Or, a network error may result in a socket.error exception.

The problem with these errors is that they are difficult or impossible to trigger reliably in your contract verification tests. If you can’t trigger an edge case in your contract verification tests, then you don’t have a verified fake for that edge case.

The best you can do, if you want to trigger these errors in MessageService tests, is to just go the regular test double route and have an unverified fake or mock.

When should you use a verified fake?

A verified fake gives you more assurance that your tests are testing what you think you’re testing, since the test double has been verified to act like the real thing. On the other hand, this requires more work, since you have to create an additional set of contract verification tests.

That means verified fakes make sense when the following conditions apply:

The API you want to fake is slow and/or expensive to setup. Otherwise, why not use the real thing?
The API you want to fake is frequently used by test code. If it’s only used once, maybe it’s better to just use the real thing.
The cost of uncaught bugs is high. If bugs aren’t expensive, it may not be worth the extra effort to write the extra contract tests.

Next time you’re about to write a fake, consider whether this is a good place for verified fakes. With a little bit of work you’ll end up with tests that are fast and correct.

The concise and action-oriented guide to Docker packaging for production

Docker packaging for production is complicated, with as many as 70+ best practices to get right. And you want small images, fast builds, and your Python application running securely.

Take the fast path to learning best practices, by using the Python on Docker Production Handbook.

Speed up your Python code and learn skills you can use at your job

Join over 8000 Python developers and data scientists learning practical tools and techniques every week, from Python performance to Docker packaging, by signing up for my newsletter.