Using mocks in Python

Test your code safely with mocks.

Image by:

kris krüg

April 1st is all about fake stories and pretending. This makes it the perfect day to talk about mocking.

Sometimes, using real objects is hard, ill-advised, or complicated. For example, a requests.Session connects to real websites. Using it in your unittests invites a…lot…of problems.

Basic mocking in Python

"Mocks" are a unittest concept. They produce objects that are substitutes for the real ones.

from unittest import mock

There's a whole cottage industry that will explain that "mock", "fake", and "stub" are all subtly different. In this article, I use the terms interchangeably.

regular = mock.MagicMock()

def do_something(o):
    return o.something(5)

do_something(regular)

This code produces:

<MagicMock name='mock.something()' id='140228824621520'>

Mocks have all the methods. The methods usually return another Mock. This can be changed by assigning it to return_value.

For example, suppose you want to call the following function:

def do_something(o):
    return o.something() + 1

It requires something which has the .something() method. Luckily, mock objects have it:

obj = mock.MagicMock(name="an object")
obj.something.return_value = 2
print(do_something(obj))

The answer:

It is also possible to override the "magic methods":

a = mock.MagicMock()
a.__str__.return_value = "an a"
print(str(a))

The answer:

an a

The spec

Make sure that a mock does not have "extra" methods or attributes by using a spec. For example, here's some code that should fail:

import pathlib

def bad_pathlib_usage(path):
    ## TYPO: missing underscore
    path.writetext("hello")

dummy_path = mock.MagicMock(spec=pathlib.Path)

try:
    bad_pathlib_usage(dummy_path)
except Exception as exc:
    print("Failed!", repr(exc))

The result:

Failed! AttributeError("Mock object has no attribute 'writetext'")

Mock side effect

Sometimes, having a MagicMock that returns the same thing every time isn't quite everything you need it to be. For example, sys.stdin.readline() usually returns different values, not the same value throughout the test.

The property side_effect allows controlling what a magic mock returns on a more detailed level than using return_value.

Iterable

One of the things that can be assigned to side_effect is an iterable, such as a sequence or a generator.

This is a powerful feature. It allows controlling each call's return value, with little code.

different_things = mock.MagicMock()
different_things.side_effect = [1, 2, 3]
print(different_things())
print(different_things())
print(different_things())

The output:

1
2
3

A more realistic example is when simulating file input. In this case, I want to be able to control what readline returns each time to pretend it's file input:

def parse_three_lines(fpin):
    line = fpin.readline()
    name, value = line.split()
    modifier = fpin.readline().strip()
    extra = fpin.readline().strip()
    return {name: f"{value}/{modifier}+{extra}"}

from io import TextIOBase
    
filelike = mock.MagicMock(spec=TextIOBase)
filelike.readline.side_effect = [
    "thing important\n",
    "a-little\n",
    "to-some-people\n"
]
value = parse_three_lines(filelike)
print(value)

The result:

{'thing': 'important/a-little+to-some-people'}

Exception

Another thing that's possible is assigning an exception to the side_effect attribute. This causes the call to raise the exception you assigned. Using this feature allows simulating edge conditions in the environment, usually precisely the ones that:

You care about
Are hard to simulate realistically

One popular case is network issues. As per Murphy's law, they always happen at 4 AM, causing a pager to go off, and never at 10 AM when you're sitting at your desk. The following is based on real code I wrote to test a network service.

In this simplified example, the code returns the length of the response line, or a negative number if a timeout has been reached. The number is different based on when in the protocol negotiation this has been reached. This allows the code to distinguish "connection timeout" from "response timeout", for example.

Testing this code against a real server is hard. Servers try hard to avoid outages! You could fork the server's C code and add some chaos or you can just use side_effect and mock:

import socket

def careful_reader(sock):
    sock.settimeout(5)
    try:
        sock.connect(("some.host", 8451))
    except socket.timeout:
        return -1
    try:
        sock.sendall(b"DO THING\n")
    except socket.timeout:
        return -2
    fpin = sock.makefile()
    try:
        line = fpin.readline()
    except socket.timeout:
        return -3
    return len(line.strip())

from io import TextIOBase
from unittest import mock

sock = mock.MagicMock(spec=socket.socket)
sock.connect.side_effect = socket.timeout("too long")
print(careful_reader(sock))

The result is a failure, which in this case means a successful test:

-1

With careful side effects, you can get to each of the return values. For example:

sock = mock.MagicMock(spec=socket.socket)
sock.sendall.side_effect = socket.timeout("too long")
print(careful_reader(sock))

The result:

-2

Callable

The previous example is simplified. Real network service test code must verify that the results it got were correct to validate that the server works correctly. This means doing a synthetic request and looking for a correct result. The mock object has to emulate that. It has to perform some computation on the inputs.

Trying to test such code without performing any computation is difficult. The tests tend to be too insensitive or too "flakey".

An insensitive test is one that does not fail in the presence of bugs.
A flakey test is one that sometimes fails, even when the code is correct.

Here, my code is incorrect. The insensitive test does not catch it, while the flakey test would fail even if it was fixed!

import socket
import random

def yolo_reader(sock):
    sock.settimeout(5)
    sock.connect(("some.host", 8451))
    fpin = sock.makefile()
    order = [0, 1]
    random.shuffle(order)
    while order:
        if order.pop() == 0:
            sock.sendall(b"GET KEY\n")
            key = fpin.readline().strip()
        else:
            sock.sendall(b"GET VALUE\n")
            value = fpin.readline().strip()
    return {value: key} ## Woops bug, should be {key: value}

The following would be too "insensitive", not detecting the bug:

sock = mock.MagicMock(spec=socket.socket)
sock.makefile.return_value.readline.return_value = "interesting\n"
assert yolo_reader(sock) == {"interesting": "interesting"}

The following would be too "flakey," detecting the bug even if it's not there, sometimes:

for i in range(10):
    sock = mock.MagicMock(spec=socket.socket)
    sock.makefile.return_value.readline.side_effect = ["key\n", "value\n"]
    if yolo_reader(sock) != {"key": "value"}:
        print(i, end=" ")

For example:

3 6 7 9

The final option of getting results from a mock object is to assign a callable object to side_effect. This calls side_effect to simply call it. Why not just assign a callable object directly to the attribute? Have patience, I'll get to that in the next part!

In this example, my callable object (just a function) assigns a return_value to the attribute of another object. This isn't that uncommon. I'm simulating the environment, and in a real environment, poking one thing often affects other things.

sock = mock.MagicMock(spec=socket.socket)
def sendall(data):
    cmd, name = data.decode("ascii").split()
    if name == "KEY":
        sock.makefile.return_value.readline.return_value = "key\n"
    elif name == "VALUE":
        sock.makefile.return_value.readline.return_value = "value\n"
    else:
        raise ValueError("got bad command", name)
sock.sendall.side_effect = sendall
print(yolo_reader(sock), dict(key="value"))

The result:

{'value': 'key'} {'key': 'value'}

Mock call arguments: x-ray for code

When writing a unit test, you are "away" from the code but trying to peer into its guts to see how it behaves. The Mock object is your sneaky spy. After it gets into the production code, it records everything faithfully. This is how you can find what your code does and whether it's the right thing.

Call counts

The simplest thing is to just make sure that the code is called the expected number of times. The .call_count attribute is exactly what counts that.

def get_values(names, client):
    ret_value = []
    cache = {}
    for name in names:
        # name = name.lower()
        if name not in cache:
            value = client.get(f"https://httpbin.org/anything/grab?name={name}").json()['args']['name']
            cache[name] = value
        ret_value.append(cache[name])
    return ret_value

client = mock.MagicMock()
client.get.return_value.json.return_value = dict(args=dict(name="something"))
result = get_values(['one', 'One'], client)
print(result)
print("call count", client.get.call_count)

The results:

['something', 'something']
call count 2

One benefit of checking .call_count >= 1 as opposed to checking .called is that it is more resistant to silly typos.

def call_function(func):
    print("I'm going to call the function, really!")
    if False:
        func()
    print("I just called the function")

func = mock.MagicMock()
call_function(func)
print(func.callled) # TYPO -- Extra "l"

I'm going to call the function, really!
I just called the function
<MagicMock name='mock.callled' id='140228249343504'>

Using spec diligently can prevent that. However, spec is not recursive. Even if the original mock object has a spec, rare is the test that makes sure that every single attribute it has also has a spec. However, using .call_count instead of .called is a simple hack that completely eliminates the chance to make this error.

Call arguments

In the next example, I ensure the code calls the method with the correct arguments. When automating data center manipulations, it's important to get things right. As they say, "To err is human, but to destroy an entire data center requires a robot with a bug."

We want to make sure our Paramiko-based automation correctly gets the sizes of files, even when the file names have spaces in them.

def get_remote_file_size(client, fname):
    client.connect('ssh.example.com')
    stdin, stdout, stderr = client.exec_command(f"ls -l {fname}")
    stdin.close()
    results = stdout.read()
    errors = stderr.read()
    stdout.close()
    stderr.close()
    if errors != '':
        raise ValueError("problem with command", errors)
    return int(results.split()[4])

fname = "a file"
client = mock.MagicMock()
client.exec_command.return_value = [mock.MagicMock(name=str(i)) for i in range(3)]
client.exec_command.return_value[1].read.return_value = f"""\
-rw-rw-r--  1 user user    123 Jul 18 20:25 {fname}
"""
client.exec_command.return_value[2].read.return_value = ""
result = get_remote_file_size(client, fname)
assert result == 123
[args], kwargs = client.exec_command.call_args
import shlex
print(shlex.split(args))

The results:

['ls', '-l', 'a', 'file']

Woops! That's not the right command. Good thing you checked the arguments.

More Python resources

What is an IDE?

Cheat sheet: Python 3.7 for beginners

Top Python GUI frameworks

Download: 7 essential PyPI libraries

Red Hat Developers

Latest Python articles

Deep dive into mocks

Mocks have a lot of power. Like any powerful tool, using it improperly is a fast way to get into a big mess. But properly using the .return_value, .side_effect, and the various .call* properties, it's possible to write the best sort of unit tests.

A good unit test is one that:

Fails in the presence of incorrect code
Passes in the presence of correct code

"Quality" is not binary. It exists on a spectrum. The badness of a unit test is determined by:

How many errors it lets pass. That's a "missing alarm" or "false negative". If you're a statistician, it's a "type 2 error".
How many correct code changes it fails. That's a "false alarm" or "false positive". If you're a statistician, it's a "type 1 errors".

When using a mock, take the time and think about both metrics to evaluate whether this mock and this unit test, will help or hinder you.