An introduction to mutation testing in Python |

An introduction to mutation testing in Python

Turn invisible bugs into visible fixes with mutation testing.

Searching for code
Image by :


Subscribe now

Get the highlights in your inbox every week.

You have tests for everything; maybe you even have a badge in your project repository stating 100% test coverage. But what are these tests helping you do? How do you know?

The costs of unit tests are clear to developers. Tests have to be written. Occasionally they don't work as intended: there are false alarms or flapping tests that alternate success and failure without any code changes. The small bugs you can find through unit tests are valuable, but often they happen quietly on a developer machine and are fixed before a commit ever goes into version control. But the truly alarming bugs are mostly invisible. And the worst of all, the missing alarms are completely invisible: you don't see the bugs you failed to catch until the code gets into a user's handsโ€”and sometimes not even then.

There is one type of test that makes it possible to make the invisible visible: mutation testing.

Mutation testing algorithmically modifies source code and checks if any "mutants" survived each test. Any mutant that survives the unit test is a problem: it means that a modification to the code, likely introducing a bug, was not caught by the standard test suite.

One framework for mutation testing in Python is mutmut.

Imagine you need to write code that calculates the angle between the hour hand and the minute hand in an analog clock to the nearest degree. The code might look something like:

def hours_hand(hour, minutes):
    base = (hour % 12 ) * (360 // 12)
    correction = int((minutes / 60) * (360 // 12))
    return base + correction

def minutes_hand(hour, minutes):
    return minutes * (360 // 60)

def between(hour, minutes):
    return abs(hours_hand(hour, minutes) - minutes_hand(hour, minutes))

To begin, write a simple unit test:

import angle

def test_twelve():
    assert angle.between(12, 00) == 0

Is this enough? The code has no if statement, so if you check coverage:

$ coverage run `which pytest`
============================= test session starts ==============================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
rootdir: /home/moshez/src/mut-mut-test
collected 1 item                                                              

tests/ .                                                    [100%]

============================== 1 passed in 0.01s ===============================

Perfect! The tests pass, and coverage is at 100%. Testing expert, you are. But what does that coverage look like when you use mutation testing?

$ mutmut run --paths-to-mutate
Legend for output:
๐ŸŽ‰ Killed mutants.   The goal is for everything to end up in this bucket.
โฐ Timeout.          Test suite took 10 times as long as the baseline so were killed.
๐Ÿค” Suspicious.       Tests took a long time, but not long enough to be fatal.
๐Ÿ™ Survived.         This means your tests needs to be expanded.
๐Ÿ”‡ Skipped.          Skipped.
โ ‹ 21/21  ๐ŸŽ‰ 5  โฐ 0  ๐Ÿค” 0  ๐Ÿ™ 16  ๐Ÿ”‡ 0

Oh no. Out of 21 mutants, 16 survived. Only five cases passed the mutation testing. But what does that mean?

For each mutation test, mutmut modified portions of your source code that simulates potential bugs. An example of a modification is changing a > comparison to >= to see what happens. If there is no unit test for this boundary condition, this mutation will "survive": this is a potential bug that none of the tests will detect.

It's time to write better unit tests. It is easy to check which changes were made with results:

$ mutmut results
Survived ๐Ÿ™ (16)

---- (16) ----

4-7, 9-14, 16-21
$ mutmut apply 4
$ git diff
diff --git a/ b/
index b5dca41..3939353 100644
--- a/
+++ b/
@@ -1,6 +1,6 @@
 def hours_hand(hour, minutes):
     hour = hour % 12
-    base = hour * (360 // 12)
+    base = hour / (360 // 12)
     correction = int((minutes / 60) * (360 // 12))
     return base + correction

This is a typical example of a mutation mutmut performs; it analyzes the source code and changes operators to different operators: addition to subtraction or, in this case, multiplication to division. Unit tests, generally speaking, should catch bugs when an operator changes; otherwise, they are not testing the behavior effectively. Following this logic, mutmut twiddles through source code to double-check your tests.

You can use mutmut apply to apply a failed mutant. Wow, it turns out you barely checked whether the hour parameter was used correctly. Fix that:

$ git diff
diff --git a/tests/ b/tests/
index f51d43a..1a2e4df 100644
--- a/tests/
+++ b/tests/
@@ -2,3 +2,6 @@ import angle
 def test_twelve():
     assert angle.between(12, 00) == 0
+def test_three():
+    assert angle.between(3, 00) == 90

Previously, you tested only for 12. Will adding a test for three be enough to improve?

$ mutmut run --paths-to-mutate
โ ‹ 21/21  ๐ŸŽ‰ 7  โฐ 0  ๐Ÿค” 0  ๐Ÿ™ 14  ๐Ÿ”‡ 0

This new test managed to kill two of the mutantsโ€”better than before, but still a long way to go. I won't work through each of the 14 cases left to fix, because I think the pattern is clear. (Can you get them down to zero?)

Mutation testing is another tool, along with coverage measurement, that allows you to see how comprehensive your test suite is. Using it makes the case that tests need to be improved: any one of those surviving mutants is a mistake a human being can make while fat-fingering code, as well as a potential bug creeping into your program. Keep testing and happy hunting.

failure sign at a party, celebrating failure

Use planned failure to ensure your code meets expected outcomes and follow along with the .NET testing framework.
Ants and a leaf making the word "open"

Since test-driven development is modeled on how nature works, mutation testing is the natural next step in the evolution of DevOps.
Snake charmer cartoon with a yellow snake and a blue snake

These are a few of my favorite things about Python.


About the author

Moshe sitting down, head slightly to the side. His t-shirt has Guardians of the Galaxy silhoutes against a background of sound visualization bars.
Moshe Zadka - Moshe has been involved in the Linux community since 1998, helping in Linux "installation parties". He has been programming Python since 1999, and has contributed to the core Python interpreter. Moshe has been a DevOps/SRE since before those terms existed, caring deeply about software reliability, build reproducibility and other such things. He has worked in companies as small as three people and as big as tens of thousands -- usually some place around where software meets system administration...