Mutation testing is the evolution of TDD

Since test-driven development is modeled on how nature works, mutation testing is the natural next step in the evolution of DevOps.
129 readers like this.
Open ant trail

Opensource.com

In "Failure is a feature in blameless DevOps," I discussed the central role of failure in delivering quality by soliciting feedback. This is the failure agile DevOps teams rely on to guide them and drive development. Test-driven development (TDD) is the conditio sine qua non of any agile DevOps value stream delivery. Failure-centric TDD methodology only works if it is paired with measurable tests.

TDD methodology is modeled on how nature works and how nature produces winners and losers in the evolutionary game.

Natural selection

Charles Darwin

In 1859, Charles Darwin proposed the theory of evolution in his book On the Origin of Species. Darwin's thesis was that natural variability is caused by the combination of spontaneous mutations in individual organisms and environmental pressures. These pressures eliminate less-adapted organisms while favoring other, more fit organisms. Each and every living being mutates its chromosomes, and those spontaneous mutations are carried to the next generation (the offspring). The newly emerged variability is then tested under natural selection—the environmental pressures that exist due to the variability of environmental conditions.

This simplified diagram illustrates the process of adjusting to environmental conditions.

Environmental pressures on fish

Fig. 1. Different environmental pressures result in different outcomes governed by natural selection. Image screenshot from a video by Richard Dawkins.

This illustration shows a school of fish in their natural habitat. The habitat varies (darker or lighter gravel at the bottom of the sea or riverbed), as does each fish (darker or lighter body patterns and colors).

It also shows two situations (i.e., two variations of the environmental pressure):

  1. The predator is present
  2. The predator is absent

In the first situation, fish that are easier to spot against the gravel shade are at higher risk of being picked off by predators. When the gravel is darker, the lighter portion of the fish population is thinned out. And vice versa—when the gravel is a lighter shade, the darker portion of the fish population suffers the thinning out scenario.

In the second situation, fish are sufficiently relaxed to engage in mating. In the absence of predators and in the presence of the mating ritual, the opposite results can be expected: The fish that stand out against the background have a better chance of being picked for mating and transferring their characteristics to offspring.

Selection criteria

When selecting among variability, the process is never arbitrary, capricious, whimsical, nor random. The decisive factor is always measurable. That decisive factor is usually called a test or a goal.

A simple mathematical example can illustrate this process of decision making. (Only in this case it won't be governed by natural selection, but by artificial selection.) Suppose someone asks you to build a little function that will take a positive number and calculate that number's square root. How would you go about doing that?

The agile DevOps way is to fail fast. Start with humility, admitting upfront that you don't really know how to develop that function. All you know, at this point, is how to describe what you'd like to do. In technical parlance, you are ready to engage in crafting a unit test.

"Unit test" describes your specific expectation. It could simply be formulated as "given the number 16, I expect the square root function to return number 4." You probably know that the square root of 16 is 4. However, you don't know the square root for some larger numbers (such as 533).

At the very least, you have formulated your selection criteria, your test or goal.

Implement the failing test

The .NET Core platform can illustrate the implementation. .NET typically uses xUnit.net as a unit-testing framework. (To follow the coding examples, please install .NET Core and xUnit.net.)

Open the command line and create a folder where your square root solution will be implemented. For example, type:

mkdir square_root

Then type:

cd square_root

Create a separate folder for unit tests:

mkdir unit_tests

Move into the unit_tests folder (cd unit_tests) and initiate the xUnit framework:

dotnet new xunit

Now, move one folder up to the square_root folder, and create the app folder:

mkdir app
cd app

Create the scaffold necessary for the C# code:

dotnet new classlib

Now open your favorite editor and start cracking!

In your code editor, navigate to the unit_tests folder and open UnitTest1.cs.

Replace auto-generated code in UnitTest1.cs with:

using System;
using Xunit;
using app;

namespace unit_tests{

   public class UnitTest1{
       Calculator calculator = new Calculator();

       [Fact]
       public void GivenPositiveNumberCalculateSquareRoot(){
           var expected = 4;
           var actual = calculator.CalculateSquareRoot(16);
           Assert.Equal(expected, actual);
       }
   }
}

This unit test describes the expectation that the variable expected should be 4. The next line describes the actual value. It proposes to calculate the actual value by sending a message to the component called calculator. This component is described as capable of handling the CalculateSquareRoot message by accepting a numeric value. That component hasn't been developed yet. But it doesn't really matter, because this merely describes the expectations.

Finally, it describes what happens when the message is triggered to be sent. At that point, it asserts whether the expected value is equal to the actual value. If it is, the test passed and the goal is reached. If the expected value isn't equal to the actual value, the test fails.

Next, to implement the component called calculator, create a new file in the app folder and call it Calculator.cs. To implement a function that calculates the square root of a number, add the following code to this new file:

namespace app {
   public class Calculator {
       public double CalculateSquareRoot(double number) {
           double bestGuess = number;
           return bestGuess;
       }
   }
}

Before you can test this implementation, you need to instruct the unit test how to find this new component (Calculator). Navigate to the unit_tests folder and open the unit_tests.csproj file. Add the following line in the <ItemGroup> code block:

<ProjectReference Include="../app/app.csproj" />

Save the unit_test.csproj file. Now you are ready for your first test run.

Go to the command line and cd into the unit_tests folder. Run the following command:

dotnet test

Running the unit test will produce the following output:

xUnit output after the unit test run fails

Fig. 2. xUnit output after the unit test run fails.

As you can see, the unit test failed. It expected that sending number 16 to the calculator component would result in the number 4 as the output, but the output (the actual value) was the number 16.

Congratulations! You have created your first failure. Your unit test provided strong, immediate feedback urging you to fix the failure.

Fix the failure

To fix the failure, you must improve bestGuess. Right now, bestGuess merely takes the number the function receives and returns it. Not good enough.

But how do you figure out a way to calculate the square root value? I have an idea—how about looking at how Mother Nature solves problems.

Emulate Mother Nature by iterating

It is extremely hard (pretty much impossible) to guess the correct value from the first (and only) attempt. You must allow for several attempts at guessing to increase your chances of solving the problem. And one way to allow for multiple attempts is to iterate.

To iterate, store the bestGuess value in the previousGuess variable, transform the bestGuess value, and compare the difference between the two values. If the difference is 0, you solved the problem. Otherwise, keep iterating.

Here is the body of the function that produces the correct value for the square root of any positive number:

double bestGuess = number;
double previousGuess;

do {
   previousGuess = bestGuess;
   bestGuess = (previousGuess + (number/previousGuess))/2;
} while((bestGuess - previousGuess) != 0);

return bestGuess;

This loop (iteration) converges bestGuess values to the desired solution. Now your carefully crafted unit test passes!

Unit test successful

Fig. 3. Unit test successful, 0 tests failed.

The iteration solves the problem

Just like Mother Nature's approach, in this exercise, iteration solves the problem. An incremental approach combined with stepwise refinement is the guaranteed way to arrive at a satisfactory solution. The decisive factor in this game is having a measurable goal and test. Once you have that, you can keep iterating until you hit the mark.

Now the punchline!

OK, this was an amusing experiment, but the more interesting discovery comes from playing with this newly minted solution. Until now, your starting bestGuess was always equal to the number the function receives as the input parameter. What happens if you change the initial bestGuess?

To test that, you can run a few scenarios. First, observe the stepwise refinement as the iteration loops through a series of guesses as it tries to calculate the square root of 25:

Code iterating for the square root of 25

Fig. 4. Iterating to calculate the square root of 25.

Starting with 25 as the bestGuess, it takes eight iterations for the function to calculate the square root of 25. But what would happen if you made a comical, ridiculously wrong stab at the bestGuess? What if you started with a clueless second guess, that 1 million might be the square root of 25? What would happen in such an obviously erroneous situation? Would your function be able to deal with such idiocy?

Take a look at the horse's mouth. Rerun the scenario, this time starting from 1 million as the bestGuess:

Stepwise refinement

Fig. 5. Stepwise refinement when calculating the square root of 25 by starting with 1,000,000 as the initial bestGuess.

Oh wow! Starting with a ludicrously large number, the number of iterations only tripled (from eight iterations to 23). Not nearly as dramatic an increase as you might intuitively expect.

The moral of the story

The Aha! moment arrives when you realize that, not only is iteration guaranteed to solve the problem, but it doesn't matter whether your search for the solution begins with a good or a terribly botched initial guess. However erroneous your initial understanding, the process of iteration, coupled with a measurable test/goal, puts you on the right track and delivers the solution. Guaranteed.

Figures 4 and 5 show a steep and dramatic burndown. From a wildly incorrect starting point, the iteration quickly burns down to an absolutely correct solution.

This amazing methodology, in a nutshell, is the essence of agile DevOps.

Back to some high-level observations

Agile DevOps practice stems from the recognition that we live in a world that is fundamentally based on uncertainty, ambiguity, incompleteness, and a healthy dose of confusion. From the scientific/philosophical point of view, these traits are well documented and supported by Heisenberg's Uncertainty Principle (covering the uncertainty part), Wittgenstein's Tractatus Logico-Philosophicus (the ambiguity part), Gödel's incompleteness theorems (the incompleteness aspect), and the Second Law of Thermodynamics (the confusion caused by relentless entropy).

In a nutshell, no matter how hard you try, you can never get complete information when trying to solve any problem. It is, therefore, more profitable to abandon an arrogant stance and adopt a more humble approach to solving problems. Humility pays big dividends in rewarding you—not only with the hoped-for solution but also with the byproduct of a well-structured solution.

Conclusion

Nature works incessantly—it's a continuous flow. Nature has no master plan; everything happens as a response to what happened earlier. The feedback loops are very tight, and apparent progress/regress is piecemeal. Everywhere you look in nature, you see stepwise refinement, in one shape or form or another.

Agile DevOps is a very interesting outcome of the engineering model's gradual maturation. DevOps is based on the recognition that the information you have available is always incomplete, so you'd better proceed cautiously. Obtain a measurable test (e.g., a hypothesis, a measurable expectation), make a humble attempt at satisfying it, most likely fail, then collect the feedback, fix the failure, and continue. There is no plan other than agreeing that, with each step of the way, there must be a measurable hypothesis/test.

In the next article in this series, I'll take a closer look at how mutation testing provides much-needed feedback that drives value.

Tags
User profile image.
Alex has been doing software development since 1990. His current passion is how to bring soft back into software. He firmly believes that our industry has reached the level of sophistication where this lofty goal (i.e. bringing soft back into software) is fully achievable.

9 Comments

Interesting read Alex! The concept of mutation is well explained.

Thanks for sharing

Thanks for your comment. If you stay tuned, in the next article I'll delve deeper into the mutation testing, which is the final frontier when it comes to ensuring the highest possible quality of precision engineering in software.

In reply to by Armstrong Foundjem

Very interesting comparison.

"When selecting among variability, the process is never arbitrary, capricious, whimsical, nor random. The decisive factor is always measurable. That decisive factor is usually called a test or a goal."

That is of course not entirely accurate. The mutation process, which in itself is rarely giving a viable and reproducible alternative, is entirely random by definition, and thus arbitrary and very capricious. But there is a 2nd pass that, with a lot of time and provided there is an appropriate threat level, and progressively gives an advantage to any form that is more appropriate. Is that sieve so tight?

Since there are usually several different threats, and not necessarily at the same time, the 2nd pass is also quite arbitrary, and may give different outcome depending on the timing alone, or on the presence of other species, environmental changes, and so on. The "selected solutions" we can see are a multitude, in order to escape a predator, some animals run faster, other climb higher, and yet others are good at hiding. If predator2 chases away predator1 just because it's heavier and more powerful, the "solutions" may still save those mutations, but if predator2 has a better sight and outdated predator1 thanks to his ability at feeding (again due to a random mutation), this will invalidate the hiding ability of some preys.

In development projets, clearly the test selects viable solutions, which themselves can vary depending on whoever wrote the code. The order of the tests, their depth and their coverage also plays a role in shaping the outcome, then it also depends on who is writing them.

All that is also deeply impacted by the environment, the culture, the available tools and libraries, the available time and resources.

In conclusion, the predictability and mathematical nature of both processes - natural selection and development, are not that obvious. They're evolutionary and by that, opportunistic, a little difference in the timing or environment may give entirely different outcomes. That's why, facing the same problem, two companies will come up with different solutions and compete.

TDD or "traditional" testing will both make sure the solution behaves as expected. In either case tests can be forgotten, or rushed. The solution will perhaps be driven differently, but I doubt the choice would have any significant impact. If that were the case, methodology selection would have eradicated "traditional" validation approaches in favour of TDD long ago, since it's not a new concept after all.

Interesting take on those challenges. You may have misunderstood the intent of the article. Unlike in the nature, iterations in software engineering are not random. They obey certain algorithm. The only thing that is random is the initial Best Guess; everything else following that initial guess is deterministic.

In reply to by NoahFebak (not verified)

Good read Alex. One question: How did you derived the "guess" bestGuess = (previousGuess + (number/previousGuess))/2; ?
For me it look like you still need a pretty clear idea how to square root is calculated?

If you want to play the engineering game, you have some qualifications. In the case of software engineering, you must join the game by having an arsenal of algorithms in your back pocket.

In this case, I reached out for the oldest algorithm in recorded history -- the Babylonian algorithm.

But hoping to be able to blindly tackle engineering problems without any prior education is the fool's errand.

In reply to by MartinNotRegistered (not verified)

Correction -- in the above reply, the first sentence should read:

If you want to play the engineering game, you have to have some qualifications.

In reply to by Alex Bunardzic

Then I'm not getting it. Of course easy examples make sense to explain complex topics. But how does a proven-in-use, iterating algorithm compare to nature and evolution? In my understanding evolution is not an iterating algorithm but instead iterating THE algorithm (e.g. mutation). Changing over time the answer to a problem.

In reply to by Alex Bunardzic

There is a huge difference between nature and human invented engineering. Nature has all the time in the world, while engineering is severely constrained by the budget (time and money concerns).

We could theoretically devise a system where the Best Guess for the square root of the supplied positive number is just some random number, then check to see if it passes the test, and if not iterate but taking another wild, random guess. Eventually, given unlimited time in the Universe, the random guess will be correct.

You know the saying "trillion monkeys, each sitting at a typewriter, will eventually produce complete works of Shakespeare".

Given enough time, anything is possible. But in the world of engineering, we win by winning the race with time. The first mover advantage and all that jazz.

In reply to by MartinNotRegistered (not verified)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.