Visualize multi-threaded Python programs with an open source tool

Image by:

Pixabay. Modified by Opensource.com. CC BY-SA 4.0.

Concurrency is an essential part of modern programming, as we have multiple cores and many tasks that need to cooperate. However, it's harder to understand concurrent programs when they are not running sequentially. It's not as easy for engineers to identify bugs and performance issues in these programs as it is in a single-thread, single-task program.

With Python, you have multiple options for concurrency. The most common ones are probably multi-threaded with the threading module, multiprocess with the subprocess and multiprocessing modules, and the more recent async syntax with the asyncio module. Before VizTracer, there was a lack of tools to analyze programs using these techniques.

VizTracer is a tool for tracing and visualizing Python programs, which is helpful for logging, debugging, and profiling. Even though it works well for single-thread, single-task programs, its utility in concurrent programs is what makes it unique.

Try a simple task

Start with a simple practice task: Figure out whether the integers in an array are prime numbers and return a Boolean array. Here is a simple solution:

def is_prime(n):
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

def get_prime_arr(arr):
    return [is_prime(elem) for elem in arr]

Try to run it normally, in a single thread, with VizTracer:

if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for _ in range(6000)]
    get_prime_arr(num_arr)

viztracer my_program.py

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

The call-stack report indicates it took about 140ms, with most of the time spent in get_prime_arr.

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

It's just doing the is_prime function over and over again on the elements in the array.

This is what you would expect, and it's not that interesting (if you know VizTracer).

Try a multi-thread program

Try doing it with a multi-thread program:

if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for i in range(2000)]
    thread1 = Thread(target=get_prime_arr, args=(num_arr,))
    thread2 = Thread(target=get_prime_arr, args=(num_arr,))
    thread3 = Thread(target=get_prime_arr, args=(num_arr,))

    thread1.start()
    thread2.start()
    thread3.start()

    thread1.join()
    thread2.join()
    thread3.join()

To match the single-thread program's workload, this uses a 2,000-element array for three threads, simulating a situation where three threads are sharing the task.

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

As you would expect if you are familiar with Python's Global Interpreter Lock (GIL), it won't get any faster. It took a little bit more than 140ms due to the overhead. However, you can observe the concurrency of multiple threads:

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

When one thread was working (executing multiple is_prime functions), the other one was frozen (one is_prime function); later, they switched. This is due to GIL, and it is the reason Python does not have true multi-threading. It can achieve concurrency but not parallelism.

Try it with multiprocessing

To achieve parallelism, the way to go is the multiprocessing library. Here is another version with multiprocessing:

if __name__ == "__main__":
    num_arr = [random.randint(100, 10000) for _ in range(2000)]
    
    p1 = Process(target=get_prime_arr, args=(num_arr,))
    p2 = Process(target=get_prime_arr, args=(num_arr,))
    p3 = Process(target=get_prime_arr, args=(num_arr,))

    p1.start()
    p2.start()
    p3.start()

    p1.join()
    p2.join()
    p3.join()

To run it with VizTracer, you need an extra argument:

viztracer --log_multiprocess my_program.py

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

The whole program finished in a little more than 50ms, with the actual task finishing before the 50ms mark. The program's speed roughly tripled.

To compare it with the multi-thread version, here is the multiprocess version:

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

Without GIL, multiple processes can achieve parallelism, which means multiple is_prime functions can execute in parallel.

However, Python's multi-thread is not useless. For example, for computation-intensive and I/O-intensive programs, you can fake an I/O-bound task with sleep:

def io_task():
    time.sleep(0.01)

Try it in a single-thread, single-task program:

if __name__ == "__main__":
    for _ in range(3):
        io_task()

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

The full program took about 30ms; nothing special.

Now use multi-thread:

if __name__ == "__main__":
    thread1 = Thread(target=io_task)
    thread2 = Thread(target=io_task)
    thread3 = Thread(target=io_task)

    thread1.start()
    thread2.start()
    thread3.start()

    thread1.join()
    thread2.join()
    thread3.join()

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

The program took 10ms, and it's clear how the three threads worked concurrently and improved the overall performance.

Try it with asyncio

Python is trying to introduce another interesting feature called async programming. You can make an async version of this task:

import asyncio

async def io_task():
    await asyncio.sleep(0.01)

async def main():
    t1 = asyncio.create_task(io_task())
    t2 = asyncio.create_task(io_task())
    t3 = asyncio.create_task(io_task())

    await t1
    await t2
    await t3


if __name__ == "__main__":
    asyncio.run(main())

As asyncio is literally a single-thread scheduler with tasks, you can use VizTracer directly on it:

Image by:

<p class="rtecenter"><sup>(Tian Gao, <a href="https://opensource.com/%3Ca%20href%3D"https://creativecommons.org/licenses/by-sa/4.0/" rel="ugc">https://creativecommons.org/licenses/by-sa/4.0/" target="_blank">CC BY-SA 4.0</a>)</sup></p>

It still took 10ms, but most of the functions displayed are the underlying structure, which is probably not what users are interested in. To solve this, you can use --log_async to separate the real task:

viztracer --log_async my_program.py

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

Now the user tasks are much clearer. For most of the time, no tasks are running (because the only thing it does is sleep). Here's the interesting part:

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

This shows when the tasks were created and executed. Task-1 was the main() co-routine and created other tasks. Tasks 2, 3, and 4 executed io_task and sleep then waited for the wake-up. As the graph shows, there is no overlap between tasks because it's a single-thread program, and VizTracer visualized it this way to make it more understandable.

To make it more interesting, add a time.sleep call in the task to block the async loop:

async def io_task():
    time.sleep(0.01)
    await asyncio.sleep(0.01)

Image by:

^{(Tian Gao, CC BY-SA 4.0)}

The program took much longer (40ms), and the tasks filled the blanks in the async scheduler.

This feature is very helpful for diagnosing behavior and performance issues in async programs.

See what's happening with VizTracer

With VizTracer, you can see what's going on with your program on a timeline, rather than imaging it from complicated logs. This helps you understand your concurrent programs better.

VizTracer is open source, released under the Apache 2.0 license, and supports all common operating systems (Linux, macOS, and Windows). You can learn more about its features and access its source code in VizTracer's GitHub repository.