Packaging Python modules with wheels

By using a CI/CD build system, providing Python packages in the advantageous wheel format becomes a breeze.
1 reader likes this.
Hands on a keyboard with a Python book

WOCinTech Chat. Modified by Opensource.com. CC BY-SA 4.0

Everyone who has been working with Python for a while might already have come around packages. In Python terminology, packages (or distribution packages) are collections of one or more Python modules that provide specific functionality. The general concept is comparable to libraries in other languages. Some peculiarities with Python packages make dealing with them different.

Pip and PyPi

The most common way to install a third-party Python package is to use package installer pip, supplied by default. The Python Package Index (PyPi) is the central server for packages of all kinds and the default source for pip. Python packages contain files that specify the package name, version, and other meta information. Based on those files, PyPi knows how to classify and index a package. In addition, those files may include installation instructions that pip processes.

Source and binary distribution

Python modules are distributed in several formats, each with pros and cons. In general, the formats can be divided into two groups.

Source distribution (sdist)

Source distributions are defined in PEP 517 and are gzipped tar archives with the file ending *.tar.gz. The archive contains all package-related source files and installation instructions. A source distribution often has dependencies to a build system like distutils or setuptools which cause code execution during installation. The execution of (arbitrary) code upon installation may raise safety concerns.

In the case of a Python C/C++ extension, a source distribution contains plain C/C++ files. These must be compiled upon installation, so an appropriate C/C++ toolchain must be present.

Built distributions (bdist)

In contrast, you can often use a built distribution as is. The idea behind built distributions is to provide a package format without introducing additional dependencies. When it comes to Python C/C++ extension, a built distribution provides binaries ready for the user's platform.

The most widely used built distribution format is the Python wheel, specified in PEP 427.

Python wheels

Wheels are ZIP archives with the file ending .whl. A wheel may contain binaries, scripts, or plain Python files. If a wheel contains binaries of a C/C++ extension module, it indicates that by including its target platform in its filename. Pure Python files (.py) are compiled into Python byte code (.pyc) during the installation of the wheel.

If you attempt to install a package from PyPi using pip, it always chooses a Python wheel over a source distribution. However, when pip cannot find a compatible wheel, it attempts to fetch the source distribution instead. As a package maintainer, it's a good practice to provide both formats on pip. For a package user, using wheels over source distributions is advantageous because of the safer installation process, their smaller size, and, as a result, faster installation time.

To address a wide range of users, the package maintainer must offer wheels for various platforms and Python versions.

In one of my previous articles, Write a C++ extension module for Python, I demonstrated how to create a Python C++ extension for the CPython interpreter. You can re-use the article's example code to build your first wheel.

Defining the build configuration with setuptools

The demo repository contains the following files, which contain meta information and a description of the build process:

pyproject.toml

[build-system]
requires = [
    "setuptools>=58"
]

build-backend = "setuptools.build_meta"

This file is the successor of the setup.py since PEP 517 and PEP 518. This file is actually the entry point for the packaging process. The build-backend key tells pip to use setuptools as the build system.

setup.cfg

This file contains the static, never changing metadata of the package:

[metadata]
name = MyModule
version = 0.0.1

description = Example C/C++ extension module
long_description = Does nothing except incremention a number
license = GPLv3
classifiers = 
    Operating System::Microsoft
    Operating System::POSIX::Linux
    Programming Language::C++

setup.py

This file defines the generic build process for the Python module. Every action which must be performed at installation time goes here.

Due to security concerns, this file should only be present if absolutely necessary.

from setuptools import setup, Extension

MyModule = Extension(
                    'MyModule',
                    sources = ['my_py_module.cpp', 'my_class_py_type.cpp'],
                    extra_compile_args=['-std=c++17']
                    )

setup(ext_modules = [MyModule])

This example package is actually a Python C/C++ extension, so it requires a C/C++ toolchain on the user's system to compile. In the previous article, I used CMake to generate the build configuration. This time, I'm using setuptools for the build process. I faced challenges when running CMake inside a build container (I'll come back to that point later). The setup.py file contains all the information required to build the extension module.

In this example, setup.py lists the involved source files and some (optional) compile arguments. You can find a reference to the setuptools build in the documentation.

Build process

To start the build process, open a terminal in the root folder of the repository and run:

$ python3 -m build --wheel

Afterward, find the subfolder dist containing a .whl file. For example:

MyModule-0.0.1-cp39-cp39-linux_x86_64

The file name carries a lot of information. After the module name and version, it specifies the Python interpreter (CPython 3.9) and the target architecture (x86_64).

At this point, you can install and test the newly created wheel:

$ python3 -m venv venv_test_wheel/

$ source venv_test_wheel/bin/activate

$ python3 -m pip install dist/MyModule-0.0.1-cp39-cp39-linux_x86_64.whl
Wheel package

(Stephan Avenwedde, CC BY-SA 4.0)

Now you have one wheel, which you can forward to someone using the same interpreter on the same architecture. This is the bare minimum, so I'll go one step further and show you how to create wheels for other platforms.

Build configuration

As a package maintainer, you should provide a suitable wheel for as many platforms as possible. Luckily, there are tools to make this easy for you.

Maintaining Linux compatibility

When building Python C/C++ extensions, the resulting binaries are linked against the standard libraries of the build system. This could cause some incompatibilities on Linux, with its various versions of glibc. A Python C/C++ extension module built on one Linux system may not work on another comparable Linux system due to, for example, the lack of a certain shared library. To avert such scenarios, PEP 513 proposed a tag for wheels that work on many Linux platforms: manylinux.

Building for the manylinux platform causes linking against a defined kernel and userspace ABI. Modules that conform to this standard are expected to work on many Linux systems. The manylinux tag developed over time, and in its latest standard (PEP 600), it directly names the glibc versions the module was linked against (manylinux_2_17_x86_64, for example).

In addition to manylinux, there is the musllinux platform (PEP 656), which defines a build configuration for distributions utilizing musl libc like Alpine Linux.

CI build wheel

The cibuildwheel project provides CI build configurations for many platforms and the most widely used CI/CD systems.

Many Git hosting platforms have CI/CD features built in. The project is hosted on GitHub, so you can use GitHub Actions as a CI server. Just follow the instructions for GitHub Actions and provide a workflow file in your repository: .github/workflows/build_wheels.yml.

CI integration

(Stephan Avenwedde, CC BY-SA 4.0)

A push to GitHub triggers the workflow. After the workflow has finished (note that it took over 15 minutes to complete), you can download an archive containing a wheel for various platforms:

Archive for various platforms

(Stephan Avenwedde, CC BY-SA 4.0)

You still have to package those wheels manually if you want to publish them on PyPi. Using CI/CD, it's possible to automate the delivery process to PyPi. You can find further instructions in cibuildwheels documentation.

Wrap up

The various formats can make the packaging of Python modules an obtuse process for beginners. Knowledge about the different package formats, their purpose, and the tools involved in the packaging process is necessary for package maintainers. I hope this article sheds light on the world of Python packaging. In the end, by using a CI/CD build system, providing packages in the advantageous wheel format becomes a breeze.

Tags
User profile image.
Stephan is a technology enthusiast who appreciates open source for the deep insight of how things work. Stephan works as a full time support engineer in the mostly proprietary area of industrial automation software. If possible, he works on his Python-based open source projects, writing articles, or driving motorbike.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.