Get the highlights in your inbox every week.
Use Python to parse configuration files | Opensource.com
Use Python to parse configuration files
The first step is choosing a configuration format: INI, JSON, YAML, or TOML.
Sometimes, a program needs enough parameters that putting them all as command-line arguments or environment variables is not pleasant nor feasible. In those cases, you will want to use a configuration file.There are several popular formats for configuration files. Among them are the venerable (although occasionally under-defined)
INIformat, the popular but sometimes hard to write by hand
JSONformat, the extensive yet occasionally surprising in details
YAMLformat, and the newest addition,
TOML, which many people have not heard of yet.
Your first task is to choose a format and then to document that choice. With this easy part out of the way, it is time to parse the configuration.
It is sometimes a good idea to have a class that corresponds to the "abstract" data in the configuration. Because this code will do nothing with the configuration, this is the simplest way to show parsing logic.
Imagine the configuration for a file processor: it includes an input directory, an output directory, and which files to pick up.
The abstract definition for the configuration class might look something like:
from __future__ import annotations
To make the format-specific code simpler, you will also write a function to parse this class out of dictionaries. Note that this assumes the configuration will use dashes, not underscores. This kind of discrepancy is not uncommon.
files = Configuration.Files(
parameters = Configuration.Paraneters(
Here is an example configuration in JSON format:
json_config = """
The parsing logic parses the JSON into Python's built-in data structures (dictionaries, lists, strings) using the
json module and then creates the class from the dictionary:
parsed = json.loads(data)
The INI format, originally popular on Windows, became a de facto configuration standard.
Here is the same configuration as an INI:
input-dir = inputs
output-dir = outputs
patterns = ['*.txt', '*.md']
Python can parse it using the built-in
configparser module. The parser behaves as a
dict-like object, so it can be passed directly to
parser = configparser.ConfigParser()
YAML (Yet Another Markup Language) is an extension of JSON that is designed to be easier to write by hand. It accomplishes this, in part, by having a long specification.
Here is the same configuration in YAML:
yaml_config = """
For Python to parse this, you will need to install a third-party module. The most popular is
pip install pyyaml). The YAML parser also returns built-in Python data types that can be passed to
configuration_from_dict. However, the YAML parser expects a stream, so you need to convert the string into a stream.
fp = io.StringIO(data)
parsed = yaml.safe_load(fp)
TOML (Tom's Own Markup Language) is designed to be a lightweight alternative to YAML. The specification is shorter, and it is already popular in some places (for example, Rust's package manager, Cargo, uses it for package configuration).
Here is the same configuration as a TOML:
toml_config = """
input-dir = "inputs"
output-dir = "outputs"
patterns = [ "*.txt", "*.md",]
In order to parse TOML, you need to install a third-party package. The most popular one is called, simply,
toml. Like YAML and JSON, it returns basic Python data types.
parsed = toml.loads(data)
Choosing a configuration format is a subtle tradeoff. However, once you make the decision, Python can parse most of the popular formats using a handful of lines of code.