Use Python to parse configuration files

The first step is choosing a configuration format: INI, JSON, YAML, or TOML.
69 readers like this.
Python programming language logo with question marks

Opensource.com

Sometimes, a program needs enough parameters that putting them all as command-line arguments or environment variables is not pleasant nor feasible. In those cases, you will want to use a configuration file.

There are several popular formats for configuration files. Among them are the venerable (although occasionally under-defined) INI format, the popular but sometimes hard to write by hand JSON format, the extensive yet occasionally surprising in details YAML format, and the newest addition, TOML, which many people have not heard of yet.

Your first task is to choose a format and then to document that choice. With this easy part out of the way, it is time to parse the configuration.

It is sometimes a good idea to have a class that corresponds to the "abstract" data in the configuration. Because this code will do nothing with the configuration, this is the simplest way to show parsing logic.

Imagine the configuration for a file processor: it includes an input directory, an output directory, and which files to pick up.

The abstract definition for the configuration class might look something like:

from __future__ import annotations
import attr

@attr.frozen
class Configuration:
    @attr.frozen
    class Files:
        input_dir: str
        output_dir: str
    files: Files
    @attr.frozen
    class Parameters:
        patterns: List[str]
    parameters: Parameters

To make the format-specific code simpler, you will also write a function to parse this class out of dictionaries. Note that this assumes the configuration will use dashes, not underscores. This kind of discrepancy is not uncommon.

def configuration_from_dict(details):
    files = Configuration.Files(
        input_dir=details["files"]["input-dir"],
        output_dir=details["files"]["output-dir"],
    )
    parameters = Configuration.Paraneters(
        patterns=details["parameters"]["patterns"]
    )
    return Configuration(
        files=files,
        parameters=parameters,
    )

JSON

JSON (JavaScript Object Notation) is a JavaScript-like format.

Here is an example configuration in JSON format:

json_config = """
{
    "files": {
        "input-dir": "inputs",
        "output-dir": "outputs"
    },
    "parameters": {
        "patterns": [
            "*.txt",
            "*.md"
        ]
    }
}
"""

The parsing logic parses the JSON into Python's built-in data structures (dictionaries, lists, strings) using the json module and then creates the class from the dictionary:

import json
def configuration_from_json(data):
    parsed = json.loads(data)
    return configuration_from_dict(parsed)

INI

The INI format, originally popular on Windows, became a de facto configuration standard.

Here is the same configuration as an INI:

ini_config="""
[files]
input-dir = inputs
output-dir = outputs

[parameters]
patterns = ['*.txt', '*.md']
"""

Python can parse it using the built-in configparser module. The parser behaves as a dict-like object, so it can be passed directly to configuration_from_dict:

import configparser

def configuration_from_ini(data):
    parser = configparser.ConfigParser()
    parser.read_string(data)
    return configuration_from_dict(parser)

YAML

YAML (Yet Another Markup Language) is an extension of JSON that is designed to be easier to write by hand. It accomplishes this, in part, by having a long specification.

Here is the same configuration in YAML:

yaml_config = """
files:
  input-dir: inputs
  output-dir: outputs
parameters:
  patterns:
  - '*.txt'
  - '*.md'
"""

For Python to parse this, you will need to install a third-party module. The most popular is PyYAML (pip install pyyaml). The YAML parser also returns built-in Python data types that can be passed to configuration_from_dict. However, the YAML parser expects a stream, so you need to convert the string into a stream.

import io
import yaml
def configuration_from_yaml(data):
    fp = io.StringIO(data)
    parsed = yaml.safe_load(fp)
    return configuration_from_dict(parsed)

TOML

TOML (Tom's Own Markup Language) is designed to be a lightweight alternative to YAML. The specification is shorter, and it is already popular in some places (for example, Rust's package manager, Cargo, uses it for package configuration).

Here is the same configuration as a TOML:

toml_config = """
[files]
input-dir = "inputs"
output-dir = "outputs"

[parameters]
patterns = [ "*.txt", "*.md",]
"""

In order to parse TOML, you need to install a third-party package. The most popular one is called, simply, toml. Like YAML and JSON, it returns basic Python data types.

import toml
def configuration_from_toml(data):
    parsed = toml.loads(data)
    return configuration_from_dict(parsed)

Summary

Choosing a configuration format is a subtle tradeoff. However, once you make the decision, Python can parse most of the popular formats using a handful of lines of code.

What to read next

What is a config file?

There are several popular formats for configuration files, each with its own strengths. Find what works best for you.

Moshe sitting down, head slightly to the side. His t-shirt has Guardians of the Galaxy silhoutes against a background of sound visualization bars.
Moshe has been involved in the Linux community since 1998, helping in Linux "installation parties". He has been programming Python since 1999, and has contributed to the core Python interpreter. Moshe has been a DevOps/SRE since before those terms existed, caring deeply about software reliability, build reproducibility and other such things.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.