Bring PDFtk back to life in a container

This PDF merging and splitting tool hasn't been packaged in Fedora for a while, but that doesn't mean it's off limits to you.
221 readers like this.
A guide to packing and preparing for a tech conference

A colleague recently told me about one of his favorite utilities, PDFtk. Among other things, it lets you merge, split, and burst PDF documents, with or without encryption. You can learn more about it in this article.

Unfortunately, PDFtk was last packaged in Fedora 20 due to build requirements. While various alternatives are available, you may still want to use PDFtk. Fortunately, there’s a simple solution: just package it in a container and run it on a more recent Fedora version.

Rather than reinventing the wheel, I did some quick research and found a GitHub repository with a README and a Dockerfile to build such a container.

First, make sure your Docker environment is configured by installing, enabling, and starting the Docker service:

$ sudo dnf install -y docker
$ sudo systemctl enable docker
$ sudo systemctl start docker
$ sudo systemctl status docker

The README file shows how to define an alias to run the container:

# alias pdftk='docker run -it --privileged -v $PWD:/workdir -w /workdir/ /pdftk'

Note that the --privileged option runs the container as root. While this will allow the container to access the files we are working with (using the -v option), it will also cause new files to be owned by root. But running a container as root is not a best security practice. Before I get into that, revise the Docker build configuration as follows:

$ cat Dockerfile 
# Container build for pdftk (last packaged in Fedora 20)

FROM       fedora:20

# Update and install pdftk

RUN yum update -y &&		\
    yum install -y pdftk && 	\
    yum clean all

# Working directory

WORKDIR /workdir

# Set pdftk as our entry point

ENTRYPOINT ["/usr/bin/pdftk"]

CMD ["--help"]

This starts by pulling down the official Fedora 20 image. Change the MAINTAINER email address to your own if you like. Next, it updates all packages, installs PDFtk, and removes cached files. Putting these three separate commands on a single RUN command creates only one additional layer in the container instead of three.

The WORKDIR keyword defines the temporary work directory where new files will be created.

Finally, it sets the PDFtk binary as the entry point to the container, with CMD providing the --help option should no arguments be passed to the container.

You can now build the container with your revised Dockerfile as follows:

$ sudo docker build -t fedora/pdftk .

and examine the new image:

$ sudo docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
fedora/pdftk        latest              f2eaa35d31c8        3 seconds ago       595 MB    20                  ba74bddb630e        20 months ago       291 MB

To run the PDFtk container in the same manner as the standalone binary, use the following wrapper script:

$ cat ~/bin/pdftk

# Run the pdftk container and pass all arguments to this script to the container:
#	--rm remove instantiated container after execution
#	 -u  run with current UID/GID to give new files correct ownership
#	 -v  attach current working directory to /workdir inside the container
#	     and modify SELinux security context ("z") to allow container access to files

sudo docker run 		\
	--rm			\
	 -u $(id -u):$(id -g) 	\
	 -v $PWD:/workdir:z 	\
	fedora/pdftk "$@"

# Files will now have SELinux type container_file_t so we need to restore context:

restorecon $PWD/*.pdf


Instead of using the --privileged option to run the container as root, this script passes your unique identifier (UID) and group identifier (GID), allowing new files to have the correct owner and group. Your current working directory is mapped to /workdir inside the container, and appending :z to the internal directory allows the SELinux context to be changed so that the container can access your current working directory. This, of course, assumes you have SELinux enabled. If you disable SELinux, you will make Dan Walsh sad; Dan is a nice guy, so please don’t do it.

After the container terminates, the SELinux context is restored from container_file_t to user_home_t (assuming you’re in your home folder structure). While you can still access the files with the new context, using restorecon will tidy things up.

With the container built and the wrapper script in place, you can now run PDFtk as you did before.

For example:

$ pdftk A=pdf1.pdf B=pdf2.pdf cat A B output pdf12.pdf

Bringing older applications back to life is a great use case for containers. What other problems have you solved using containers? Let me know in the comment section below.

User profile image.
Curtis Rempel, RHCA, is a system design engineer with the Red Hat Certification Team. His Linux journey started in 1994 with a Red Hat Linux 2.1 CD from Jon “maddog” Hall.

1 Comment

xpdf and poppler-utils are your friends.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.