Get the highlights in your inbox every week.
Bring PDFtk back to life in a container
Bring PDFtk back to life in a container
This PDF merging and splitting tool hasn't been packaged in Fedora for a while, but that doesn't mean it's off limits to you.
A colleague recently told me about one of his favorite utilities, PDFtk. Among other things, it lets you merge, split, and burst PDF documents, with or without encryption. You can learn more about it in this Opensource.com article.
Unfortunately, PDFtk was last packaged in Fedora 20 due to build requirements. While various alternatives are available, you may still want to use PDFtk. Fortunately, there’s a simple solution: just package it in a container and run it on a more recent Fedora version.
Rather than reinventing the wheel, I did some quick research and found a GitHub repository with a README and a Dockerfile to build such a container.
First, make sure your Docker environment is configured by installing, enabling, and starting the Docker service:
$ sudo dnf install -y docker
$ sudo systemctl enable docker
$ sudo systemctl start docker
$ sudo systemctl status docker
The README file shows how to define an alias to run the container:
# alias pdftk='docker run -it --privileged -v $PWD:/workdir -w /workdir/ /pdftk'
Note that the
--privileged option runs the container as root. While this will allow the container to access the files we are working with (using the
-v option), it will also cause new files to be owned by root. But running a container as root is not a best security practice. Before I get into that, revise the Docker build configuration as follows:
$ cat Dockerfile
# Container build for pdftk (last packaged in Fedora 20)
# Update and install pdftk
RUN yum update -y && \
yum install -y pdftk && \
yum clean all
# Working directory
# Set pdftk as our entry point
This starts by pulling down the official Fedora 20 image. Change the
MAINTAINER email address to your own if you like. Next, it updates all packages, installs PDFtk, and removes cached files. Putting these three separate commands on a single
RUN command creates only one additional layer in the container instead of three.
WORKDIRkeyword defines the temporary work directory where new files will be created.
Finally, it sets the PDFtk binary as the entry point to the container, with
CMD providing the
--help option should no arguments be passed to the container.
You can now build the container with your revised Dockerfile as follows:
$ sudo docker build -t fedora/pdftk .
and examine the new image:
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
fedora/pdftk latest f2eaa35d31c8 3 seconds ago 595 MB
docker.io/fedora 20 ba74bddb630e 20 months ago 291 MB
To run the PDFtk container in the same manner as the standalone binary, use the following wrapper script:
$ cat ~/bin/pdftk
# Run the pdftk container and pass all arguments to this script to the container:
# --rm remove instantiated container after execution
# -u run with current UID/GID to give new files correct ownership
# -v attach current working directory to /workdir inside the container
# and modify SELinux security context ("z") to allow container access to files
sudo docker run \
-u $(id -u):$(id -g) \
-v $PWD:/workdir:z \
# Files will now have SELinux type container_file_t so we need to restore context:
Instead of using the
--privileged option to run the container as root, this script passes your unique identifier (UID) and group identifier (GID), allowing new files to have the correct owner and group. Your current working directory is mapped to
/workdir inside the container, and appending
:z to the internal directory allows the SELinux context to be changed so that the container can access your current working directory. This, of course, assumes you have SELinux enabled. If you disable SELinux, you will make Dan Walsh sad; Dan is a nice guy, so please don’t do it.
After the container terminates, the SELinux context is restored from
user_home_t (assuming you’re in your home folder structure). While you can still access the files with the new context, using restorecon will tidy things up.
With the container built and the wrapper script in place, you can now run PDFtk as you did before.
$ pdftk A=pdf1.pdf B=pdf2.pdf cat A B output pdf12.pdf
Bringing older applications back to life is a great use case for containers. What other problems have you solved using containers? Let me know in the comment section below.