4 ways you can edit a PDF with the pdftk-java command | Opensource.com

4 ways you can edit a PDF with the pdftk-java command

Combine PDFs, remove pages, split PDFs, and fill in forms with this handy Linux command.

a checklist for a team
Image by : 
Opensource.com
x

Subscribe now

Get the highlights in your inbox every week.

Between technology whitepapers, manuscripts, and RPG books, I deal with lots of PDFs every day. The PDF format is popular because it contains processed PostScript code. PostScript is the native language of modern printers, so publishers often release a digital version of a book as a PDF because they've invested in the time and effort to produce a file for print anyway. But a PDF isn't intended to be an editable format, and while some reverse processing is possible, it's meant to be the last stop for digital data before it's sent to the printer. Even so, sometimes you need to make adjustments to a PDF, and one of my favorite tools for that job is the pdftk-java command.

Install pdftk-java on Linux

As its name suggests, pdftk-java is written in Java, so it works on all major operating systems as long as you have Java installed.

Linux and macOS users can install Java from AdoptOpenJDK.net. Windows users can install Red Hat's Windows build of OpenJDK.

To install pdftk-java:

1. Download the pdftk-all.jar release from its Gitlab repository, and save it to ~/.local/bin/ or some other location in your path.

2. Open ~/.bashrc in your favorite text editor and add this line to it:

alias pdftk='java -jar $HOME/.local/bin/pdftk-all.jar'

3. Load your new Bash settings:

$ source ~/.bashrc

Command syntax

The structure of a valid pdftk-java command follows a pattern, but there's a lot of flexibility in what's in the pattern. The syntax is a little unusual because it doesn't use traditional-style terminal options, but with practice, it's not too difficult to remember.

  • pdftk: The alias to call the command
  • input file: The PDF you want to modify
  • action: What you want to do to the input file
  • output: Where you want to save your modified PDF file

It's the action part that's most complex, so I'll start with simple tasks.

Combine two PDF files into one

It's not uncommon for the front cover of a book to be created in a separate application, such as Inkscape or GIMP, than the rest of the book, which is usually done in a layout application like Scribus or an office suite like LibreOffice. You could combine the two in your layout application. A good desktop publisher like Scribus makes it easy just to reference an image so that when the cover changes, it's automatically updated in layout. However, it's also possible to prepend the cover to a PDF with pdftk-java:

$ pdftk cover.pdf body.pdf \
cat \
output book.pdf

In this example, the action is cat, as in concatenate and like the Linux cat command, it concatenates one or more PDF files into a single data stream, and the data stream is directed into whatever file the output argument specifies.

Remove pages from a PDF

You can't exactly remove a page from a PDF, but you can create a new PDF containing only the pages you want to keep.

$ pdftk book.pdf \
cat 1 3-end \
output shorter-book.pdf

In this example, page 1 of my book file, and all pages from 3 to the end, are saved to a new file. The page I've removed, therefore, is page 2.

Split a PDF into separate files

Splitting a PDF file into many different files also uses the cat action, and it's similar in principle to removing pages. You can split a PDF by sending the pages you want to a new file:

$ pdftk book.pdf \
cat 1-15 \
output part-1.pdf

$ pdftk book.pdf \
cat 16-42 \
output part-2.pdf

If you need to split a PDF into single-page files, there's a special action for that, called burst:

$ pdftk book.pdf burst

$ ls
book.pdf pg_0001.pdf pg_0002.pdf
pg_0003.pdf pg_0004.pdf pg_0005.pdf
[...]

Fill in forms

Few would argue that the PDF format hasn't become bloated over the years, and one feature you sometimes find in a PDF file is a fillable form. You see this in US tax documents, RPG character sheets, online school workbooks, and other PDF files that are intended to be interactive. While most modern PDF viewers, such as GNOME's Evince and KDE's Okular, can fill out PDF forms, you can also fill out a PDF form with the help of pdftk-java.

First, you must extract the form data using the generate_fdf action. This extracts the IDs of the form elements and places them into a text file.

$ pdftk character-sheet.pdf \
generate_fdf \
output chsheet-form.txt

Your destination file (in this example, chsheet-form.txt) contains the data of the form contained in the PDF, but just the text parts. You can edit it in any standard text editor, like Atom or Gedit.

In a sometimes admirable and sometimes awkward glimpse into the workflow of the organization producing the PDF, you'll find some forms are clearly labeled, while others have default names like "Checkbox_001" and "Textfield-021", so you might have to cross-reference your text file with your PDF, but that may be worthwhile if you're writing a script to fill out forms automatically. Each label is marked as a /T item, and on the following line, there's space (marked as /V) provided for text entry. Here's a snippet from one that's got context to its labels and some data filled in:

/T (CharacterName 2)
/V (Abaddon)
>>
<<
/T (SlotsTotal 24)
/V ()
>>
<<
/T (Hair)
/V (Brown)
>>
<<
/T (AC)
/V (15)
>>
<<
/T (Background)
/V ()
>>
<<
/T (DEXmod )
/V ()

Once you've got the form data entered, you can combine your text input with the PDF structure with the fill_form action:

$ pdftk character-sheet.pdf \
fill_form chsheet-form.txt \
output completed.pdf

Here's a sample of the result:

pdftk-form-fill.jpg

A form filled by pdftk-java

(Seth Kenlon, CC BY-SA 4.0)

PDF modification made easy

When you deal with lots of PDF files or PDF files through shell scripts, a tool like pdftk-java is invaluable because it frees you from having to do everything manually. When I build a PDF from the output of Docbook, it's a Makefile that calls pdftk-java for any number of tasks, so there's no chance of me forgetting a step or mistyping the command, and there's no need for me to spend my time on it. There are lots of other reasons you might use pdftk-java in your own workflow, and lots of other things pdftk-java can do, including actions like shuffle, rotate, dump_data, update_info, and attach_files. If you find yourself dealing with PDF files often, give pdftk-java a try.

Text editor on a browser, in blue

Use qpdf and poppler-utils to split, modify, and combine PDF files.
Person using a laptop

One university open source program office is working to improve accessibility of an open access journal with LaTeX.

About the author

Seth Kenlon
Seth Kenlon - Seth Kenlon is a UNIX geek, free culture advocate, independent multimedia artist, and D&D nerd. He has worked in the film and computing industry, often at the same time. He is one of the maintainers of the Slackware-based multimedia production project Slackermedia.