3 command-line file conversion tools for Linux

591 readers like this.
How to upgrade your Fedora Linux system with DNF

Opensource.com

Recently, a friend innocently asked me how many file formats there are. My semi-serious response was, "Think of a soup bowl filled with beach sand."

OK, there aren't quite that many file formats. That said, you've probably never heard of many of the formats that are commonly used enough to warrant listing on Wikipedia. Chances are, you'll never see and never use most of them. If, however, you want or need to convert between file formats, then there are a quite a few applications for the job.

Let's take a look at three solid file conversion tools for the Linux command line.

Pandoc

Everyone I know who works with markup languages says Pandoc is the go-to utility for converting between those languages. And for good reason: Pandoc not only does some pretty nifty conversions, it's fast, too.

Have a file formatted with Markdown that you want to convert to a LibreOffice Writer document? How about a LaTeX document that you want to turn into an EPUB? Or maybe you have an HTML file that you want to turn into a slide deck. Pandoc is up to all of those tasks. And more.

Here's how to use Pandoc for a simple conversion (in this case, from HTML to reStructuredText):

pandoc -t rst myFile.html -o myFile.rst

You're not just limited to straight conversions. You can, for example, add a table of contents, typographic quotes, custom headers, and syntax highlighting to the resulting file. Take a peek at Pandoc's documentation for details.

Pandoc, however, only handles text-based files. What happens if you have a binary file, such as a word processor document? Help at the command line comes from an unexpected source.

LibreOffice

You're probably thinking, "Hold on! LibreOffice is a GUI application." Yes, it is. But what many people don't know is that you can run LibreOffice from the command line to quickly convert one or more files.

How? To, for example, transform a LibreOffice Impress slide deck to PDF, you'd type the following:

soffice --headless --convert-to pdf mySlides.odp

You'd just replace pdf with the extension of whatever file format you want to convert to. The --headless option, in case you're wondering, stops an empty LibreOffice window from opening on your desktop.

Using LibreOffice at the command line to convert a single file is overkill. However, turning to the command line is a great way to convert several files at once. If, say, you want to convert all of the Microsoft Word documents in a folder to LibreOffice Writer format, you'd type:

soffice --headless --convert-to odt *.docx

The conversion takes far less time than opening all of those files in LibreOffice Writer and doing the conversion manually.

FFmpeg

Whereas Pandoc is the Swiss Army Knife for converting between markup languages, FFmpeg is Pandoc's opposite number for audio and video formats.

FFmpeg is a set of libraries and executables that give you the ability to convert seemlessly between nearly any format.

Here's an example of a simple conversion of a video from AVI to Ogg Theora:

ffmpeg -i myVideo.avi myvideo.ogg

FFmpeg can do a lot more than that. You can set the frame rate of videos and add subtitles to them, change the aspect ratio, change the quality of audio, and more.

The command line can get quite crowded with those options, should you choose to use more than a couple of them. It's easy to forget the options, especially if you only use FFmpeg every so often. Take it from an old technical writer: There's no shame in reading the documentation.

Do you have a favorite command-line file conversion tool? Feel free to share it by leaving a comment below.

That idiot Scott Nesbitt ...
I'm a long-time user of free/open source software, and write various things for both fun and profit. I don't take myself all that seriously and I do all of my own stunts.

21 Comments

Great LibreOffice tip Scott! I usually start the GUI, but this is easier, for converting a few files to .pdf.

Thanks, Robin. This is one of my favourite LibreOffice/OpenOffice tricks, and works really nicely with a more complex conversion workflow I was forced to come up with a few years ago.

In reply to by robinmuilwijk

Ditto. I think I'd heard about this feature at some point, but I had forgotten about it. I tend to do a lot of my work over SSH connections, so being able to do more without X forwarding makes me happy.

In reply to by robinmuilwijk

When discussing conversion tools, it would probably worth it to include the `convert` command that comes with the ImageMagick suite of tools. :) I find myself using that one pretty regularly.

I just looked this up recently, and it was suggested that mogrify might be a better command than convert. Example: mogrify -format png somedoc.pdf
You can also do batch processing with this command.

In reply to by ScottNesbitt

Great article Scott. I knew about the "ffmpeg" conversion but not the other other two.There is also "pdftotext" on the command line.

Related to Libre/OpenOffice is unoconv.

The functionality is I believe similar, but maybe without the overhead of a full "soffice" instance?

I did have you in mind when I wrote the intro to the pandoc section of the article ...

In reply to by bbehrens

Great article! I'm choosing Pandoc! I was looking for something like this article!

I guess it's because I'm not in the proper field of work? But I don't know when I would have to rename a ton of files from one format to another? The only way I can see me needing to do this is if I work in a field such as audio / video production/engineering. There you might have a slew of audio files that have the wrong format for compression and the like, or maybe there's a ton of artwork that needs to be renamed in bulk for a show. But as a home user, who basically collects a lot of PDF's from the web and likes to listen to streaming jazz whilst reading, I don't know that I'd use any of these.

While it's certainly more common to use tools like this when you're on the content creation side of the fence (as opposed to consuming content). However, when you do have large libraries of files (ebooks, music, videos, etc.), a lot of people like to homogenize their files so they're all the same format. For instance, if you have a variety of ebooks and they're a mix of PDF, EPUB, MOBI, and plain text, you might run into problems reading those ebooks on some devices. But if you convert all of those books to EPUB, you then at least have a solid and consistent baseline from which to work.

In reply to by Eddie G. (not verified)

Great tool for data migrations! I have wished for such a tool several times. Thanks for the tip, installed LibreOffice on server at work and tested it out and it will be perfect for some of our workflows, which include a ton of data format conversions. This will make several people very happy.

In reply to by Eddie G. (not verified)

Excellent timing! I'm preparing to start a project that needs to scrape data from the web,
and pandoc will make it much easier.

Another file conversion tool I use a lot is openssl. It is invaluable when working with
certificates that need to be in a variety of formats.

I use a utility called sox to convert audio file formats. From its man page:

SoX reads and writes audio files in most popular formats and can optionally apply
effects to them. It can combine multiple input sources, synthesise audio, and, on
many systems, act as a general purpose audio player or a multi-track audio recorder.
It also has limited ability to split the input into multiple output files.

Nice. I'd never looked at Pandoc before so thanks for this

This is a great piece, thank you Scott! Perhaps unrelated, but made me think of command line PDF manipulation tools like PDFtk and QPDF.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.