Get the highlights in your inbox every week.
How to validate your EPUB and convert it
Open source tools to prepare your ebooks for publication
Having published three ebooks, and being in the process of putting together another one, I’ve learned that after writing a book there are a few more things that you need to do before sharing your book with the world.
If you're publishing your book in the EPUB format, you need to check the book to ensure it's properly formed. And, if you're selling your book through Amazon.com's Kindle Library, you need to convert your EPUB to the format that the Kindle supports.
Let’s take a look at two tools that enable you to do both.
EPUB files are very structured. It's not just the chapters (which are made up of individual XHTML files) but also the way in which those chapters, the supporting files (like images and Cascading Style Sheet files), and the directories inside an EPUB file are arranged.
No matter how careful you are or what tool you’re using to create or assemble a book in EPUB format, a few mistakes can creep in. That’s why you need to validate your book before setting it loose. Validation is the process of making sure that your EPUB books contain all the elements that ebook readers expect. Like what? Here’s a partial list:
- Complete metadata
- The proper directory structure in the EPUB file
- Valid XHTML
- Working links and references to files in the EPUB file
- A table of contents
An ebook reader can usually open an EPUB file that doesn't validate. However, the contents might not render properly and the navigation might not work.
A command line tool called epubcheck makes validation fairly easy. epubcheck is a Java application, so make sure you have a fairly recent JRE installed on your computer before trying to use it. If you're averse to the command line, don't worry: there's also an online validator available.
To use epubcheck, run this command:
java -jar epubcheck–3.0b5.jar ebook_file.epub
Here are the results when epubcheck is run against a partially-completed ebook:
In the example above, the file names of the individual chapter titles contain spaces. That's not a massive sin, but one which can cause problems.
epubcheck is great at finding problems. But in many cases, it's lacking when it comes to explaining what those problems are. epubcheck assumes that you have a level of knowledge and the knowledge to fix the problem. That’s not always the case.
When a writer of my acquaintance was validating an ebook, he got an error message telling him that there was invalid HTML syntax in a particular file. He has a decent knowledge of HTML and went to the line number that epubcheck pointed to in the file. He didn’t see anything wrong. He enlisted my help, and we discovered that epubcheck was expecting paragraph tags around text surrounded by blockquote tags.
Like it or hate it (and there are many people across that spectrum), Amazon is the biggest and arguably most popular marketplace for publishing, selling, and buying books. Amazon, however, uses a proprietary format with the extension .mobi. Apart from a web-based authoring and publishing tool called BookType, there aren’t any open source tools for creating books in the .mobi format.
You can convert your ebooks to .mobi using a command line tool called ebook-convert. It's part of the suite of utilities that comes with the calibre ebook management software.
To convert an EPUB file to .mobi, run the command:
ebook-convert ebook_file.epub ebook_file.mobi
Depending on the size of your file, the conversion will anywhere from 5 to 20 seconds. Here’s what the tool returns at the command line when you run it:
Here’s what an EPUB converted to .mobi looks like when viewed in calibre:
The only problem I've encountered with the conversion is that ebook-convert duplicates the cover page. Other than that, it does just as good a job of converting EPUB files to .mobi as Amazon's proprietary Kindlegen conversion tool.
With all that out of the way, you're ready to share your book with the world.