DAISY: A Linux-compatible text format for the visually impaired

Register or Login to like
Register or Login to like
Book stack

Image by Kate Ter Haar. Modified by Opensource.com. CC BY-SA 2.0.

If you're blind or visually impaired like I am, you usually require various levels of hardware or software to do things that people who can see take for granted. One among these is specialized formats for reading print books: Braille (if you know how to read it) or specialized text formats such as DAISY.

What is DAISY?

DAISY stands for Digital Accessible Information System. It's an open standard used almost exclusively by the blind to read textbooks, periodicals, newspapers, fiction, you name it. It was founded in the mid '90s by The DAISY Consortium, a group of organizations dedicated to producing a set of standards that would allow text to be marked up in a way that would make it easy to read, skip around in, annotate, and otherwise manipulate text in much the same way a sighted user would.

The current version of DAISY 3.0, was released in mid-2005 and is a complete rewrite of the standard. It was created with the goal of making it much easier to write books complying with it. It's worth noting that DAISY can support plain text only, audio recordings (in PCM Wave or MPEG Layer III format) only, or a combination of text and audio. Specialized software can read these books and allow users to set bookmarks and navigate a book as easily as a sighted person would with a print book.

How does DAISY work?

DAISY, regardless of the specific version, works a bit like this: You have your main navigation file (ncc.html in DAISY 2.02) that contains metadata about the book, such as author's name, copyright date, how many pages the book has, etc. This file is a valid XML document in the case of DAISY 3.0, with DTD (document type definition) files being highly recommended to be included with each book.

In the navigation control file is markup describing precise positions—either text caret offsets in the case of text navigation or time down to the millisecond in the case of audio recordings—that allows the software to skip to that exact point in the book much as a sighted person would turn to a chapter page. It's worth noting that this navigation control file only contains positions for the main, and largest, elements of a book.

The smaller elements are handled by SMIL (synchronized multimedia integration language) files. These files contain position points for each chapter in the book. The level of navigation depends heavily on how well the book was marked up. Think of it like this: If a print book has no chapter headings, you will have a hard time figuring out which chapter you're in. If a DAISY book is badly marked up, you might only be able to navigate to the start of the book, or possibly only to the table of contents. If a book is marked up badly enough (or missing markup entirely), your DAISY reading software is likely to simply ignore it.

Why the need for specialized software?

You may be wondering why, if DAISY is little more than HTML, XML, and audio files, you would need specialized software to read and manipulate it. Technically speaking, you don't. The specialized software is mostly for convenience. In Linux, for example, a simple web browser can be used to open the books and read them. If you click on the XML file in a DAISY 3 book, all the software will generally do is read the spines of the books you give it access to and create a list of them that you click on to open. If a book is badly marked up, it won't show up in this list.

Producing DAISY is another matter entirely, and usually requires either specialized software or enough knowledge of the specifications to modify general-purpose software to parse it.

Conclusion

Fortunately, DAISY is a dying standard. While it is very good at what it does, the need for specialized software to produce it has set us apart from the normal sighted world, where readers use a variety of formats to read their books electronically. This is why the DAISY consortium has succeeded DAISY with EPUB, version 3, which supports what are called media overlays. This is basically an EPUB book with optional audio or video. Since EPUB shares a lot of DAISY's XML markup, some software that can read DAISY can see EPUB books but usually cannot read them. This means that once the websites that provide books for us switch over to this open format, we will have a much larger selection of software to read our books.

Kendell Clark is an open source advocate and Fedora user who has been using Gnu/Linux since August 2011. I love my wife melisa, my dog tigger, and gnu/linux, especially if has anything to do with accessibility

5 Comments

Very interesting article Kendall! I had seen DAISY when I was helping special education teachers and students with the Bookshare program, https://www.bookshare.org/.

Interesting to see DAISY actually still in use. I was part of the technical teams in the earliest stages in the consortium and actually recorded and produced the very first DAISY book while working for the Swedish association for the visually impaired. It's one achievement I'm still very proud of and at the time marvelous we did to get a global standard. Sadly not open source though. The acronym was a construction afterwards and it was called Daisy early referring to another famous computer and movie ...

thanks for all the positive comments. The copyeditors hear cut out some of the technical detail which really goes into detail about how daisy works. If anyone's interested,I can send them the original open document file by email which has them. In particular, this file has details on the hybrid format the nls (national library service) hear in the US uses for it's talking books. It's a sort of mix of daisy 3 and some proprietary encryption scheme which uses the aes 256 bit algorithm to encrypt the audio which is in amr wb format so that only "authorised" players can decode them. I've always wanted an app for linux that could play these books but what I want most is for linux to be able to properly identify a daisy book when you run across it in your file manager. A daisy book is a folder full of files, so it shouldn't be too hard. Is anyone interested in helping me do that?

I couldn't agree more about formats and standards being open. Daisy is semi open I suppose, although they actively encourage patented audio formats. The only audio formats they support are amrwb for the nls's digital books, and mp3 or wave for daisy 2 and daisy 3 audio books. I remember reading some of the standards documents for daisy 2 and daisy 3, and in both cases they had people from both the riaa and mpaa helping, so what else can you expect. I can't wait for blind sites such as bookshare, rfb&d, now learning ally, etc to switch over to epub. The organization behind daisy has switched over to supporting it. But they'll probably take their time because of all the windows and mac users. What I mean by this is that there is more software to read daisy books on those platforms than epub and other open formats. Calibre isn't accessible enough on anything but linux to be usable. Despite this, daisy has outlived itself. It's a convoluted format that only windows and mac and now android have ever supported well. Although this isn't really the daisy consortium's fault. There have been programs in linux started to handle the format but they all got abandoned at some point or another.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.