Emacspeak, an audible interface for Linux

Image by:

Opensource.com

Screen readers such as Orca work by describing the graphical environment to the user. They deconstruct an arbitrary visual environment that's built on top of an inherently text-based system. On some systems, this is necessary because there's no access—at least pragmatically—to the OS by any other means than the graphical interface. As most Linux users know, however, a graphical interface on a good Unix system is entirely optional in the first place, so there's no need to generate one, deconstruct it, and describe it; the computer can just spit out text.

I am aware of two efforts forging this path: Emacspeak and ADRIANE (on Knoppix). In this article, we'll take an in-depth look at the former.

Emacspeak is an audible interface that allows non-sighted users to interact independently and efficiently with a computer, entirely by way of text input and output. Emacspeak uses "audio formatting" and W3C's Aural CSS to produce a full audio representation of input and output.

Another advantage of using the Emacspeak system is that it inherits Emacs' extensive set of useful packages. As any serious Emacs user knows, it's quite possible to sit in front of a computer, launch Emacs, and never leave it 'til shutdown. Emacs has an application for nearly anything you want to do at a computer on a daily basis, including browsing the Web with w3m, sending and receiving email with rmail, chatting via IRC with erc or circe, listening to audiobooks and music with emms, managing files with dired, running a Unix shell with shell-mode, installing more Emacs packages with a built-in package manager, and scripting pretty much anything and building your own custom modules with elisp.

And yes, it can also edit text.

There's a learning curve, of course, but learning any new desktop (and in this model, Emacs is the desktop) entails learning some new concepts and developing new muscle-memory. But if you do give it a chance as an operating environment, Emacs proves itself quickly as a robust and practical user interface.

Installing the OS

Since Emacs works on practically everything, there are—at least technically—a dozen different ways to install, configure, and use Emacspeak. There are many posts online about it, but each covers a different configuration and yet none of them appear to do so to completion. This, however, is a definitive round-trip tutorial. It aims to be indifferent to distribution, although some of the fancy startup processes could be implemented one way or another depending on whether you use Systemd startup scripts, OpenRC, or BSD-style scripts. Conceptually, the ideas will be the same no matter what; the important bits are how different components fit together.

This guide configures a computer such that emacspeak will be the only screen reader on the system and will require only software-based components (it will use the computer speakers to speak the text, and will not require any external "speech synthesizer" or braille output device).

Please note that these install instructions have been written and performed by a sighted user. It is not optimized for non-sighted users and might require sighted assistance to install. Everyday use from that point requires nothing but a but a blind user (or a sighted user with a blindfold) and the computer.

Install Linux

The first step is to install the Linux distribution of choice. I prefer Slackware for its stability, its lack of auto-updates that could potentially break the mission-critical environment of Emacspeak, and the inclusion of the Emacspeak package on its install DVD and package servers. Emacspeak can, however, be installed on any Linux distribution.

Install Emacspeak

After installing Linux, log in and install the emacspeak package and, if available, the emacspeak-ss package. Different distributions package it differently, but it's usually safe to just install everything related to emacspeak.

To confirm that emacspeak is installed, launch it from a terminal by typing:

emacspeak

You'll probably get errors in response (unless your distribution has configured it for you), but as long as you get a response from the command other than command not found, then you'll know that emacspeak is installed and ready to be configured.

Time to grab a speech synthesizer.

Configuring speech synthesis

There's a lot of confusion online about what speech synthesizers are. The first misunderstanding tends to be music-related (speech synthesizers and vocoders are not the same thing). Then there's the old speech recognition confusion (we don't want to talk to our computer, we want our computer to talk to us). And then there are even questions about software speech synthesizers and hardware speech synthesizers. The system we are building uses a software speech synthesizer that will cost $0.

If the term "speech synthesizer" is confusing to you, just think of it as your screen reader's voice. But in technical terms, it is a software-based speech synthesizer. Think Stephen Hawking.

The speech synthesizer most actively maintained is Flite, and it works well with Emacs. It's not the prettiest sounding speech a computer's ever rendered, and it does tend to be quite fast, but it's the open source option we have. Companies designing closed source synthesizers could do the world a big favor by open sourcing their synthesizers (or at the very least making them free to use non-commercially). Next time you speak with your government representative, you might even go so far as to ask why none of the tax dollars you spend on purchasing speech synthesis for blind employees also goes toward developing an open source, communal solution. (Although the same could be asked of the OS it's running on in the first place.)

Install flite from your distribution's software repository, or, if you're on Slackware, from SlackBuilds.org.

Link the sound server and Emacspeak

The next step is to configure something called a "sound server," which is basically the intermediary link between emacspeak and the software synthesizer flite.

Without a sound server, Emacs will sit in one corner and Flite in the other, with no way for them to ever communicate with one another, much less read text back to you.

The sound server we can use is eflite. Install it from your distribution's repository, or on Slackware from SlackBuilds.org.

Once eflite has been built and installed, you should be able to test flite with a command like this:

$ flite -t foobar

You should hear a voice say "foobar"

If this test does not work, the most likely problem is that the sound on your computer isn't working or is turned all the way down. Configure sound, play some tests in a multimedia application like VLC just to confirm that your sound is working, and then try the eflite test again.

Now you have installed the emacspeak audio desktop, the flite speech synthesizer, and the eflite speech server. The next step is to configure it all to work together.

Wiring it all together

Emacspeak discovers what sound server to use from environment variables.

These don't set themselves, so you need to set them. Before setting them permanently, do a test. At a BASH prompt, set the appropriate environment variable to point to eflite:

$ DTK_PROGRAM=`which eflite`

Make it permanent for this session: $ export DTK_PROGRAM

And launch:

$ emacspeak

When Emacs launches, you should hear Emacs being narrated to you.

If this is not working for you but all the tests up to this point have been successful, then emacspeak is probably not using the correct sound server. After all, if the individual components have proven themselves to work, then the problem can't be individual parts.

Things to review and troubleshoot:

It's important that the DTK_PROGRAM environment variable is set so that Emacspeak knows which sound server to send information to. To set this variable properly, you must, within the same BASH shell as the shell you use to launch emacspeak, do exactly the steps above (set the variable, export the variable, launch). If you use a shell other than BASH, then the process may be different (depending on the shell).

If you change shells or close that shell between setting DTK_PROGRAM and launching Emacspeak, you lose the variable setting and you are launching Emacspeak without it knowing what sound server to use. You can test to see that this variable is set by running echo $DTK_PROGRAM just before launching Emacspeak. If it returns /usr/bin/eflite (or something like that), then the variable is set correctly. If it does not return a path to eflite, double-check that eflite installed correctly and then locate the path to the executable (it should be located somewhere in a bin directory).

Assuming you get it to work as expected, it's time to make setting the DTK_PROGRAM variable and launching Emacspeak more transparent. To do this, add the commands you ran to test Emacspeak to the user's .bash_profile:

echo "DTK_PROGRAM=`which eflite`" >> $HOME/.bash_profile echo "export DTK_PROGRAM" >> $HOME/.bash_profile

Optionally, if this really is the only way you'll ever use Emacs, then you might want to create an alias so that all calls to emacs actually open up emacspeak:

echo 'alias emacs="emacspeak"' >> $HOME/.bash_profile

Log out and log back in and type echo $DTK_PROGRAM. If it returns /usr/bin/eflite (or whatever the path to eflite is on your system), then the .bash_profile is working. Launch Emacspeak by typing in emacs, and emacs with Emacspeak functionality should launch and you should hear it narrate to you as expected.

Now go learn emacs and all of its extensions. Emacs has everything from shells to media players available for it, and all of it will be narrated by Emacspeak. So as long as you never leave emacs (and with so many extensions, you'll find that there may well never be a need to do so), then your computing environment will be spoken aloud to you. A good resource for learning about GNU Emacs is the emacswiki.org, and a very good introductory mini-series in podcast form can be found at Hacker Public Radio; listen to part 1, part 2, and part 3. For bonus points, listen to them using emms!

Refining the system for non-visual use

If the Emacspeak system you are setting up is for a blind user, there are two useful modifications to the system that can be made to help usability. As it is now, when the system boots, it runs through the usual launch sequence and then sits silently at a login prompt waiting for the user to enter their login information. Potentially, there's no need to launch a GUI at all; on Slackware, this is the default behavior, and on Systemd startup it can become the default:

$ sudo systemctl set-default multi-user.target

Alternately, you can allow a graphical boot but configure, within the desktop environment, to have Emacspeak launch automatically immediately after login. How each desktop manages that is unique to the desktop (generally it's an option in System Settings or in a dedicated control panel like Startup Applications), or you can add the command to .xinitrc.

If you're booting to no GUI, then you should provide notification for your user that the boot sequence has ended and a login prompt is waiting for input.

To create an audible login prompt, we modify a startup script such that some audio command is initiated at the end of the boot sequence. This means that there will be an audible prompt, and then the boot sequence will officially end, and then the login prompt will appear. Traditionally, the init sequence consists of a series of shell scripts that are kept in etc, with the final shell script executed being rc.local. That's still the case with Slackware and some other systems, and on Systemd compatibility with rc.local is usually built-in.

Therefore, a command added to rc.local will always execute at the end of the boot sequence. Any audible cue should work; you could play a sound file with ogg123 or sox, or just a phrase with flite:

# echo 'exec /usr/bin/flite -t "please log in."' >> /etc/rc.d/rc.local

After a successful login, the user is dumped to a non-audible bash prompt. That's a problem, especially since the same happens after a failed login. The easiest solution is to launch Emacspeak automatically after a successful login, simultaneously confirming success and eliminating the redundancy of manually launching Emacspeak every time the user logs into the system.

To implement this:

$ echo 'exec /usr/bin/emacspeak' >> $HOME/.bash_profile

With this in place, Emacspeak is automatically spawned any time the user opens a new shell, serving as the de facto shell environment as well as the de facto desktop.

How to help

I hope this guide has helped. Feel free to ask questions via the contact info you find on either slackermedia.ml or here on Opensource.com.

If you want to help blind Linux usage advance, then there are a few things you can do! Broadly, you can always help by implementing shell scripts and shell-based or elisp-based applications that respect the standard input and output expectations that Unix has established since its very beginnings. It's not just "graybeard" pride that encourages the classic Unix model, it's the preservation of data that can be parsed by both humans and machines in a predictable and usable manner. If you're writing a GUI application, consider breaking its code into shell usage as well as GUI usage; sometimes it takes a little more thought, but generally it results in cleaner code, improved usage of system resources, and better logic.

If you're a web developer, always check your websites in a text browser like w3m-emacs, lynx, or elinks. It's not just a fun "retro" gimmick; these browsers are how computers (and speech synthesizers like flite) "see" your sites. If you find your site difficult to use in a text interface, so do blind users!

And finally, if you're an educator, learn, promote, and teach Emacs. It offers so many applications that make a computer accessible to blind users (and sighted users over remote shells), so the more information out there to de-mystify its interface, the better!