Replace smart quotes with the Linux sed command | Opensource.com

Replace smart quotes with the Linux sed command

Banish "smart" quotes with your favorite version of sed.

Coding on a computer
x

Subscribe now

Get the highlights in your inbox every week.

In typography, a pair of quotation marks were traditionally oriented toward one another. They look like this:

“smart quotes”

As computers became popular in the mid-twentieth century, the orientation was often abandoned. The original character set of computers didn't have much room to spare, so it makes sense that two double-quotes and two single-quotes were reduced down to just one of each in the ASCII specification. These days the common character set is Unicode, with plenty of space for lots of fancy quotation marks and apostrophes, but many people have become used to the minimalism of just one character for both opening and closing quotes. Besides that, computers actually see the different kinds of quotation marks and apostrophes as distinct characters. In other words, to a copmuter the right double quote is different from the left double quote or a straight quote.

Replacing smart quotes with sed

Computers aren't typewriters. When you press a key on your keyboard, you're not pressing a lever with an inkstamp attached to it. You're just pressing a button that sends a signal to your computer, which the computer interprets as a request to display a specific predefined character. The request depends on your keyboard map. As a Dvorak typist, I've witnessed the confusion on people's faces when they discover "asdf" on my keyboard produces "aoeu" on the screen. You may also have pressed special combinations of keys to produce characters, such as ™ or ß or ≠, that's not even printed on your keyboard.

Each letter or character, whether it's printed on your keyboard or not, has a code. Character encoding can be expressed in different ways, but to a computer the Unicode sequences u2018 and u2019 produce and , while the codes u201c and u201d produce the and characters. Knowing these "secret" codes means you can replace them programmatically using a command like sed. Any version of sed will do, so you can use GNU sed or BSD sed or even Busybox sed.

Here's the simple shell script I use:

#!/bin/sh
# GNU All-Permissive License

SDQUO=$(echo -ne '\u2018\u2019')
RDQUO=$(echo -ne '\u201C\u201D')
$SED -i -e "s/[$SDQUO]/\'/g" -e "s/[$RDQUO]/\"/g" "${1}"

Save this script as fixquotes.sh and then create a separate test file containing smart quotes:

‘Single quote’
“Double quote”

Run the script, and then use the cat command to see the results:

$ sh ./fixquotes.sh test.txt
$ cat test.txt
'Single quote'
"Double quote"

Install sed

If you’re using Linux, BSD, or macOS, then you already have GNU or BSD sed installed. These are two unique reimplementations of the original sed command, and for the script in this article they are functionally the same (that's not true for all scripts, though).

On Windows, you can install GNU sed with Chocolatey.

Penguin with green background

Learn basic sed usage then download our cheat sheet for a quick reference to the Linux stream editor.
A person working.

Check your writing for spelling, grammar, plagiarism, and style errors using these open source tools.
Typewriter keys

Vim offers great benefits to writers, regardless of whether they are technically minded or not.

About the author

Seth Kenlon
Seth Kenlon - Seth Kenlon is a UNIX geek, free culture advocate, independent multimedia artist, and D&D nerd. He has worked in the film and computing industry, often at the same time. He is one of the maintainers of the Slackware-based multimedia production project Slackermedia.