A Python script for fixing smart quotes in text

A script for replacing straight quotes with curly quotes in Scribus.

Image by:

Opensource.com

About six years ago, we had a question on the Scribus mail list from someone who wanted to know whether there was an automated way to convert typewriter quotation marks to typographic quotes. In case you don't know what this means, typographic quotes (for example, “ and ”) are sometimes referred to as curly quotes, rather than the incorrect versions on your typewriter (i.e., "). The ones on the keyboard are fine for using as abbreviations for feet or inches, or minutes and seconds, but most of the time you really want curly quotes in a piece of text. (Editor's note: Opensource.com's style is to use straight quotes when possible.)

Although most word processors will automatically replace typewriter quotes with typographic ones, Scribus doesn't have that built-in function. It seemed like an interesting scripting challenge to me, so I took it up. This led to a script, a quotation marks conversion wiki page, and finally the script—called Autoquote—was deemed useful enough to include with the Scribus package. (Find the current version of Autoquote at the bottom of this article.) Since then, we've added enhancements to a version called Autoquote2, also included with Scribus. This one has an option for dialogs in French and effort toward putting spaces between quotes and text (as is done in French).

The idea is simple enough that, generally speaking, if one of these marks follows a space or is at the beginning of a paragraph and is followed by a character of some sort, it should be a left-hand quote, and mostly, right-hand quotes are followed by a space or are at the end of a paragraph. However, you then have words like contractions, such as the word “aren’t” and what about “’nested’ quotes”? (For example, LibreOffice gets nested quotes wrong, but Autoquote does them correctly.) Finally, typographic quotes are not the same for all languages. French and Russian use guillemets («»), and other languages place the quote marks differently—these are different glyphs in some cases (˛’ or „“ for example). It's actually amazing how many variations there are.

What you see early on in the script is that you need to tell the script which language you are using:


    lang = scribus.valueDialog("Choose by language or country", 'Language: af, be, ch, cs, de, en, es, et, fi, fr,\n hu, is, lt, mk, nl, pl, ru, se, sk, sl, sq and uk\n are current choices','en')

After which there is a long list of assignments, such as:

if (lang == 'en'):
   lead_double = u"\u201c"
   follow_double = u"\u201d"
   lead_single = u"\u2018"
   follow_single = u"\u2019"
elif (lang == 'de'):
   lead_double = u"\u201e"
   follow_double = u"\u201c"
   lead_single = u"\u2019"
   follow_single = u"\u201a"
elif (lang == 'fr'):      
   lead_double = u"\u00ab"
   follow_double = u"\u00bb"
   lead_single = u"\u2018"
   follow_single = u"\u2019"

This assigns the correct unicode character for the replacement for leading or following quotation marks. After this, the script can get down to work, and it does so by parsing the entire text of the selected text frame, with parsing meaning analyzing the text, character by character. A further complication is that you can't see a number of characters when you look at a Scribus document. You don't really see a carriage return character, but it's there. Additionally, there are control characters used to change the character or paragraph style, and you don't want to mess these up in the process. You must ignore these as far as any scheme of quotation marks assignment, but you need to keep them just the same. If you check these characters with the Python command len, you will find that their length is 0, so this is how we find them.

In case it's not clear, what the script does then is tear apart the contents of a text frame, character by character, then changes typewriter single and double quotes as desired, rebuilding the content of the frame. On the wiki page you can see how this logic plays out.

There is one flaw in the script for which I haven't figured out the logic—the situation in which you have a contraction at the beginning of a word (for example, ‘twas). Even in LibreOffice, this comes out as a left-hand single quote, but should be right-handed, like ’twas.

This is the story of how the Autoquote script came to be. What I realized later is that this text frame parsing process could also be used for other situations, which I will talk about in my next Python and Scribus article.

Autoquote.py script

#!/usr/bin/env python
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Autoquote.py - changes typewriter quotes to typographic quotes
# © 2010.08.28 Gregory Pittman
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
"""
USAGE

You must have a document open, and a text frame selected.
There will be a valueDialog asking for your language for the quotes, 
the default is 'en', but change the default to suit your needs.
Detected errors shut down the script with an appropriate message.

"""
import scribus

if scribus.haveDoc() > 0:
    c = 0
    lang = scribus.valueDialog("Choose by language or country", 'Language: af, be, ch, cs, de, en, es, et, fi, fr,\n hu, is, lt, mk, nl, pl, ru, se, sk, sl, sq and uk\n are current choices','en')
    if (lang == 'en'):
        lead_double = u"\u201c"
        follow_double = u"\u201d"
        lead_single = u"\u2018"
        follow_single = u"\u2019"
    elif (lang == 'de'):
        lead_double = u"\u201e"
        follow_double = u"\u201c"
        lead_single = u"\u2019"
        follow_single = u"\u201a"
    elif (lang == 'fr'):      
        lead_double = u"\u00ab"
        follow_double = u"\u00bb"
        lead_single = u"\u2018"
        follow_single = u"\u2019"  # am hoping this will cover contractions like je t'aime
    elif (lang == 'pl'):
        lead_double = u"\u201e"
        follow_double = u"\u201d"
        lead_single = u"\u201a"
        follow_single = u"\u2019"
    elif ((lang == 'se') or (lang == 'fi')):
        lead_double = u"\u201d"
        follow_double = u"\u201d"
        lead_single = u"\u2019"
        follow_single = u"\u2019"
    elif (lang == 'af'):
        lead_double = u"\u201c"
        follow_double = u"\u201d"
        lead_single = u"\u2018"
        follow_single = u"\u2019"
    elif (lang == 'sq'):
        lead_double = u"\u201e"
        follow_double = u"\u201c"
        lead_single = u"\u2018"
        follow_single = u"\u2019"
    elif ((lang == 'be') or (lang == 'ch') or (lang == 'uk') or (lang == 'ru')):
        lead_double = u"\u00ab"
        follow_double = u"\u00bb"
        lead_single = u"\u2039"
        follow_single = u"\u203a"
    elif (lang == 'uk'):
        lead_double = u"\u00ab"
        follow_double = u"\u00bb"
        lead_single = u"\u2039"
        follow_single = u"\u203a"
    elif (lang == 'es'):
        lead_double = u"\u00ab"
        follow_double = u"\u00bb"
        lead_single = u"\u2018"
        follow_single = u"\u2019"
    elif ((lang == 'lt') or (lang == 'is') or (lang == 'sk') or (lang == 'sl') or (lang == 'cs') or (lang == 'et')):
        lead_double = u"\u201e"
        follow_double = u"\u201c"
        lead_single = u"\u201a"
        follow_single = u"\u2018"
    elif (lang == 'mk'):
        lead_double = u"\u201e"
        follow_double = u"\u201c"
        lead_single = u"\u2019"
        follow_single = u"\u2018"
    elif ((lang == 'hu') or (lang == 'nl')):
        lead_double = u"\u201e"
        follow_double = u"\u201d"
        lead_single = u"\u00bb"
        follow_single = u"\u00ab"
    else:
        scribus.messageBox('Language Error', 'You need to choose an available language', scribus.ICON_WARNING, scribus.BUTTON_OK)
        sys.exit(2)
        
else:
    scribus.messageBox('Usage Error', 'You need a Document open', scribus.ICON_WARNING, scribus.BUTTON_OK)
    sys.exit(2)

if scribus.selectionCount() == 0:
    scribus.messageBox('Scribus - Usage Error',
        "There is no object selected.\nPlease select a text frame and try again.",
        scribus.ICON_WARNING, scribus.BUTTON_OK)
    sys.exit(2)
if scribus.selectionCount() > 1:
    scribus.messageBox('Scribus - Usage Error',
        "You have more than one object selected.\nPlease select one text frame and try again.", scribus.ICON_WARNING, scribus.BUTTON_OK)
    sys.exit(2)
textbox = scribus.getSelectedObject()
pageitems = scribus.getPageItems()
boxcount = 1
for item in pageitems:
    if (item[0] == textbox):
        if (item[1] != 4):
            scribus.messageBox('Scribus - Usage Error', "This is not a textframe. Try again.", scribus.ICON_WARNING, scribus.BUTTON_OK)
            sys.exit(2)
contents = scribus.getTextLength(textbox)
while c <= (contents -1):
    if ((c + 1) > contents - 1):
        nextchar = ' '
    else:
        scribus.selectText(c+1, 1, textbox)
        nextchar = scribus.getText(textbox)
    scribus.selectText(c, 1, textbox)
    char = scribus.getText(textbox)
    if (len(char) != 1):
        c += 1
        continue
    if ((ord(char) == 34) and (c == 0)):
        scribus.deleteText(textbox)
        scribus.insertText(lead_double, c, textbox)
    elif (ord(char) == 34):
        if ((prevchar == '.') or (prevchar == ',') or (prevchar == '?') or (prevchar == '!')):
            scribus.deleteText(textbox)
            scribus.insertText(follow_double, c, textbox)
        elif ((ord(prevchar) == 39) and ((nextchar != ' ') and (nextchar != ',') and (nextchar != '.'))):
            scribus.deleteText(textbox)
            scribus.insertText(lead_double, c, textbox)
        elif ((nextchar == '.') or (nextchar == ',')):
            scribus.deleteText(textbox)
            scribus.insertText(follow_double, c, textbox)

        elif ((prevchar == ' ') or ((nextchar != ' ') and (ord(nextchar) != 39))):
            scribus.deleteText(textbox)
            scribus.insertText(lead_double, c, textbox)
        else:
            scribus.deleteText(textbox)
            scribus.insertText(follow_double, c, textbox)
            
    if ((ord(char) == 39) and (c == 0)):
        scribus.deleteText(textbox)
        scribus.insertText(lead_single, c, textbox)
    elif (ord(char) == 39):
        if ((prevchar == '.') or (prevchar == ',') or (prevchar == '?') or (prevchar == '!')):
            scribus.deleteText(textbox)
            scribus.insertText(follow_single, c, textbox)
        elif ((ord(prevchar) == 34) and ((nextchar != ' ') and (nextchar != ',') and (nextchar != '.'))):
            scribus.deleteText(textbox)
            scribus.insertText(lead_single, c, textbox)
        elif ((prevchar != ' ') and (ord(prevchar) != 34) and (nextchar != ' ')):
            scribus.deleteText(textbox)
            scribus.insertText(follow_single, c, textbox)
        elif ((prevchar == ' ') or ((nextchar != ' ') and (ord(nextchar) != 34))):
            scribus.deleteText(textbox)
            scribus.insertText(lead_single, c, textbox)
        else:
            scribus.deleteText(textbox)
            scribus.insertText(follow_single, c, textbox)
            
    c += 1
    prevchar = char

scribus.setRedraw(1)
scribus.docChanged(1)
endmessage = 'Successfully ran script\n Last character read was '+str(char) # Change this message to your liking
scribus.messageBox("Finished", endmessage, icon=scribus.ICON_NONE, button1=scribus.BUTTON_OK)