In my last article, I described Autoquote, a script that converts typewriter (or "straight") quotes to typographic (or "curly") quotes, which was prompted by a question on the Scribus open source desktop publishing software's mail list. Most publications adhere to certain style conventions, including the type of quotation marks they use, and a script that automatically corrects deviations from house style is a big time-saver.
At the heart of the script was a method for parsing the contents of a text frame character-by-character, which works like this: First, various kinds of control characters, e.g., for carriage returns and those that denote styles, are left intact. Second, there is a progressive march along the text that keeps track of the current character being analyzed and also the preceding and following characters to determine whether a typewriter quote should be replaced with a left or right quotation mark.
After I created this Autoquote script, someone asked me if I could write one that scrambles the text in a document. The person wanted to post a Scribus document to show a layout, yet hide the text contents of the frames. The idea was not to encrypt the document, but simply turn the text into gibberish. It seemed to me that the basic parsing section from Autoquote would serve this purpose well.
I called the end result replacetext.py, and I eventually ended up with four different versions, as you can see on the wiki page. The original one operated only on a selected text frame, but then there came a version that converted all text frames on regular pages, another for all text frames including those on master pages, and another version that worked only on the current page of the document.
I chose to do this scrambling as follows:
alpha = random.randint(1,26)
letter = chr(alpha + 96)
LETTER = chr(alpha + 64)
scribus.insertText(letter, c, textbox)
scribus.insertText(LETTER, c, textbox)
For each newly parsed character, a random integer between 1 and 26 is generated. This random integer simultaneously creates a random lower case and upper case letter. Then the script tests the original text character to determine whether it's a lower or upper case letter so it can make the appropriate substitution (i.e., to preserve the original character's case). Only a-z and A-Z characters are affected, not numbers and not those outside of ASCII territory (although it wouldn't be hard to extend this).
Because of the randomization function, there is no way to reverse the process, but I still wanted to retain the rough appearance of text with capitalizations and word spacing intact. In practical use, one side effect is that text tends to take up more space, which I presume relates to an increase in the number of wider glyphs than is usual in English. In practice, a user could delete characters when necessary for the layout appearance.
Here is an example of the results. The top paragraph is the original text, just some sample English text from Scribus, and below is the result after running replacetext.py. As you can see, only the alphabetic characters are converted, and occasionally randomization results in the same character, as expected. This also shows how the replaced text tends to take up more space.
Many publications specify the use of en (–) and em (—) dashes in their style guide. These are different from hyphens (-), but are sometimes denoted by two or three hyphens typed together. Many Scribus users compose their text in a text editor outside of Scribus, then import it into a text frame in the Scribus document. Like with typographic quotes, a utility to automatically convert hyphens to en and em dashes would be useful.
You can find the en+emdash.py script on its wiki page. Here is the pertinent assignment strategy:
if (char == '-'):
if (prevchar == '-'):
if (nextchar == '-'):
scribus.selectText(c-1, 3, textbox)
scribus.insertText(mdash, c-1, textbox)
char = mdash
scribus.selectText(c-1, 2, textbox)
scribus.insertText(ndash, c-1, textbox)
char = ndash
In this case, the variables mdash and ndash have been previously assigned the appropriate Unicode characters. If the en+emdash.py script encounters a hyphen, it checks to see if a previous character was also a hyphen. If that is true, then it checks the following character, and if this is also a hyphen (i.e., ---), it assigns an em dash, but if not (i.e., --), it assigns an en dash. Single hyphens are left as single hyphens.
This isn't a powerful or frequently used script, but it functions as a simple utility to accomplish one task, much like a number of Unix/Linux command line functions.
This also shows that, once you have taken the time to work through the logic of a complex basic operation like text parsing, you can go on to adapt it to a variety of uses.