Open source alternatives to Grammarly for word processing

A graduate degree could springboard you into an open source job

Image by:

Opensource.com

Grammarly is popular among many teachers, students, business people, and others who need to write or process a lot of words on a regular basis. It's a useful tool, but you're required to register and log in to use it, and I rarely keep website login data in my cache.

I process words pretty often for writing technical and creative pieces, and ducking out of my text editor to open a web browser, much less to visit a site that requires me to log in, is usually too much a bother for me. Fortunately, with a few open source utilities, I can avoid this distraction.

Grammarly's main benefits are checking for:

Spelling errors
English grammar errors
Plagiarism
Style

Following are the open source alternatives I use for each of these functions.

Spelling

Spell checking is common in most word processors and even text editors. I use Flyspell in Emacs. Flyspell-mode is a minor mode that provides on-the-fly spell checks. Should I spell a word incorrectly, it's underlined with a red line that prompts me to review it. It also has an option to autocorrect words, and if I didn't deal in technology and fantasy and science fiction so much, I'd probably use it.

You can install Flyspell using Emacs' packages interface. To make it an active mode upon launch, add this to your .emacs file:

(require 'flyspell)
(flyspell-mode +1)

Grammar

For grammatical issues, I use the LanguageTool API. It's an open source website and library funded by the European Union and developed by coders around the world.

You can use LanguageTool as a plugin for LibreOffice or Firefox, Chromium, Brave, Chrome, and other browsers; as a terminal command; or as a graphical application. It even has plugins for proprietary editors like Google Docs and Microsoft Word. If you download it for local use, you must have Java installed.

There's also an Emacs plugin, which essentially is an Elisp connector between Emacs and the LanguageTool Java library. By installing the langtool package in Emacs, LanguageTool checks my grammar without ever having to consciously launch it myself.

Plagiarism checks

The line between research, reporting, and reuse is often a little blurry, and with so much content available on the internet, it gets less clear every day. Typically, I try to limit myself to Creative Commons and open source resources, but even then, it's important to credit those resources either out of legal obligation or as common courtesy (depending on the license). One way to keep influences in check is to verify your final work against what already exists on the internet.

I use a Python script to do my plagiarism checks. It's by no means a good script. I hacked it together as a quick and easy way to guard against obvious copy-paste mistakes or misjudgments. So, while it's not an elegant script (the option parsing is over-complex and inefficient, and there's no adjustable tolerance level to exclude extremely short searches) and there are sure to be lots of false positives, it's an example of how a quick Python script can replace a service that doesn't otherwise fit into your workflow.

Before using it, you must install the Python google module to enable easy Google searches:

$ python3 -m pip install google --user

I specifically use Google and not an open source search engine like YaCy because I want a big pool of data to draw from.

Here's the script:

#!/usr/bin/env python3
# stollen plagiarism checker
# by Seth Kenlon <skenlon@redhat.com>
# GPLv3

# This program is free software: you can redistribute it 
# and/or modify it under the terms of the GNU General 
# Public License as published by the Free Software 
# Foundation, either version 3 of the License, or (at 
# your option) any later version.

# This program is distributed in the hope that it will be
# useful, but WITHOUT ANY WARRANTY; without even the 
# implied warranty of MERCHANTABILITY or FITNESS FOR A 
# PARTICULAR PURPOSE.  See the GNU General Public License 
# for more details.

# You should have received a copy of the GNU General 
# Public License along with this program. 
# If not, see <https://www.gnu.org/licenses/>.

import sys
import random
from pathlib import Path
from googlesearch import search 

def Scrub(ARG):
    """
    Read lines of file.
    """

    f = open(ARG, 'r') 
    LINES = f.readlines() 
    Search(LINES)

def Search(LINES):
    """
    Search Internet for exact match of LINE.
    """

    COUNT=0
    
    for LINE in LINES:
        COUNT += 1        
        PAUSE = random.randrange(1,4)

        if VERBOSE:
            print("Searching...")
            
        for ITEM in search(LINE, tld="com", num=1, stop=1, pause=PAUSE):
            if VERBOSE:
                print("WARNING:" + LINE + " → " + ITEM)
            else:
                print("WARNING: line " + str(COUNT) + " → " + ITEM)

if __name__ == "__main__":
    random.seed()
    n=1
    
    if sys.argv[1] == "--verbose" or sys.argv[1] == "-v":
        VERBOSE = True
        # shift 1
        n += 1
    else:
        VERBOSE = False
        
    f = Path(sys.argv[n])

    if not f.is_file():
        print("Provide a text file to check.")
        exit()
    else:
        Scrub(sys.argv[n])

Here's a simple test file containing a few lines from the public domain work Alice in Wonderland and a line from a copyrighted song, both of which the script caught, and a line of nonsense text that correctly is not flagged by the script:

Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?”

So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.

acrutchen macpie.

Just when you think you've got more than enough, that's when it all up and flies away

You can test this by saving the Python script into a file called stollen.py (named after the delicious Christmas cake, not the idea that anyone would ever use stolen content), and the contents of the test file into test.txt. The expected results are hits on all but line 5.

$ chmod +x ./stollen.py 
$ ./stollen.py test.txt
WARNING: line 1 → https://www.ego4u.com/en/read-on/literature/alice-wonderland?part1
WARNING: line 3 → https://www.goodreads.com/quotes/733845-so-she-was-considering-in-her-own-mind-as-well
WARNING: line 7 → https://genius.com/Prince-and-the-new-power-generation-money-dont-matter-2-night-lyrics

To safeguard against being blocked by Google, I use a random number of seconds to pause between calls, so the script isn't very fast by design. Then again, if you've ever used Grammarly, you know that its plagiarism checker isn't very fast, either.

Style review

Of all the features provided by automated editors, a style review is least important for me. Even with Grammarly's adjustable tolerance settings for writing styles spanning from formal to casual, I almost never agree with its suggestions, and it rarely catches things I dislike.

Defining an appropriate style, I think, is subjective for both the author and the reader, and in the context of automated editing, I believe it's actually shorthand for how strictly rules are applied. Therefore, what's actually important are breaches of rules, and it's up to the author or reviewer to decide whether the rule ought to be applied or ignored.

The strictest languages of all are constructed languages intended for computers, such as C, Java, Python, and so on. Because these languages are strictly defined, it's possible to check them, stringently and without exception, against the rules that define them. This process is called linting in computer science, and the aim of the proselint project is to bring that process to natural languages.

You can install proselint as a Python module:

$ python3 -m pip install proselint --user

Once it's installed, run it against a text file:

$ prolint myfile.txt

It provides grammar advice and performs some style checks to catch clichés and slang. It's a useful and objective look at prose, and you're free to ignore or follow its advice. Try it out if you're uncertain about the clarity or vibrancy of your writing.

Open source means choice

There are lots of websites out there that don't publish their source code, and we all use them every day. Finding a good open source alternative isn't always about licensing or source code availability. Sometimes, it's about finding a tool that works better for you than what you were using previously.

With open source, you can survey your options and test them out until you find the one closest to your personal preference. If you want style checking, you have several linters and style checkers to choose from. If you want spelling and grammar checkers, you have many applications that let you integrate different dictionaries and interfaces. Non-open applications don't tend to allow that kind of flexibility. If you limit yourself, even if it's only for a few tasks, to software that isn't open, the diversity of possibility can be difficult to see.

Challenge yourself today, whether it's for spell checking or automated style critique or something else entirely: Find an open source alternative, and see if you can turn something routine into something compelling, fun, and effective.