I've recently become a little obsessed with an online word puzzle game in which you have six attempts to guess a random five-letter word. The word changes every day, and you can only play once per day. After each guess, each of the letters in your guess is highlighted: gray means that letter does not appear in the mystery word, yellow means that letter appears in the word but not at that position, and green means the letter appears in the word at that correct position.
Here's how you can use the Linux command line to help you play guessing games like Wordle. I used this method to help me solve the January 6 puzzle:
Linux systems keep a dictionary of words in the
/usr/share/dict/words file. This is a very long plain text file. My system's words file has over 479,800 entries in it. The file contains both plain words and proper nouns (names, places, and so on).
To start my first guess, I just want a list of plain words that are exactly five letters long. To do that, I use this
$ grep '^[a-z][a-z][a-z][a-z][a-z]$' /usr/share/dict/words > myguess
grep command uses regular expressions to perform searches. You can do a lot with regular expressions, but to help me solve Wordle, I only need the basics: The
^ means the start of a line, and the
$ means the end of a line. In between, I've specified five instances of
[a-z], which indicates any lowercase letter from a to z.
I can also use the
wc command to see my list of possible words is "only" 15,000 words:
$ wc -l myguess 15034 myguess
From that list, I picked a random five-letter word: acres. The a was set to yellow, meaning that letter exists somewhere in the mystery word but not in the first position. The other letters are gray, so I know they don't exist in the word of the day.
For my next guess, I want to get a list of all words that contain an a, but not in the first position. My list should also not include the letters c, r, e, or s. Let's break this down into steps:
To get a list of all words with an a, I use the
fgrep (fixed strings grep) command. The
fgrep command also searches for text like
grep, but without using regular expressions:
$ fgrep a myguess > myguess2
That brings my possible list of next guesses down from 15,000 words to 6,600 words:
$ wc -l myguess myguess2 15034 myguess 6634 myguess2 21668 total
But that list of words also includes the letter a in the first position, which I don't want. The game already indicated the letter a exists in some other position. I can modify my command with
grep to look for words containing some other letter in the first position. That narrows my possible guesses to just 5,500 words:
$ fgrep a myguess | grep '^[b-z]' > myguess2 $ wc -l myguess myguess2 15034 myguess 5566 myguess2 20600 total
But I know the mystery word also does not include the letters c, r, e, or s. I can use another
grep command to omit those letters from the search:
$ fgrep a myguess | grep '^[b-z]' | grep -v '[cres]' > myguess2 $ wc -l myguess myguess2 15034 myguess 1257 myguess2 16291 total
-v option means to invert the search, so
grep will only return the lines that do not match the regular expression
[cres] or the single list of letters c, r, e, or s. With this extra
grep command, I've narrowed my next guess considerably to only 1,200 possible words with an a somewhere but not in the first position, and that do not contain c, r, e, or s.
After viewing the list, I decided to try the word balmy.
This time, the letters b and a were highlighted in green, meaning I have those letters in the correct position. The letter l was yellow, so that letter exists somewhere else in the word, but not in that position. The letters m and y are gray, so I can eliminate those from my next guess.
To identify my next list of possible words, I can use another set of
grep commands. I know the word starts with ba, so I can begin my search there:
$ grep '^ba' myguess2 > myguess3 $ wc -l myguess3 77 myguess3
That's only 77 words! I can narrow that further by looking for words that also contain the letter l in anywhere but the third position:
$ grep '^ba[^l]' myguess2 > myguess3 $ wc -l myguess3 61 myguess3
^ inside the square brackets
[^l] means not this list of letters, so not the letter l. That brings my list of possible words to 61, not all of which contain the letter l, which I can eliminate using another
$ grep '^ba[^l]' myguess2 | fgrep l > myguess3 $ wc -l myguess3 10 myguess3
Some of those words might contain the letters m and y, which are not in today's mystery word. I can remove those from my list of guesses with one more inverted
$ grep '^ba[^l]' myguess2 | fgrep l | grep -v '[my]' > myguess3 $ wc -l myguess3 7 myguess3
My list of possible words is very short now, only seven words!
$ cat myguess3 babul bailo bakal bakli banal bauld baulk
I'll pick banal as a likely word for my next guess, which happened to be correct.
The power of regular expressions
The Linux command line provides powerful tools to help you do real work. The
fgrep commands offer great flexibility in scanning lists of words. For a word-based guessing game,
grep helped identify a list of 15,000 possible words of the day. After guessing and knowing what letters did and did not appear in the mystery word,
fgrep helped narrow the options to 1,200 words and then only seven words. That's the power of the command line.