Finding and fixing typos on Linux


If you want to check a text file for typos, Linux can help.

It has a couple of tools and a number of commands that can point out the errors including aspell and enchant, and I’ll share a script that I put together recently that looks for typos using the system’s words file.

Using aspell

aspell is very clever tool that will point out typos and make it surprisingly easy to fix them. When used to make changes to a single file, it reverses the text and background colors to highlight misspelled words. You would start it with a command like this:

$ aspell check myfile

If aspell detects no typos, it simply exits. Otherwise, it will open with a display that contains the file text (or just the top lines depending on the length of the file) followed by a list of suggested replacement words and, below that, a list of the commands that you can run. The first typo (or suspected typo) will be displayed with the text and background colors reversed as shown below.

I wish that I could type with my eyes closed and never make a mistake. I don't
like typoze and I think I run into them far more often than I want.

1) depose                               6) typo              <===== suggested words
2) typos                                7) topaz
3) typo's                               8) topees
4) types                                9) type's
5) type                                 0) typed
i) Ignore                               I) Ignore all        <===== commands
r) Replace                              R) Replace all
a) Add                                  l) Add Lower
b) Abort                                x) Exit

?

If you want to replace the typo with one of the words listed, just use your keyboard to type the digit to the left of the word you want to select. If it’s the only typo in the file, aspell will make the change and exit. Otherwise, it will move on to the next misspelled word.

You can also replace a typo by typing “r” and then typing the word you want to use to replace it. If it’s a word that is likely to be repeated, you can press “R” instead and replace all instances of the word in the file. You can also decide to ignore what aspell deems a typo. After all, it might be a term that aspell simply doesn’t recognize or an acronym. You can do this one instance at a time by typing “i” or as a group by typing “I”.

As a precaution, aspell creates a backup file (e.g., myfile.bak) of the file you are checking so that you can recover your typos if you find it necessary and repair words you might have changed in error.

You can also use aspell to check the spelling of a group of words. Type “aspell -a” as shown below and you can type a word and see the list of suggested replacements. If aspell responds with an asterisk (*), the word was spelled correctly.

$ aspell -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8)
typoze
& typoze 17 0: depose, typos, typo's, types, type, typo, topaz, topees, type's, typed, tapes, topee, Topsy, doze, pose, tape, tape's

typos
*
^C

Type ^C to exit as shown above.

Using enchant

A tool named “enchant” will list the words it considers typos with a command like this:

$ enchant -l myfile
typoze
typoze

If you expect a lot of typos, you can use a command like the one below to tell you how many times each typo appears in the file:

$ enchant -l myfile | uniq -c
      2 typoze

To view the suggested replacements, run a command like this:

$ enchant -a myfile | grep :
& typoze 1 5: typo
& typoze 1 6: typo

Building a spell-checking script

I put a bash script together to see how well I could check the words in a file against the Linux words file (/usr/share/dict/words on my system). The task turned out to be a little trickier than I expected.

I run the script like this:

$ findTypos myfile
typoze
typoze

The script contains a series of commands to find and display a list of the misspelled words. The first group of commands check to see that a filename has been provided as an argument. If not, it prompts for one.

#!/bin/bash

if [ $# == 0 ]; then
    echo -n "file: "
    read file
else
    file=$1
fi

while read -ra line;
do
    for word in "${line[@]}";
    do
        word=$(echo $word | tr '[:upper:]' '[:lower:]')
        word=`echo $word | tr -d '[.,?:!"]'` # punct doesn't work for this
        word=`echo $word | sed s/'s//| sed s/'s//`
        grep ^$word$ words >/dev/null || echo $word
    done;
done < $file

The script then runs through each word in the file and runs a tr command to change it to all lowercase to avoid issues with capitalized words. It then uses a second tr command to remove most punctuation marks so that periods, question marks, etc. don’t cling to the words that need to be checked. I didn’t use [:punct:] because it would have removed the apostrophe in words like “isn’t”, but I separately removed the possessive “’s” at the ends of words. The last step was looking for the word in the words file. The ^ and $ characters tell grep to find only the word specified, not words that might include that word.

The script, which I call “findTypos”, finds typos, but makes no attempt to fix them or suggest replacement words.

Wrap-Up

Detecting misspelled words in your text files can be helpful, especially if you’re preparing your weekly report and your boss is a stickler for grammar. Fortunately, Linux provides a number of ways to help with this.

Copyright © 2022 IDG Communications, Inc.



Source link