Factorials and unscrambling words with bash on Linux


In this post, we examine a bash script that takes a string of letters, rearranges them in every possible way and checks each permutation to identify those that are English words. In the process, we’ll take a close look at the script and calculate how hard it might have to work.

Note that, in the algorithm used, each letter arrangement must use all of the letters in the string provided. Words formed by substrings are not considered.

First, the script expects the scrambled string to be provided as an argument and prompts for it if none is provided. It then checks out each arrangement of letters to find those that exist in the system’s words file – in this case, that’s /usr/share/dict/words. Here are the first lines in the script:

#!/bin/bash

if [ $# == 0 ]; then
    echo -n "scrambled string> "
    read string
else
    string=$1
fi

The next part of the script defines a function that will both rearrange the letters and look for matches in the words file. Matches will be added to an array if the array does not already contain them.

function mixup {
    if [ "${#1}" == 1 ]; then
        word="${2}${1}"
        grep ^$word$ /usr/share/dict/words &> /dev/null
        if [ $? == 0 ]; then
            if [[ ! " ${words[@]} " =~ "$word" ]]; then  # add word if new
                    words[$n]=$word
                ((++n))
            fi
        fi
    else
        for i in $(seq 0 $((${#1}-1)) ); do
            pre="${2}${1:$i:1}"
            pc1="${1:0:$i}"
            pc2="${1:$((i+1))}"
            pc="${pc1}${pc2}"
            mixup "$pc" "$pre"
        done
    fi
}

In the final lines of the script, the mixup function is called for the first time. After it has run the required number of times, the script displays the number of words that were saved in the array and then lists the words.

mixup $string
echo ${#words[@]} "word(s) found"
for n in ${words[@]}; do echo $n; done

if we ran the script for the letters “olwf”, we would get a list containing three words.

$ unscramble olwf
3 words found
wolf
fowl
flow

The responses would be delivered in a second or so.

The worst thing about this script, however, is that if you give it a fairly long string to unscramble into words, it’s going to take a long time to get back to you. You might as well go on a coffee break or take an early vacation. Working with a 12-character string might not sound like a big deal, but it would mean that you’d be working with 12! (12 factorial) different arrangements of the letters. If you haven’t worked with factorials in a while, let me remind you that 12! is 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 or 479,001,600 different letter arrangements (counting any duplicates for letters that are in the word more than once)! The script will take a couple minutes for an eight-character string, but a 12-character string would require considerably more time.

To calculate a factorial, you could use a script like this one:

#!/bin/bash

if [ $# == 0 ]; then
    echo -n "enter number> "
    read num
else
    num=$1
fi

fac=1

while [ $num -gt 1 ]
do
    fac=$((fac * num))
    num=$((num - 1))
done

echo $fac

Presented with the number 8, the script would tell us that there would be 40,320 different arrangements of an 8-character string. By adding 4 more characters, we’re looking at nearly half a billion.

$ factorial 8
40320

$ factorial 12
479001600

This difference greatly impacts the time required for all the strings to be generated and checked against the words file. An 8-character string should only take a couple minutes to run.

$ time unscramble bthpaale
1 word(s) found
alphabet

real    1m49.693s
user    1m20.559s
sys     0m26.921s

Would a nine-character string take twice as long to process? No, it ought to take nine times as long. A 12-character string? Nearly 12,000 (11,800) times as long. Here’s an example of running with a nine-character string:

$ time unscramble fialactor
1 word(s) found
factorial

real    16m27.318s
user    12m1.492s
sys     4m5.169s

A 12-character string could take weeks unless, maybe, you just happen to be using a supercomputer.

Wrap-Up

Unscrambling words using the methodology described is thorough but, for long words, can be incredibly slow. I suspect that sites like wordunscrambler are using quite a different approach – probably making use of a pre-generated lists of scrambled and unscrambled words.

While most of the bash scripts that I write and use are both straightforward and fairly efficient, looking into the issue of unscrambling strings has made it clear that sometimes the most logical approach to solving a problem will be neither.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2021 IDG Communications, Inc.



Source link