Factorials and unscrambling words with bash on Linux
In this post, we examine a bash script that takes a string of letters, rearranges them in every possible way and checks each permutation to identify those that are English words. In the process, we’ll take a close look at the script and calculate how hard it might have to work.
Note that, in the algorithm used, each letter arrangement must use all of the letters in the string provided. Words formed by substrings are not considered.
First, the script expects the scrambled string to be provided as an argument and prompts for it if none is provided. It then checks out each arrangement of letters to find those that exist in the system’s words file – in this case, that’s /usr/share/dict/words. Here are the first lines in the script:
#!/bin/bash if [ $# == 0 ]; then echo -n "scrambled string> " read string else string=$1 fi
The next part of the script defines a function that will both rearrange the letters and look for matches in the words file. Matches will be added to an array if the array does not already contain them.
function mixup { if [ "${#1}" == 1 ]; then word="${2}${1}" grep ^$word$ /usr/share/dict/words &> /dev/null if [ $? == 0 ]; then if [[ ! " ${words[@]} " =~ "$word" ]]; then # add word if new words[$n]=$word ((++n)) fi fi else for i in $(seq 0 $((${#1}-1)) ); do pre="${2}${1:$i:1}" pc1="${1:0:$i}" pc2="${1:$((i+1))}" pc="${pc1}${pc2}" mixup "$pc" "$pre" done fi }
In the final lines of the script, the mixup function is called for the first time. After it has run the required number of times, the script displays the number of words that were saved in the array and then lists the words.
mixup $string echo ${#words[@]} "word(s) found" for n in ${words[@]}; do echo $n; done
if we ran the script for the letters “olwf”, we would get a list containing three words.
$ unscramble olwf 3 words found wolf fowl flow
The responses would be delivered in a second or so.
The worst thing about this script, however, is that if you give it a fairly long string to unscramble into words, it’s going to take a long time to get back to you. You might as well go on a coffee break or take an early vacation. Working with a 12-character string might not sound like a big deal, but it would mean that you’d be working with 12! (12 factorial) different arrangements of the letters. If you haven’t worked with factorials in a while, let me remind you that 12! is 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 or 479,001,600 different letter arrangements (counting any duplicates for letters that are in the word more than once)! The script will take a couple minutes for an eight-character string, but a 12-character string would require considerably more time.
To calculate a factorial, you could use a script like this one:
#!/bin/bash if [ $# == 0 ]; then echo -n "enter number> " read num else num=$1 fi fac=1 while [ $num -gt 1 ] do fac=$((fac * num)) num=$((num - 1)) done echo $fac
Presented with the number 8, the script would tell us that there would be 40,320 different arrangements of an 8-character string. By adding 4 more characters, we’re looking at nearly half a billion.
$ factorial 8 40320 $ factorial 12 479001600
This difference greatly impacts the time required for all the strings to be generated and checked against the words file. An 8-character string should only take a couple minutes to run.
$ time unscramble bthpaale 1 word(s) found alphabet real 1m49.693s user 1m20.559s sys 0m26.921s
Would a nine-character string take twice as long to process? No, it ought to take nine times as long. A 12-character string? Nearly 12,000 (11,800) times as long. Here’s an example of running with a nine-character string:
$ time unscramble fialactor 1 word(s) found factorial real 16m27.318s user 12m1.492s sys 4m5.169s
A 12-character string could take weeks unless, maybe, you just happen to be using a supercomputer.
Wrap-Up
Unscrambling words using the methodology described is thorough but, for long words, can be incredibly slow. I suspect that sites like wordunscrambler are using quite a different approach – probably making use of a pre-generated lists of scrambled and unscrambled words.
While most of the bash scripts that I write and use are both straightforward and fairly efficient, looking into the issue of unscrambling strings has made it clear that sometimes the most logical approach to solving a problem will be neither.
Copyright © 2021 IDG Communications, Inc.