- La colaboración entre Seguridad y FinOps puede generar beneficios ocultos en la nube
- El papel del CIO en 2024: una retrospectiva del año en clave TI
- How control rooms help organizations and security management
- ITDM 2025 전망 | “효율경영 시대의 핵심 동력 ‘데이터 조직’··· 내년도 활약 무대 더 커진다” 쏘카 김상우 본부장
- 세일포인트 기고 | 2025년을 맞이하며… 머신 아이덴티티의 부상이 울리는 경종
Open-sourced tool speeds up Linux scripts via parallelization
MIT has open-sourced pa.sh (also called pash), a tool that can dramatically speed up Linux scripts by using parallelization, saving time and without risk of introducing errors.
The process of parallelization first examines a script for code that can be run separately and independently, so not all scripts can benefit from the tool. But when pa.sh does find portions that can run independently, it runs them in parallel on separate CPUs. It also uses other techniques to get the code to run faster.
Below is a demonstration I ran on my home Fedora box, first running a script on its own and then again using pa.sh. Note that this script was provided with the pa.sh tool and lends itself to parallelization. It’s not nearly as demanding as scripts that might process gigabytes of data in a scientific or artificial-intelligence lab, so the results are not dramatic.
Running the script on the command line
I used the time command to gauge the performance of the hello-world.sh script.
$ time ./evaluation/intro/hello-world.sh 2176 real 0m55.077s user 0m54.815s sys 0m0.062s
NOTE: The “2176” on the second line is the script’s output.
Running the script using pa.sh
In the next command, I ran the same script through pa.sh.
$ time ./pa.sh ./evaluation/intro/hello-world.sh 2176 real 0m19.216s user 0m37.509s sys 0m0.255s
Notice that when run using pa.sh, the script used little more than a third of the time (real time) that it used when run directly. If I run a script that simply loops from 1 to 10,000 and display the count every 100th step, it takes significantly longer to run using pa.sh. That’s because with pa.sh, the script doesn’t benefit from parallelization but still requires an analysis:
$ time ./count_to_10000 $ time pa.sh ./count_to_10000 100 100 200 200 300 300 400 400 500 500 600 600 700 700 800 800 900 900 1000 1000 real 0m0.010s real 0m59.121s user 0m0.007s user 0m41.386s sys 0m0.003s sys 0m19.263s
The script runs a single loop and looks like this and provides no opportunity for parallelization:
for num in {1..1000} do if [[ "$num" == *"00" ]]; then echo $num fi done
For complex scripts that can benefit from parallelization, however, pash can make a tremendous difference in how long they take to run. All you have to do is invoke your scripts using pa.sh. And, as already noted, pa.sh does this without introducing errors, so you can be confident that you will get the results expected, just a whole lot faster. If you are using scripts that need to process a large amount of data, this can save a lot of time.
Installing and using pa.sh
You will need to have tools like sudo, wget, and curl, but these tools are likely already available on your Linux system.
Once pa.sh is installed, you will need to export $PASH_TOP that will point to the top of the directory where it is installed. For example:
$ export PASH_TOP=/opt/pash
$ echo $PASH_TOP /opt/pash
Wrap-Up
From everything I’ve seen and read, pa.sh can provide a dramatic performance improvement to complex and data-hungry scripts. If you or your organization might benefit from this kind of tool, it is well worth looking into.
The tool, as well as the example code, are open source, and pa.sh is available at github. There is no man page, but help is available when you use the pa.sh –help command. A technical paper explaining pa.sh has been posted by Nikos Vasilakis, a research scientist at MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL) who chairs the international committee of researchers who have worked on the tool for nearly two years. MIT announced pa.sh earlier this month.
Copyright © 2022 IDG Communications, Inc.