The Problem: Slow Loops
You have 1000 images. You want to compress them. A simple loop:
for img in *.jpg; do convert "$img" -quality 80 "compressed_$img"doneRuns one image at a time. On a 4-core system, 3 cores sit idle the whole time.
With GNU parallel, all cores work. Processing time drops from 60 minutes to 15.
Installation
On Linux:
sudo apt install parallel # Debian/Ubuntusudo yum install parallel # CentOS/RHELbrew install parallel # macOSOr compile from source:
wget https://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2tar xjf parallel-latest.tar.bz2cd parallel-*./configure && make && sudo make installBasic Usage
Compress those images:
ls *.jpg | parallel convert {} -quality 80 "compressed_{}"{} is a placeholder for each input item. Parallel distributes the work across all CPU cores.
By default, it uses as many jobs as you have cores. Force a specific number:
ls *.jpg | parallel -j 2 convert {} -quality 80 "compressed_{}"-j 2 runs only 2 jobs at a time (useful if each job is memory-hungry).
Real-World Examples
Download Files in Parallel
cat urls.txt | parallel -j 4 wget -q {}Downloads 4 files at once instead of one-by-one. Dramatically faster.
Compress Files
find . -name "*.log" | parallel gzip {}Compress all logs in parallel. Way faster than a for loop.
Process CSV Rows
# Each row is a user, run a script on each in paralleltail -n +2 users.csv | parallel -j 8 process_user.sh {}tail -n +2 skips the header. Process 8 rows at a time.
Run a Function on Each Item
#!/bin/bash
process_item() { local id=$1 echo "Processing $id..." sleep 1 # Fake work echo "Done $id"}
export -f process_item
seq 1 10 | parallel process_item {}export -f makes the function available to parallel. Now each worker runs it.
Controlling Workers
Run on All Cores
parallel -j 0 command {} # Uses all cores + hyperthreading-j 0 is maximum parallelism.
Run on N Cores
parallel -j 4 command {}Lock to 4 parallel jobs.
Use a Fraction of Cores
parallel -j 50% command {}Run on 50% of available cores. Leaves room for other work.
Run on a Remote Machine
parallel -S user@remote.example.com command {}Parallel can distribute work across multiple machines. Useful for clusters.
Quoting and Escaping
Tricky input? Use --quote:
echo 'file with spaces.txt' | parallel --quote echo {}Or quote manually:
echo 'file with spaces.txt' | parallel echo "{}"The double quotes protect the argument.
Named Placeholders
Instead of {}, use custom names:
cat list.txt | parallel --col-sep=, --pipe \ bash -c 'id=$1; name=$2; echo "ID: $id, Name: $name"' _ {} {2}Actually, that’s getting complex. Better approach:
parallel echo "User: {2} (ID: {1})" ::: 123 alice 456 bob::: separates input groups. {1} is the first, {2} is the second.
Handling Errors
By default, parallel continues even if a job fails. See errors:
parallel --joblog /tmp/job.log command {} < input.txt--joblog writes results to a file. Check it:
cat /tmp/job.logOutput:
Seq Host Starttime JobRuntime Send Receive Exitval Signal Command1 : 1618590123 1.234 0 100 0 0 command input12 : 1618590124 1.567 0 50 1 0 command input2Exitval is the exit code. 1 means it failed.
Stop on first failure:
parallel -x command {} < input.txt-x exits if any job fails.
Progress and Monitoring
Show progress bar:
parallel --eta command {} < input.txt--eta shows estimated time remaining.
Or use --progress:
parallel --progress command {} < input.txtShows how many jobs are done.
Real Example: Batch Image Processing
You have 10,000 photos. Resize and compress them:
#!/bin/bash
WORKERS=4QUALITY=80SIZE="1920x1080"
find ./photos -type f \( -name "*.jpg" -o -name "*.png" \) | parallel -j $WORKERS \ convert {} \ -resize "$SIZE^" \ -quality $QUALITY \ -gravity center \ -extent "$SIZE" \ "./output/{/}"-j $WORKERS: Use 4 cores{}: Input filename{/}: Just the basename (strip directory path)./output/{/}: Output to a different directory
On a 4-core system, this is 4x faster than a loop.
Real Example: Database Backups
Backup multiple databases in parallel:
databases=("db1" "db2" "db3" "db4")
printf '%s\n' "${databases[@]}" | parallel -j 2 \ mysqldump {} | gzip > "backup_{}.sql.gz"Backs up 2 databases at a time. Faster than serial backups.
Alternatives
GNU parallel is powerful but has a learning curve.
xargs is simpler for basic cases:
find . -name "*.log" | xargs -P 4 gzip-P 4 runs 4 processes in parallel. Works for simple tasks.
Rust’s rayon (if you’re writing Rust):
use rayon::prelude::*;
files.par_iter().for_each(|file| { process(file);});Python’s multiprocessing:
from multiprocessing import Pool
with Pool(4) as p: p.map(process, files)For shell scripts though, GNU parallel is the go-to.
Bottom Line
GNU parallel turns a slow for-loop into a blazing-fast parallel job distributor. If you’re processing hundreds or thousands of items on a multi-core system, 10 minutes learning parallel buys you hours of compute time back.
Just remember: parallel isn’t magic. It can’t make one job faster. It can only run multiple jobs at once. Pick it when you have embarrassingly parallel work—tasks that don’t depend on each other.