gzip or bzip2
Using bzip2
instead of gzip
will sometimes save you valuable storage
capacity.
Quite some time ago I already wrote a simple script to replace a gzip
archive
in case bzip2
compression is doing a better job.
With verbose handling and additional md5 check of the result, this is the current script:
#!/bin/bash
################################################################################
# Convert gzip'ped files to bzip2 format, if that saves space.
################################################################################
VERBOSE="1"
files="$@"
[ -z "$files" ] && files="$(ls *.gz)"
for file_gz in $files; do
[[ "$file_gz" == *.gz ]] || continue
[ -r "$file_gz" ] || continue
[ -w "$(dirname $file_gz)" ] || continue
file_bz2="$(dirname "$file_gz")/$(basename "$file_gz" .gz).bz2"
if [ -e "$file_bz2" ]; then
echo "Cowardly refusing to overwrite $file_bz2."
continue
fi
# bzip2 compression:
[ -n "$VERBOSE" ] && echo "Conduct bzip2 on $file_gz..."
zcat "$file_gz" | nice bzip2 >"$file_bz2" || continue
# Check size (bz2 clone is smaller).
size_gz="$(stat -c "%s" "$file_gz")"
size_bz2="$(stat -c "%s" "$file_bz2")"
if [ -z "$size_bz2" -o "$size_bz2" = 0 -o "$size_gz" -le "$size_bz2" ]; then
[ -n "$VERBOSE" ] && echo "Result is not smaller."
rm -f "$file_bz2"
continue
fi
[ -n "$VERBOSE" ] && echo "bzip2 compression wins benchmark: $size_gz > $size_bz2"
# Additional md5 check.
md5_gz="$(zcat "$file_gz" | md5sum)"
md5_bz2="$(bzcat "$file_bz2" | md5sum)"
if [ "$md5_gz" != "$md5_bz2" ]; then
[ -n "$VERBOSE" ] && echo "MD5 check failed."
rm -f "$file_bz2"
continue
fi
[ -n "$VERBOSE" ] && echo "MD5 check passed."
# Size is better, md5 is ok, then drop the original file.
[ -n "$VERBOSE" ] && echo "Drop original file: $file_gz"
rm -f "$file_gz"
done