bzip2 Vs pbzip2 File Compression Software: Efficiency Test
By Partho, Gaea News NetworkMonday, June 7, 2010
bzip2 is an open source data compression algorithm and program. bzip2 is preferred for its ability to compresses most files effectively than the older LZW and Deflate (.zip and .gz) compression algorithms. However, it is considerably slower. bzip2 compresses data in blocks of size between 100 and 900 kB. It uses the Burrows-Wheeler transform to convert frequently-recurring character sequences into strings of identical letters. The major shortcomings with bzip2 is that it takes a large CPU time required for compression. This can be given to the fact that bzip2 is unable to use multiple core processors.
bzip2 excepts a list of file names to accompany the command-line flags. Each file is replaced by a compressed version of itself with the name. Each compressed file has the same modification date, permissions and when possible ownership as the corresponding original. This ensures that properties are retained after decompression time. The file name handling is naive in the sense that there is no mechanism for preserving original file names, permissions, ownerships or dates in filesystems that lacks the concepts or might have serious file name length restriction.
This inspired a modified version was created in 2003 called pbzip2 that supports multi-threading. pbzip2 is a parallel version of bzip2 that shared memory machines. It provides near-linear speedup when used on true multi-processor machines and 5-10% speedup on Hyperthreaded machines. Files compressed in pbzip are fully compatible with the regular bzip2 data so any files created with pbzip2 can be uncompressed by bzip2 and vice-versa. It provides near-linear speedup when used on true multi-processor machines and 5 to 10 % speedup on Hyperthreaded machines.
We conducted a test to see the difference in timings of the two file compressors bzip2 and pbzip2
System configuration: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz, 4GB RAM
Timings for compressing a 1GB .txt file using bzip2
Time bzip2 1040955758.txt
real 2m19.142s
user 2m10.659s
sys 0m2.195s
Timings for extracting the .txt file using bzip2
real 0m55.514s
user 0m52.729s
sys 0m2.461s
Timings for compressing a 1GB .txt file
Time pbzip2 1040955758.txt using pbzip2
real 1m18.978s
user 2m29.859s
sys 0m5.853s
Timings for extracting the .txt file using pbzip2
real 0m37.159s
user 1m9.508s
sys 0m2.817s