File Archiver Smackdown
By Angsuman Chakraborty, Gaea News NetworkThursday, July 5, 2007
I tested several file compression programs like zip, gzip, arj, bzip2, jar etc for compressing big files. The corpus constituted 5 POI generated Microsoft Excel documents totaling 298.8 MB. And there is a clear winner!
About the data
The file excel documents were standard corporate data of a very big corporation (read Fortune 500). There is nothing special about the data as such, regular text data in excel files. For obvious reasons I cannot share the data for independent verifications.
Archive formats not tested
I haven’t tested two popular file formats - rar & 7zip as they aren’t easily available on Linux.
Results
Compression Algorithm | Compressed Size | % Compression |
---|---|---|
tar.bz2 | 10.9 MB | 96.35 |
tar.gz | 52.5 MB | 82.43 |
zip | 52.5 MB | 82.43 |
arj | 52.5 MB | 82.43 |
jar | 52.5 MB | 82.43 |
Test Notes
The File Roller archive manager which ships with Gnome UI on Linux provides even better bzip2 compression than bzip2 -9!
bzip2 -9 compressed to 12 MB.
I also tried .tar.zip which was the worst.
All the file formats took comparable times but then I tested them on a Core 2 Duo 6600 with 2 GB RAM and RAID 1 SATA drives
As such the results do not speculate about performance.
All the compressed files were tested for accuracy of data.
Winner
This shows all popular compression algorithms are on the same level with the sole exception of bzip2, which stands leagues ahead of the rest. The clear winner of compression algorithms is bzip2.
Linux and Windows users can use bzip2 by directly running the bzip executable (downloadable from bzip2.org). The latest version of 7Zip and WinZip, both supports bzip2 format.
Linux users have a winner in File Roller.
Sh