pbzip2: Free Parallel (Multithreaded) File Compression Tool

If compressing files is something that you usually do (I do it, for taking regular backups of this website for instance), whether to save disk space or for storing/managing files with ease (because the compression process lets you store a large amount of individual files into a single file) etc and use GNU/Linux as the OS platform, then I’m pretty sure you’re familiar with “bzip2”.

“bzip2” gives better compression levels (at the expense of slightly longer compression times, though decompression is somewhat faster) which can reduce the file size from few Megabytes to hundreds (depending on the source file and its content).

But one of the disadvantageous of “bzip2” is that, say you have a multi-core processor (including threads), then while compressing a file, if “bzip2” could use all the available CPU-cores it’ll be able to finish the job with the fastest time possible. Instead, “bzip2” be using a single CPU-core only, where other CPUs just sit on their asse*, doing nothing ;-).

For an example, let’s say that you have a CPU unit with 4 individual cores. If it takes “bzip2” 1 minute to compress a file (using a single core), but if it’s done using all the 4-cores, it should have only taken about 15 seconds to finish the same job!.

Here, “bzip2” only using a single CPU where I have 4 cores, boo! 😀 (actually I have two cores and those two cores have two threads which makes it looks like 4 CPU cores) …
Here’s “pbzip2” using all the CPU cores. Although “Parallel” computing slightly differ from “multithreading” … still for the end users like you and me, it means speeeeeeeed! 😉 …

“bzip2” isn’t designed to be used that way, but luckily, there’s a utility called “pbzip2”, that’s originally based on “bzip2”, which adds this functionality.

So, basically, if you have multi-cores and looking for a file compressing utility that uses all the available CPU cores (again, significantly improving the performance & reducing the time it takes) then “pbzip2” is a tool that you should try.

I’m not going to write about its features here, because I assume that you’re quite familiar with “bzip2” and since “pbzip2” is based on that, it has the exact same functionality (including the command-line arguments etc). But please remember that there isn’t a GUI for it, but it’s really easy to use nonetheless.

You can install “pbzip2” in Ubuntu 12.04 Precise Pangolin, 11.10 Oneiric Ocelot, 11.04 Natty Narwhal and 10.10 by using the below command in your Terminal window.

sudo apt-get install pbzip2

“pbzip2” is a cross-platform tool which supports GNU/Linux, MS Windows and Mac OS X. So if you use those OS platforms, please visit this “pbzip2” home page and get the appropriate package.

How to use it?

Again, if you use “bzip2”, then it’s exactly the same with “pbzip2” too. But, just for the sake of completion of this post, I’ll give a simple example anyway ;-).

Assuming that I have a file called “new.tar” and want to compress it with “pbzip2” (in Ubuntu), with the best possible compression levels plus disabling the deletion of the source file (because, by default both “bzip2” and “pbzip2” delete the original file after the compression is done!) I’ll use the below command.

pbzip2 --best -k new.tar

Please remember to replace “new.tar” with the name and the path of your source file.

The “--best” option tells “pbzip2” to use the maximum compression levels and the “-k” argument makes sure not to delete the source file.

Also remember that, you can’t compress a folder neither with the original “bzip2” nor “pbzip2”. So first you will have to “put” the folder (including its content inside) into a single file (“container”).

The easiest way to do this is to use the built in “tar” utility (if you use GNU/Linux) as it has the ability to take a folder (with hundreds or thousands of other files inside it) or, take individual files and store them in a single file output (with the extension of “.tar”), which is a bit like what these compression tools do.

However, in truth, “tar” does not actually compress the content, it only copies those files inside (of a folder as for this example) into a single file. Since it doesn’t compress it, it won’t take much long (it’s almost like copying that folder to somewhere else).

So, let’s say that I have a folder called “22” filled with a lot of individual files inside it. Then I’ll use the below command in the Terminal to get a single file called “new.tar” in my “Home” folder.

tar -cf new.tar 22

Simply replace both “new.tar” with your desired output file’s name and “22” with the source folder and its path. Then you can use the “new.tar” file with “pbzip2” (as shown above) for actually compressing it. That’s it, good luck.

2 thoughts on “pbzip2: Free Parallel (Multithreaded) File Compression Tool

    • Hi Bohdan,

      Interesting point.

      Now I haven't actually tried 7zip (in Ubuntu) but yes, I too agree that if you want the maximum compatibility, then "pbzip2" is pretty useful since it uses the "bzip2"'s file formats :).

Leave a Comment

%d bloggers like this: