We recently optimized the restore performance of the Windows Restore Tool. As you may have seen with the latest press release, we managed to improve throughput by 300%.
In most cases, there is a bottleneck in the process limiting throughput. It’s almost as though the process and its bottlenecks are a calculus limit problem. As bottlenecks approach zero, throughput approaches infinitum. One of the larger bottlenecks in the restore process ended up being the decompression of the data.
On a leapserv, data is broken down into pieces for data de-duplication before being compressed with bzip2 compression. Bzip2 compresses better than the average LZ* algorithm with the tradeoff being performance. Given that the leapserv provides a platform for dedicated processing power, sacrificing cpu cycles in an effort to save bandwidth and disk space is a good trade off.
However, as mentioned when optimizing the restore performance of a leapserv, we looked for bottlenecks one by one. As they’re solved, throughput increased until running up against the next bottleneck. The decompression of the data quickly became our biggest bottleneck. Given that one of the developers ported bzip2 to assembly, we determined there wasn’t much we could do with the bzip2 algorithm.
The solution to our decompression problem was distributed decompression. Given that during a restore, a computer is involved in the process of getting the data off the leapserv, why not use the cpu resources of the target computer to help decompress the data? By doing so, we achieved a near 200 of the 300% throughput gain we managed to acquire through the whole optimization process.
The distributed decompression engine is rather slick; it’ll adjust the decompression percentages between the host computer and the leapserv to compensate for different processing speeds. Thus, the engine distributes decompression work based on the available cpu resources on each machine.
After adding distributed decompression to the restore process, we quickly found the decompression to be the bottleneck once more. The throughput was now nearly twice as fast but the bottleneck remained the same.
Our goal is to max out the write speed of a customer’s hard drives or the bandwidth limit of a customer’s network. At that point, the bottleneck is out of our hands giving the customer the best possible service for their environment. Are we there yet? For some customers, I’m pretty sure we are. For our enterprise customers though, I believe there is more we can do.
The next step would be to get more machines involved. The host computer’s processor is a logical choice since the restoring user has already taken the time to install the Windows Restore Tool software on said computer. The question though is if it’s worth the user’s time to install distributed decompression software on other machines on his/her network. And if it were, how long would it be before the decompression bottleneck was overcome only to be replaced with a bandwidth bottleneck to the decompression machines? We’ll find out…
- Given the amount of bzip2 compression and decompression BitLeap performs, one of our engineers translated the algorithm into assembly. I noticed the other day the source code was not yet posted on the open source page when it should in fact be there. If you would be interested in the assembly version of bzip2 before we get it posted, shoot us an email at support@bitleap.com