Friday, July 9, 2010

BitTorrent and zsync FTW

My current favourite (and it seems almost everyone else's) Linux distro is Ubuntu. I use it on my home server, on virtual machines at work, I even carry it on my USB stick. I find that the easiest, and probably the fastest, way to download Ubuntu is via BitTorrent. Using BitTorrent to download Open Source software is also a nice way to show your appreciation - you reduce project's bandwidth costs and help creators to distribute their product. This is especially true during the stable release time, when everyone rushes to get the latest release and direct downloads slow down to a crawl. This is the perfect time to use P2P, since hundreds of people already have the file you want and are willing to share it with you.

BitTorrent is not a silver bullet, however. It's only good when the file you want is a popular one. For example here's a some tracker stats for two Ubuntu releases, 10.04 is a stable release from a couple months back and maverick is the latest alpha:

File Seeders Leechers
ubuntu-10.04-dvd-i386.iso 1391 123
ubuntu-10.04-server-i386.iso 968 20
maverick-dvd-i386.iso 8 5
maverick-server-i386.iso 10 0

Not a lot of people care about the alpha release, so there are very few seeders and leechers. Since number of peers is low, P2P download will be slow and you might be better off just downloading the file directly. In Ubuntu's case there is a better solution - zsync.

zsync is a file transfer program. It allows you to download a file from a remote server, where you have a copy of an older version of the file on your computer already. zsync downloads only the new parts of the file. It uses the same algorithm as rsync. However, where rsync is designed for synchronising data from one computer to another within an organisation, zsync is designed for file distribution, with one file on a server to be distributed to thousands of downloaders.

To put it simply, if I have a large file like a Ubuntu release ISO and a new incremental release just came out, instead of downloading the entire 700MiB of it, with zsync I only have to download a small portion of it that actually changed.

Here's an actual example. I have previously downloaded Ubuntu 10.04 Server and the Ubuntu 10.04 DVD ISOs and have been seeding them via BitTorrent since release date. Let's take a look at these files:

File Size MD5 checksum
ubuntu-10.04-server-i386.iso 700.41 MB 15342636441181f7a19c65984b44e24c
ubuntu-10.04-dvd-i386.iso 4355.24 MB 4bc6827198b3b3825e1db5cb256eeece
maverick-server-i386.iso 716.85 MB 4c51c750d3c936f3977628ec8a9a593d
maverick-dvd-i386.iso 4406.47 MB b9b16b248c4be07c3f542128a50e8558

Ubuntu has been providing .zsync files for all their releases for a while now, let's see how much of ISO has changed in Ubuntu 10.10 (Maverick Meerkat) Alpha-2. I will first create a copy of my 10.04 files since I don't want to loose the originals yet.

$ mkdir new
$ cp ubuntu-10.04-server-i386.iso ubuntu-10.04-dvd-i386.iso new/
$ cd new/

Now let's run zsync on server ISO and compare it to new alpha:

$ zsync -i ubuntu-10.04-server-i386.iso
reading seed file ubuntu-10.04-server-i386.iso:
Read ubuntu-10.04-server-i386.iso. Target 26.0% complete.
downloading from
verifying download...checksum matches OK 
used 186382376 local, fetched 530472920

After fetching a 1.3 MB seed file we can see these two ISOs are quite different, with only 26% similarity. That means ~506MB would still need to be downloaded, still better than fetching the entire file.

Let's see if DVD release fares any better:

$ zsync -i ubuntu-10.04-dvd-i386.iso
reading seed file ubuntu-10.04-dvd-i386.iso:
Read ubuntu-10.04-dvd-i386.iso. Target 51.0% complete.
downloading from
verifying download...checksum matches OK 
used 2247304212 local, fetched 2159174636

DVD has 51% similarity, that saved me more than 2 Gigabytes of download, not bad.

Now that I have the files I want, it's time to give back by seeding them. To do that with my favourite BitTorrent client Transmission I move the ISO files to my Transmission download directory and open the .torrent files available on central server. Transmission automatically detects existing files, verifies them against the tracker and starts seeding. Legal P2P file sharing at its finest.

When the next release come out I just zsync it again, usually only around 10-15% is different between incremental releases, and start seeding it with BitTorrent. I'll repeat these steps for every incremental release, alpha-3, beta-1, beta-2, rc, etc. By the time stable release is out, I will only need to download a small amount of data from the central server and start seeding the entire file at the time it's needed the most.

Here's a few more links on the same topic: