Thursday, August 6, 2009

Setting up apt-mirror behind http proxy

Had a bit of brain teaser at work today. I was setting up a mirror for Ubuntu package repository, to reduce bandwidth usage and package download time for our hundred+ virtual machines. We mostly use Ubuntu 8.04 (Hardy) and almost exclusively 32bit.

This seemed like a pretty simple exercise:
  1. Install apt-mirror package
  2. Specify which repositories should be mirrored
  3. Download content from repositories (the long step)
  4. Serve mirrored directory via Apache or Lighttpd
  5. Tell all VMs to use local repository


Step one is the easiest, just run:
apt-get install apt-mirror
Step two requires to modify /etc/apt/mirror.list file to select specific repositories and perhaps choose ones geographically closer. Here's mine:
set nthreads     20
set _tilde 0

deb http://ca.archive.ubuntu.com/ubuntu hardy main restricted universe
deb http://ca.archive.ubuntu.com/ubuntu hardy-updates main restricted universe
deb http://ca.archive.ubuntu.com/ubuntu hardy-security main restricted universe

clean http://ca.archive.ubuntu.com/ubuntu
Based on some online recommendations I'm not mirroring deb-src packages here or security updates.

In step three I'm supposed to run su - apt-mirror -c apt-mirror as root to clone repositories listed in mirrors file. However, this is where I hit a bit of a snag...

My work's LAN is behind a firewall and we use http proxy (Squid) for Internet access. There's already a line in /etc/apt/apt.conf that tells apt to use this proxy server:
Acquire::http::Proxy "http://my-proxy.lan:3128/";
It works fine with apt-get, but apt-mirror doesn't seem to know or care about it.

Exporting environment variable as root doesn't do any good because command is actually ran as apt-mirror user (that's what "su - apt-mirror -c" is for). So to force apt-mirror to use proxy, I added this line to /etc/environment file:
http_proxy="http://my-proxy.lan:3128/"
Probably not the best way to solve this, but it works.

Once the first sync is done we will have a copy of repository stored locally on this machine. It seems this step requires at least 25 GiB of free disk space per architecture type.

Step four is to allow other machines access our local repository. Just need to install Apache or lighttpd and make local mirror directory accessible over HTTP:
apt-get install lighttpd
cd /var/www
ln -s /var/spool/apt-mirror/mirror/ca.archive.ubuntu.com/ubuntu ubuntu
Now local repository is visible at http://my-apt-mirror-host.lan/ubuntu/

Step five will take the most time, since I have to edit /etc/apt/sources.list file on every VM and add:
deb http://my-apt-mirror-host.lan/ubuntu hardy main restricted universe
deb http://my-apt-mirror-host.lan/ubuntu hardy-updates main restricted universe
deb http://my-apt-mirror-host.lan/ubuntu hardy-security main restricted universe
I'm going to keep all lines with "deb http://security.ubuntu.com/ubuntu" unchanged, since I'm not mirroring them. All the "deb http://ca.archive.ubuntu.com/ubuntu" will be commented out.