Rsync Backups: Conserve Disk Space with Hard Links

Posted on in System Administration

Open HDD - GoldPersonally, my data isn't so important that I need to keep years of it backed up to tape. Obviously, that isn't always the case in a corporate environment, but luckily that's not a problem for me.

That being said, I do like to have at least a week's worth of backups just in case I accidentally delete the wrong thing. (As if that's ever happened!) Like a lot of network and system administrators, I use rsync to do quick and dirty backups from remote systems to a central server.

In order to keep a week's worth of backups, I need to maintain at least one backup directory per day for each remote directory I'm backing up. Considering my home directory is about 5 GB, that's 35 GB of backups every week just for my home directory. Add in all the websites and other users on my systems and you're looking at 20+ GB of daily backups†. That's a lot of spaced devoted to backups.

One of the often overlooked features of rsync is the --link-dest command line argument. This argument specifies a directory that will be used to compare against the new backup. If the file in the --link-dest directory is the same as the file in the new directory, a hard link is created instead of creating a new file. Using hard links conserves disk space (both "files" share the same data via reference).

Unlike symbolic links, the data for a hard link is not removed until all references to the data are removed. This means you can safely delete backup directories without fear of removing the real data.

Although I have a script for this (perhaps more on that in the future), I basically specify the previous day's backup directory as the --link-dest. That ensures I am always hard linking against the most current version of a file.

† This is a rough estimate that also doesn't include MySQL backups. Unfortunately, MySQL backups do not benefit from this disk conservation method because of their ever-changing nature.

Slaptijack's Koding Kraken