Time Machine Forensics

» 01 July 2010 » In Other, PHP »

About a year ago, I finally wisened up to the obvious: 1) losing data sucks, and 2) doing daily (or frequent) backups manually are a pain in the ass, especially for portables. 2) is why most people forgo backups in the first place, resulting in 1). I needed something that I could set up once and pretty much forget about. Thankfully, there was just the thing: Time Capsule  —  a single device that encapsulates both Airport Extreme base station and a huge hard drive, and can be used for automatic, transparent, wireless backups from any of your Mac devices via the Time Machine.
I purchased the 1 TB version, configured it, and have been using it ever since. It has already saved my butt a couple of times, when I accidentally deleted (rm -rf) a directory and had that familiar sinking feeling, but then remembered that the Time Machine was there ready to lend a hand.
However, I noticed recently that the hourly backups were taking longer and growing in size, sometimes up to 1GB or more. It didn’t seem like there was that much data changing on an hourly basis, so I set out to investigate. A quick look around revealed that the Time Machine itself will not reveal the backup details, but someone wrote a script called timedog that displays the files that the Time Machine backed up during its most recent run (or any of your choosing). The output of the script is something like this (abbreviated):

# cd /Volumes/Backup\ of\ foo/Backups.backupdb/foo
# timedog -d 5 -l
==> Comparing TM backup 2010-06-29-160846 to 2010-06-29-101853
     399B->     399B    [1] /Mac HD/Library/Application Support/CrashReporter/
   52.6KB->   53.0KB        /Mac HD/Users/andrei/.dbshell
     863B->     866B        /Mac HD/Users/andrei/.lesshst
   11.0KB->   15.1KB        /Mac HD/Users/andrei/.viminfo
   10.4KB->   30.7KB        /Mac HD/Users/andrei/.zsh_hist
    6.7MB->    6.7MB        /Mac HD/Users/andrei/.dropbox/dropbox.db
    3.9MB->    3.9MB        /Mac HD/Users/andrei/.dropbox/migrate.db
   25.2MB->   50.3MB   [10] /Mac HD/Users/andrei/.dropbox/cache/
   21.0KB->   21.0KB    [1] /Mac HD/Users/andrei/Desktop/
  120.0MB->  120.4MB    [1] /Mac HD/Users/andrei/Documents/Linkinus 2 Logs/
  142.8MB->  146.2MB  [156] /Mac HD/Users/andrei/Library/Application Support/
  608.0MB->  608.0MB    [5] /Mac HD/private/var/data/mongodb/
==> Total Backup: 967 changed files/directories, 1.88GB

Looking at this, a couple of offenders are immediately obvious. Linkinus (an IRC client) keeps all the conversation logs in one single file, and since I’m frequently on IRC, that file grows and gets backed up every hour. The data files for MongoDB that I use for development are also fairly large and change often. Finally, there is something in Library/Application Support, but it wasn’t shown because of the depth limit I set. After increasing the depth to 7, I discovered that it was the Google Chrome cache and history.
Conversation logs and Chrome stuff are not important enough for me to back up hourly, and MongoDB data I can copy periodically to an off-site server. By excluding these 3 items from the backup process via Time Machine Preferences I was able to reduce the size of the hourlies to 50 MB or less.
This brings up an important point though: while Time Machine is great for doing automated backups over your local network, you should have a separate copy of the data off-site, for redundancy. How and when to do the off-site backups varies according to everyone’s needs, but I would suggest something like CrashPlan, which does unlimited online backups for about $5/month. Once again, it’s automatic and hands-off, which is how you want it.
Tools I found useful while investigating this Time Machine issue:

Trackback URL