Backing up those gmail messages in All Mail

I recently read an article on wired.com that mentions in passing backing up all of that information you have lurking in the cloud, like emails.

It got me thinking about a perl script I had for grabbing IMAP data. The basic idea is that the module allows for fetch, delete, copy on IMAP-based email. This allows you, for instance, to grab all of the emails lurking in Gmail’s “All Mail” folder. You know – the place where emails go when you say “archive this one, I don’t want to see it no more.” With the advent of the “Priority Inbox”, using the archive feature, especially with keyboard shortcuts, makes reducing the size of your inbox a much more realizable task.

Anyway, perl has this module, Net::IMAP::Simple::SSL that lets you access IMAP from perl. Here is a simple script to show how it can do it’s thing. Script comes with no warranty, etc.

Don’t forget to put your username and password into the script at the top. See $username and $password.


#!/usr/bin/perl
#
# example-imap-ssl.pl
#
# Get mail from google via imap, and dump it to console.
#

use strict;
use Net::IMAP::Simple::SSL;
use Getopt::Long;

my $host = "imap.googlemail.com";
my $username = '<insert your google username>';
my $password = '<insert your google password>';

my $boxes = 0;
my $brief = 0;
my $dump = 0;
my $help = 0;
my $verbose = 0;
my $mailbox = "%";
my $remove = 0;
my $regexp = "";

#----------------------------------------
# process any command line arguments
my $result = GetOptions('help|?+'  => $help,
			'brief|b+' => $brief,
			"dump+"         => $dump,
			'dump-folders+' => $boxes,
			"host|h=s"      => $host,
			'mailbox|m=s'   => $mailbox,
			"password|p=s"  => $password,
			"regexp=s" => $regexp,
			"user|u=s" => $username,
			"verbose|v+" => $verbose
    );


print "IMAPS client: $username@$hostn" if ($verbose || !$dump);

if ($help) {
		print "
  --brief, -b              synopsis of every message (to,from,subject)
  --dump                   dump the mailbox selected
  --dump-folders           dump all folders contained in selected mailbox
  --host, -h         set the imap server contacted to be 
  --mailbox, -m     	   set the set of folder(s) to consider
  --password, -p     the password to use
  --regexp        display messages matching regex 
  --user, -u         the username to use

";
		exit 0;
}


my $imap;
if ($verbose > 1) {
    print "Connect to $hostn";
    $imap = Net::IMAP::Simple::SSL->new($host, Debug=>1, use_ssl=>1);
}
else {
    $imap = Net::IMAP::Simple::SSL->new($host, use_ssl=>1);
}

$imap->login($username, $password);

if ($boxes) {
    print "Enumerating folders for '$mailbox':n";
    my @folders = $imap->mailboxes($mailbox);
    for my $f (@folders) {
        print "  '$f'";
    }
    print "ne='", $imap->errstr(), "'n";
    $imap->quit();
    exit(0);
}


if ($dump) {
    my @folders = $imap->mailboxes($mailbox);

    for my $f (@folders) {
	my $msg_count = $imap->select($f);
	for (my $i=0; $i < $msg_count; $i++) {
	    my $msg = $imap->get($i);

	    if (!defined($msg)) {
		print " undefn";
	    }
	    else {
		for my $m (@{$msg}) {
		    print "$m";
		}
	    }
	} #for (my $i...
    } #for @folders
}
else {
    my @folders = $imap->mailboxes($mailbox);
    for my $f (@folders) {

	my $spam = 0;
	my $type = " ";
	my $msg_count = $imap->select($f);

	if ($f =~ /.*spam.*/i) {
	    $type = "S";
	    $spam = 1;
	}
	printf "%4i messages in $f ($type)n", $msg_count;

	for (my $i=0; $i < $msg_count; $i++) {

	    my $matched = 0;

	    if ($verbose > 1 || $brief || length($regexp)) {
		#print "$i $type" if ($verbose > 1);

		my $msg = $imap->get($i);

		if (!defined($msg)) {
		    print " undefn";
		}
		else {
		    if ($brief) {
			my ($sub, $from, $to, $date) = "";
			for my $m (@{$msg}) {
			    $sub = $m if ($m =~ /^Subject/);
			    $from = $m if ($m =~ /^From/);
			    $to = $m if ($m =~ /^To/);
			    $date = $m if ($m =~ /^Date/);
			}
			print "$from $to $date [$sub] n";
		    }
		    elsif (length($regexp)) {
			#print "regexp: searchn" if ($verbose);
			for my $m (@{$msg}) {
			    if ($m =~ /$regexp/) {

				$matched = 1;

				if ($verbose > 0) {
				    for my $n (@{$msg}) {
					print $n;
				    }
				}
				else {
				    print $m;
				}
			    }
			}
		    }
		    elsif ($verbose > 1) {
			for my $m (@{$msg}) {
			    print "$m";
			}
		    }
		}
	    }
	}
    }
}

$imap->quit();


The new thing I struggled with was the name of the folder where archived emails live. It is

[Google Mail]/All Mail

But spaces are bad in all kinds of place names, and here is another. There is a wildcard character available ‘%’ and replacing spaces with ‘%’ works fine. So with the name

[Google%Mail]/All%Mail

all of your mail can be downloaded in one go. When researching this, I also found references to [Gmail] as a the IMAP folder name. YMMV. List the IMAP folders if you get stuck.

In the same way, sent emails can be downloaded from “[Google Mail]/Sent Items”.

Note that the format of the downloaded emails isn’t pretty. There is just one great file with all of them concatenated together. But this is a quick and simple backup. And grep is your friend, too.

I use the perl above like this:

./example-imap-ssl.pl --user me --password my_passwd -m '[Google%Mail]/All%Mail' --dump > 2011-01-31.all-mail.backup
bzip2 2011-01-31.all-mail.backup

Further information
Turning on imap in gmail.
Turning on ssl in gmail (if you haven’t done this, you really should).

jeremy