Archiving a huge collection of Outlook Emails

I've been saving every email for more than 15 years. It is a great historical resource to be able to go back to. I've been using Outlook for the entire time because I think it is an amazingly productive tool. Unfortunately, Outlook stores emails in PST  files. These files are fragile (they get corrupted easily) and proprietary, so I want to find a better way to store and search my emails going into the distant future.

Solutions

#1 Import Everything into an email archiving program

MailStore Home 5.0

MailStore Home

For almost everyone, this free program will be good enough. You can import millions of emails from your PST files and easily do simple searches.

It can also do a fairly complete export into standard RFC822 EML files, so you are no longer dependent on your PST files and will always have to option to move your emails to a different platform in the future.

Unfortunately, it is missing two key features that I need...

  1. You can not search for text inside the email header. Most people don't care about the headers, but I do - mostly because I like to see where my spam comes from.
  2. You cannot search in groups of folders, only in a single specific folder or the entire database. Again, this will not bother most people, but I keep my spam in a folder called "spam" in every PST and when I search, I usually (but not always) do not want to see the the SPAM results.

Sadly, both of these flaws would be trivial to fix but not much I can do to get around them.

SCAN 1.3

SCAN

Open source indexing engine. Looks promising, but I just could not get this to work at all. I'd import a bunch of EML files, but nothing would show up in the index.

DtSearch Desktop

DtSearch Desktop

This will load vast quantities of email from Outlook and index them, but the interface took a long time to figure out and is incomplete.

I was excited about it's ability to make custom fields, but this is also not well implemented. For example, I could not figure out how to see what fields were defined for a given message. I also could not figure out how to have a custom field show up while browsing through search results.

This has a rich API and I bet I could eventually make it do what I need, but seems like there must be an easier way.

ThunderBird

Thunderbird Screenshot

Surprisingly, ThunderBird holds all the messages for a given folder in a single, giant flat file. Not much better than a giant PST file. You can never have a folder bigger than 4GB. Deal breaker for me.

Also noticed that ThunderBird looses much of the email header information when importing from a PST file.

GFI Archiver

It would not install into a virtual machine. No reply from tech support.

X1 Version 7 Pro

X1 screenshot

X1 doesn't store emails itself, it just indexes them so you are still dependent on the PST files. I also tried exporting my PST files to EML files and then using X1 to index those, but then it doesn't show stuff like SUBJECT and SENDER in the search results, to it is useful.

X1 with EML files screenshot

The interface is also sluggish and it also does not let you search in email headers.

Copernic Desktop Professional 3.5.1

Copernic screenshot

Like X1, Copernic only indexes your PST files so you need to keep the PST files and are still dependent on them. It also does not let search on info in the internet headers of emails. It also does not let you arbitrarily pick which folders to include in a search result. It also crashed with a "Server Error in /Copern-1.0.0.52 Application" when I tried searching for a long number in my emails. Not a good sign for something I am going to need to work for a long time.

I also tried using Copernic to index a directory of exported EML files and, like X1, it did index the text saw the files as just files rather than emails so I could search on metadata like send date or recipient. It also listed the filename and date in the search results rather than the subject and send date of the actual email.

Google Desktop

Google Desktop is fast and displays the right metadata about emails in the results. The user interface is clean and easy. I really like it.

A great thing about Google Desktop is that you can write plugins for it that basically can make it index whatever you want, and I was just about to write a simple plugin that would expose all the fields I wanted for my exported EML files, but then Google announced that thy were killing Desktop. I don't want to invest all that effort into a system that I hope to use for decades when it is based on a product I know is already dead.

Open Email Pro 4.2.3

OpenEmailPro Screenshot

OpenEmailPro can open PST files or access directories of EML or MSG files.

It has all the right fields in its browse view.

 It does not index the emails so searches are slow, but that would be ok for me except that you can only search for emails in a single folder, and you can't search for text in the email header.

Discovery Attender

Discovery Attender Screenshot

This tool does not index or database your messages, you start each search from scratch by telling it what you want to find and which PST files to look in. Then it scans the files and displays the results. This is slow, but though. It has the option to search inside of the internet message headers, which is very nice.

The evaluation version is free and can search up to 5 PST files at a time, so if you rarely need to do a search and have less than 5 PSTs, this might be a good solution. I was unable to figure out how much a license costs from the website - you probably have to talk to a salesperson, which is a bad sign.

 

#2 Export everything from Outlook into a standard format and then find a way to search

 

FAQ:

Q: But wait, what about all the messaging and sequencing and indexing?
A: This code is only the matching engine, which sits on top of plenty of other support software. All the networking and message sequencing code is written in hard code C and ASM. Any place you see INT97that is a call down to this code. But that is just plumbing. This also depends on FoxPro's ISAM indexing engine which uses a B*tree and is still impressive in its performance and reliability even 20 years later. Any place you see a SEEK, that is a call down to the FoxPro indexer.

UPDATES

1/2/2012: First published

###