Marking up email on the Web is not a pretty picture in 2008.
- Weaver daemon used by Gmane (seems complicated to setup)
- Hypermail (51k SLOC) used at suckless
- Hypermess (56k SLOC C) used by the W3C
- Mailman’s Pipermail used for Python mail archives (not a drop in replacement for hypermail or maintained according to upstream)
- MHonArc (280k SLOC of perl) used by Debian lists
- lurker (10k+ CPP) used by the free network group
- zest 1k of Python (looks like an interesting research project which I have not managed to get working yet…)
Notable mail archiving services:
- Google Groups for Usenet archives and much more, with a killer search feature
- The Mail Archive which uses MHonArc
Random thoughts:
- I think the thread view is the most important feature. Google’s approach to threading mail as a bunched up conversation works quite well, though I still prefer the tree threaded structure that mutt does so well.
- Mailman implements RFC2369, though the W3C doesn’t seem to support this, though it does use Archived-At: which is quite useful. Google doesn’t seem to employ archive links to HTML sadly.
- I am not fond of framed views that I’ve seen some mail archiving services provide.
- Has anyone really studied how a RFC2822 mail message should be marked up in HTML? 3.1. Formats of Archived Message does not address the problem.
Any other relevant RFCs, tools or tips I might have missed?

RFC 2822 has been superceded:
http://www.rfc-editor.org/rfc/rfc5322.txt
Since you apparently use
ikiwiki, you might be interested in the mailbox plugin that I wrote
http://pivot.cs.unb.ca/git/?p=ikimailbox.git;a=summary
I’ll be the first to admit that it is immature. But it is small (1.1 millimhonarc) and deals with threading. It could be somewhere to start.
There’s also mod_mbox from apache:
http://httpd.apache.org/mod_mbox/
…which sucks (there’s no search, no way to link to a thread rather than a message, browsing from message view to its thread loses context, etc). For a while some of the projects there used eyebrowse:
http://eyebrowse.tigris.org/
…which sucks harder – mails kept going missing from its index before apache abandoned it.
Another issue once, it has been archived on the Web… Search engine for mail data.
Maybe a good way to think about that issue is not that much a Web archive, but more about dynamic views of the mails. Example: Show me this mailing-list, Show me the thread of this discussions. etc. (ala Spotlight in Mail.app) Then we enter into performance, caching issues
#!/bin/bash export HM_SHOWHTML=1 export HM_LINKQUOTES=1 export HM_SHOWHEADERS=1 export HM_DEFAULTINDEX="thread" for i in ~/Mail/*/ do LABEL=$(basename $i) WWWDIR=/srv/www/archive.dabase.com/$LABEL for mail in $i/cur/* do echo $mail hypermail -L en -l $LABEL -i -u -d $WWWDIR < $mail done done sudo chown hendry:www-data -R $WWWDIR