[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [libreplanet-discuss] free email
Re: [libreplanet-discuss] free email
Wed, 22 May 2013 13:46:14 -0400
On Tue, May 21, 2013 at 12:55:13PM +0100, Ramana Kumar wrote:
> What is your email setup specifically? Like, what software do you use?
Postfix for SMTP, SpamAssassin, Pyzor, Razor, DSPAM, procmail, mutt,
and some custom code that looks up the AS number, GeoIP, and RBL
memberships of incoming IP addresses. I use Postfix, procmail, and
mutt more or less unmodified. By themselves, they'll easily handle
hundreds of thousands of emails per day on a modern server machine,
so they'll serve the needs of a few individual users many times over.
Spam filtering is a whole different matter.
If you're not intentionally running a spam trap and you get less than
1000 spam per day, SpamAssassin and DSPAM on an entry-level dedicated
server is fine, as long as you pay attention to what happens under
worst-case load conditions and plan/configure for them (e.g. if your
VPS has 1024MB of RAM it can reliably filter no more than two or three
emails simultaneously, so the MTA default limit of 100 local delivery
processes at once is probably too high).
My custom AS/GeoIP/RBL lookup code isn't really important. Only the
RBL lookups are useful for spam filtering, and SpamAssassin already does
those. The GeoIP and AS number lookups had to be debugged and working
so I could get the data that shows they're not as relevant as RBL data,
but I've had no reason to take them out since (and it's kind of nice
to have a header in every message that says what ISP and what part of
the world it came from). I also publish some graphs on how often the
RBL memberships match the final message's spam status, and those graphs
need data different from what SpamAssassin provides.
I run SpamAssassin configured to run only its static rules engine and
network checks, and annotate messages with lists of matching rules in the
headers; however, it is DSPAM that ultimately decides whether a message
is spam or not. SpamAssassin's false positive rate is far too high for me
(well over 0.1%, which would make lost mail a weekly event), its algorithm
for combining rule results (adding scores) makes no mathematical sense,
and its machine learning storage backends are too slow to keep up with my
mail load; however, SA's static rules can identify pattern-based message
features like "fake Outlook headers" or "looks like Nigerian 419 scam"
and the network checks can add information like "listed in Pyzor"
or "contains blacklisted URI" which help DSPAM make better filtering
decisions for specific flavors of spam.
DSPAM is a Bayes classifier, so it automatically learns what my non-spam
email looks like (and which SA rules and RBL listings are reliable
indicators of non-spam or spam and which are noise). The price for
configuration simplicity is that I have to provide timely feedback to the
filter every single time spam gets through (or non-spam gets trapped),
or DSPAM learns the wrong things. I have macros in mutt to retrain
the filter on misclassified messages, so in practice this is a single
keypress per message.
Any Bayes classifier will work as long as it looks at mail headers
(beware, some do not!). What I like in particular about DSPAM is the
API for remote storage, so I can split filtering between one machine
that has access to the readable message text and another that has hashed
token statistics. A well-trained spam filter can end up knowing who
your friends are, where you shop, where you live, and what work you do,
so the token stats DB has some privacy implications. DSPAM's architecture
helps since the DSPAM data store is not as easy to read as a folder full
of plaintext email messages (though it's still possible to extract some
private information with a dictionary attack).
Presumably you are not visiting your VPS mail server in its data center,
so if you have a significantly more capable machine to run your MUA you
can do the spam filtering on that, and let your VPS server handle only
unsophisticated MTA and storage tasks.
Having used both, if I had to build my mail server again, I'd use Exim
instead of Postfix. Postfix has a lot of configuration restrictions
designed to protect unsophisticated administrators from their own
ignorance, like not being able to use back-references in regexp rules,
or having to jump through obscure hoops to match multiple SMTP envelope
fields in a single message (e.g. you want to redirect a message when
client IP = x AND recipient address = y). In practice I want to do these
things fairly often. Exim can easily express these kinds of rules in
its configuration language, but Postfix requires external mail filter
daemons, internal code changes, or non-default build options.
> And does anyone on list have any experience with migrating out of Gmail to
> a self-hosted setup for email?
If it's a personal Gmail account you can set it up to do SMTP forwarding
(Google pushes to your server) or enable IMAP access (your server
pulls from Google, e.g. with fetchmail) on the Gmail side. I use IMAP
because IMAP lets me scrape my Gmail Spam folder (Google false-positives
almost as badly as SpamAssassin, and the SMTP forwarding will not forward
anything it believes is spam). This lets you transition gracefully to a
new email address. In the future, when the only people sending mail to
your Gmail address are spammers, you can turn the Gmail account off.
> I have a VPS, and I'm using exim to just forward messages sent to my
> domain to my Gmail account, but I would really like to be just keeping it
> all on my server and dropping the Gmail.
Pick a MUA (or set up an IMAP server) and a spam filter, set up local
delivery, and you're mostly done already.
> If anyone else is in a similar situation, maybe we can work it out
Description: Digital signature