[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ANN: CRM114 backend for GNUS spam filtering

From: Daniel Franke
Subject: Re: ANN: CRM114 backend for GNUS spam filtering
Date: Sun, 25 May 2008 15:46:30 -0700
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) (Adam Sjøgren) writes:

> Interesting, I haven't heard of mailreaver.crm (haven't been following
> crm114 either, it is just working).
> What are the differences, and should we update spam.el to use
> mailreaver.crm instead?

Quoth the source:

#     This is MailReaver, the 3rd Generation mail filter "standard script" 
#     for CRM114.  The goal is to make a more maintainable, understandable,
#     and easier-to-customize mail filter.
#    1) we use the consolidated library "maillib.crm" for most shareable
#       things like parsing the .cf file, munging text, cacheing text, etc.
#    2) we always use the CacheIDs and the Reaver (maildir-like) format 
#       for storing incoming email in unaltered form if there is 
#       any possibility of training it. 
#    3) We always train using mailtrainer.crm rather than training 
#       internally.  Thus, if you want to change the way things are
#       trained, you need to look at mailtrainer.crm as well.

Number 3 is an important advantage.  Training a Bayesian filter with every
single email is generally considered a bad idea.  You'll get better accuracy
if you only train it on the ones it gets wrong or uncertain.  Presently,
handling this from Gnus is awkward.  However, with mailtrainer.crm you don't
have to worry about it, because it's smart enough to ignore messages that it
already gets the right answer on.

So the answer is yes, you should definitely switch to MailReaver.
However, be careful about the reaver cache.  When MailReaver checks a
message, it caches it and inserts an X-CRM114-CacheID header.  It'll
later look for this header when you pass the message back for
training.  The advice that I gave with my implementation is to have
your MTA/MDA run messages through CRM114 before Gnus sees them, so
that the CacheID will already be there; or if you can't do this, then
at least make certain that you enable `spam-log-to-registry'.

 Daniel Franke
 |----| =|\     \\\\    
 || * | -|-\---------   Man is free at the instant he wants to be. 
 -----| =|  \   ///     --Voltaire

reply via email to

[Prev in Thread] Current Thread [Next in Thread]