[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v

From: Steven D'Aprano
Subject: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Date: Sun, 5 Jul 2009 17:58:14 +1000
User-agent: KMail/1.9.9

On Sun, 5 Jul 2009 04:18:07 pm Matej Cepl wrote:
> Steven D'Aprano, Sun, 05 Jul 2009 11:58:28 +1000:
> > The missus uses Thunderbird, and as near as we can tell, its spam
> > filtering is crap. She found false negative rates approaching 50%
> > (half the actual spam was flagged as good) and false positive rates
> > approaching 10% (one out of ten good emails was flagged as spam).
> No, it isn't, but the problem is that as every Bayesian filter (and I
> am a big fan of them), it needs a lot of training. Thunderbird trying
> to be easy of use hides its users from this ugly fact[1] and delivers
> some kind of generalized set of trained data for some generalized
> entity eliminating by it the biggest strength of Bayesian filters,
> which is that they are unpredictable by spammers and indivualized to
> ones mailing patterns. If properly trained (by several THOUSAND of
> BOTH spam and ham messages), it can work pretty well.

Sounds like crap to me *grins*. SpamAssassin gives very good 
results "out of the box", without training, and performs even better 
when trained against a hundred or so spams and hams. I know people who 
run SA with Bayesian filtering essentially disabled (they give it a 
score of zero and never do any learning) and still get excellent 
results from it. I guess Thunderbird's real problem is it relies 
entirely on Bayesian filtering.
My wife's biggest problem was Thunderbird's general mail filter, which 
seemed to be just out-and-out buggy.

Steven D'Aprano

reply via email to

[Prev in Thread] Current Thread [Next in Thread]