Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v

pan-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of v

From:	Steven D'Aprano
Subject:	Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Date:	Sun, 5 Jul 2009 17:58:14 +1000
User-agent:	KMail/1.9.9

On Sun, 5 Jul 2009 04:18:07 pm Matej Cepl wrote:
> Steven D'Aprano, Sun, 05 Jul 2009 11:58:28 +1000:
> > The missus uses Thunderbird, and as near as we can tell, its spam
> > filtering is crap. She found false negative rates approaching 50%
> > (half the actual spam was flagged as good) and false positive rates
> > approaching 10% (one out of ten good emails was flagged as spam).
>
> No, it isn't, but the problem is that as every Bayesian filter (and I
> am a big fan of them), it needs a lot of training. Thunderbird trying
> to be easy of use hides its users from this ugly fact[1] and delivers
> some kind of generalized set of trained data for some generalized
> entity eliminating by it the biggest strength of Bayesian filters,
> which is that they are unpredictable by spammers and indivualized to
> ones mailing patterns. If properly trained (by several THOUSAND of
> BOTH spam and ham messages), it can work pretty well.

Sounds like crap to me *grins*. SpamAssassin gives very good 
results "out of the box", without training, and performs even better 
when trained against a hundred or so spams and hams. I know people who 
run SA with Bayesian filtering essentially disabled (they give it a 
score of zero and never do any learning) and still get excellent 
results from it. I guess Thunderbird's real problem is it relies 
entirely on Bayesian filtering.

My wife's biggest problem was Thunderbird's general mail filter, which 
seemed to be just out-and-out buggy.

-- 
Steven D'Aprano

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?), (continued)

Prev by Date: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Next by Date: Re: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Previous by thread: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Next by thread: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Index(es):
- Date
- Thread