ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-discuss] Re: Large spam-only .idata file available


From: clemens fischer
Subject: [Ifile-discuss] Re: Large spam-only .idata file available
Date: 6 May 2003 14:09:45 +0200
User-agent: Gnus/5.090023 (Oort Gnus v0.23) Emacs/21.3 (berkeley-unix)

"Jonadab the Unsightly One" <address@hidden>:

> Is 45 thousand enough to give solid results, or would it be helpful to
> have an additional twenty-six-thousand-message spam collection?

it might even be too much!  note that with ten times as many spams
than hams ifile will think many legit messages to be spam, just
because some of the words both categories have in common have high
counts in `spam'.

for example:  i'm currently testing bogofilter-0.11.2 (it only does
spam/ham detection, but it also does base64 decoding and it tags IP
addresses), and i have around 16.000 good messages vs. 999 spams in a
database build afresh during the past weeks.  this setup gives me
false negatives (quite a few spams categorized as ham).  with each new
spam (re-)classified the recision improves.

so with the lot of spam-corpi available on the net recently, we might
want to start collecting ham for technically oriented people  :)

  clemens




reply via email to

[Prev in Thread] Current Thread [Next in Thread]