[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to feed Bayes on relay-only server?

From: Thomas Cameron
Subject: Re: How to feed Bayes on relay-only server?
Date: Mon, 14 Jun 2004 21:24:13 -0500

----- Original Message ----- 
From: "Dan Nelson" <address@hidden>
To: "Thomas Cameron" <address@hidden>
Cc: <address@hidden>
Sent: Monday, June 14, 2004 10:17 AM
Subject: Re: How to feed Bayes on relay-only server?

> It primarily depends on the end-user's setups.  One relatively easy way
> would be to set up "spam@" and "notspam@" email accounts, and have
> something processing those mailboxes and training any
> attached/forwarded messages.  This only works if your end-users mail
> agents can forward the entire original message (including headers) as
> an attachment or inline.  If they don't, then your options are limited,
> since without the message you can't retrain.
> What I do here with Lotus Notes clients is save all incoming messages
> under 32k (spam is rarely bigger than that) to a MySQL database with
> another milter.

Which milter is that?  Sounds  intriguing...

> I then wrote an agent that grabs just the Message-ID
> out of tagged messages and submits them via xmlrpc to a daemon on the
> mailserver, which extracts the full messages from the database and runs
> sa-learn and razor-report on them.  I age messages out of the database
> after a week.

But how do you tell SA which is spam and which is ham?  I'm looking at my
inbox and most of my (non-spam) messages are under 32k.

> If you have no control over your end-user's clients, maybe a
> combination of both approaches would work.  Save all incoming mail, and
> scour messages sent to spam/ham@ for enough information to pull the
> orignal out of the database.  If there's a message-id in the forwarded
> mail, you're home free.  Otherwise, filtering on subject and date
> (maybe recipient) should get you close enough.

What about auto-learning?  If I understand correctly, SA auto-learns for
messages which score very high or very low.  Would it make sense to change
those thresholds so that SA is more likely to auto-learn?  Or am I not
understanding the auto-learn?

Thanks for all the feedback.  Keep it coming!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]