[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to feed Bayes on relay-only server?

From: Dan Nelson
Subject: Re: How to feed Bayes on relay-only server?
Date: Mon, 14 Jun 2004 10:17:35 -0500
User-agent: Mutt/1.5.6i

In the last episode (Jun 14), Thomas Cameron said:
> I know this has been discussed before, but I wanted to see if anyone
> had any recent experiences/advice.
> I am going to set up a relay-only server with SA and spamass-milter
> in front of my customer's real mail server (big commercial ISP, I
> can't do anything with the messages once they are delivered).  We
> will be changing the MX record to point to the server I am building
> then it will pass messages along to the big ISP.  The big ISP will
> not be published in DNS as being an MX.
> Is there any reasonably easy way to capture ham and spam so as to
> feed the Bayesian filter?

It primarily depends on the end-user's setups.  One relatively easy way
would be to set up "spam@" and "notspam@" email accounts, and have
something processing those mailboxes and training any
attached/forwarded messages.  This only works if your end-users mail
agents can forward the entire original message (including headers) as
an attachment or inline.  If they don't, then your options are limited,
since without the message you can't retrain.

What I do here with Lotus Notes clients is save all incoming messages
under 32k (spam is rarely bigger than that) to a MySQL database with
another milter.  I then wrote an agent that grabs just the Message-ID
out of tagged messages and submits them via xmlrpc to a daemon on the
mailserver, which extracts the full messages from the database and runs
sa-learn and razor-report on them.  I age messages out of the database
after a week.

If you have no control over your end-user's clients, maybe a
combination of both approaches would work.  Save all incoming mail, and
scour messages sent to spam/ham@ for enough information to pull the
orignal out of the database.  If there's a message-id in the forwarded
mail, you're home free.  Otherwise, filtering on subject and date
(maybe recipient) should get you close enough.

> I am considering feeding Bayes my personal account's ham and spam but
> I can't imagine that will be as helpful as if I could capture all the
> users' ham and spam.

        Dan Nelson

reply via email to

[Prev in Thread] Current Thread [Next in Thread]