savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Savannah-hackers-public] [Savannah-help-public] [gnu.org #705563] m


From: Bernie Innocenti via RT
Subject: Re: [Savannah-hackers-public] [Savannah-help-public] [gnu.org #705563] many messages missing from mail archives
Date: Mon, 22 Aug 2011 17:57:01 -0400

[cc += ward]

On Mon, 2011-08-22 at 21:18 +0000, Karl Berry wrote:

> Well, here are some ideas (actually, strong suggestions :) for the
> spamassassin settings:
> 
>       *  3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
>       *      [XXX.XXX.XXX.XXX listed in zen.spamhaus.org]
>       *  0.8 RCVD_IN_SORBS_WEB RBL: SORBS: sender is an abusable web server
>       *      [XXX.XXX.XXX.XXX listed in dnsbl.sorbs.net]
>       *  1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT
>       *      [XXX.XXX.XXX.XXX listed in bb.barracudacentral.org]
> 
> By considering multiple RBL's, they have a disproportionate effect.
> Also, the 3.3 for PBL seems especially exaggerated.  Savannah, my own
> servers, and many others have been the victim of incorrect blacklisting
> many times.  I strongly think their contribution to the overall score
> should be reduced.

Sounds reasonable to me, but the real expert in spam countermeasures is
Ward, I'll let him comment on this.


> Furthermore, individuals (Shailesh being a case in point) can in general
> not control whether their server is blacklisted and probably don't even
> know it.  Until mail is lost, wasting everyone's time.
> 
>       * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
>       *      [score: 0.0000]
> 
> This by itself should have been enough to overcome the blacklisting,
> seems to me, since it means the msg is almost certainly not spam.  Can
> this be given greater weight?  The next one down (BAYES_05?) is likely
> worth increasing too.

In recent times, Bayesian filters has become less and less effective,
because today spammers learned employ several techniques to defeat them.
I've seen myself plenty of obvious spam in my inbox which was flagged
with BAYES_00 by SpamAssassin.

Over time, I've learned to trust SpamAssassin's default scores, because
nowadays they're automatically optimized daily by the mass-check system
using a very large quantity of spam: http://ruleqa.spamassassin.org/

By the way, in case anyone wishes to contribute their corpus,
instructions are here: http://wiki.apache.org/spamassassin/MassCheck




>       *  0.6 HS_INDEX_PARAM URI: Link contains a common tracker pattern.
> 
> Every reply to a tracker is weighted toward spam?  That does not seem a
> reason to add to the spamicity.  Nearly all tracker comments are real,
> because the trackers themselves (like savannah's) already try hard to
> avoid spam.

This is what the rule matches:

 ##{ HS_INDEX_PARAM
 uri HS_INDEX_PARAM 
m'^https?:/*([^/]*/)+(?:index.(?:cgi|html?|php)|default.(?:asp|jsp))?\?(?!(?-i:[A-Z][a-z]{2,}){2,}$)\w+={0,2}$'i
 describe HS_INDEX_PARAM Link contains a common tracker pattern.
 ##} HS_INDEX_PARAM

This is the justification for it:

 http://wiki.apache.org/spamassassin/Rules/HS_INDEX_PARAM


Admittedly, it seems bogus, but the scores are computed automatically by
the above mentioned procedure, which means that probably it matches a
lot of spam with few false positives.

Instead of disabling this rule on eggs, we could whitelist
savannah.gnu.org to give all incoming mail from the bugtracker a +1
bonus.


>     by SpamAssassin currently go to quarantine maildirs that nobody ever
>     looks at. (I'm not suggesting that someone should, it would require a
> 
> Well, I and others would look at them.  Where is the quarantine?  Can we
> have access, if we don't already?

It's in /spam on lists.gnu.org, you should already have access. The
problem is that it contains 300K messages :-)


> That is, we certainly would not look all of them -- just the ones where
> the score was on the edge.  Then there would be a chance to revive
> messages wrongly considered spam.

I don't know how to re-inject miscategorized posts from the quarantine.
Ward probably knows the answer.


> If we saw the full SA configuration, we could compare against what we do
> for listhelper and perhaps improve both.

The configuration is on eggs, but it's pretty much the default for
Debian/Ubuntu package spamassassin_3.3.1-1, with the following tweaks in
local.cf:

----------8<-----------8<-----------8<-----------8<-----------8<----------
# SpamHaus local lookups
# SBL is the Spamhaus Block List: http://www.spamhaus.org/sbl/
header RCVD_IN_SBL              eval:check_rbl('sbl', 'sbl.fsfblacklists.', 
'127.0.0.2')
describe RCVD_IN_SBL            Received via a relay in Spamhaus SBL
tflags RCVD_IN_SBL              net

# XBL is the Exploits Block List: http://www.spamhaus.org/xbl/
header RCVD_IN_XBL              eval:check_rbl('xbl', 'xbl.fsfblacklists.', 
'127.0.0.[456]')
describe RCVD_IN_XBL            Received via a relay in Spamhaus XBL
tflags RCVD_IN_XBL              net

header RECEIVED_FROM_WINDOWS_HOST X-Detected-Operating-System =~ /Windows/
score RECEIVED_FROM_WINDOWS_HOST 2.5

header RCVD_IN_MSPIKE_BL eval:check_rbl('mspike-lastexternal', 
'bl.mailspike.net.')
tflags RCVD_IN_MSPIKE_BL net
score RCVD_IN_MSPIKE_BL 3.5

# All our machines but *not* RT which tends to forward some spam to our people,
# and we don't want that mail to get tagged with the trusted source SA score.
# See RT #617486. Ward, 2010-10-04
internal_networks 140.186.70.0/24 !199.232.76.167 199.232.76.160/28 
66.92.78.210 74.94.156.208/28
----------8<-----------8<-----------8<-----------8<-----------8<----------

Plus, of course, all the nightly updates from sa-update.

-- 
Bernie Innocenti
Systems Administrator, Free Software Foundation







reply via email to

[Prev in Thread] Current Thread [Next in Thread]