[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: antispam-0.1 patch
From: |
boud |
Subject: |
Re: antispam-0.1 patch |
Date: |
Tue, 12 Jun 2007 01:16:24 +0200 (CEST) |
hi
On Mon, 11 Jun 2007, Dmitry Borodaenko wrote:
On 6/11/07, boud <address@hidden> wrote:
Here's a basic antispam feature.
Very nice! I'm starting to merge it into my branch, expect a new snapshot
soon!
Glad you like it. :)
Another TODO:
TODO: Maybe check that the timing is reasonably optimised and will not
scale exponentially. In my latest copy of
http://arch.thinkmo.de/cgi-bin/spam-merge
there are 7286 expressions and since 85% or so of us still do not have
any access to internet, there's still lots of potential for more wiki
spammers :P. A fraction of a second delay in publishing an article or
comment should be OK, but we should be prepared for reasonably
predictable possible future evolution of the internet ecosystem.
In the ruby discussion about (?>re) type expressions [that's > not <],
http://rubycentral.com/book/language.html
there's a discussion about how this avoids exponentially scaled searches.
And a comment.
AFAIK there's a guy called Daniel Brandt who has collected a lot of
info on US elites and intelligence agencies and is critical against
wikipedia, google and yahoo:
http://en.wikipedia.org/wiki/Daniel_Brandt
It seems that because of this, he and his websites are directly targetted
in the spam-merge list - if the exclude_list option is not used, then
someone trying to link to e.g. one of
(google|wikipedia|yahoo)-watch\.org
will be considered a spammer and unable to publish his/her article/reply.
My guess (i haven't had time to look deeply) is that the wiki community
is angry at brandt (especially for his attacks against wikipedia) and
decided that if he hates wikis so much, then there's no need for anyone
to make any links on wiki pages to his -watch.org pages.
This sounds rather authoritarian to me - his attacks against wikipedia
sound unreasonable to me (though i haven't looked at them in detail),
but that doesn't mean we should try to stop people from looking at his
-watch.org pages.
Also, namebase.org is a potentially extremely interesting page - it
aims to objectively map out the social/political structure of a lot of
the US governing elite. Considering this as spam is quite ridiculous IMHO.
So what do we do for the samizdat "out-of-the-box" default?
If we don't put any super-list in by default, then the antispam won't work
until a sysadmin configures it by hand.
If we do put this particular list in (maybe there are others - i
haven't made a general search), then (given the present version
anyway) we effectively encourage the blocking of some non-spam
dissident websites.
So if we include http://arch.thinkmo.de/cgi-bin/spam-merge by default, then
IMHO we should also include the default
exclude_list
exclude_list: [ google-watch\\.org, yahoo-watch\\.org,
wikipedia-watch\\.org,
namebase\\.org ]
i'm not sure about whether or not we should include ln-s\\.net.
The spam-merge list also lists a few variants on tinyurl.* , and it's
true that spammers can use any of these to get through any URL regexp
tests. On the other hand, sometimes it's useful to make tiny URLs,
and ln-s.net is cool, linux-ish :).
My feeling is it's not as important as the daniel brandt sites, which
are clearly not spam sites. (Well, maybe some stupid people have tried
to spam brandt's sites on wikis - but i expect that's not his fault.)
In any case, the general topic is similar to email antispam approaches -
the internet ecology is constantly evolving and both email and wiki antispam
tools should be regularly maintained/updated once the email address or
website address have got onto spammers' target lists. In both cases,
use of one or more community "superlists" created by some sort of
consensus mechanism are one of the more powerful approaches, since that
way the antispam evolves together with the spammers. So some initial
"hardwiring" while advising/warning sysadmins to update is probably difficult
to avoid if we want samizdat to work "out-of-the-box".
cheers
boud