samizdat-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: antispam-0.1 patch


From: boud
Subject: Re: antispam-0.1 patch
Date: Tue, 12 Jun 2007 01:16:24 +0200 (CEST)

hi

On Mon, 11 Jun 2007, Dmitry Borodaenko wrote:

On 6/11/07, boud <address@hidden> wrote:
Here's a basic antispam feature.

Very nice! I'm starting to merge it into my branch, expect a new snapshot soon!

Glad you like it. :)

Another TODO:

TODO: Maybe check that the timing is reasonably optimised and will not
scale exponentially. In my latest copy of

  http://arch.thinkmo.de/cgi-bin/spam-merge

there are 7286 expressions and since 85% or so of us still do not have
any access to internet, there's still lots of potential for more wiki
spammers :P.  A fraction of a second delay in publishing an article or
comment should be OK, but we should be prepared for reasonably
predictable possible future evolution of the internet ecosystem.

In the ruby discussion about  (?>re)   type expressions [that's > not <],
 http://rubycentral.com/book/language.html
there's a discussion about how this avoids exponentially scaled searches.



And a comment.

AFAIK there's a guy called Daniel Brandt who has collected a lot of
info on US elites and intelligence agencies and is critical against
wikipedia, google and yahoo: http://en.wikipedia.org/wiki/Daniel_Brandt

It seems that because of this, he and his websites are directly targetted
in the spam-merge list - if the exclude_list option is not used, then someone trying to link to e.g. one of (google|wikipedia|yahoo)-watch\.org will be considered a spammer and unable to publish his/her article/reply.

My guess (i haven't had time to look deeply) is that the wiki community
is angry at brandt (especially for his attacks against wikipedia) and
decided that if he hates wikis so much, then there's no need for anyone
to make any links on wiki pages to his -watch.org pages.

This sounds rather authoritarian to me - his attacks against wikipedia
sound unreasonable to me (though i haven't looked at them in detail),
but that doesn't mean we should try to stop people from looking at his
-watch.org pages.

Also, namebase.org is a potentially extremely interesting page - it
aims to objectively map out the social/political structure of a lot of
the US governing elite. Considering this as spam is quite ridiculous IMHO.


So what do we do for the samizdat "out-of-the-box" default?

If we don't put any super-list in by default, then the antispam won't work
until a sysadmin configures it by hand.

If we do put this particular list in (maybe there are others - i
haven't made a general search), then (given the present version
anyway) we effectively encourage the blocking of some non-spam
dissident websites.

So if we include http://arch.thinkmo.de/cgi-bin/spam-merge by default, then IMHO we should also include the default exclude_list

   exclude_list: [ google-watch\\.org, yahoo-watch\\.org,
                   wikipedia-watch\\.org,
                   namebase\\.org ]

i'm not sure about whether or not we should include ln-s\\.net. The spam-merge list also lists a few variants on tinyurl.* , and it's true that spammers can use any of these to get through any URL regexp tests. On the other hand, sometimes it's useful to make tiny URLs, and ln-s.net is cool, linux-ish :).

My feeling is it's not as important as the daniel brandt sites, which are clearly not spam sites. (Well, maybe some stupid people have tried
to spam brandt's sites on wikis - but i expect that's not his fault.)


In any case, the general topic is similar to email antispam approaches -
the internet ecology is constantly evolving and both email and wiki antispam
tools should be regularly maintained/updated once the email address or website address have got onto spammers' target lists. In both cases, use of one or more community "superlists" created by some sort of
consensus mechanism are one of the more powerful approaches, since that
way the antispam evolves together with the spammers.  So some initial
"hardwiring" while advising/warning sysadmins to update is probably difficult to avoid if we want samizdat to work "out-of-the-box".


cheers
boud




reply via email to

[Prev in Thread] Current Thread [Next in Thread]