samizdat-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

squish queries - great idea + could they overload?


From: boud
Subject: squish queries - great idea + could they overload?
Date: Wed, 31 Jan 2007 19:36:48 +0100 (CET)

hi samizdat-devel,


SQUISH QUERIES - GREAT IDEA + COULD THEY OVERLOAD?

i just realised something which we don't seem to have "advertised" much
regarding samizdat vs other indymedia candidate cms'es, and it's something
which i more or less forgot about.

If i understand correctly, using e.g. the "All Replies" link in Links
or the squish enquiries, any anonymous user can make any enquiry directly
into the postgresql database with the authorization of the apache user.

i didn't think much about it, because it's clearly not a priority
for "non-techies".

However, IIUC, this is definitely an extremely good thing in terms of
scalability, non-hierarchy and distribution of information. It means,
for example, that each local imc could have its own version of the imc
contact database (all the public parts) and then a web of trust could
grow more organically, with the help of robots doing the administrative
side while people do the fun stuff. :)

Naive question: On the other hand, couldn't this be a problem in terms
of attacks on the system?  In trying to get mir to do something
slightly creative in order to provide parallel solutions to people who
had different views on publishing priorities, at some point our
"obvious" hack to a "template" (mir terminology) created a job which
would take 10-20 minutes to run each time somebody published a new
article. As long as no more than a few articles per hour were
published (and 48 articles a day would already be a lot) this was
annoying but not critically bad.  It turned out that the problem was
related to N^2 searching through the database.  This was not an
attempt to attack the system (on the contrary!), but it generated a
lot of excess load for the CPU.

People probably have already thought through this, and i have started
reading through query.rb, and maybe the answers lie there, but my question
remains (at least for the moment): are there some mechanisms in place to
make sure that robots (with good intentions) or spambots (with bad
intentions) cannot overload the cpu+database?

i can see one element of limiting this:

# Size Limits
#
limit:
  pattern: 7   # maximum size of search query pattern


i seem to remember that in postgresql there are some functions which
evaluate the "cost" of an enquiry. At the risk of adding extra work
(time delay) for all ordinary enquiries, it should presumably be
possible to first check the "cost" of the enquiry, and if the "cost"
would be too high (i.e. the enquiry would take too long), then refuse
to do it and take some sensible "rescue" action - e.g. inform the user
and suggest that s/he design a more efficient or less demanding
enquiry.


Any thoughts?

cheers
boud




reply via email to

[Prev in Thread] Current Thread [Next in Thread]