help-debbugs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: High system load due to bug mbox downloads


From: Ricardo Wurmus
Subject: Re: High system load due to bug mbox downloads
Date: Tue, 07 Apr 2020 22:31:16 +0200
User-agent: mu4e 1.2.0; emacs 26.3

Hi Glenn,

> I found the right log file. The problem was:
>
> #requests    host
> 47180        141.80.181.40   ci.guix.info
> 23877        84.173.77.179   someone at t-ipconnect.de
>
> These numbers refer to the past 8 hours!
>
> If someone needs to mirror the bug database, they should ask about eg
> rsync access instead of doing that.

I’m terribly sorry about this!  This was issues.guix.gnu.org.  The other
one was my laptop during development :-/

Up to today issue.guix.gnu.org used the SOAP service via Guile-Debbugs
to check for changes to bugs regularly.  This involved using the
get_bugs operation to get bug numbers, and then fetching the list of
messages with get_bug_log to extract the message numbers and compare
them with our cache.  The missing ones would be downloaded individually.

Thanks to the magic of abstractions I failed to see that the procedure I
used to get the message numbers is very wasteful, as get_bug_log returns
the contents of *all* the messages — even though I’m throwing them all
away just to get the message numbers.[1]

So today I thought: we can do better and avoid get_bug_log!  Not only
can we speed this all up but we also get to reduce load on
debbugs.gnu.org by using the SOAP service less often!  I thought it
would be better to make only *one* request per bug (to get one mbox with
all the messages for a bug), but to use the response headers to
determine whether to download or not.  If the modification time header
would indicate that the mbox file in our cache was old we would read the
response.  (I wanted to use Range headers to only ask for the
difference, but always got the full mbox back.)

The first big mistake here is that using GET gets the full mbox, even if
I’m only looking at the headers.  I don’t know if HEAD requests are
supported, but they should have been a much cheaper way to request the
modification times.  Since I made GET requests the server had to send
the full mbox even though I mostly didn’t even read it.  Embarrassing!

But I must wonder why requesting an mbox is more expensive than using
get_bug_log via SOAP (which we had been doing more often than I’d like
before today).  Increasing the load on debbugs.gnu.org was clearly not
intended, and I didn’t even think of asking for rsync access before that
because “guix” and “guix-patches” only have about 1700 unarchived bugs,
which didn’t seem like a very big number of requests to make
consecutively in regular intervals.

Turns out I was wrong.  My apologies again!

I have since disabled the updater on issues.guix.gnu.org.  Would you be
willing to allow us to rsync the message database instead?  This would
also be of great help to me for the implementation of more Debbugs
features in Mumi (the Guile application behind issues.guix.gnu.org).

Thank you for keeping debbugs.gnu.org running and for rescuing it from
my inadvertent abuse!

--
Ricardo

[1]: 
https://git.savannah.gnu.org/cgit/guile-debbugs.git/tree/debbugs/operations.scm#n139



reply via email to

[Prev in Thread] Current Thread [Next in Thread]