[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#67988: [Cuirass] ‘request-work’ responses received by several worker
From: |
Ludovic Courtès |
Subject: |
bug#67988: [Cuirass] ‘request-work’ responses received by several workers |
Date: |
Fri, 31 May 2024 21:55:16 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
> I’m under the impression that sometimes, when the server replies to
> ‘worker-request-work’ messages, its reply is received by more than just
> the target worker, leading to builds being performed twice:
On closer inspection, the theory of the message being received by two
different peers doesn’t hold.
Instead, I believe ‘db-get-pending-build’ would return the same build at
two different points in time, typically while the first one is still
running.
That’s normally not possible because the build’s status is changed to
‘submitted’ once it’s been picked up. Turns out that, due to slowness
of the query in ‘db-get-pending-build’ (fixed in
17338588d4862b04e9e405c1244a2ea703b50d98), ‘remote-server’ would
sometimes fail to see worker pings in a timely fashion. Thus, it would
call ‘db-remove-unresponsive-workers’, which would reschedule builds
that were being carried out by said worker(s). And that’s how we would
end up with multiple concurrent builds of the same derivation.
I added logging in c2061ca845d05694ebeb88935a6ff2254711beb2, which
should give a hint, should that happen again.
Ludo’.