[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [libmicrohttpd] "Failed to join a thread": Race condition when closi
Re: [libmicrohttpd] "Failed to join a thread": Race condition when closing the connection?
Fri, 2 Feb 2018 20:53:35 +0100
On Fri, 2018-02-02 09:27:59 +0100, Christian Grothoff <address@hidden> wrote:
> This is very strange, the loop should not fix this, as pthread_join
> should simply block (not race!) until the thread is done. In fact, I
> generally think the right answer to ESRCH would be to die, as to me this
> indicates some kind of memory corruption or other severe invariant
Your assumption though doesn't match my experience. Just calling
pthread_join() again after a delay of 10msec did the job. (I placed
the loop to give it a few more tries if needed.)
> Now, given that you mentioned changes related to popen()-logic in your
> own code, I wonder if the change in your _application_ logic related to
> fork() may be interoperating badly with threads. In particular, after
> you fork(), all of the "other" threads will be gone, so if you fork()
> and then continue any MHD-interaction related to the threads spawned by
> MHD, that is likely to be, eh, problematic --- and may show up with an
> ESRCH. However, that doesn't quite explain to me why putting this in a
> loop with sleeps might fix it. (But I don't know enough about your code.)
The code in use is https://github.com/famzah/popen-noshell ,
with a small wrapper to really make it look like popen()/pclose(),
which simply puts the neede clean-up struct into a hash table with the
fp as its key. pclose() then uses the fp to recover the clean-up
struct pointer to be supplied to the simplified pclose() variant.
> Regardless, the loop/sleep is a very, very wrong fix, and I strongly
> suspect the problem is in your code (or how you use the MHD API, in
> conjunction with fork()).
You're completely right here. I wrote some more small test programs,
and I observe two things:
* pthread_join() indeed waits as promised in my tests; and
* I cannot reproduce that non-waiting / failing behavior with any of
my test attempts so far.
I did, however, find _one_ similar bug report, where pthread_join()
failed in a similar way:
Unfortunately, the provided testcase is incorrect (see my comment
there) and and this bug report wasn't ever finished, so I don't know
if the bad testcase exists similarly in their original application.
However, since libmicrohttpd just ignores the thread's result (passing
a NULL pointer to pthread_join(), in conjunction with my observation
that it will work on a second try), I'm quite confident that something
different is happening here.
Looping until success (limited to a small justificable timespan)
isn't a correct fix of course. And indeed, pthread_join() probably
should wait, so I'm off again trying to find out in which situations
this couldn't happen.
Getslash GmbH, Hermann-Johenning-Platz 2, 59302 Oelde
Tel: +49-2522-834349-5 Fax: +49-2522-834349-1
http://www.getslash.de Mobil: +49-152-33822499
Sitz der Gesellschaft: Oelde
Handelsregister: Amtsgericht Münster, HRB 11911
Ust-Id-Nr.: DE 815060326
Geschäftsführung: Andre Peitz, Tobias Hanisch
Description: PGP signature