[libmicrohttpd] Recursive mutexes (was: Re: shutdown of listen socket do

libmicrohttpd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[libmicrohttpd] Recursive mutexes (was: Re: shutdown of listen socket do

From:	Will Bryant
Subject:	[libmicrohttpd] Recursive mutexes (was: Re: shutdown of listen socket does not work on solaris 10)
Date:	Mon, 19 Sep 2011 13:00:28 +1200

I have a patch for this almost working on OS X, but I'm getting hangs running 
"make check" in daemontest_timeout.  My patch is causing an error, which I'll 
look into, but it's hanging because the current error handling code tries to 
recursively lock the response mutex.

The call stack is this:

#0  0x00007fff8ec3cbf2 in __psynch_mutexwait ()
#1  0x00007fff8db251a1 in pthread_mutex_lock ()
#2  0x00000001013a6ec5 in MHD_destroy_response (response=0x7ffe92000000) at 
response.c:436
#3  0x00000001013a363d in MHD_connection_close (connection=Cannot access memory 
at address 0x0
) at connection.c:316
#4  0x00000001013a375a in try_ready_normal_body (connection=0x7ffe92000028) at 
connection.c:337
#5  0x00000001013a15da in MHD_connection_handle_write (connection=0x101430d70) 
at connection.c:1871
#6  0x00000001013a69e9 in MHD_select () at daemon.c:1458
#7  0x00000001013a6a41 in MHD_run (daemon=0x7ffe92000028) at daemon.c:1592
#8  0x000000010139db2e in main (argc=-1845493720, argv=0x603) at 
test_callback.c:185

It's occurring because the code does not request a recursive mutex, and since 
try_ready_normal_body is called with the mutex locked:

        case MHD_CONNECTION_NORMAL_BODY_READY:
          response = connection->response;
          if (response->crc != NULL)
            pthread_mutex_lock (&response->mutex);
          if (MHD_YES != try_ready_normal_body (connection))
            {
              if (response->crc != NULL)
                pthread_mutex_unlock (&response->mutex);
              break;
            }

when connection_close_error calls MHD_connection_close and it calls 
MHD_destroy_response, the second attempt to lock the mutex hangs.

So should we start requesting recursive mutexes so we can be sure that error 
handling will work?  On some platforms POSIX mutexes are by default recursive, 
but POSIX says you have to ask for them to be sure.




On 19/09/2011, at 00:18 , Christian Grothoff wrote:

> I see.  But given that on Linux it works without a pipe and that a pipe also 
> "consumes" two extra FDs, I think the answer should be slightly different: 
> revert 14584 *conditionally* for non-Linux, non-FreeRTOS OSes.  So we'd have 
> pipes on Solaris/*BSD/OS X, and no-pipes on Linux/FreeRTOS.  That should make 
> everyone happy (at the expense of a few additional #ifdef's in the code, a 
> price we can probably pay, right?).
> 
> Now, I won't have time to do this today, but I do take patches along these 
> lines ;-).
> 
> Happy hacking,
> 
> Christian
> 
> On 09/18/2011 01:13 AM, Will Bryant wrote:
>> Coincidentally I discovered the same problem on Mac OS X last night, while 
>> trying to debug issue #1760.  There we had noticed that the shutdown call 
>> was failing and I had also noticed that my app does not shut down ever since 
>> I upgraded past 0.9.7, and the library's tests were not running properly.
>> 
>> It turns out that libmicrohttpd has been failing tests (well, they hang 
>> rather than failing) on OS X ever since 0.9.8.  I did a bisect to find the 
>> offending commit and found it was r14584, "also eliminating use of pipe, 
>> thereby addressing #1662".
>> 
>> So basically because the original requestor on FreeRTOS couldn't use pipe(), 
>> a solution has been adopted that I believe doesn't work on many unixes other 
>> than Linux - it's definitely broken on OS X and Solaris, and my 
>> understanding from #1674 is that it was also broken on Cygwin - though the 
>> #1674 workaround which has been committed presumably works on cygwin but 
>> doesn't seem to work on OS X when I try it.
>> 
>> Can anyone confirm if shutdown still works on the BSDs?  I would have 
>> guessed not since OS X tends to be closer to the BSD behavior.
>> 
>> I agree with Christian that using close() on the socket is not an acceptable 
>> solution due to the race condition.
>> 
>> I personally think that the best thing to do would be to revert the r14584 
>> commit to start with, because the pipe solution was a good one for most 
>> supported platforms, and if the FreeRTOS support is still required, let the 
>> requestor work on another solution for platforms missing most of the posix 
>> spec (it could use a localhost socket pair, for example) or patch the 
>> library to use the shutdown technique on that platform only.
>> 
>> Adding complicated workarounds for platforms that don't have many basic OS 
>> features seems like an unfair burden on libmicrohttpd.
>> 
>> 
>> On 16/09/2011, at 11:53 PM, Frank Meier wrote:
>> 
>>> hi
>>> 
>>> I tried to use libmicrohttpd on solaris 10, and my simple programm only 
>>> starting and stopping the daemon (see below) did hang in.
>>> after some investigation (see debug code) I found, that the shutdown call 
>>> in daemon.c fails, since the daemon socket is not a connected fullduplex 
>>> but a listening socket. this means the daemon thread keeps hanging in 
>>> select() and main hangs in pthread_join() (see pstack output). problem is 
>>> since the socket doesn't get closed, select does not interrupt. After 
>>> adding CLOSE (fd) after the point of SHUTDOWN () the programm did work. It 
>>> looks like solaris and linux behave differently when shutdown is called on 
>>> a listening socket. But on the other hand I'm not sure if it is correct to 
>>> call shutdown in the first place, since the man page states "The  
>>> shutdown() call causes all or part of a full-duplex connection on the 
>>> socket associated with sockfd to be shut down" and the listening socket 
>>> does not have a connection.
>>> 
>>> I think the socket should be closed with CLOSE before calling 
>>> pthread_join() and use SHUTDOWN only on connection sockets.
>>> 
>>> regards, Frank
>>> 
>>> 
>>> /*test programm*/
>>> struct MHD_Daemon* mhdDaemon = 0;
>>> mhdDaemon = MHD_start_daemon (MHD_USE_THREAD_PER_CONNECTION, mhdPort, NULL, 
>>> NULL,&answer_to_connection, NULL, MHD_OPTION_END);
>>> if (mhdDaemon)
>>> {
>>>    MHD_stop_daemon (mhdDaemon);  //<-here it hangs
>>>    mhdDaemon = 0;
>>> }
>>> 
>>> /*debug code in daemon.c*/
>>> --- daemon/daemon.c     (revision 16873)
>>> +++ daemon/daemon.c     (working copy)
>>> @@ -2477,7 +2477,11 @@
>>>       daemon->worker_pool[i].shutdown = MHD_YES;
>>>       daemon->worker_pool[i].socket_fd = -1;
>>>     }
>>> -  SHUTDOWN (fd, SHUT_RDWR);
>>> +       int res = SHUTDOWN (fd, SHUT_RDWR);
>>> +       int tmpErrno = errno;
>>> +       printf("shutdown res:%d  %s\n", res, strerror(tmpErrno));
>>> +
>>> +       CLOSE (fd);
>>> 
>>> ->output with debug code:
>>> shutdown res:-1  Transport endpoint is not connected
>>> 
>>> /*pstack output ->  damon thread is in select, main hangs in pthread join*/
>>> $ pstack 16802
>>> 16802:  ./test.bin
>>> -----------------  lwp# 1 / thread# 1  --------------------
>>> feee9365 lwp_wait (2, 80478d8)
>>> feee2c1f _thrp_join (2, 0, 8047918, 1) + 5a
>>> feee2d9e pthread_join (2, 8047918, feeec7e0, fef9665d) + 2b
>>> fef96777 MHD_stop_daemon (8060d40, 0, 0, fee63301) + 12b
>>> 0805080a main     (1, 8047988, 8047990) + 6f
>>> 080506ac _start   (1, 8047a88, 0, 8047a93, 8047abe, 8047b78) + 60
>>> -----------------  lwp# 2 / thread# 2  --------------------
>>> feee8da5 pollsys  (fed5ed50, 1, 0, 0)
>>> fee93afa pselect  (4, fed5ef30, fed5eeb0, fed5ee30, 0, 0) + 18e
>>> fee93df0 select   (4, fed5ef30, fed5eeb0, fed5ee30, 0, 0) + 82
>>> fef95ba4 MHD_select (fec50200, fec50200, fed5efec, feee59b9, 8060d40, 0) + 
>>> 130
>>> fef96248 MHD_select_thread (8060d40) + 18
>>> feee59b9 _thr_setup (fec50200) + 4e
>>> feee5ca0 _lwp_start (fec50200, 0, 0, fed5eff8, feee5ca0, fec50200)
>> 
>> 
> 
>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, (continued)
- Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Will Bryant, 2011/09/18
  - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Christian Grothoff, 2011/09/18
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Will Bryant, 2011/09/19
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Christian Grothoff, 2011/09/19
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Will Bryant, 2011/09/20
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Christian Grothoff, 2011/09/20
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Will Bryant, 2011/09/20
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Christian Grothoff, 2011/09/21
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Frank Meier, 2011/09/22
    - Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Peter Ross, 2011/09/19
    - [libmicrohttpd] Recursive mutexes (was: Re: shutdown of listen socket does not work on solaris 10), Will Bryant <=
    - Re: [libmicrohttpd] Recursive mutexes (was: Re: shutdown of listen socket does not work on solaris 10), Christian Grothoff, 2011/09/19
- [libmicrohttpd] shutdown of listen socket does not work on solaris 10, Frank Meier, 2011/09/17

Prev by Date: Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10
Next by Date: Re: [libmicrohttpd] MHD_create_response_from_callback
Previous by thread: Re: [libmicrohttpd] shutdown of listen socket does not work on solaris 10
Next by thread: Re: [libmicrohttpd] Recursive mutexes (was: Re: shutdown of listen socket does not work on solaris 10)
Index(es):
- Date
- Thread