Re: [libmicrohttpd] Problems with latency and getting "stuck"

libmicrohttpd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [libmicrohttpd] Problems with latency and getting "stuck"

From:	maurice barnum
Subject:	Re: [libmicrohttpd] Problems with latency and getting "stuck"
Date:	Mon, 07 Apr 2014 14:05:44 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 04/05/2014 11:21 PM, Christian Grothoff wrote:

On 04/05/14 02:03, maurice barnum wrote:

Hi.

I'm working on a project where I want to use libmicrohttpd to handle
incoming HTTP connections that are then forwarded to a ZeroMQ server.


It would be helpful if you would mention which MHD version you are using.


0.9.34

Three problems I'm encountering:
  * latency is bad due to a significant delay between when I wake up
a connection with MHD_resume_connection and when MHD_run calls the
corresponding handler callback


What do you consider a 'significant delay' here? In your data plot, it
seems you're concerned with each millisecond or a few microseconds, but
maybe I missed something.

  * each run of MHD_run will accept several incoming connetions but
only "retire" one of the connections I've resumed.


"retire" or "resume"?  I am pretty sure MHD_run is perfectly happy to
tear down multiple connections during one invocation.  And yes, as
MHD_run may accept fresh connections before processing 'resume' events,
this may increase latency, especially given that you're using a single
thread for all processing.  If you are concerned with
milli/micro-seconds, I wonder why you are not using a thread pool.


I have separate experiments with using a thread pool that make blocking
calls to my backend v. queuing the request and suspending the
connection thread.  I may return to that approach when I understand the
issues with my current one.

I was surprised to see that MHD_run(), in my traces, never called my
callback for more than one resumed connection.  For example, following
one zmq_poll, I resume 16 connections ("5" in the output is right after
MHD_resume_connection), but the next call to MHD_run results in only a
single callback ("6" in the output is after calling MHD_queue_response)
resumed connection is that pattern I've seen:

  1396654686212937 -> ZMQ_POLL -1 20  | | | | | | | | | | | | | | | | | | | |
* 1396654686215761 <- 1               | | | | | | | | | | | | | | | | | | | |
  1396654686215766 0x2470f00          | | | | | | | | 5 | | | | | | | | | | |
  1396654686215768 0x2470e60          | | | | | | | | | 5 | | | | | | | | | |
  1396654686215769 0x2470d20          | | | | 5 | | | | | | | | | | | | | | |
  1396654686215769 0x2553280          | | | | | | | | | | | | 5 | | | | | | |
  1396654686215770 0x2470b40          | | | | | | | 5 | | | | | | | | | | | |
  1396654686215771 0x2553140          | | | | | | | | | | | | | | 5 | | | | |
  1396654686215771 0x2470aa0          | 5 | | | | | | | | | | | | | | | | | |
  1396654686215772 0x25530a0          | | | | | | | | | | | | | | | 5 | | | |
  1396654686215773 0x25531e0          | | | | | | | | | | | | | 5 | | | | | |
  1396654686215773 0x2470be0          | | | | | | 5 | | | | | | | | | | | | |
  1396654686215774 0x2470a00          | | 5 | | | | | | | | | | | | | | | | |
  1396654686215775 0x2470dc0          | | | | | | | | | | 5 | | | | | | | | |
  1396654686215776 0x2553320          | | | | | | | | | | | 5 | | | | | | | |
  1396654686215776 0x23f5140          5 | | | | | | | | | | | | | | | | | | |
  1396654686215777 0x2470c80          | | | | | 5 | | | | | | | | | | | | | |
  1396654686215778 0x2470960          | | | 5 | | | | | | | | | | | | | | | |
  1396654686215779 -> MHD_RUN 20      | | | | | | | | | | | | | | | | | | | |
  1396654686215893 0x25530a0          | | | | | | | | | | | | | | | 6 | | | |
  1396654686215961 -> ZMQ_POLL 6 20   | | | | | | | | | | | | | | | | | | | |
  1396654686215981 <- 1               | | | | | | | | | | | | | | | | | | | |
  1396654686215982 -> MHD_RUN 20      | | | | | | | | | | | | | | | | | | | |
* 1396654686216009 0x25530a0          | | | | | | | | | | | | | | | X | | | |
  1396654686216050 0x2553140          | | | | | | | | | | | | | | 6   | | | |
  1396654686216060 -> ZMQ_POLL 6 19   | | | | | | | | | | | | | | |   | | | |


note: the trace doesn't show the return from MHD_RUN, which is
immediately before each "-> ZMQ_POLL" line.  I'll debug this to try and
understand what is happening.

  * eventually, everything stops:  I call zmq_poll with a timeout of -1
(MHD_get_timeout() returned 0), but the epoll fd never signals read-in
even when new connections come in.


You have set a connection limit of 25. If you suspended 25 connections,
the accept FD (and all 25 suspended connections) will be out of the
epoll set, and your server grinds to a halt until you resume a connection.


That makes sense, but I observe the server grinds to a halt after all
connections have been resumed and deleted (the "X" in the trace is from
the MHD_OPTION_NOTIFY_COMPLETED callback).

My event loop listens on a ZeroMQ socket and the epoll fd returned from
MHD.  The loop looks basically like (pseudo-code):

    while true:
      MHD_run(md)
      timeout = MHD_get_timeout() / 1000
      if not timeout:
        timeout = -1
      zmq_poll(items, len(items), timeout)


I hope you're somehow having the MHD epoll socket in the zmq_poll set here.


Yes, using MHD_get_daemon_info().


 ...

Any ideas on where to look/debug is appreciated.  I've attached a
digested trace of my debug output that shows the behavior.


I'd try with a connection limit of 1 first, that should simplify what
happens, and you should encounter certain problems immediately instead
of only after 25 'suspended' connections.

Happy hacking!


That sounds like a good idea.  Thanks!

[Prev in Thread]

Current Thread

[Next in Thread]

[libmicrohttpd] Problems with latency and getting "stuck", maurice barnum, 2014/04/05
- Re: [libmicrohttpd] Problems with latency and getting "stuck", Christian Grothoff, 2014/04/06
  - Re: [libmicrohttpd] Problems with latency and getting "stuck", maurice barnum <=

Prev by Date: Re: [libmicrohttpd] Problems with latency and getting "stuck"
Next by Date: [libmicrohttpd] syscalls name clash
Previous by thread: Re: [libmicrohttpd] Problems with latency and getting "stuck"
Next by thread: [libmicrohttpd] syscalls name clash
Index(es):
- Date
- Thread