[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gpsd-dev] [PATCH] Use pselect() to reduce wakeups and avoid race co

From: Dave Platt
Subject: Re: [gpsd-dev] [PATCH] Use pselect() to reduce wakeups and avoid race conditions
Date: Fri, 13 Jan 2012 14:41:17 -0800
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20111110 Iceowl/1.0b1 Icedove/3.0.11

On 01/13/2012 02:24 PM, Mike Frysinger wrote:
> On Friday 13 January 2012 16:06:52 Eckhart Wörner wrote:
>> attached patch replaces select() by pselect() in the gpsd main loop. This
>> should make gpsd behaving nice when idle and avoids a race condition when
>> multiple signals arrive shortly one after another.
>> if COMPAT_SELECT is defined, the code should behave exactly as it did
>> before, for systems where pselect() isn't available. COMPAT_SELECT needs a
>> configuration variable.
> the idle issue doesn't require pselect() to address.  you can do that already 
> by replacing "tv" with NULL in the select() call.  but the current main logic 
> seems to imply that this behavior is somewhat desirable (at least when 
> debugging).  perhaps the tv argument should be modified that way ?  if people 
> are running in debug mode, then poll every second, otherwise wait in 
> definitely.  note: i'm certainly not a gpsd expert, so my quick analysis 
> might 
> be off ...
> as for the race condition, a little explanation might be in order for people 
> not versed on esoteric signal behavior.  signal() has the property that when 
> the user specified handler is called, the handler is automatically reset to 
> SIG_DFL.  thus there is a race condition from the time someone does `killall -
> HUP gpsd` to the time where gpsd's main loop notices and actually processes 
> the request.  this window cannot be fully closed even if the user signal 
> handler itself calls signal() to setup the handler for future signals.  if, 
> in 
> that window of time where the handler has been reset to SIG_DFL, the user 
> resends with `killall -HUP gpsd`, then the kernel will go ahead and terminate 
> gpsd.  there is no race issue between different signals (e.g. `killall -HUP 
> gpsd` && `killall -INT gpsd`) since each signal has its own state.
> but fixing this issue doesn't require pselect() either.  simply use 
> sigaction() 
> and don't use the SA_RESETHAND flag when registering the handlers.  this 
> should 
> be portable enough for the code to not worry about compat defines.

There's another tricky select/pselect race condition issue... near and dear
to my heart, as it caused me no end of heartburn recently.

The problem is this:  if you want to select() on a bunch of descriptors,
and also have some signal handlers installed, and you want to be sure
that you'll wake up from the select() call *either* if one of those signals
fires *or* if I/O is possible according to the select()... well, you can't
do this reliably.

The reason:  if you want to be awakened from the select() by a signal, you
have to make sure that this signal cannot fire prior to the select() starting.
The only way you can do this is to mask off the signal at other times, and
un-mask it just before starting the select().

However, the moment you call one of the functions to change your process's
signal mask (so that the previously-masked-off signals can fire), you
open the window... and if one of these signals is already pending, it will
fire *immediately*... your signal handler will catch it, do whatever work
needs to be done, return, and your code will fall into the signal()... and
go to sleep, possibly for a very long amount of time.  The signal won't
result in the select() call exiting immediately (or quickly) because it has
already fired and been handled.

So, if you've got a simple signal handler which simply records the fact that
something needs to be done (e.g. sets a flag), and you expect to handle
the flag upon coming out of the select() call... then you may end up
waiting a *long* time.  Possibly your process will hang (e.g. if your
select doesn't have a timeout).

pselect() was created to close this timing window... it unmasks signals
and starts the select() in an atomic fashion.  If it unmasks a signal
which is already pending, the signal is allowed to fire and then
pselect() exits immediately - so, no race condition.

Hence, in many places where you want to wake up on a signal *or* on
an "I/O is ready", and don't want to use an indefinite timeout,
using pselect() is the right way to do this.

[This particular problem bit me badly when I was debugging software
 which was running on an old embedded Linux kernel, which did not
 implement pselect() in the kernel.  Instead, glibc has a "convenience
 implementation" of pselect() - which simply changes the signal mask
 and calls select().  In other words, the glibc code re-introduces
 the very timing race condition that pselect() was intended to cure...
 and it resulted in our system hanging at boot in an unpredictable

Although I haven't looked at that particular bit of gpsd code, I'd
bet that the select() call has a timeout *precisely* to work around
this select() race condition.  Changing to pselect() would be a fix
rather than a workaround... and by eliminating the need for the
timeout it could reduce overhead slightly by eliminating unneeded

reply via email to

[Prev in Thread] Current Thread [Next in Thread]