Re: [PATCH] sort: fix two race conditions reported by valgrind.

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] sort: fix two race conditions reported by valgrind.

From:	Pádraig Brady
Subject:	Re: [PATCH] sort: fix two race conditions reported by valgrind.
Date:	Fri, 11 Jul 2014 19:07:52 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 01/14/2014 02:39 AM, Pádraig Brady wrote:
> On 01/14/2014 02:08 AM, Pádraig Brady wrote:
>> On 01/14/2014 12:49 AM, Shayan Pooya wrote:
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 7:31 PM, Pádraig Brady <address@hidden 
>>> <mailto:address@hidden>> wrote:
>>>
>>>     On 01/14/2014 12:22 AM, Shayan Pooya wrote:
>>>     > Valgrind reported two race conditions when I ran sort on a small file.
>>>     > Both of them seem to be legitimate.
>>>     > ---
>>>     >  src/sort.c | 4 ++--
>>>     >  1 file changed, 2 insertions(+), 2 deletions(-)
>>>     >
>>>     > diff --git a/src/sort.c b/src/sort.c
>>>     > index 3380be6..e6658e6 100644
>>>     > --- a/src/sort.c
>>>     > +++ b/src/sort.c
>>>     > @@ -3354,8 +3354,8 @@ queue_insert (struct merge_node_queue *queue, 
>>> struct merge_node *node)
>>>     >    pthread_mutex_lock (&queue->mutex);
>>>     >    heap_insert (queue->priority_queue, node);
>>>     >    node->queued = true;
>>>     > -  pthread_mutex_unlock (&queue->mutex);
>>>     >    pthread_cond_signal (&queue->cond);
>>>     > +  pthread_mutex_unlock (&queue->mutex);
>>
>> valgrind is not flagging the above for me?
> 
> Though according to http://valgrind.org/docs/manual/drd-manual.html
> valgrind should be flagging this as it mentions this is a usually a mistake 
> because...
> 
> "Sending a signal to a condition variable while no lock is held on the mutex 
> associated with the condition variable.
> This is a common programming error which can cause subtle race conditions and 
> unpredictable behavior.
> There exist some uncommon synchronization patterns however where it is safe 
> to send a signal without holding a lock on the associated mutex"
> 
> Ah, if I bump up the size of the file to `seq 1000000` I see the error:
> "Probably a race condition: condition variable 0xffeffffb0 has been signaled 
> but
> the associated mutex 0xffeffff88 is not locked by the signalling thread."
> 
> With your full patch applied but with the larger input I'm now getting lots 
> of:
> 
> ==15443== Thread 3:
> ==15443== Conflicting store by thread 3 at 0x0541a7d8 size 8
> ==15443==    at 0x4084EB: sortlines (sort.c:3432)
> ==15443==    by 0x408907: sortlines_thread (sort.c:3571)
> ==15443==    by 0x4C2BDDB: ??? (in 
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443==    by 0x4E44F32: start_thread (in /usr/lib64/libpthread-2.18.so)
> ==15443==    by 0x514EEAC: clone (in /usr/lib64/libc-2.18.so)
> ==15443== Address 0x541a7d8 is at offset 280 from 0x541a6c0. Allocation 
> context:
> ==15443==    at 0x4C2938D: malloc (in 
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443==    by 0x40F9F8: xmalloc (xmalloc.c:41)
> ==15443==    by 0x40422F: main (sort.c:3223)
> ==15443== Other segment start (thread 2)
> ==15443==    at 0x4C2E843: pthread_mutex_lock (in 
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443==    by 0x408323: sortlines (sort.c:3313)
> ==15443==    by 0x4088BF: sortlines (sort.c:3618)
> ==15443==    by 0x408907: sortlines_thread (sort.c:3571)
> ==15443==    by 0x4C2BDDB: ??? (in 
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443==    by 0x4E44F32: start_thread (in /usr/lib64/libpthread-2.18.so)
> ==15443==    by 0x514EEAC: clone (in /usr/lib64/libc-2.18.so)
> ==15443== Other segment end (thread 2)
> ==15443==    at 0x4E4E600: __errno_location (in /usr/lib64/libpthread-2.18.so)
> ==15443==    by 0x4117B2: memcoll0 (memcoll.c:39)
> ==15443==    by 0x40FE08: xmemcoll0 (xmemcoll.c:71)
> ==15443==    by 0x407CBB: compare (sort.c:2731)
> ==15443==    by 0x4083EC: sortlines (sort.c:3418)
> ==15443==    by 0x4088BF: sortlines (sort.c:3618)
> ==15443==    by 0x408907: sortlines_thread (sort.c:3571)
> ==15443==    by 0x4C2BDDB: ??? (in 
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443==    by 0x4E44F32: start_thread (in /usr/lib64/libpthread-2.18.so)
> ==15443==    by 0x514EEAC: clone (in /usr/lib64/libc-2.18.so)
> 
> Hopefully that's a false positive.
> I'll do some more investigation tomorrow.

Well it's not tomorrow, but at least I didn't forget :/
The above warning is due to valgrind warning about two threads
accessing the large shared malloced buffer.
Arbitration to that is done at a higher level unknown to valgrind,
so I think the above is a false positive.

Now in addition to the warnings you pointed out,
I noticed another more problematic intermittent issue that
could very well explain deadlock issues seen here:
http://lists.gnu.org/archive/html/coreutils/2013-03/msg00048.html

I've attached your patch and another follow up that
hopefully addresses that issue.

thanks,
Pádraig.

sort-mutex-destroy.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH] sort: fix two race conditions reported by valgrind., Pádraig Brady <=

Prev by Date: Re: [PATCH] gettext: update macros to version 0.19
Next by Date: Re: [PATCH] gettext: update macros to version 0.19
Previous by thread: [PATCH] build: update gnulib submodule to latest
Next by thread: [PATCH] maint.mk: give projects more flexibilty in set_prog_name arguments
Index(es):
- Date
- Thread