[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] sort: fix two race conditions reported by valgrind.
From: |
Pádraig Brady |
Subject: |
Re: [PATCH] sort: fix two race conditions reported by valgrind. |
Date: |
Fri, 11 Jul 2014 19:07:52 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 01/14/2014 02:39 AM, Pádraig Brady wrote:
> On 01/14/2014 02:08 AM, Pádraig Brady wrote:
>> On 01/14/2014 12:49 AM, Shayan Pooya wrote:
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 7:31 PM, Pádraig Brady <address@hidden
>>> <mailto:address@hidden>> wrote:
>>>
>>> On 01/14/2014 12:22 AM, Shayan Pooya wrote:
>>> > Valgrind reported two race conditions when I ran sort on a small file.
>>> > Both of them seem to be legitimate.
>>> > ---
>>> > src/sort.c | 4 ++--
>>> > 1 file changed, 2 insertions(+), 2 deletions(-)
>>> >
>>> > diff --git a/src/sort.c b/src/sort.c
>>> > index 3380be6..e6658e6 100644
>>> > --- a/src/sort.c
>>> > +++ b/src/sort.c
>>> > @@ -3354,8 +3354,8 @@ queue_insert (struct merge_node_queue *queue,
>>> struct merge_node *node)
>>> > pthread_mutex_lock (&queue->mutex);
>>> > heap_insert (queue->priority_queue, node);
>>> > node->queued = true;
>>> > - pthread_mutex_unlock (&queue->mutex);
>>> > pthread_cond_signal (&queue->cond);
>>> > + pthread_mutex_unlock (&queue->mutex);
>>
>> valgrind is not flagging the above for me?
>
> Though according to http://valgrind.org/docs/manual/drd-manual.html
> valgrind should be flagging this as it mentions this is a usually a mistake
> because...
>
> "Sending a signal to a condition variable while no lock is held on the mutex
> associated with the condition variable.
> This is a common programming error which can cause subtle race conditions and
> unpredictable behavior.
> There exist some uncommon synchronization patterns however where it is safe
> to send a signal without holding a lock on the associated mutex"
>
> Ah, if I bump up the size of the file to `seq 1000000` I see the error:
> "Probably a race condition: condition variable 0xffeffffb0 has been signaled
> but
> the associated mutex 0xffeffff88 is not locked by the signalling thread."
>
> With your full patch applied but with the larger input I'm now getting lots
> of:
>
> ==15443== Thread 3:
> ==15443== Conflicting store by thread 3 at 0x0541a7d8 size 8
> ==15443== at 0x4084EB: sortlines (sort.c:3432)
> ==15443== by 0x408907: sortlines_thread (sort.c:3571)
> ==15443== by 0x4C2BDDB: ??? (in
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443== by 0x4E44F32: start_thread (in /usr/lib64/libpthread-2.18.so)
> ==15443== by 0x514EEAC: clone (in /usr/lib64/libc-2.18.so)
> ==15443== Address 0x541a7d8 is at offset 280 from 0x541a6c0. Allocation
> context:
> ==15443== at 0x4C2938D: malloc (in
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443== by 0x40F9F8: xmalloc (xmalloc.c:41)
> ==15443== by 0x40422F: main (sort.c:3223)
> ==15443== Other segment start (thread 2)
> ==15443== at 0x4C2E843: pthread_mutex_lock (in
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443== by 0x408323: sortlines (sort.c:3313)
> ==15443== by 0x4088BF: sortlines (sort.c:3618)
> ==15443== by 0x408907: sortlines_thread (sort.c:3571)
> ==15443== by 0x4C2BDDB: ??? (in
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443== by 0x4E44F32: start_thread (in /usr/lib64/libpthread-2.18.so)
> ==15443== by 0x514EEAC: clone (in /usr/lib64/libc-2.18.so)
> ==15443== Other segment end (thread 2)
> ==15443== at 0x4E4E600: __errno_location (in /usr/lib64/libpthread-2.18.so)
> ==15443== by 0x4117B2: memcoll0 (memcoll.c:39)
> ==15443== by 0x40FE08: xmemcoll0 (xmemcoll.c:71)
> ==15443== by 0x407CBB: compare (sort.c:2731)
> ==15443== by 0x4083EC: sortlines (sort.c:3418)
> ==15443== by 0x4088BF: sortlines (sort.c:3618)
> ==15443== by 0x408907: sortlines_thread (sort.c:3571)
> ==15443== by 0x4C2BDDB: ??? (in
> /usr/lib64/valgrind/vgpreload_drd-amd64-linux.so)
> ==15443== by 0x4E44F32: start_thread (in /usr/lib64/libpthread-2.18.so)
> ==15443== by 0x514EEAC: clone (in /usr/lib64/libc-2.18.so)
>
> Hopefully that's a false positive.
> I'll do some more investigation tomorrow.
Well it's not tomorrow, but at least I didn't forget :/
The above warning is due to valgrind warning about two threads
accessing the large shared malloced buffer.
Arbitration to that is done at a higher level unknown to valgrind,
so I think the above is a false positive.
Now in addition to the warnings you pointed out,
I noticed another more problematic intermittent issue that
could very well explain deadlock issues seen here:
http://lists.gnu.org/archive/html/coreutils/2013-03/msg00048.html
I've attached your patch and another follow up that
hopefully addresses that issue.
thanks,
Pádraig.
sort-mutex-destroy.patch
Description: Text Data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH] sort: fix two race conditions reported by valgrind.,
Pádraig Brady <=