[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: possible race condition in linuxthreads-0.9 (i386, LDT)

From: David Madore
Subject: Re: possible race condition in linuxthreads-0.9 (i386, LDT)
Date: Tue, 17 Jul 2001 01:52:40 +0200
User-agent: Mutt/1.2.5i

By now we have seen quite a number of different behaviors.  Let me try
to put some order in this madness.  Behaviors 4 and 5 below were not
reported before, and the backtrace I have for __clone() might shed
some light on the problem (though I think it makes things even more
mysterious).  See below for details.

* Behavior 1: (observed by Ed Connell, see <URL:
http://sources.redhat.com/ml/bug-glibc/2001-06/msg00079.html >).
Program hangs in __pthread_create_2_1() (within __sigsuspend())
because __clone() somehow fails to create manager thread: it is
immediately defunct.  Consequently, initial thread waits forever.
Only observed on SMP.  Happens when program is exec'ed from a shell

* Behavior 2: noticed by alvin: call to suspend(self) within
__pthread_create_2_1() should probably check condition in case it was
spuriously woken up.  This is practically the "opposite" problem
(spurious wakeup instead of hanging forever).  Suggested workaround:
loop on sched_yield() until return p_retcode has been filled.  But
this kinda defeats the purpose of suspend() - rather, the latter
should be fixed if possible: signals are not supposed to get fixed.
(Is this Ulrich's objection?)

* Behavior 3: (observed by myself, see <URL:
http://sources.redhat.com/ml/bug-glibc/2001-07/msg00055.html >).
Program segfaults in pthread_create(), even earlier than the creation
of the manager thread, simply in the thread_self() call, because the
%gs register has not been initialized to an LDT selector.  Core dump
is available, and clearly indicates the wrong value for %gs.  Only
observed on SMP.  Happens when program is exec'ed from a shell script.
Similar to behavior 1 above, but earlier in __pthread_create_2_1().
Compiling libc without LDT support works around the problem.

* Behavior 4: observed by myself, not reported until now.  Program is
the same as the one causing behavior 3 (when run without a launcher
script, or on a UP machine, behavior 4 replaces behavior 3 with
probability around 1/2, and in other cases program runs correctly -
i.e. does nothing).  Program (i.e. initial thread) hangs forever in
__sigsuspend() after main() has exited.  This is probably from the
suspend() call in pthread_onexit_process(), although gdb doesn't show
the latter.  Other threads (i.e. manager thread and one child thread)
are still running.  It seems that the exit request was lost on its way
from the initial thread to the manager thread...

* Behavior 5: observed by myself, not reported until now.  Program is
the chess program "crafty" (tested with version 17.14 and 18.9; to
produce, compile for SMP, type "smpmt=2" to set up threading, and type
a move which is not in the opening book, e.g. "a2a4").  Behavior is
the same as number 1 above (i.e. manager thread dies immediately after
creation), except that I've observed it on UP, and it happens
completely reliably for me, not needing a shell launcher script.  When
I LD_PRELOAD the libSegFault.so, I get the following output:

*** Segmentation fault
Register dump:

 EAX: 00000402   EBX: 400446d8   ECX: 00000000   EDX: fffffc00
 ESI: 00000020   EDI: 00000002   EBP: 083ec99c   ESP: 083ec964

 EIP: 40036ef5   EFLAGS: 00010202

 CS: 0023   DS: 002b   ES: 002b   FS: 0000   GS: 000f   SS: 002b

 Trap: 0000000e   Error: 00000006   OldMask: fffbfaef
 ESP/signal: 083ec964   CR2: fffffc50

 FPUCW: ffff037f   FPUSW: ffff4020   TAG: ffffffff
 IPOFF: 080872e9   CSSEL: 0023   DATAOFF: 080d0acc   DATASEL: 002b

 ST(0) 0000 99b60e0000000000   ST(1) 0000 f32d1edf3a6b56eb
 ST(2) 0000 ff0cd2e120c594a9   ST(3) 0000 f3831f0000000000
 ST(4) 0000 83d6000000000000   ST(5) 0000 c000000000000000
 ST(6) 0000 0000000000000000   ST(7) 0000 0000000000000000


Now this doesn't tell much except for the backtrace.  I don't know why
addresses are referred to pthread_detach, but: pthread_detach+0x395 is
actually __pthread_manager+0x231, and it is the call to
pthread_handle_create() in __pthread_manager().  As for
pthread_detach+0x8c5, it is in fact pthread_handle_create()+0xd2 (all
these addresses are relative to RedHat's version of the libpthread
which ships with RedHat-7.1, viz. glibc-2.2.2-10).

Here's the assembler, with the corresponding source lines (as far as I
can tell: the library doesn't have debug info, so this is hand work):

* This is the "__pthread_handles_num++" on manager.c:498
    6ece:       8b 93 04 02 00 00       mov    0x204(%ebx),%edx
    6ed4:       8b 02                   mov    (%edx),%eax
    6ed6:       40                      inc    %eax
    6ed7:       89 02                   mov    %eax,(%edx)
* This is the "pthread_threads_counter += PTHREAD_THREADS_MAX" on manager.c:500
    6ed9:       8b 83 28 03 00 00       mov    0x328(%ebx),%eax
(omitting one instruction here at 6edf because is optimized out of sequence)
    6ee2:       05 00 04 00 00          add    $0x400,%eax
    6ee7:       89 83 28 03 00 00       mov    %eax,0x328(%ebx)
* This is the "new_thread_id = sseg + pthread_threads_counter" on manager.c:501
(note that %eax contains pthread_threads_counter)
    6eed:       8b 7d e0                mov    0xffffffe0(%ebp),%edi
    6ef0:       01 f8                   add    %edi,%eax
    6ef2:       89 45 dc                mov    %eax,0xffffffdc(%ebp)
* This is the "new_thread->p_tid = new_thread_id" on manager.c:504
(the following instruction was moved from above for clarity)
    6edf:       8b 55 e4                mov    0xffffffe4(%ebp),%edx
+ Segfault is on following instruction ->
    6ef5:       89 42 50                mov    %eax,0x50(%edx)
* This is "new_thread->p_lock = &(__pthread_handles[sseg].h_lock)" on 
    6ef8:       89 f0                   mov    %esi,%eax
    6efa:       8b b3 f4 01 00 00       mov    0x1f4(%ebx),%esi
    6f00:       c6 82 80 00 00 00 00    movb   $0x0,0x80(%edx)
    6f07:       01 f0                   add    %esi,%eax
    6f09:       89 42 5c                mov    %eax,0x5c(%edx)

Looking at the value of %edx, viz. 0xfffffc00, given in the dump,
which is supposed to be the new_thread pointer of the C source, it is
not surprising there is a problem.  It would seem that somehow
pthread_allocate_stack() did not put a valid value in new_thread.

This demands further investigation, but I'll stop here for now.  Maybe
all these data will bring a bright idea to somebody's mind.

The number one question for now: are these behaviors all
manifestations of one bug, or are these so many different bugs?

     David A. Madore
     http://www.eleves.ens.fr:8080/home/madore/ )

reply via email to

[Prev in Thread] Current Thread [Next in Thread]