|Subject:||Problem due to hanging threads|
|Date:||Sat, 27 Jul 2002 12:09:40 +0530|
Our problem of Servers hanging has revisited. Although this seems to be a little different and less frequent than the previous hanging problem that we moved to gcc 3.1 and GNUstep 1.1.0 ( based on the suggestion given in this discussion forum) to avoid, it has lots of the same markings at the lowest level. We are no longer getting this problem when trying to do class lookups. As Adam Fedor noted in response to our previous problem, there is no longer a lock used when doing class lookups. However, we do still end up using the low level __objc_mutex_lock() call for other locks. The thing that is most disturbing from the stack trace is that most of our ObjectiveC threads are locked at this point, several of them on different locks, and almost all of them on locks that we do not have control, so we don't think that this problem has anything to do with our code. The other wierd thing is that all of these threads that are hung in this fashion are actually hung on the system call, umask().
Whereas our previous hung traces all ended in:
#0 0xfea9b3e0 in _lwp_sema_wait ()
#1 0xfecc9820 in _park ()
#2 0xfecc94f8 in _swtch ()
#3 0xfeccade8 in _mutex_adaptive_lock ()
#4 0xfeccb7e8 in pthread_mutex_lock ()
#5 0xfee60504 in __objc_mutex_lock ()
#6 0xfee609dc in objc_mutex_lock ()
all these new traces end in:
#0 0xfea1b6d4 in umask () from /usr/lib/libc.so.1
#1 0xfec4977c in _park () from /usr/lib/libthread.so.1
#2 0xfec49454 in _swtch () from /usr/lib/libthread.so.1
#3 0xfec4ad44 in _mutex_adaptive_lock () from /usr/lib/libthread.so.1
#4 0xfec4b744 in pthread_mutex_lock () from /usr/lib/libthread.so.1
#5 0x1f08c in __objc_mutex_lock (mutex=0x346e8)
#6 0x1e9a4 in objc_mutex_lock (mutex=0x346e8)
Exactly the same trace down to the last call. However, since we looked at this core on a different machine than the machine where it was generated, it is possible that that last call is bogus, and should actually be _lwp_sema_wait(), due to differences in the libc images on the two machines. That would seem to make more sense, given our past experience. It doesn't seem to make sense that _park() would call umask().
I am attaching the complete stack trace that we got from the core file. The Java threads don't give us much information about their state, but they appear to be OK. The threads that appear to be completely hung are: 4, 8, 25, 26, 27, 34, and 35. All are in the end spot noted above. Threads 4, 25, and 26 are hung on a single, internal OBJC lock from the sel_get_any_uid()call (4,26) or the objc_thread_detach call (25). Threads 8, 34, and 35 are hung on a recursive lock for getting to [NSUserDefaults standardUserDefaults]. Thread 27 is hung attempting to lock the logging lock.
Any idea of how to resolve this problem.
Description: Text document
|[Prev in Thread]||Current Thread||[Next in Thread]|