[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Jamming up with mutex_lock
From: |
Andrew Daviel |
Subject: |
Jamming up with mutex_lock |
Date: |
Tue, 19 Jun 2007 02:10:42 -0700 (PDT) |
I have been running a modified version of spamass-milter-0.3.1
(match_gecos, per-user rejection threshold). It worked fine in testing,
but in production it jams up after a day or so. The milter continues to
run, but sendmail cannot connect to it, logging
"error connecting to filter". Sometimes there a few messages
"Milter (spamassassin): to error state"
"milter_read(spamassassin): cmd read returned 0"
earlier, though the milter continues to operate for a while - maybe a
couple of hours.
When I look at the processes, I see two or more copies of spamass-milter
in sleep (S) state as well as the parent in sleep (Ss1) state.
If I connect to one of the processes with gdb and do a backtrace, I
typically see something like
in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
in _L_mutex_lock_29 () from /lib/tls/libc.so.6
in strdup () from /lib/tls/libc.so.6
in SpamAssassin::Connect (this=0x8bb01f8) at spamass-milter.cpp:1506
in mlfi_header ... at spamass-milter.cpp:1148
from which I assume that two threads have got in a deadlocked state.
Sometimes I see "debug" instead of "strdup".
I have tried replacing localtime() and strerror(), which are not
threadsafe on Linux, with localtime_r and strerror_r(), but
that does not help.
Elsewhere on the Web I see a comment that mutex lock may be caused by
calling malloc or printf inside a signal handler. I don't think
spamass-milter is a signal handler, though strdup and vsyslog would
call malloc and printf, so it's a not-impossible explanation.
I had earlier seen mutex_lock called from strlwr, but have now replaced
the complex tolower() call with a much simpler 7-bit ASCII routine.
The somewhat similar smf-clamd milter runs OK with no problem (similar in
that it uses the same libraries and also passes mail to a daemon
for processing).
RHEL 4.3
sendmail-8.13.1-3.2.el4.i386
glibc-2.3.4-2.25.i686
kernel 2.6.9-34.0.1.ELsmp
(I doubt that my changes are directly responsible, bacause I've been
playing with them without affecting the lock-up. Trying the stock milter
on the production machine is an issue because the users expect their
whitelists to work based on match_gecos - address@hidden
-> user "juser")
--
Andrew Daviel, TRIUMF, Canada
Tel. +1 (604) 222-7376 (Pacific Time)
Network Security Manager
- Jamming up with mutex_lock,
Andrew Daviel <=