spamass-milt-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Jamming up with mutex_lock


From: Joe Maimon
Subject: Re: Jamming up with mutex_lock
Date: Tue, 19 Jun 2007 11:50:59 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.11) Gecko/20050728 MultiZilla/1.7.0.1j



Andrew Daviel wrote:


I have been running a modified version of spamass-milter-0.3.1
(match_gecos, per-user rejection threshold). It worked fine in testing, but in production it jams up after a day or so. The milter continues to run, but sendmail cannot connect to it, logging
"error connecting to filter". Sometimes there a few messages
"Milter (spamassassin): to error state"
"milter_read(spamassassin): cmd read returned 0"

This means that the read call timed out.

earlier, though the milter continues to operate for a while - maybe a couple of hours.

The other threads continue to operate.

You then probably run into a ulimit condition.


When I look at the processes, I see two or more copies of spamass-milter
in sleep (S) state as well as the parent in sleep (Ss1) state.

Are you displaying all the threads?

If those are all the threads, then apparently the deadlock extends to the engine thread so that no new connections can be accepted.


If I connect to one of the processes with gdb and do a backtrace, I typically see something like
 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
 in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
 in _L_mutex_lock_29 () from /lib/tls/libc.so.6
 in strdup () from /lib/tls/libc.so.6
 in SpamAssassin::Connect (this=0x8bb01f8) at spamass-milter.cpp:1506
 in mlfi_header ... at spamass-milter.cpp:1148
from which I assume that two threads have got in a deadlocked state.
Sometimes I see "debug" instead of "strdup".

I have tried replacing localtime() and strerror(), which are not threadsafe on Linux, with localtime_r and strerror_r(), but
that does not help.

Elsewhere on the Web I see a comment that mutex lock may be caused by calling malloc or printf inside a signal handler. I don't think spamass-milter is a signal handler, though strdup and vsyslog would call malloc and printf, so it's a not-impossible explanation. I had earlier seen mutex_lock called from strlwr, but have now replaced the complex tolower() call with a much simpler 7-bit ASCII routine.


If you suspect the milter calls unsafe functions, surround them with mutex's.

Carefully.

The somewhat similar smf-clamd milter runs OK with no problem (similar in that it uses the same libraries and also passes mail to a daemon
for processing).

RHEL 4.3
sendmail-8.13.1-3.2.el4.i386
glibc-2.3.4-2.25.i686
kernel 2.6.9-34.0.1.ELsmp

Try running this on a recent Debian instead.


(I doubt that my changes are directly responsible, bacause I've been playing with them without affecting the lock-up. Trying the stock milter on the production machine is an issue because the users expect their
whitelists to work based on match_gecos - address@hidden
-> user "juser")

Perhaps you could show the patch?







reply via email to

[Prev in Thread] Current Thread [Next in Thread]