spamass-milt-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Status of CVS for 0.3.0 branch?


From: Dan Nelson
Subject: Re: Status of CVS for 0.3.0 branch?
Date: Thu, 9 Feb 2006 14:22:03 -0600
User-agent: Mutt/1.5.11

In the last episode (Feb 09), Chris Crowley said:
> My question, "Is CVS for the 0.3.0 branch improved from the distro,
> and stable for production use?" If not, I'll drill down into the
> problems with the 0.3.0 tar file that I've got, otherwise, I'll
> install the CVS version and see if the problems persist.

Only minor changes have been made since 0.3.0; none should affect
stability one way or the other.  My milters never seem to crash or
hang, but I only process 1 message every 5-10 seconds.  Each milter
thread is independant, so (barring OS bugs) hangs/crashes due to race
conditions should not be possible.

> ...details...
> I've been running 0.2.0, and plan to upgrade soon.  I've build 0.3.0,
> and have noticed in some high load testing that it fails differently
> than the 0.2.0 spamass-milter.  By failure I mean that I see error
> messages in the log. For example:
> <log>
> Milter (spamassassin): local socket name /var/run/sendmail/spamass.sock unsafe
> sendmail[10000]: ###ID: Milter (spamassassin): to error state
> spamass-milter[13360]: SpamAssassin, mi_rd_cmd: read returned -1: Connection 
> reset by peer
> spamass-milter[19980]: SpamAssassin: thread_create() failed: 12, try again
> </log>
> 
> and a strace on the process shows that it is "hung":
> <strace>
> strace -p 13360
> Process 13360 attached - interrupt to quit
> futex(0xc9e20c, FUTEX_WAIT, 2, NULL <unfinished ...>
> </strace>

If you can get a stack trace out of the process (gdb it and run "thread
apply all bt), that would help narrow down what's hanging.  Also try
upgrading to a less-buggy glibc, or set the environment variable
LD_ASSUME_KERNEL=2.4.1.  Any time I see a process hang on a futex,
setting that has fixed it (it disables futexes entirely).

> From the logs, and a quick non-scientific assessment, I don't think
> that 0.3.0 is failing any less frequently that 0.2.0 was.  It's just
> that the 0.3.0 process actually persists after it fails, so my
> restart script (which looks if the socket exists) doesn't work to
> repair things.
> 
> Thanks for any insight you can provide.  Of course, I'm able to
> provide more details if they would be beneficial.

-- 
        Dan Nelson
        address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]