[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Milter hangs, revisited

From: Dan Nelson
Subject: Re: Milter hangs, revisited
Date: Wed, 13 Nov 2002 00:13:30 -0600
User-agent: Mutt/1.5.1i

In the last episode (Nov 12), Terry Kennedy said:
>   Pardon the long report, but there is a solution at the end which
> might be useful to other folks...
> > According to this, the spamass-milter sent the entire message to
> > spamc, closed the send pipe, and is waiting for the results.  After
> > 3 minutes sendmail gives up.  According to your ps output, there
> > isn't even a spamc running.  All read_pipe does is read from the
> > recv pipe connected to spamc, so if spamc isn't there the read()
> > should return 0.  Try changing the debug line on line 922 to read
> >
> >     debug(1, "::read_pipe(%d) enter", pipe_io[1][0]);
>   Line 984 in my 0.1.2 sources...?

Uuh, yeah.  I have no idea where that 922 came from.
> > , run it again, make a note of the file number in the log, then run
> > lsof -p <spamass-milter-pid>, and see what that filedescriptor is
> > connected to, if anything.
> Nov 12 18:28:43 test1 spamass-milter[8474]: ::read_pipe(8) enter
> (0:327) test1:/usr/local/src/spamass-milter-0.1.2# lsof | grep 8474
> spamass-m 8474   root  cwd   VDIR      3,0,3       1024 238112 
> /usr/local/src/spamass-milter-0.1.2
> spamass-m 8474   root  txt   VREG      3,0,0      21422    177 
> /shlib/
> spamass-m 8474   root  txt   VREG      3,0,3     449212 286875 
> /usr/local/bin/spamass-milter
> spamass-m 8474   root  txt   VREG      3,0,3     322505 420812 
> /usr/lib/
> spamass-m 8474   root  txt   VREG      3,0,0      87797    190 
> /shlib/
> spamass-m 8474   root  txt   VREG      3,0,0      56508    224 
> /shlib/
> spamass-m 8474   root  txt   VREG      3,0,0     982614    271 
> /shlib/
> spamass-m 8474   root    0u  VCHR      5,0,0    0t48097    867 /dev/ttyp0
> spamass-m 8474   root    1u  VCHR      5,0,0    0t48097    867 /dev/ttyp0
> spamass-m 8474   root    2u  VCHR      5,0,0    0t48097    867 /dev/ttyp0
> spamass-m 8474   root    3u  unix 0xe3a3ca00        0t0        ->0xe3b03b00
> spamass-m 8474   root    4u  unix 0xe43ae700        0t0        
> /var/run/spamass.sock
> spamass-m 8474   root    5u  unix 0xe4469900        0t0        
> /var/run/spamass.sock
> spamass-m 8474   root    8r  unix 0xf0380300        0t0        ->0xe438cc00
> I think this means that lsof can't find a useful structure to fish a
> file-name out of for that descriptor.

It's actually printing all it knows, since pipes don't have filenames,
just endpoints.  That output means that 1) fd 8 exists, and 2) it's
actually attached to something (i.e. the other end hasn't exited).  You
could run "lsof | grep 0xe438cc00" to find the process at the other
end, but it's just going to be spamc.
> > I find it interesting that there are no spamd logs.  In my maillog
> > I get lines like this (my syslogd is set to log and
> > higher to maillog).
> >
> > Nov 12 15:45:59 <2.6> dan spamd[942]: connection from localhost [] 
> > at port 52771
> > Nov 12 15:45:59 <2.6> dan spamd[7877]: processing message <address@hidden> 
> > for root:26, expecting 4527 bytes.
> > Nov 12 15:46:04 <2.6> dan spamd[7877]: clean message (-1.4/5.0) for root:26 
> > in 5.4 seconds, 4527 bytes.
> I get that sort of logging if I do "spamc < test", but not with the
> milter.
> > Maybe spamc is not getting called at all?  If you have process
> > accounting enabled, run "lastcomm | less" and verify that spamc got
> > run.
>   Yup, no invocation of spamc.

Ok, maybe the other end isn't attached to spamc :)  Do that "lsof |
grep" bit and see what process is on the other end of the pipe.
>   I started sprinkling debug() logging throughout the code
> immediately following the fork() call and before the execvp() call. I
> lost logging after the closeall() call. The milter apparently never
> came back from a syscall to close(5) which is the socket. I commented
> out the closeall() and managed to successfully get mail through the
> milter and properly tagged.

close() might hang if there is unread data in a pipe and closing the fd
would destroy the pipe.  That should not be a problem here, though,
since at least in the lsof output you pasted, fd 5 is a milter socket,
so the parent process still has a reference to it.

>   I'm a bit nervous running with the closeall() commented out, though.
>   Any ideas on why this is happening and a better workaround?

The closeall is there to ensure that the child process that will become
spamc after the exec() does not have references to any other pipes
hanging about.  Pipes with multiple processes at each endpoint usually
end up hanging processes on read or write eventually, as the kernel has
no idea that (for example) spamc will never write to that pipe attached
to fd 37.  An alternative is to set the close-on-exec flag via
fcntl(fd, F_SETFD, FD_CLOEXEC), but you can't set that for the internal
milter unix sockets connected to /var/run/spamass.sock, because you
don't know their fd numbers.  That's why I dup2 the pipe descriptors
onto stdin/out/err and close the rest. Not running closeall should not
hurt as long as your spamc processes exit after a reasonable time.  If
one does sit around longer than necessary, though, other milter threads
could start to hang until it exits.

Actually, try this.  Replace the close(fd++) call in closeall() with
fcntl(fd++, F_SETFD, FD_CLOEXEC), which should end up doing the same
thing in a different way.  Hopefully the process won't just end up
hanging in execvp() :)

Makes me want to build a non-threaded milter API, it does.

        Dan Nelson

reply via email to

[Prev in Thread] Current Thread [Next in Thread]