[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Milter hangs, revisited
Re: Milter hangs, revisited
Wed, 13 Nov 2002 00:13:30 -0600
In the last episode (Nov 12), Terry Kennedy said:
> Pardon the long report, but there is a solution at the end which
> might be useful to other folks...
> > According to this, the spamass-milter sent the entire message to
> > spamc, closed the send pipe, and is waiting for the results. After
> > 3 minutes sendmail gives up. According to your ps output, there
> > isn't even a spamc running. All read_pipe does is read from the
> > recv pipe connected to spamc, so if spamc isn't there the read()
> > should return 0. Try changing the debug line on line 922 to read
> > debug(1, "::read_pipe(%d) enter", pipe_io);
> Line 984 in my 0.1.2 sources...?
Uuh, yeah. I have no idea where that 922 came from.
> > , run it again, make a note of the file number in the log, then run
> > lsof -p <spamass-milter-pid>, and see what that filedescriptor is
> > connected to, if anything.
> Nov 12 18:28:43 test1 spamass-milter: ::read_pipe(8) enter
> (0:327) test1:/usr/local/src/spamass-milter-0.1.2# lsof | grep 8474
> spamass-m 8474 root cwd VDIR 3,0,3 1024 238112
> spamass-m 8474 root txt VREG 3,0,0 21422 177
> spamass-m 8474 root txt VREG 3,0,3 449212 286875
> spamass-m 8474 root txt VREG 3,0,3 322505 420812
> spamass-m 8474 root txt VREG 3,0,0 87797 190
> spamass-m 8474 root txt VREG 3,0,0 56508 224
> spamass-m 8474 root txt VREG 3,0,0 982614 271
> spamass-m 8474 root 0u VCHR 5,0,0 0t48097 867 /dev/ttyp0
> spamass-m 8474 root 1u VCHR 5,0,0 0t48097 867 /dev/ttyp0
> spamass-m 8474 root 2u VCHR 5,0,0 0t48097 867 /dev/ttyp0
> spamass-m 8474 root 3u unix 0xe3a3ca00 0t0 ->0xe3b03b00
> spamass-m 8474 root 4u unix 0xe43ae700 0t0
> spamass-m 8474 root 5u unix 0xe4469900 0t0
> spamass-m 8474 root 8r unix 0xf0380300 0t0 ->0xe438cc00
> I think this means that lsof can't find a useful structure to fish a
> file-name out of for that descriptor.
It's actually printing all it knows, since pipes don't have filenames,
just endpoints. That output means that 1) fd 8 exists, and 2) it's
actually attached to something (i.e. the other end hasn't exited). You
could run "lsof | grep 0xe438cc00" to find the process at the other
end, but it's just going to be spamc.
> > I find it interesting that there are no spamd logs. In my maillog
> > I get lines like this (my syslogd is set to log mail.info and
> > higher to maillog).
> > Nov 12 15:45:59 <2.6> dan spamd: connection from localhost [127.0.0.1]
> > at port 52771
> > Nov 12 15:45:59 <2.6> dan spamd: processing message <address@hidden>
> > for root:26, expecting 4527 bytes.
> > Nov 12 15:46:04 <2.6> dan spamd: clean message (-1.4/5.0) for root:26
> > in 5.4 seconds, 4527 bytes.
> I get that sort of logging if I do "spamc < test", but not with the
> > Maybe spamc is not getting called at all? If you have process
> > accounting enabled, run "lastcomm | less" and verify that spamc got
> > run.
> Yup, no invocation of spamc.
Ok, maybe the other end isn't attached to spamc :) Do that "lsof |
grep" bit and see what process is on the other end of the pipe.
> I started sprinkling debug() logging throughout the code
> immediately following the fork() call and before the execvp() call. I
> lost logging after the closeall() call. The milter apparently never
> came back from a syscall to close(5) which is the socket. I commented
> out the closeall() and managed to successfully get mail through the
> milter and properly tagged.
close() might hang if there is unread data in a pipe and closing the fd
would destroy the pipe. That should not be a problem here, though,
since at least in the lsof output you pasted, fd 5 is a milter socket,
so the parent process still has a reference to it.
> I'm a bit nervous running with the closeall() commented out, though.
> Any ideas on why this is happening and a better workaround?
The closeall is there to ensure that the child process that will become
spamc after the exec() does not have references to any other pipes
hanging about. Pipes with multiple processes at each endpoint usually
end up hanging processes on read or write eventually, as the kernel has
no idea that (for example) spamc will never write to that pipe attached
to fd 37. An alternative is to set the close-on-exec flag via
fcntl(fd, F_SETFD, FD_CLOEXEC), but you can't set that for the internal
milter unix sockets connected to /var/run/spamass.sock, because you
don't know their fd numbers. That's why I dup2 the pipe descriptors
onto stdin/out/err and close the rest. Not running closeall should not
hurt as long as your spamc processes exit after a reasonable time. If
one does sit around longer than necessary, though, other milter threads
could start to hang until it exits.
Actually, try this. Replace the close(fd++) call in closeall() with
fcntl(fd++, F_SETFD, FD_CLOEXEC), which should end up doing the same
thing in a different way. Hopefully the process won't just end up
hanging in execvp() :)
Makes me want to build a non-threaded milter API, it does.