[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: autoconf hangs due to autom4te.cache and NFS problem on AIX

From: Bob Proulx
Subject: Re: autoconf hangs due to autom4te.cache and NFS problem on AIX
Date: Sat, 23 Feb 2008 03:27:55 -0700
User-agent: Mutt/1.5.13 (2006-08-11)

Chris Pickett wrote:
> [[Please reply-all, I'm not subscribed.]]
> I noticed that I couldn't delete my autom4te.cache directory:
> conftest $ ll autom4te.cache/
> total 0
> -rw-r--r-- 1 pickett xxx 0 Feb 23 00:27 .nfsCC131
> because of that .nfs file.  I don't really know how nfs works 
> internally, but if I delete that file manually it just comes back.

This is because of a problem known as the "NFS last close" problem.
In a Unix / POSIX standard filesystem a process can open a file and
then while holding a file handle open the file can be removed from the
directory.  The file doesn't actually go away yet because the system
is reference counted and the file has a non-zero reference count
because of the open filehandle.  Eventually when the last reference to
the file is gone then the file will be removed.

Because NFS is stateless the nfs server can't know about reference
counts and open filehandles held by client processes on the nfs client
and therefore can't implement this "last close" behavior.  All that
can be done is that nfs clients can try to fake it.  NFS clients do
this by renaming the file.  On the nfs client if a process still has
an open file descriptor to the file then instead of removing it the
nfs client renames the file to a unique filename and hides it away
hoping it will appear to be gone.  This fakes removing the file but
leaves it around so that the process with it open can continue to
access it.  (If on the nfs server the file is removed directly without
going through the nfs client then it is actually removed.  Any nfs
clients that access the file subsequently will get a stale nfs file
handle error.)

By attempting to remove it manually the file simply gets renamed again
by the nfs client because it knows that it has an open file descriptor
to it.  A process has it open, it has a non-zero reference count, and
so the old filename does get removed but it is simply renamed to a
unique hidden filename.  You can move the file however.  Sometimes it
is usefule to move the file upward into the parent directory.  Then
the previous directory will be empty and can be removed.  Finding the
root cause of why a process still has the file busy is better though.

You would need to find the process that has the file open and deal
with it (probably by killing it) first.  Using tools such as 'fuser'
and 'lsof' or their equivalents on your system can be useful here.

> By the way, I had similar problems with not being able to delete files 
> with Subversion, as you can see here:

That is the same problem.

> so I don't really think it's Autoconf's "fault", but if you could find a 
> way to make it fail gracefully that would be neat, I suppose.  (I'll 
> test a tarball if you like.)  That's not the point of this message 
> though, as stated...

Dealing with NFS problems has been a long running problem.

As a first pass I would try using a local directory instead of over
NFS.  Then NFS problems will be removed from the environment.  Try
building on the local disk, perhaps in /tmp or /var/tmp, and it is
likely that you will avoid these problems.  If it works on the local
disk but fails over nfs then it is likely to be an nfs bug.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]