Re: Emacs Hangs on Filesystem Operations on Stale NFS

From: Stefan Monnier
Subject: Re: Emacs Hangs on Filesystem Operations on Stale NFS
Date: Mon, 11 Jun 2018 11:04:47 -0400
> this discussion.  I still find this issue very disruptive.  Yet another
> example would be `recentf-cleanup' which is in my case triggered on Emacs
> start up, when the file comes from stale NFS, the corresponding
> `file-readable-p' down the stack will hang indefinitely, and there would be
> no way to unfreeze it apart from issuing 'kill -9' to that Emacs instance.

Indeed stale NFS mounts can be problematic.  As you can see from
Andreas's reaction the obvious first answer is that it's a general
problem, so I think we first need to understand what makes it different
in the context of Emacs.

I don't use NFS much these days, but IIRC there are basically two
different ways to do NFS mounts: "hard" and "soft".  Back when I used
it, "hard" was used with "intr" so you could interrupt frozen processes,
but from what I read, the linux kernel's NFS client nowadays doesn't
support this any more, so a process waiting for a hard-mounted NFS
server can only be interrupted with a SIGKILL.

So some questions, to better understand what are our options:

- It seems your unreliable NFS server is mounted "hard" rather than "soft".
  Why is that?  "man mount" on my Debian machine doesn't find any "hard"
  or "soft" options, so has the soft-mount option disappeared?  What are
  applications usually expected to do when accessing a stale NFS server?

- You can "kill -9" is the only option, yet you seem to also say that
  SIGALRM does work.  The two statements seem contradictory.
  What is the set of signals which work, really?
  E.g. Does `kill -USR1` work (with debug-on-event)?
  Maybe the issue here is that Emacs handles C-g via polling rather than
  via interrupts, and we should refine that polling such that it handles
  such "C-g while in the middle of a long-running file access syscall"?

> Well, enough rant.  I think I have a proposal how to fix the issue, even
> given the blocking nature of Emacs.  How about introducing a variable
> `file-access-timeout' defaulting to `nil', which would

If at all possible, I think I'd prefer to let the user interrupt with
C-g rather than rely on some kind of timeout.

Reading the original thread, you seem to say that this mostly affects
"dired" operation, and that not only can it hang, but it can also
be slow.

So a few more questions:

- In my experience dired-like operations over NFS servers should obey
  either normal speed or hang (if the server is unavailable) but "slow"
  is not something I'd expect.  Do you know why it's sometimes slow?
  Is your NFS server itself "far/slow"?  Is the slowness due to some
  automount (i.e. it's slow because of the time taken to perform the
  mount itself)?

- Does this slow/hanging behavior appear only in dired?  Does it only
  affect Emacs when using dired on a directory that's indeed on a NFS
  server, or does it affect accesses which don't obviously require
  NFS access?


