[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Hangs on Filesystem Operations on Stale NFS

From: Alexander Shukaev
Subject: Re: Emacs Hangs on Filesystem Operations on Stale NFS
Date: Wed, 13 Jun 2018 12:45:35 +0200

On 06/11/2018 12:27 PM, Alexander Shukaev wrote:
Hi Everyone,

I initiated a discussion back in 2015 [1] about fragility of Emacs in terms of filesystem operations on stale NFS.  No solution actually came out of this discussion.  I still find this issue very disruptive.  Yet another example would be `recentf-cleanup' which is in my case triggered on Emacs start up, when the file comes from stale NFS, the corresponding `file-readable-p' down the stack will hang indefinitely, and there would be no way to unfreeze it apart from issuing 'kill -9' to that Emacs instance.  Don't you people find it unacceptable for the daily usage? Well, I do.  Such hangs always disrupt daily work and require quite some time to track them down as they are not Lisp-debuggable with e.g. <C-g> in a straightforward way (these are dead hangs from C code, where even attaching a GDB does not work).

Well, enough rant.  I think I have a proposal how to fix the issue, even given the blocking nature of Emacs.  How about introducing a variable `file-access-timeout' defaulting to `nil', which would reflect a configurable timeout for all access operations (such as `file-readable-p')?  This would be achieved via `SIGALARM' in the C code, which would protect every such operation.  For example,

#include <sigaction.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>

static void alarm_handler(int sig)

int emacs_stat(const char* path, struct stat* s, unsigned int seconds)
     struct sigaction newact;
     struct sigaction oldact;

     memset(&newact, 0, sizeof(newact));
     memset(&oldact, 0, sizeof(oldact));


     newact.sa_flags   = 0;
     newact.sa_handler = alarm_handler;
     sigaction(SIGALRM, &newact, &oldact);


     errno                 = 0;
     const int rc          = stat(path, s);
     const int saved_errno = errno;

     sigaction(SIGALRM, &oldact, NULL);

     errno = saved_errno;
     return rc;

where `seconds' should be initialized with the value of `file-access-timeout'.  The cool advantage of this that I see is that one can then also selectively `let'-bind different values for `file-access-timeout', thus having total control over the use cases in which one wants to protect oneself from indefinite hangs.

Kind regards,

[1] https://lists.gnu.org/archive/html/help-gnu-emacs/2015-11/msg00251.html

Today I realized that the following code

import os
import signal

class Alarm(Exception):

def alarm_handler(signum, frame):
  raise Alarm

path = '/mnt/<nfs>'

signal.signal(signal.SIGALRM, alarm_handler)
except Alarm:
  print("Timed out after 3 seconds...")

does not time out in case of 'hard' mounting, which means that the only way to time this case out reliably is to spawn a child which attempts to perform 'stat' and is then killed by the parent if timed out (similar to how 'lsof' does it). I'm sure such a complex technique would be unacceptable for Emacs.

Although I don't like the current behavior as it looks like an editor vulnerability to me, which could be (either intentionally or unintentionally) used as a potential attack to hang the editor merely by pulling a network plug out, I have to admit that we can probably wrap up this discussion here since there is not much that could be done by Emacs itself to defend.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]