emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs Hangs on Filesystem Operations on Stale NFS


From: Alexander Shukaev
Subject: Re: Emacs Hangs on Filesystem Operations on Stale NFS
Date: Wed, 13 Jun 2018 12:45:35 +0200

On 06/11/2018 12:27 PM, Alexander Shukaev wrote:
Hi Everyone,


I initiated a discussion back in 2015 [1] about fragility of Emacs in terms of filesystem operations on stale NFS.  No solution actually came out of this discussion.  I still find this issue very disruptive.  Yet another example would be `recentf-cleanup' which is in my case triggered on Emacs start up, when the file comes from stale NFS, the corresponding `file-readable-p' down the stack will hang indefinitely, and there would be no way to unfreeze it apart from issuing 'kill -9' to that Emacs instance.  Don't you people find it unacceptable for the daily usage? Well, I do.  Such hangs always disrupt daily work and require quite some time to track them down as they are not Lisp-debuggable with e.g. <C-g> in a straightforward way (these are dead hangs from C code, where even attaching a GDB does not work).

Well, enough rant.  I think I have a proposal how to fix the issue, even given the blocking nature of Emacs.  How about introducing a variable `file-access-timeout' defaulting to `nil', which would reflect a configurable timeout for all access operations (such as `file-readable-p')?  This would be achieved via `SIGALARM' in the C code, which would protect every such operation.  For example,

#include <sigaction.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>

static void alarm_handler(int sig)
{
     return;
}

int emacs_stat(const char* path, struct stat* s, unsigned int seconds)
{
     struct sigaction newact;
     struct sigaction oldact;

     memset(&newact, 0, sizeof(newact));
     memset(&oldact, 0, sizeof(oldact));

     sigemptyset(&newact.sa_mask);

     newact.sa_flags   = 0;
     newact.sa_handler = alarm_handler;
     sigaction(SIGALRM, &newact, &oldact);

     alarm(seconds);

     errno                 = 0;
     const int rc          = stat(path, s);
     const int saved_errno = errno;

     alarm(0);
     sigaction(SIGALRM, &oldact, NULL);

     errno = saved_errno;
     return rc;
}

where `seconds' should be initialized with the value of `file-access-timeout'.  The cool advantage of this that I see is that one can then also selectively `let'-bind different values for `file-access-timeout', thus having total control over the use cases in which one wants to protect oneself from indefinite hangs.

Kind regards,
Alexander

[1] https://lists.gnu.org/archive/html/help-gnu-emacs/2015-11/msg00251.html


Today I realized that the following code

import os
import signal

class Alarm(Exception):
  pass

def alarm_handler(signum, frame):
  raise Alarm

path = '/mnt/<nfs>'

signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(3)
try:
  os.stat(path)
  signal.alarm(0)
except Alarm:
  print("Timed out after 3 seconds...")

does not time out in case of 'hard' mounting, which means that the only way to time this case out reliably is to spawn a child which attempts to perform 'stat' and is then killed by the parent if timed out (similar to how 'lsof' does it). I'm sure such a complex technique would be unacceptable for Emacs.

Although I don't like the current behavior as it looks like an editor vulnerability to me, which could be (either intentionally or unintentionally) used as a potential attack to hang the editor merely by pulling a network plug out, I have to admit that we can probably wrap up this discussion here since there is not much that could be done by Emacs itself to defend.

Regards,
Alexander



reply via email to

[Prev in Thread] Current Thread [Next in Thread]