emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: When should ralloc.c be used?


From: Eli Zaretskii
Subject: Re: When should ralloc.c be used?
Date: Mon, 24 Oct 2016 10:44:54 +0300

> Cc: address@hidden
> From: Paul Eggert <address@hidden>
> Date: Sun, 23 Oct 2016 21:59:52 -0700
> 
> Eli Zaretskii wrote:
> > Then what are our choices to solve this for Emacs 25.2?
> 
> Sorry, I've lost context. What's "this"?

The fact that Emacs 25.2 builds with ralloc.c on recent GNU/Linux
systems, which triggers random crashes, inability to build Emacs in
some cases, and other atrocities, such as corruption of buffer text.

> Doesn't draft Emacs 25.2 work adequately on bleeding-edge glibc
> without our doing anything special?

Not even close.

> If not, what are the problems and how can we reproduce them?

One problem is that relocation of buffer text and Lisp string data can
happen during regex searches, due to reallocation of the failure stack
to a size that exceeds MAX_ALLOCA.  Noam fixed that by some
non-trivial code in regex.c, but those changes seem to have uncovered
a problem that precludes bootstrapping Emacs 25.2, reported here:

  https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24358#123

which was independently reported for a different machine here:

  https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24772#11

I predict that more people will start hitting this as they upgrade to
newer glibc and start building Emacs with ralloc.c.

So we can't even build Emacs 25.2 reliably on "bleeding-edge"
GNU/Linux systems -- how's that for "working adequately"?

Then there's bug#24764, which sounds a lot like it's also related to
ralloc.c.  Because Michael Heerdegen talked about problems with
browsing Web pages, I looked at xml.c, and sure thing, it passes a C
pointer to buffer text to libxml2 functions which call malloc
internally.  I installed a ralloc-specific workaround for that, but
didn't yet hear from Michael if that was sufficient to solve his
frequent crashes in GC and corruptions of buffer text.

> I attempted to reproduce the problems, whatever they are, by building draft 
> Emacs 25.2 with './configure emacs_cv_var_doug_lea_malloc=no' on x86-64 
> Ubuntu 
> 16.04.1 (I don't have easy access to 16.10 yet, but this does build and link 
> gmalloc.o and ralloc.o). --enable-gcc-warnings did generate some warnings, 
> which 
> I just now fixed, but I didn't observe any runtime misbehaviors.

The problems don't happen immediately, and the problem with bootstrap
(bug#24358) seems to be dependent on some factor we don't yet
understand: Noam cannot reproduce it, although his system is very
similar to the one where it does happen.  If you want an almost
immediate manifestation of the problem in a build with ralloc, remove
the calls to r_alloc_inhibit_buffer_relocation in xml.c, and browse
some Web pages, you will sooner or later hit an assertion violation in
parse_region.

More generally, I found a few more places in the sources where we hold
C pointers to buffer text around calls to functions that can call
malloc.  I'm not sure I found all of them, because the only way to
look for them I know of is not perfect.  As for the same situation
where we hold C pointers to Lisp string data where malloc can be
called, there are virtually dozens of them, the most frequent paradigm
is something like

  char *beg = SSDATA (lisp_string);
  char *end = beg + something;
  Lisp_Object new_string = make_unibyte_string (beg, end - beg);

(The catch here is that make_unibyte_string calls malloc internally,
which could relocate the data of the original lisp_string, and thus
invalidate the pointers 'beg' and 'end'.)

I don't remember well enough the internals of ralloc.c: perhaps it
doesn't relocate Lisp string data unless the string is long enough? or
at all?  So the problems with Lisp string might not be as grave as I
fear, but this should certainly be looked into before we dismiss all
those cases.

Bottom line: when GNU/Linux systems started using ralloc.c, they've
potentially exposed Emacs 25.2 to very serious instability, on the
platform that we consider by far the most important one.  We need to
move fast and thoroughly to investigate the possible solutions and
decide which one to use.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]