bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43389: 28.0.50; Emacs memory leaks


From: Eli Zaretskii
Subject: bug#43389: 28.0.50; Emacs memory leaks
Date: Fri, 30 Oct 2020 10:00:29 +0200

> From: Trevor Bentley <trevor@trevorbentley.com>
> Date: Thu, 29 Oct 2020 21:17:20 +0100
> 
> It doesn't start leaking until it has been active for 2-3 days. 
> It might depends on other factors, such as suspending or losing 
> network connectivity.  Once the leak triggers, it grows at a rate 
> of about 1MB every few seconds. My machine has 32GB, so it gets 
> pretty far before I notice and kill it. I'm not sure if there is a 
> limit.
> 
> I built emacs with debug symbols and dumped some strace logs last 
> time it happened.  This is from the "native-comp" branch, since 
> it's the only one I had built with debug symbols:  GNU Emacs 
> 28.0.50, commit feed53f8b5da0e58cce412cd41a52883dba6c1be.  I see 
> the same with the version installed from my package manager (Arch, 
> GNU Emacs 27.1), and the strace log looks about the same, though 
> without symbols.
> 
> I waited until it was actively leaking, and then ran the following 
> command to print a stack trace whenever the heap is extended with 
> brk():
> 
> $ sudo strace -p $PID -k -r --trace="?brk" --signal="SIGTERM"
> 
> The findings: this particular leak is triggered in libgnutls.  I 
> get large batches of the following (truncated) stack trace

Thanks.  This trace doesn't show how many bytes were allocated, does
it?  Without that it is hard to judge whether these GnuTLS calls could
be the culprit.  Because the full trace shows other calls to malloc,
for example this:

   > /usr/lib/libc-2.32.so(brk+0xb) [0xf6e7b]
   > /usr/lib/libc-2.32.so(__sbrk+0x84) [0xf6f54]
   > /usr/lib/libc-2.32.so(__default_morecore+0xd) [0x8d80d]
   > /usr/lib/libc-2.32.so(sysmalloc+0x372) [0x890e2]
   > /usr/lib/libc-2.32.so(_int_malloc+0xd9e) [0x8ad6e]
   > /usr/lib/libc-2.32.so(_int_memalign+0x3f) [0x8b01f]
   > /usr/lib/libc-2.32.so(_mid_memalign+0x13c) [0x8c12c]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(lisp_align_malloc+0x2e) 
[0x2364ee]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcons+0x65) [0x237f74]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(store_in_alist+0x5f) 
[0x5c9a3]
   > 
/home/trevor/applications/opt/bin/emacs-28.0.50(gui_report_frame_params+0x46a) 
[0x607f1]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Fframe_parameters+0x499) 
[0x5d88b]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Fframe_parameter+0x381) 
[0x5dc9c]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(eval_sub+0x7a7) [0x26f964]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Fif+0x1f) [0x26b590]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(eval_sub+0x38b) [0x26f548]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Feval+0x7a) [0x26ef45]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(funcall_subr+0x257) 
[0x271463]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Ffuncall+0x192) [0x270fe9]
   > 
/home/trevor/applications/opt/bin/emacs-28.0.50(internal_condition_case_n+0xa1) 
[0x26d81a]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__call+0x211) [0x73943]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__call1+0xba) [0x73b47]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__eval+0x35) [0x73bd7]
   > 
/home/trevor/applications/opt/bin/emacs-28.0.50(display_mode_element+0xe32) 
[0xb5515]

This seems to indicate some mode-line element that uses :eval, but
without knowing what it does it is hard to say anything more specific.

I also see this:

   > /home/trevor/applications/opt/bin/emacs-28.0.50(_start+0x2e) [0x4598e]
       2.870962 brk(0x55f5ed9a4000)       = 0x55f5ed9a4000
   > /usr/lib/libc-2.32.so(brk+0xb) [0xf6e7b]
   > /usr/lib/libc-2.32.so(__sbrk+0x84) [0xf6f54]
   > /usr/lib/libc-2.32.so(__default_morecore+0xd) [0x8d80d]
   > /usr/lib/libc-2.32.so(sysmalloc+0x372) [0x890e2]
   > /usr/lib/libc-2.32.so(_int_malloc+0xd9e) [0x8ad6e]
   > /usr/lib/libc-2.32.so(_int_memalign+0x3f) [0x8b01f]
   > /usr/lib/libc-2.32.so(_mid_memalign+0x13c) [0x8c12c]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(lisp_align_malloc+0x2e) 
[0x2364ee]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcons+0x65) [0x237f74]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Fmake_list+0x4f) [0x238544]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(concat+0x5c3) [0x2792f6]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcopy_sequence+0x16a) 
[0x278d2a]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(timer_check+0x33) 
[0x1b79dd]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(readable_events+0x1a) 
[0x1b5d00]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(get_input_pending+0x2f) 
[0x1bcf3a]
   > 
/home/trevor/applications/opt/bin/emacs-28.0.50(detect_input_pending_run_timers+0x2e)
 [0x1c4eb1]
   > 
/home/trevor/applications/opt/bin/emacs-28.0.50(wait_reading_process_output+0x14ec)
 [0x2de0c0]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(sit_for+0x211) [0x53e78]
   > /home/trevor/applications/opt/bin/emacs-28.0.50(read_char+0x1019) 
[0x1b3f62]

This indicates some timer that runs; again, without knowing which
timer and what it does, it is hard to proceed.

Etc. etc. -- the bottom line is that I think we need to know how many
bytes are allocated in each call to make some progress.  It would be
even more useful if we could somehow know which of the allocated
buffers are free'd soon and which aren't.  That's because Emacs calls
memory allocation functions _a_lot_, and it is completely normal to
see a lot of these calls.  What we need is to find allocations that
don't get free'd, and whose byte counts come close to explaining the
rate of 1MB every few seconds.  So these calls need to be filtered
somehow, otherwise we will not see the forest for the gazillion trees.

> I'm not sure if gnutls is giving back buffers that emacs is 
> supposed to free, or if the leak is entirely contained within 
> gnutls, but something in that path is hanging on to a lot of 
> allocations indefinitely.

The GnuTLS functions we call in emacs_gnutls_read are:

  gnutls_record_recv
  emacs_gnutls_handle_error

The latter is only called if there's an error, so I'm guessing it is
not part of your trace.  And the former doesn't say in its
documentation that Emacs should free any buffers after calling it, so
I'm not sure how Emacs could be the culprit here.  If GnuTLS is the
culprit (and as explained above, this is not certain at this point),
perhaps upgrading to a newer GnuTLS version or reporting this to
GnuTLS developers would allow some progress.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]