bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27


From: Mark H Weaver
Subject: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27
Date: Wed, 04 Jul 2018 23:33:52 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Hi Ludovic,

address@hidden (Ludovic Courtès) writes:

> (+Cc: Andy as the ultimate authority for all these things.  :-))
>
> address@hidden (Ludovic Courtès) skribis:
>
>> (let loop ((files files)
>>            (n 0))
>>   (match files
>>     ((file . tail)
>>      (call-with-input-file file
>>        (lambda (port)
>>          (call-with-decompressed-port 'gzip port
>>            (lambda (port)
>>              (let loop ()
>>                (unless (eof-object? (get-bytevector-n port 777))
>>                  (loop)))))))
>>      ;; (pk 'loop n file)
>>      (display ".")
>>      (loop tail (+ n 1)))))
>
> One problem I’ve noticed is that the child process that
> ‘call-with-decompressed-port’ spawns would be stuck trying to get the
> allocation lock:
>
> (gdb) bt
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x00007f9fd8d5cb25 in __GI___pthread_mutex_lock (mutex=0x7f9fd91b3240 
> <GC_allocate_ml>) at ../nptl/pthread_mutex_lock.c:78
> #2  0x00007f9fd8f8ef8f in GC_call_with_alloc_lock (address@hidden 
> <do_copy_weak_entry>, address@hidden) at misc.c:1929
> #3  0x00007f9fd92b1270 in copy_weak_entry (dst=0x7ffe4b9a0d70, src=0x759ed0) 
> at weak-set.c:124
> #4  weak_set_remove_x (closure=0x8850c0, pred=0x7f9fd92b0440 <eq_predicate>, 
> hash=3944337866184184181, set=0x70cf00) at weak-set.c:615
> #5  scm_c_weak_set_remove_x (address@hidden<weak-set 756df0>, 
> raw_hash=<optimized out>, address@hidden <eq_predicate>, address@hidden) at 
> weak-set.c:791
> #6  0x00007f9fd92b13b0 in scm_weak_set_remove_x (set=#<weak-set 756df0>, 
> address@hidden<port 2 8850c0>) at weak-set.c:812
> #7  0x00007f9fd926f72f in close_port (port=#<port 2 8850c0>, 
> explicit=<optimized out>) at ports.c:884
> #8  0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 
> <GC_allocate_ml>, vp=0x7adf30, registers=0x0, resume=-657049556) at 
> vm-engine.c:786
> #9  0x00007f9fd92afb37 in scm_call_n (proc=<error reading variable: ERROR: 
> Cannot access memory at address 0xd959b030>0x7f9fd959b030, address@hidden, 
> address@hidden) at vm.c:1257
> #10 0x00007f9fd9233017 in scm_primitive_eval (exp=<optimized out>, 
> address@hidden<error reading variable: ERROR: Cannot access memory at address 
> 0xd5677cf8>0x855280) at eval.c:662
> #11 0x00007f9fd9233073 in scm_eval (exp=<error reading variable: ERROR: 
> Cannot access memory at address 0xd5677cf8>0x855280, address@hidden<error 
> reading variable: ERROR: Cannot access memory at address 0xd95580d8>0x83d140) 
> at eval.c:696
> #12 0x00007f9fd927e8d0 in scm_shell (argc=2, argv=0x7ffe4b9a1668) at 
> script.c:454
> #13 0x00007f9fd9249a9d in invoke_main_func (body_data=0x7ffe4b9a1510) at 
> init.c:340
> #14 0x00007f9fd922c28a in c_body (d=0x7ffe4b9a1450) at continuations.c:422
> #15 0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 
> <GC_allocate_ml>, vp=0x7adf30, registers=0x0, resume=-657049556) at 
> vm-engine.c:786
> #16 0x00007f9fd92afb37 in scm_call_n (address@hidden<smob catch-closure 
> 795120>, address@hidden, address@hidden) at vm.c:1257
> #17 0x00007f9fd9231e69 in scm_call_0 (address@hidden<smob catch-closure 
> 795120>) at eval.c:481
> #18 0x00007f9fd929e7b2 in catch (address@hidden, thunk=#<smob catch-closure 
> 795120>, handler=<error reading variable: ERROR: Cannot access memory at 
> address 0x400000000>0x7950c0, pre_unwind_handler=<error reading variable: 
> ERROR: Cannot access memory at address 0x400000000>0x7950a0) at throw.c:137
> #19 0x00007f9fd929ea95 in scm_catch_with_pre_unwind_handler (address@hidden, 
> thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized 
> out>) at throw.c:254
> #20 0x00007f9fd929ec5f in scm_c_catch (address@hidden, address@hidden 
> <c_body>, address@hidden, address@hidden <c_handler>, address@hidden, 
> address@hidden <pre_unwind_handler>, pre_unwind_handler_data=0x7a9bc0) at 
> throw.c:377
> #21 0x00007f9fd922c870 in scm_i_with_continuation_barrier (address@hidden 
> <c_body>, address@hidden, address@hidden <c_handler>, address@hidden, 
> address@hidden <pre_unwind_handler>, pre_unwind_handler_data=0x7a9bc0) at 
> continuations.c:360
> #22 0x00007f9fd922c905 in scm_c_with_continuation_barrier (func=<optimized 
> out>, data=<optimized out>) at continuations.c:456
> #23 0x00007f9fd929d3ec in with_guile (address@hidden, address@hidden) at 
> threads.c:661
> #24 0x00007f9fd8f8efb8 in GC_call_with_stack_base (address@hidden 
> <with_guile>, address@hidden) at misc.c:1949
> #25 0x00007f9fd929d708 in scm_i_with_guile (dynamic_state=<optimized out>, 
> address@hidden, address@hidden <invoke_main_func>) at threads.c:704
> #26 scm_with_guile (address@hidden <invoke_main_func>, address@hidden) at 
> threads.c:710
> #27 0x00007f9fd9249c32 in scm_boot_guile (address@hidden, address@hidden, 
> address@hidden <inner_main>, address@hidden) at init.c:323
> #28 0x0000000000400b70 in main (argc=2, argv=0x7ffe4b9a1668) at guile.c:101
> (gdb) info threads
>   Id   Target Id         Frame 
> * 1    Thread 0x7f9fd972eb80 (LWP 15573) "guile" __lll_lock_wait () at 
> ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>
> So it seems quite clear that the thing has the alloc lock taken.  I
> suppose this can happen if one of the libgc threads runs right when we
> call fork and takes the alloc lock, right?

Does libgc spawn threads that run concurrently with user threads?  If
so, that would be news to me.  My understanding was that incremental
marking occurs within GC allocation calls, and marking threads are only
spawned after all user threads have been stopped, but I could be wrong.

The first idea that comes to my mind is that perhaps the finalization
thread is holding the GC allocation lock when 'fork' is called.  The
finalization thread grabs the GC allocation lock every time it calls
'GC_invoke_finalizers'.  All ports backed by POSIX file descriptors
(including pipes) register finalizers and therefore spawn the
finalization thread and make work for it to do.

Another possibility: both the finalization thread and the signal
delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
which also temporarily grabs the GC allocation lock before calling the
specified function.  See 'GC_do_blocking_inner' in pthread_support.c in
libgc.  You spawn the signal delivery thread by calling 'sigaction' and
you make work for it to do every second when the SIGALRM is delivered.

> If that is correct, the fix would be to call fork within
> ‘GC_call_with_alloc_lock’.
>
> How does that sound?

Sure, sounds good to me.

> As a workaround on the Guix side, we might achieve the same effect by
> calling ‘gc-disable’ right before ‘primitive-fork’.

I don't think this would help.

     Thanks,
       Mark





reply via email to

[Prev in Thread] Current Thread [Next in Thread]