guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Now crashing [was Re: guile-2.9.2 and threading


From: Linas Vepstas
Subject: Now crashing [was Re: guile-2.9.2 and threading
Date: Sun, 14 Jul 2019 16:59:43 -0500


So, here's my next installment on using guile-2.9.2. The first installment said that I'd piled up CPU-months of guile 2.9.2 experience without any crashes. Well, now, a different workload crashes in minutes.  Below is a highly simplified, edited gdb session -- it crashes because it unexpectedly aborts, during an abort(!) because `get_callee_vcode()` failed. Harrumpf. 

Background: there are 140 threads, half in guile, the other half waiting for guile to finish. Yes, that's too many, but anyways ... 70 threads in guile and one crashed:

#2  0x00007f85d3f6ecdb in capture_delimited_continuation (
    current_registers=<optimized out>, dynstack=<optimized out>,
    saved_registers=<optimized out>, saved_mra=<optimized out>,
    saved_fp=<optimized out>, vp=<optimized out>) at ../../libguile/vm.c:1327
#3  abort_to_prompt (thread=0x15f692dc0, saved_mra=<optimized out>)
    at ../../libguile/vm.c:1454

Both frames are interesting, because libguile/vm.c:1327 shows
  if (SCM_FRAME_DYNAMIC_LINK (base_fp) != saved_fp)
     abort();
hey!? who called this? line 1454 is in the middle of abort_to_prompt ()  Yow! an unexpected abort during an abort...

How did we get here?
#15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420,
    subr=<optimized out>, message=message@entry=0x1607c9380,
    args=args@entry=0x15af130e0, data="">    at ../../libguile/error.c:90
#16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0,
    message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", args=0x15af130e0,
    rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62
#17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0)
    at ../../libguile/vm.c:1527

and libguile/vm.c:1527 tells me that get_callee_vcode () is very unhappy. But why? I cannot tell .. after that, things peter out in boring stack frames that started with my call scm_c_catch() ... the same seemingly harmless call that is pending in 70 other threads. (the same call that has survived several CPU month of pounding with a different collection of scheme code) 

My best guess is that the current workload, by unintentionally launching gobs of threads is exposing a race condition that has been hithertho hidden.  I don't know how to debug any further.  I will try a slightly newer guile shortly, to see if I get lucky.

-- Linas

p.s. here's the whole stack trace. But really, its boring, except for the above highlights.



(gdb) bt
#0  0x00007f85d38ef428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f85d38f102a in __GI_abort () at abort.c:89
#2  0x00007f85d3f6ecdb in capture_delimited_continuation (
    current_registers=<optimized out>, dynstack=<optimized out>,
    saved_registers=<optimized out>, saved_mra=<optimized out>,
    saved_fp=<optimized out>, vp=<optimized out>) at ../../libguile/vm.c:1327
#3  abort_to_prompt (thread=0x15f692dc0, saved_mra=<optimized out>)
    at ../../libguile/vm.c:1454
#4  0x00007f85ac539041 in ?? ()
#5  0x00007f85ac37f040 in ?? ()
#6  0x00007f85d41e10c0 in jump_table_ () from /usr/local/lib/libguile-3.0.so.0
#7  0x000000015f692dc0 in ?? ()
#8  0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
    mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
#9  0x00007f85d3f70600 in vm_debug_engine (thread=0x7f85ac539000)
    at ../../libguile/vm-engine.c:370
#10 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0xe45a20, argv=<optimized out>,
    nargs=5) at ../../libguile/vm.c:1605
#11 0x00007f85d3eefdcb in scm_apply_0 (proc=0xe45a20, args=0x304)
    at ../../libguile/eval.c:603
#12 0x00007f85d3ef0a0d in scm_apply_1 (proc=<optimized out>,
    arg1=arg1@entry=0xdc5420, args=args@entry=0x15af130a0)
    at ../../libguile/eval.c:609
#13 0x00007f85d3f6c546 in scm_throw (key=key@entry=0xdc5420, args=0x15af130a0)
    at ../../libguile/throw.c:272
#14 0x00007f85d3f6caf9 in scm_ithrow (key=key@entry=0xdc5420, args=<optimized out>,
    no_return=no_return@entry=1) at ../../libguile/throw.c:619
#15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420,
    subr=<optimized out>, message=message@entry=0x1607c9380,
    args=args@entry=0x15af130e0, data="">    at ../../libguile/error.c:90
---Type <return> to continue, or q <return> to quit---
#16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0,
    message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", args=0x15af130e0,
    rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62
#17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0)
    at ../../libguile/vm.c:1527
#18 0x00007f85b4314805 in ?? ()
#19 0x00007f85b428a000 in ?? ()
#20 0x00007f85d41e10c0 in jump_table_ () from /usr/local/lib/libguile-3.0.so.0
#21 0x000000015f692dc0 in ?? ()
#22 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
    mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
#23 0x00007f85d3f70600 in vm_debug_engine (thread=0x2)
    at ../../libguile/vm-engine.c:370
#24 0x00007f85d3f76db2 in scm_call_n (proc=<optimized out>,
    argv=argv@entry=0x7f7e85fe2600, nargs=nargs@entry=3) at ../../libguile/vm.c:1605
#25 0x00007f85d3eef97f in scm_call_3 (proc=<optimized out>, arg1=<optimized out>,
    arg2=<optimized out>, arg3=<optimized out>) at ../../libguile/eval.c:510
#26 0x00007f85d4262b6f in ?? ()
#27 0x00007f85d4262a80 in ?? ()
#28 0x00007f85d41e10c0 in jump_table_ () from /usr/local/lib/libguile-3.0.so.0
#29 0x000000015f692dc0 in ?? ()
#30 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
    mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
#31 0x00007f85d3f70600 in vm_debug_engine (thread=0x304)
    at ../../libguile/vm-engine.c:370
#32 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341ee0,
    argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605
#33 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341ee0)
    at ../../libguile/eval.c:490
#34 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341ee0,
    handler=0x15b341ec0, pre_unwind_handler=0x15b341ea0) at ../../libguile/throw.c:146
#35 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404,
---Type <return> to continue, or q <return> to quit---
    thunk=<optimized out>, handler=<optimized out>,
    pre_unwind_handler=<optimized out>) at ../../libguile/throw.c:260
#36 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404, body=<optimized out>,
    body_data=<optimized out>,
    handler=handler@entry=0x7f85c95d0f00 <opencog::SchemeEval::catch_handler_wrapper(void*, scm_unused_struct*, scm_unused_struct*)>,
    handler_data=handler_data@entry=0x7f7e60000980,
    pre_unwind_handler=pre_unwind_handler@entry=0x7f85c95d0c40 <opencog::SchemeEval::preunwind_handler_wrapper(void*, scm_unused_struct*, scm_unused_struct*)>,
    pre_unwind_handler_data=0x7f7e60000980) at ../../libguile/throw.c:385
#37 0x00007f85c95d122a in opencog::SchemeEval::do_eval (this=0x7f7e60000980,
    expr="(observe-mpg \"The countess, with her loving heart, felt that her children were being ruined, that it was not the count's fault for he could not help being what he was -- that (though he tried to hide "...)
    at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:590
#38 0x00007f85c95d12aa in opencog::SchemeEval::c_wrap_eval (p=0x7f7e60000980)
    at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:507
#39 0x00007f85d3eeb47a in c_body (d=0x7f7e85fe2d40)
    at ../../libguile/continuations.c:430
#40 0x00007f85d4262b6f in ?? ()
#41 0x00007f85d4262a80 in ?? ()
#42 0x00007f85d41e10c0 in jump_table_ () from /usr/local/lib/libguile-3.0.so.0
#43 0x000000015f692dc0 in ?? ()
#44 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0,
    mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796
#45 0x00007f85d3f70600 in vm_debug_engine (thread=0x304)
    at ../../libguile/vm-engine.c:370
#46 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341fe0,
    argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605
#47 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341fe0)
    at ../../libguile/eval.c:490
#48 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341fe0,
---Type <return> to continue, or q <return> to quit---
    handler=0x15b341fc0, pre_unwind_handler=0x15b341fa0) at ../../libguile/throw.c:146
#49 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404,
    thunk=<optimized out>, handler=<optimized out>,
    pre_unwind_handler=<optimized out>) at ../../libguile/throw.c:260
#50 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404,
    body=body@entry=0x7f85d3eeb470 <c_body>,
    body_data=body_data@entry=0x7f7e85fe2d40,
    handler=handler@entry=0x7f85d3eeb720 <c_handler>,
    handler_data=handler_data@entry=0x7f7e85fe2d40,
    pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 <pre_unwind_handler>,
    pre_unwind_handler_data=0xe174a0) at ../../libguile/throw.c:385
#51 0x00007f85d3eeb9e3 in scm_i_with_continuation_barrier (
    body=body@entry=0x7f85d3eeb470 <c_body>,
    body_data=body_data@entry=0x7f7e85fe2d40,
    handler=handler@entry=0x7f85d3eeb720 <c_handler>,
    handler_data=handler_data@entry=0x7f7e85fe2d40,
    pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 <pre_unwind_handler>,
    pre_unwind_handler_data=0xe174a0) at ../../libguile/continuations.c:368
#52 0x00007f85d3eebac5 in scm_c_with_continuation_barrier (func=<optimized out>,
    data="" out>) at ../../libguile/continuations.c:464
#53 0x00007f85d3575127 in GC_call_with_gc_active (
    fn=fn@entry=0x7f85d3f6a070 <with_guile_trampoline>,
    client_data=client_data@entry=0x7f7e85fe2e20) at ../pthread_support.c:1343
#54 0x00007f85d3f6ac4f in with_guile (base=base@entry=0x7f7e85fe2df0,
    data="" at ../../libguile/threads.c:683
#55 0x00007f85d356f132 in GC_call_with_stack_base (
    fn=fn@entry=0x7f85d3f6abb0 <with_guile>, arg=arg@entry=0x7f7e85fe2e20)
    at ../misc.c:1941
#56 0x00007f85d3f6aff8 in scm_i_with_guile (dynamic_state=<optimized out>,
    data=""
    func=0x7f85c95d1290 <opencog::SchemeEval::c_wrap_eval(void*)>)
    at ../../libguile/threads.c:698
---Type <return> to continue, or q <return> to quit---
#57 scm_with_guile (
    func=func@entry=0x7f85c95d1290 <opencog::SchemeEval::c_wrap_eval(void*)>,
    data="" at ../../libguile/threads.c:704
#58 0x00007f85c95d126e in opencog::SchemeEval::eval_expr (this=0x7f7e60000980,
    expr=...) at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:479
#59 0x00007f85bc783439 in opencog::GenericShell::eval_loop (this=0x7f7ef0001e90)
    at /home/ubuntu/src/opencog/opencog/cogserver/shell/GenericShell.cc:588
#60 0x00007f85c6e5ac80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#61 0x00007f85d3c916ba in start_thread (arg=0x7f7e85fe3700) at pthread_create.c:333
#62 0x00007f85d39c141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb)



--
cassette tapes - analog TV - film cameras - you

reply via email to

[Prev in Thread] Current Thread [Next in Thread]