[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

System getting stuck by GCC invocation

From: Thomas Schwinge
Subject: System getting stuck by GCC invocation
Date: Thu, 25 Sep 2014 08:13:57 +0200
User-agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu)


Here is a test case that gets a Hurd system "stuck".  (But I don't know
yet what "stuck" exactly means.)  Unfortunately, the test case is a
biggie: GCC.  The steps to reproduce it, reduced as much as I could:

Fetch GCC trunk, for example retrieve the following snapshot:
or a Subversion or Git checkout: r215404 or commit
20a52496a54ae8916e2bfc4d38aec95bb3592242, respectively, is what I've been

    $ mkdir trunk.build && cd trunk.build/
    $ ../trunk/configure --prefix=$PWD.install --enable-languages=java 
--disable-bootstrap --with-native-system-header-dir=/usr/include 
--enable-multiarch --enable-link-mutex --disable-lto --disable-libcilkrts 
    $ make -j2 configure-target-libjava
    [Takes some time, produce more than 1 GiB of "stuff".]
    $ make -C i686-unknown-gnu0.5/libjava/classpath/lib/ classes
    $ \time make -C i686-unknown-gnu0.5/libjava/ gnu/java/nio/charset.lo
    /bin/sh ./libtool --tag=GCJ   --mode=compile [...]
    libtool: compile:  /home/thomas/tmp/gcc/trunk.build/./gcc/gcj [...] -o 

The last one, the gcj invocation, is the command to bring down the
system.  You can further decompose that one: cd to
i686-unknown-gnu0.5/libjava/, and in there run the »[...]/gcj [...]«
command with »-v -save-temps« added, then you'll see:

     /home/thomas/tmp/gcc/trunk.build/./gcc/jc1 /tmp/cc[...] -o cc[...].s
    GNU Java (GCC) version 5.0.0 20140919 [...]

That jc1 invocation (which does not fork further, so can be rpctraced and
all that) basically takes a number of Java class files, and writes those
into cc[...].s, 160 MiB of "assembler" source code.  (When runnign under
rpctrace, this can be observed, and the process terminates normally.)  If
the system doesn't get stuck before, this assembler source file will then
be further processed into an object file of 24 MiB.  Running that last
»\time make [...]« command line on x86 Debian GNU/Linux:

    74.77user 2.58system 1:22.38elapsed 93%CPU (0avgtext+0avgdata 
    152inputs+375640outputs (0major+206659minor)pagefaults 0swaps

So, that had a RSS of about 500 MiB as well as considerable CPU time
usage.  On the Hurd we additionally have to deal with the memory
management for the many out-of-line RPC buffers for the data that is
being sent between the processes.

Running »vmstat 1« in parallel, one can see the free memory decrease and
inactive increase, and when the system gets stuck, there is still plenty
of free memory available (for example, 780 MiB), and vmstat keeps
updating (but the delta between the values it prints gets next to zero),
but the gcj process was already stuck, couldn't be SIGINTed, and no
further shells could be spawned, and so on.

This testing was done in a QEMU/KVM Hurd system, configured with 1536 GiB
of RAM.  Some time after beginning to execute the jc1 command, the system
stops responding, but still consumes 100 % host CPU time.  And, if QEMU's
built-in gdbserver is to be believed, it is still doing "something":

    (gdb) target remote :1234
    Remote debugging using :1234
    0x801094d7 in spl0 () at ../i386/i386/spl.S:106
    106             SETMASK()                       /* program PICs with new 
mask */
    (gdb) bt
    #0  0x801094d7 in spl0 () at ../i386/i386/spl.S:106
    #1  0x00000007 in ?? ()
    #2  0x80127880 in thread_depress_abort (thread=0xe19eabe0) at 
    #3  0x801278c7 in thread_depress_abort (thread=<optimized out>) at 
    #4  swtch_pri_continue () at ../kern/syscall_subr.c:103
    #5  0x00000000 in ?? ()
    (gdb) c
    Program received signal SIGINT, Interrupt.
    spl0 () at ../i386/i386/spl.S:108
    108             sti                             /* enable interrupts */
    (gdb) bt
    #0  spl0 () at ../i386/i386/spl.S:108
    #1  0x00000007 in ?? ()
    #2  0x801279ef in thread_depress_priority (thread=0xe19eabe0, 
depress_time=10) at ../kern/syscall_subr.c:321
    #3  0x80127a39 in swtch_pri (pri=0) at ../kern/syscall_subr.c:131
    #4  0x801083a8 in mach_call_call () at ../i386/i386/locore.S:1101
    #5  0x00000000 in ?? ()
    (gdb) c
    Program received signal SIGINT, Interrupt.
    db_load_context (pcb=0xe19b83c0) at ../i386/i386/db_interface.c:80
    80              set_dr3(pcb->ims.ids.dr[3]);
    (gdb) bt
    #0  db_load_context (pcb=0xe19b83c0) at ../i386/i386/db_interface.c:80
    #1  0x8010881d in switch_ktss (pcb=0xe19b83c0) at ../i386/i386/pcb.c:219
    #2  0x80108921 in stack_handoff (old=0xe19eabe0, new=0xe19b7000) at 
    #3  0x80126826 in thread_invoke (old_thread=0xe19eabe0, 
continuation=0x801278b0 <swtch_pri_continue>, new_thread=0xe19b7000) at 
    #4  0x80126e64 in thread_block (continuation=0x801278b0 
<swtch_pri_continue>) at ../kern/sched_prim.c:890
    #5  0x80127a45 in swtch_pri (pri=0) at ../kern/syscall_subr.c:134
    #6  0x801083a8 in mach_call_call () at ../i386/i386/locore.S:1101
    #7  0x00000000 in ?? ()

..., and so on.

Any recommendations about how to continue debugging this issue?


Attachment: pgpa6cknDFNNQ.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]