qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu-system-ppc hangs


From: Richard Purdie
Subject: Re: [Qemu-devel] qemu-system-ppc hangs
Date: Tue, 21 Nov 2017 09:02:25 +0000

On Tue, 2017-11-21 at 07:50 +0000, Mark Cave-Ayland wrote:
> On 21/11/17 00:00, Richard Purdie wrote:
> > I work on the Yocto Project and we use qemu to test boot our Linux
> > images and run tests against them. We've been noticing some
> > instability
> > for ppc where the images sometimes hang, usually around udevd bring
> > up
> > time so just after booting into userspace.
> > 
> > To cut a long story short, I've tracked down what I think is the
> > problem. I believe the decrementer timer stops receiving interrupts
> > so
> > tasks in our images hang indefinitely as the timer stopped. 
> > 
> > It can be summed up with this line of debug:
> > 
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req
> > 00000004
> > 
> > It should normally read:
> > 
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req
> > 00000002
> > 
> > The question is why CPU_INTERRUPT_EXITTB ends up being set when the
> > lines above this log message clearly sets CPU_INTERRUPT_HARD (via 
> > cpu_interrupt() ).
> > 
> > I note in cpu.h:
> > 
> >     /* updates protected by BQL */
> >     uint32_t interrupt_request;
> > 
> > (for struct CPUState)
> > 
> > The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB"
> > in 5
> > places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,  
> > g_assert(qemu_mutex_iothread_locked()); fails. If I do something
> > like:
> > 
> > if (!qemu_mutex_iothread_locked()) {
> >     qemu_mutex_lock_iothread();
> >     cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> >     qemu_mutex_unlock_iothread();
> > } else {
> >     cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> > }
> > 
> > in these call sites then I can no longer lock qemu up with my test
> > case.
> > 
> > I suspect the _HARD setting gets overwritten which stops the 
> > decrementer interrupts being delivered.
> > 
> > I don't know if taking this lock in these situations is going to be
> > bad
> > for performance and whether such a patch would be right/wrong.
> > 
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
>
> Thanks for the report - given that a lot of work has been done on
> MTTCG and iothread over the past few releases, it wouldn't be a
> complete surprise if something had crept in here.
> 
> Firstly let's start off with some basics: what is your host
> architecture, QEMU version and full command line being used to launch
> QEMU?

I'm running this on x86_64, I'm using qemu 2.10.1 and the commandline
being used for qemu is:

qemu-system-ppc -device virtio-net-pci,netdev=net0,mac=52:54:00:12:34:02 
   -netdev tap,id=net0,ifname=tap0,script=no,downscript=no 
   -drive file=./core-image-sato-qemuppc.rootfs.ext4,if=virtio,format=raw 
   -show-cursor -device virtio-rng-pci  -nographic -pidfile /tmp/zzqemu.1.pid 
   -d unimp,guest_errors,int -D /tmp/qemu.1 -monitor pty -machine mac99 
   -cpu G4 -m 256 -snapshot -serial mon:stdio -serial null 
   -kernel /tmp/repro/vmlinux-qemuppc.bin -append 'root=/dev/vda rw highres=off 
 
    console=ttyS0 mem=256M ip=192.168.7.2::192.168.7.1:255.255.255.0 
console=tty 
    console=ttyS0  udev.log-priority=debug powersave=off'

> Would it also be possible for you to make your test image available
> for other people to see if they can recreate the same issue?

I've shared the image, kernel and my "reproducer" script: 

http://www.rpsys.net/wp/rp/qemuppc-hang-reproducer.tgz

We needed to find a way to reproduce this at will and it doesn't seem
to happen often. The scripts in there are partly extracted from our
test setup and partly ugly brute forcing. To run them you'd do
something like:

cc tunctl.c -o tunctl
sudo ./runqemu-gen-tapdevs 1000 1000 50
(This sets up tap0-tap49 accessible by user/group 1000/1000, only need
to do this once - its how we enable easy networking without needing
sudo on our test infrastructure)

vi core-image-sato-qemuppc.qemuboot.conf
[set the last three lines to point at where qemu-system-ppc lives]
vi ./runqemu-parallel.py
[set mydir to wherever you extracted it to]
python3 ./runqemu-parallel.py

This will launch 50 copies of qemu, dumping logging and output into
/tmp/qemu.X and /tmp/*runqemu* files and than monitor the logs to see
which if any "stall". Its normal for the image to stall for a few
seconds towards the end of boot but if any are printing stalled
messages for a minute, they've properly hung. You'll see a:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req
00000004

in the logs of a hung qemu. The image output would usually stop with a
ep_poll4. The kernel I've provided there is a very verbose debug kernel
which is hang to tell when its hung, if a bit slower to boot.

I didn't promise it was neat, sorry :)

Cheers,

Richard




reply via email to

[Prev in Thread] Current Thread [Next in Thread]