Re: [Qemu-devel] State of ARM FIQ in Qemu

On 12 November 2014 07:56, Tim Sander <address@hidden> wrote:

Hi Greg

> > Bad mode in data abort handler detected
> > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > Modules linked in: firq(O) ipv6
> > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G O 3.14.0 #1
> > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > PC is at 0xffff1240
> > LR is at handle_fasteoi_irq+0x9c/0x13c
> > pc : [<ffff1240>] lr : [<8005cda0>] psr: 600f01d1
> > sp : bf363e70 ip : 07a7e79d fp : 00000000
> > r10: 76f92008 r9 : 80590080 r8 : 76e8e4d0
> > r7 : f8200100 r6 : bf363fb0 r5 : bf008414 r4 : bf0083c0
> > r3 : 80230d04 r2 : 0000002f r1 : 00000000 r0 : bf0083c0
> > Flags: nZCv IRQs off FIQs off Mode FIQ_32 ISA ARM Segment user
>
> It looks like we are in FIQ mode and interrupts have been masked.
Indeed.

> > Control: 10c53c7d Table: 60004059 DAC: 00000015
> > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > Stack: (0xbf363e70 to 0xbf364000)
> > 3e60: bf0083c0 00000000 0000002f
> > 80230d04
> > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008
> > 00000000
> > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff 8005cd04
> > 0000002f
> > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8
> > 80008528
> > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180 00000000
> > 805baa00
> > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000 76e8e4d0
> > 80590080
> > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744 200f0113
> > ffffffff
> > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140 bf0079c0
> > 8058cc70
> > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008 00000000
> > 80023af0
> > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 00000000
> > 76dd3b44
> > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000 76f93428
> > 76f93428
> > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c 76f92008
> > 00000000
> > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff 9fffd821
> > 9fffdc21
> > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>] (gic_eoi_irq+0x0/0x4c)
>
> It certainly looks like we are going down the standard IRQ patch as you
> suggested. I'm not a Linux driver guy, but do you see any kind of activity
> (break points, printfs, ...) through your FIQ handler?

I am reaching 0xffff1224 which i believe is the fiq vector address on the vexpress?

Hmmm.... not sure. As you mentioned previously (and as seen in the above register dump), I would expect offset 0x1240 (pc=0xffff1240) for an FIQ. I'm not sure what is at offset 0x1224, but on my Linux kernel it appears that offset 0x1220 is vector_addrexcptn (not pabort), that happens to occupy the HYP trap vector.

> > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > ---[ end trace 3dc3571209a017e1 ]---
> > Kernel panic - not syncing: Fatal exception in interrupt
>
> It is hard to determine entirely what is happening here based on this
> info. I do have code of my own that routes KGDB interrupts as FIQs and
> with the workaround I see the FIQs handled as expected. Some things we can
> try to get more info in hopes of pinpointing where to look:
>
> 1. At the top of hw/intc/arm_gic.c there is the following commented out
> line:
> //#define DEBUG_GIC
> Uncomment the line, rebuild and rerun. This will give us some trace on
> what is going through the GIC code.
I have commented out some debug lines but i see:
Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at hw/intc/arm_gic.c:120
120 DPRINTF("Raised pending FIQ %d (cpu %d)\n", best_irq, cpu);

With the expected irq nr. 49 (32+17).

> 2. Run qemu with the "-d int" option which will print a message on each
> interrupt. We should see an FIQ at some point if they are occurring. The
> only issue is that there will be numerous IRQs, so you'll have to parse
> through them to find an "exception 6 [FIQ].
Here is the relevant output when the FIQ hits:
Taking exception 2 [SVC]
Taking exception 2 [SVC]
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]

This looks to me like the GIC has caught the interrupt and communicated it to the CPU causing it to take the FIQ exception.

pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer

Is pml your test driver? It looks like it initiates the interrupt and possibly performs some handling following it?

Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x80221d70
Taking exception 4 [Data Abort]
...with DFSR 0x805 DFAR 0x805c604c
Taking exception 4 [Data Abort]
...with DFSR 0x805 DFAR 0x805c604c
Taking exception 4 [Data Abort]

So the fiq is hitting but unfortunatly i have no idea where the data aborts are coming from.

The data aborts are likely a side effect of the prefetch abort taken before them; it is the interesting one.

I have shifted all other Irqs besides 49 to group 1 so that only irq 49 is a FIQ.
Might it be that i am seeing some secure violations...
The address of the IFAR __idr_pre_get which lives in the linux kernel in lib/idr.c seems to
be implementing ann integer ID management.

> 3. If you set a breakpoint in your driver, is it possible to see that
> FIQs are on from the kernel debugger. Clearly you have to try this from
> a path where interrupts are masked. I see the following on my system
> mentioned above:
> ...
> Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
> ...
So you mean by debugging via the qemu debug port? I have not enabled the kgdb.
As stated above, i was not able to catch the fiq irq there. But it might be that i get

I have debugged qemu to see if the irq is routed correctly. The depeest call i could find is this: bt
#0 tcg_handle_interrupt (cpu=0x555556450790, mask=16) at /home/sander/speedy/soc/qemu/translate-all.c:1503
#1 0x0000555555755323 in cpu_interrupt (cpu=0x555556450790, mask=16)
at /home/sander/speedy/soc/qemu/include/qom/cpu.h:556
#2 0x00005555557561b7 in arm_cpu_set_irq (opaque=0x555556450790, irq=1, level=1)
at /home/sander/speedy/soc/qemu/target-arm/cpu.c:261
#3 0x00005555558193ec in qemu_set_irq (irq=0x55555642c840, level=1) at hw/core/irq.c:43
#4 0x0000555555879073 in gic_update_with_grouping (s=0x5555564dba80) at hw/intc/arm_gic.c:132
#5 0x000055555587936d in gic_update (s=0x5555564dba80) at hw/intc/arm_gic.c:180
#6 0x00005555558798a7 in gic_set_irq (opaque=0x5555564dba80, irq=49, level=1) at hw/intc/arm_gic.c:264
#7 0x00005555558193ec in qemu_set_irq (irq=0x555556432b00, level=1) at hw/core/irq.c:43
#8 0x0000555555661d4d in a9mp_priv_set_irq (opaque=0x5555564d7260, irq=17, level=1)
at /home/sander/speedy/soc/qemu/hw/cpu/a9mpcore.c:17
#9 0x00005555558193ec in qemu_set_irq (irq=0x5555564f3c00, level=1) at hw/core/irq.c:43
#10 0x00005555558f6fed in qemu_irq_raise (irq=0x5555564f3c00) at /home/sander/speedy/soc/qemu/include/hw/irq.h:16
#11 0x00005555558f7363 in pml_timer_tick (opaque=0x555556595020) at hw/timer/pml.c:95
#12 0x000055555599be6e in aio_bh_poll (ctx=0x5555563fdad0) at async.c:82
#13 0x00005555559b2d9f in aio_dispatch (ctx=0x5555563fdad0) at aio-posix.c:137
#14 0x000055555599c2cb in aio_ctx_dispatch (source=0x5555563fdad0, callback=0x0, user_data=0x0) at async.c:221
#15 0x00007ffff7901e04 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#16 0x00005555559b0a79 in glib_pollfds_poll () at main-loop.c:200
#17 0x00005555559b0b7a in os_host_main_loop_wait (timeout=0) at main-loop.c:245
#18 0x00005555559b0c52 in main_loop_wait (nonblocking=1) at main-loop.c:494
#19 0x0000555555791d8b in main_loop () at vl.c:1872
#20 0x00005555557998d5 in main (argc=22, argv=0x7fffffffda38, envp=0x7fffffffdaf0) at vl.c:4348

I am not sure if arm_cpu_set_irq(opaque=0x555556450790, irq=1, level=1) represents a fiq
and if mask 16 is the correct mask for the fiq request.

Yeah this routine handles both IRQs and FIQs. I don't see anything above that stands out as suspicious. It may be interesting to try the same test driver on an A15 emulation if it is not too much trouble. This would rule out the A9 workaround not being sufficient for being GICv2.

Row #6 show clearly that irq 49 configured to Group 0 is triggered. All other interrupt are configured to Group 1
from my Linux kernel. The call to #4 gic_update_with_grouping shows that grouping within the GIC is enabled
and that irq is triggered as FIQ within qemu. All of this looks good as far as i understand. So i am pretty confident
that qemu is working correctly (minus the Prefetch and Data Aborts).

I agree that QEMU appears to be handling the FIQ properly and it appears that the CPU is trying to dispatch it. I understand that the Linux FIQ handling is a little trickier than IRQs, so I suspect that either something in the Linux kernel handling or your driver is going awry during handling or as a result of the FIQ.

Let me know if you need any additional help or you discover any misbehavior.

Best regards
Tim

Regards,

Greg

From:	Greg Bellows
Subject:	Re: [Qemu-devel] State of ARM FIQ in Qemu
Date:	Wed, 12 Nov 2014 10:00:03 -0600