[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -
From: |
Mark Rutland |
Subject: |
Re: [Qemu-devel] Kernel boot regression with PAuth and aarch64-softmmu -cpu max and el2 enabled |
Date: |
Tue, 29 Jan 2019 11:46:09 +0000 |
User-agent: |
Mutt/1.11.1+11 (2f07cb52) (2018-12-01) |
Hi,
[adding Kristina, who is in charge of Linux pointer authentication]
On Tue, Jan 29, 2019 at 11:08:19AM +0000, Alex Bennée wrote:
> Hi,
>
> Following up on yesterday's discussion on IRC I thought I'd better
> report on my findings in the permanent record so things don't get lost.
>
> As I tend to periodically rebuild my test kernels from the current
> state of linux.git I occasionally run into these things. My test
> invocation is:
>
> qemu-system-aarch64 -machine type=virt,virtualization=on \
> -display none -m 4096 -serial mon:stdio \
> -kernel
> ../../kernel-v8-plain.build/arch/arm64/boot/Image \
> -append 'console=ttyAMA0 panic=-1' -no-reboot -cpu max
>
> The kernel is essentially a defconfig kernel with a bunch of the VIRTIO
> device drivers built-in for when I actually boot a more complex setup
> with disks and drives. However this is a boot test so doesn't really
> matter.
>
> The -machine type=virt,virtualization=on enables our virt machine model
> with EL2 turned on. As there is no BIOS involved the kernel is invoked
> directly at EL2.
>
> The -cpu max enabled a cortex-a57 + whatever extra features we've
> enabled in QEMU so far. It won't match any "real" CPU but it should be
> architecturally correct in so far we implement prerequisite features for
> any given feature. The cpuid feature bits should also be correct as we
> test them internally in QEMU to enable features.
Just to check, does this enable VHE?
> The breakage is the kernel never boots (no output on serial port) and on
> attaching with gdb I found it stuck in:
>
> (gdb) bt
> #0 0xffffff8010a9e480 in overflow_stack ()
> Backtrace stopped: not enough registers or memory available to unwind
> further
>
> If I turn on exception tracing it looks like we go into an exception
> loop.
As mentioned on IRC, this looks very odd, since overflow_stack is a data
pointer, not code. I can't presently see how we could branch here.
If you pass the kernel 'earlycon keep_bootcon', do you get any output?
> On the QEMU side this breakage comes in at:
>
> commit 1ce32e47db52e3511132c7104770eae65d412144 (HEAD, refs/bisect/bad)
> Author: Richard Henderson <address@hidden>
> Date: Mon Jan 21 10:23:13 2019 +0000
>
> target/arm: Enable PAuth for -cpu max
>
> Reviewed-by: Peter Maydell <address@hidden>
> Signed-off-by: Richard Henderson <address@hidden>
> Message-id: address@hidden
> Signed-off-by: Peter Maydell <address@hidden>
>
> and as you would expect the system boots fine with -cpu cortex-a57
>
> On the kernel side it breaks at:
>
> commit 04ca3204fa09f5f55c8f113b0072004a7b364ff4
> Author: Mark Rutland <address@hidden>
> Date: Fri Dec 7 18:39:30 2018 +0000
>
> arm64: enable pointer authentication
>
> Now that all the necessary bits are in place for userspace, add the
> necessary Kconfig logic to allow this to be enabled.
>
> Signed-off-by: Mark Rutland <address@hidden>
> Signed-off-by: Kristina Martsenko <address@hidden>
> Cc: Catalin Marinas <address@hidden>
> Cc: Will Deacon <address@hidden>
> Signed-off-by: Will Deacon <address@hidden>
>
> So predictably we failed at enabling PAuth somewhere between the kernel
> and QEMU.
>
> I'm guessing the kernel so far has been tested on the fast model with a
> full chain of TF, UEFI and kernel?
The kernel has been tested on a fast model with the Linux bootwrapper:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/
Kristina, could you confirm whether or not it's been tested with
ATF+UEFI?
> I think Richard's tests were without EL2 enabled.
>
> So in the case that the kernel boots in EL2 is it expecting anyone else
> to deal with Pauth exceptions or should it be able to cope with an
> enabled Pauth but no firmware underneath it?
So long as the highest implemented exception level is EL2, the kernel
should handle that itself. During boot we'll configure HCR_EL2.{API,APK}
in el2_setup().
>From that point onwards, there should be no traps for pointer
authentication functionality from EL1, AFAICT.
> Either we've got something wrong or we'll need to rethink what features
> the user can have enabled by -cpu max on a direct kernel boot.
It's not immediately clear to me when precisely things are going wrong,
so I think we need to narrow that down first. For example, it's not
clear whether a trap is being taken, or something is unexpectedly
behaving is UNDEF.
Is it possible to watch the exception vectors to see if/when an
exception is taken, and from where?
Thanks,
Mark.