Re: [PATCH 00/41] arm: Implement GICv4

From: Peter Maydell
Subject: Re: [PATCH 00/41] arm: Implement GICv4
Date: Fri, 8 Apr 2022 15:29:56 +0100

On Fri, 8 Apr 2022 at 15:15, Peter Maydell <peter.maydell@linaro.org> wrote:
> This patchset implements emulation of GICv4 in our TCG GIC and ITS
> models, and makes the virt board use it where appropriate.

> Tested with a Linux kernel passing through a virtio-blk device
> to an inner Linux VM with KVM/QEMU. (NB that to get the outer
> Linux kernel to actually use the new GICv4 functionality you
> need to pass it "kvm-arm.vgic_v4_enable=1", as the kernel
> will not use it by default.)

I guess I might as well post my notes here about how I set up
that test environment. These are a bit too scrappy (and rather
specific about a niche thing) to be proper documentation, but
having them in the list archives might be helpful in future...

How to set up an environment to test QEMU's emulation of virtualization,
with PCI passthrough of a virtio-blk-pci device to the L2 guest

(1) Set up a Debian aarch64 guest (the instructions in the old
blog post
still work; I used Debian bullseye for my testing).

(2) Copy the hda.qcow2 to hda-for-inner.qcow2; run the L1 guest
using the 'runme' script.

Caution: the virtio devices need to be in this order (hda.qcow2,
network,hda-for-inner.qcow2), because systemd in the guest names
the ethernet interface based on which PCI slot it goes in.

(3) In the L1 guest, first we need to fix up the hda-for-inner.qcow2
so that it has different UUIDs and partition UUIDs from hda.qcow2.
You'll need to make sure you have the blkid, gdisk, tune2fs, swaplabel
utilities installed in the guest.

 swapoff -a   # L1 guest might have swapped onto /dev/vdb2 by accident
 # print current partition IDs; you'll see that vda and vdb currently
 # share IDs for their partitions, and we must change those for vdb
 # first change the PARTUUIDs with gdisk; this is the answer from
 # https://askubuntu.com/questions/1250224/how-to-change-partuuid
 gdisk /dev/vdb
 x   # change to experts menu
 c   # change partition ID
 1   # for partition 1
 R   # pick a random ID
 c   # ditto for partitions 2, 3
 m   # back to main menu
 w   # write partition table
 q   # quit
 # change UUIDs; from
 tune2fs -U random /dev/vdb1
 tune2fs -U random /dev/vdb2
 swaplabel -U $(uuidgen) /dev/vdb3
 # Check the UUIDs and PARTUUIDs are now all changed:
 # Now update the fstab in the L2 filesystem:
 mount /dev/vdb2 /mnt
 # Finally, edit /mnt/etc/fstab to set the UUID values for /, /boot and swap to
 # the new ones for /dev/vdb's partitions
 vi /mnt/etc/fstab # or editor of your choice
 umount /mnt
 # shutdown the L1 guest now, to ensure that all the changes to that
 # qcow2 file are committed
 shutdown -h now

(4) Copy necessary files into the L1 guest's filesystem;
you can run the L1 guest and run scp there to copy from your host machine,
or any other method you like. You'll need:
 - the vmlinuz (same one being used for L1)
 - the initrd
 - some scripts [runme-inner, runme-inner-nopassthru, reassign-vdb]
 - a copy of hda-for-inner.qcow2 (probably best to copy it to a temporary
   file while the L1 guest is not running, then copy that into the guest)
 - the qemu-system-aarch64 you want to use as the L2 QEMU
   (I cross-compiled this on my x86-64 host. The packaged Debian bullseye
   qemu-system-aarch64 will also work if you don't need to use a custom
   QEMU for L2.)

(5) Now you can run the L2 guest without using PCI passthrough like this:
 ./runme-inner-nopassthru ./qemu-system-aarch64

(6) And you can run the L2 guest with PCI passthrough like this:
 # you only need to run reassign-vdb once for any given run of the
 # L1 guest, to give the PCI device to vfio-pci rather than to the
 # L1 virtio driver. After that you can run the L2 QEMU multiple times.
 ./runme-inner ./qemu-system-aarch64


I have set up the various 'runme' scripts so that L1 has a mux of
stdio and the monitor, which means that you can kill it with ^A-x,
and ^C will be delivered to the L1 guest. The L2 guest has plain
'-serial stdio', which means that ^C will kill the L2 guest.

The 'runme' scripts expect their first argument to be the path to
the QEMU you want to run; any further arguments are extra arguments
to that QEMU. So you can do things like:

   # pass more arguments to QEMU, here disabling the ITS
   ./runme ~/qemu-system-aarch64 -machine its=off
   # run gdb, and run QEMU under gdb
   ./runme gdb --args ~/qemu-system-aarch64 -machine its=off

The 'runme' scripts should be in the same directory as the
kernel etc files they go with; but you don't need to be
in that directory to run them.

#!/bin/sh -e
TESTDIR="$(cd "$(dirname "$0")"; pwd)"

# Run with GICv3 and the disk image with a nested copy in it
# (for testing EL2/GICv3-virt emulation)

: ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64}
: ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64}
: ${DISK:=$TESTDIR/hda.qcow2}
: ${INNERDISK:=$TESTDIR/hda-for-inner.qcow2}

# Note that the virtio-net-pci must be the 2nd PCI device,
# because otherwise the network interface name it gets will
# not match /etc/network/interfaces.

# set up with -serial mon:stdio so we can ^C the inner QEMU


${QEMU} \
  -cpu cortex-a57 \
  -machine type=virt \
  -machine gic-version=max \
  -machine virtualization=true \
  -machine iommu=smmuv3 \
  -m 1024M \
  -kernel "${KERNEL}" -initrd "${INITRD}" \
  -drive if=none,id=mydrive,file="${DISK}",format=qcow2 \
  -device virtio-blk-pci,drive=mydrive \
  -netdev user,id=mynet \
  -device virtio-net-pci,netdev=mynet \
  -drive if=none,id=innerdrive,file="${INNERDISK}",format=qcow2 \
  -device virtio-blk-pci,drive=innerdrive"$IOMMU_ADDON" \
  -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2
kvm-arm.vgic_v4_enable=1' \
  -mon chardev=monitor,mode=readline \
  -display none -serial mon:stdio

#!/bin/sh -e
# Script to detach the /dev/vdb PCI device from the virtio-blk driver
# and hand it to vfio-pci


echo -n "$PCIDEV" > /sys/bus/pci/drivers/virtio-pci/unbind
modprobe vfio-pci

echo vfio-pci > /sys/bus/pci/devices/"$PCIDEV"/driver_override

echo -n "$PCIDEV" > /sys/bus/pci/drivers/vfio-pci/bind

#!/bin/sh -e
TESTDIR="$(cd "$(dirname "$0")"; pwd)"

# run the inner guest, passing it the passthrough PCI device
: ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64}
: ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64}

# set up with -serial stdio so we can ^C the inner QEMU
# use -net none to work around the default virtio-net-pci
# network device wanting to load efi-virtio.rom, which the
# L1 guest's debian package puts somewhere other than where
# our locally compiled qemu-system-aarch64 wants to find it.

${QEMU} \
  -cpu cortex-a57 \
  -enable-kvm \
  -machine type=virt \
  -machine gic-version=3 \
  -m 256M \
  -kernel "${KERNEL}" -initrd "${INITRD}" \
  -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2' \
  -display none -serial stdio \
  -device vfio-pci,host=0000:00:03.0,id=pci0 \
  -net none

#!/bin/sh -e
TESTDIR="$(cd "$(dirname "$0")"; pwd)"

# run the inner guest, passing it a disk image
: ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64}
: ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64}
: ${DISK:=$TESTDIR/hda-for-inner.qcow2}

# set up with -serial stdio so we can ^C the inner QEMU
# use -net none to work around the default virtio-net-pci
# network device wanting to load efi-virtio.rom, which the
# L1 guest's debian package puts somewhere other than where
# our locally compiled qemu-system-aarch64 wants to find it.

${QEMU} \
  -cpu cortex-a57 \
  -enable-kvm \
  -machine type=virt \
  -machine gic-version=3 \
  -m 256M \
  -kernel "${KERNEL}" -initrd "${INITRD}" \
  -drive if=none,id=mydrive,file="${DISK}",format=qcow2 \
  -device virtio-blk-pci,drive=mydrive \
  -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2' \
  -display none -serial stdio \
  -net none

-- PMM

