Re: [PATCH 0/6] Add debug interface to kick/call on purpose

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

From:	Jason Wang
Subject:	Re: [PATCH 0/6] Add debug interface to kick/call on purpose
Date:	Mon, 29 Mar 2021 11:56:57 +0800
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:78.0) Gecko/20100101 Thunderbird/78.8.1


在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx ffff8f67f6bbe700
struct eventfd_ctx {
    kref = {
      refcount = {
        refs = {
          counter = 4
        }
      }
    },
    wqh = {
      lock = {
        {
          rlock = {
            raw_lock = {
              val = {
                counter = 0
              }
            }
          }
        }
      },
      head = {
        next = 0xffff8f841dc08e18,
        prev = 0xffff8f841dc08e18
      }
    },
    count = 15, ---> eventfd is 15 !!!
    flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
    "arguments": { "dev": "/machine/peripheral/vscsi0",
                   "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.



So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

Consider we want to debug virtio issue, only 2) or 3) is what we reallycared.


So for kick you did (assume vhost is on):

virtio_device_event_kick()
    virtio_queue_notify()
        event_notifier_set()

It looks to me you're actaully testing if ioeventfd is correctly set byQemu.


For call you did:

virtio_device_event_call()
    event_notifier_set()

A test of irqfd is correctly set by Qemu. So all of those are not virtiospecific stuffs but you introduce virtio specific command to do debugnon virtio functions.

In the case of what you mentioned for vring_need_event(), what we reallywant is to dump the virtqueue state from the guest. This might requriessome work of extending virtio spec (e.g to dump device status likeindices, event, wrap counters).

This was initially proposed for vhost only and I was going to export
ioeventfd/irqfd from vhost to admin via sysfs. Finally, I realized I would
better implement this at QEMU.

The QEMU inits the eventfd (ioeventfd and irqfd), and offloads them to
KVM/vhost. The VM side sends requests to ring buffer and kicks the doorbell (via
ioeventfd), while the backend vhost side sends responses back and calls the IRQ
(via ioeventfd).

Unfortunately, sometimes there is issue with virtio/vhost so that kick/call was
missed/ignored, or even never triggered. The example mentioned in the patchset
cover letter is with virtio-net (I assume vhost=on), where a kick to ioventfd
was ignored, due to pci-bridge/hotplug issue.

So this is not a good example since it was a chipset bug. You need touse other tool to debug chipset code isn't it?

The hotplug is with a very small window but the IO hangs permanently. I did test
that kicking the doorbell again will help recover the IO, so that I would be
able to conclude this was due to lost of kick.

The loss of irq/doorbell is painful especially in production environment where
we are not able to attach to QEMU via gdb. While the patchset is only for QEMU,
Xen PV driver used to experience loss of IRQ issue as well, e.g., linux kernel
commit 4704fe4f03a5 ("xen/events: mask events when changing their VCPU 
binding").

So looking at the git history we can see it has little possibility thatthe missing is due to virtio/vhost itself. So the commit you mentionhere is not good as well since it's not a netfront/netbackend bug.


So for the case of event call, what you did is:

satic void virtio_device_event_call(VirtQueue *vq, bool eventfd,
                                     Error **errp)
{
#ifdef DEBUG_VIRTIO_EVENT

printf("The 'call' event is triggered for path=%s, queue=%d,irqfd=%d.\n",

           object_get_canonical_path(OBJECT(vq->vdev)),
           vq->queue_index, eventfd);
#endif

    if (eventfd) {
        virtio_set_isr(vq->vdev, 0x1);
        event_notifier_set(&vq->guest_notifier);
    } else {
        virtio_irq(vq);
    }
}

This means, when eventfd is set, you bypasses the MSI mask which is verydangerous to make it used in the case of production environment. And ifyou check masking, it won't help a lot if the MSI is masked wrongly.

This can help "narrow down" if the IO/networking hang is due to loss of
IRQ/doorbell issue (or VM MSI-x is erroneously masked), especially in production
env. This can also be used as a workaround so that VM owner will not need to
reboot VM.

So having such extra workaround is pretty dangerous in productionenvironemnt where I think we need to be conservative which means we needto collect information instead of generating artificial event.

And it doesn't help if the wrokaround can be triggered throughmanagement API.


In addition, the VFIO will benefit from it. We will be able to test if to inject
IRQ on purpose helps when the driver (e.g., PCI, NVMe, Ethernet) developers
blame the hang is caused by loss of IRQ with KVM/VFIO.(It seems there is more
chance to loose IRQ during CPU hotplug or changing IRQ affinity).

I think we could not gain much for introducing an dedicated mechanism for such a
corner case.

As replied by Dave for prior RFC, the QEMU already supports hmp_ioport_write to
trigger an ioport write on purpose.

If that applies. I would rather have a hmp_mem_write then we can testboth MSI and doorbell. But again, they are very dangerous to be used inproduction envronment.


The linux block layer also supports the below to kick the IO queue on purpose.

echo "kick" > /sys/kernel/debug/block/sda/state

This might be fine for hardware device but not virtio. The device canchoose to poll the virtqueue instead of depending of the doorbell towork. And for networking subsystem, we don't have such stuffs, insteadethtool support to dump ring and vendor specific stuffs which could beused for dumping virtqueue state in this case.


Thanks


Dongli Zhang

Thanks

crash> eventfd_ctx ffff8f67f6bbe700
struct eventfd_ctx {
    kref = {
      refcount = {
        refs = {
          counter = 4
        }
      }
    },
    wqh = {
      lock = {
        {
          rlock = {
            raw_lock = {
              val = {
                counter = 0
              }
            }
          }
        }
      },
      head = {
        next = 0xffff8f841dc08e18,
        prev = 0xffff8f841dc08e18
      }
    },
    count = 16, ---> eventfd incremented to 16 !!!
    flags = 526336
}


Original RFC link:

https://urldefense.com/v3/__https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5UvUJ86I$

Changed since RFC:
    - add support for more virtio/vhost pci devices
    - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this
      mischeivous command had been used
    - fix grammer error (s/lost/loss/)
    - change version to 6.1
    - fix incorrect example in qapi/qdev.json
    - manage event types with enum/array, instead of hard coding


Dongli Zhang (6):
     qdev: introduce qapi/hmp command for kick/call event
     virtio: introduce helper function for kick/call device event
     virtio-blk-pci: implement device event interface for kick/call
     virtio-scsi-pci: implement device event interface for kick/call
     vhost-scsi-pci: implement device event interface for kick/call
     virtio-net-pci: implement device event interface for kick/call

   hmp-commands.hx                 | 14 ++++++++
   hw/block/virtio-blk.c           |  9 +++++
   hw/net/virtio-net.c             |  9 +++++
   hw/scsi/vhost-scsi.c            |  6 ++++
   hw/scsi/virtio-scsi.c           |  9 +++++
   hw/virtio/vhost-scsi-pci.c      | 10 ++++++
   hw/virtio/virtio-blk-pci.c      | 10 ++++++
   hw/virtio/virtio-net-pci.c      | 10 ++++++
   hw/virtio/virtio-scsi-pci.c     | 10 ++++++
   hw/virtio/virtio.c              | 64 ++++++++++++++++++++++++++++++++++++
   include/hw/qdev-core.h          |  9 +++++
   include/hw/virtio/vhost-scsi.h  |  3 ++
   include/hw/virtio/virtio-blk.h  |  2 ++
   include/hw/virtio/virtio-net.h  |  3 ++
   include/hw/virtio/virtio-scsi.h |  3 ++
   include/hw/virtio/virtio.h      |  3 ++
   include/monitor/hmp.h           |  1 +
   qapi/qdev.json                  | 30 +++++++++++++++++
   softmmu/qdev-monitor.c          | 56 +++++++++++++++++++++++++++++++
   19 files changed, 261 insertions(+)


I did tests with below cases.

- virtio-blk-pci (ioeventfd on/off, iothread, live migration)
- virtio-scsi-pci (ioeventfd on/off)
- vhost-scsi-pci
- virtio-net-pci (ioeventfd on/off, vhost)

Thank you very much!

Dongli Zhang

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 0/6] Add debug interface to kick/call on purpose, Dongli Zhang, 2021/03/26
- [PATCH 6/6] virtio-net-pci: implement device event interface for kick/call, Dongli Zhang, 2021/03/26
- [PATCH 2/6] virtio: introduce helper function for kick/call device event, Dongli Zhang, 2021/03/26
- [PATCH 4/6] virtio-scsi-pci: implement device event interface for kick/call, Dongli Zhang, 2021/03/26
- [PATCH 3/6] virtio-blk-pci: implement device event interface for kick/call, Dongli Zhang, 2021/03/26
- [PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event, Dongli Zhang, 2021/03/26
- [PATCH 5/6] vhost-scsi-pci: implement device event interface for kick/call, Dongli Zhang, 2021/03/26
- Re: [PATCH 0/6] Add debug interface to kick/call on purpose, Jason Wang, 2021/03/26
  - Re: [PATCH 0/6] Add debug interface to kick/call on purpose, Dongli Zhang, 2021/03/26
    - Re: [PATCH 0/6] Add debug interface to kick/call on purpose, Jason Wang <=
    - Re: [PATCH 0/6] Add debug interface to kick/call on purpose, Dongli Zhang, 2021/03/30

Prev by Date: [PATCH V5] target/riscv: Align the data type of reset vector address
Next by Date: Re: [PATCH] tap-bsd: Remove special casing for older OpenBSD releases
Previous by thread: Re: [PATCH 0/6] Add debug interface to kick/call on purpose
Next by thread: Re: [PATCH 0/6] Add debug interface to kick/call on purpose
Index(es):
- Date
- Thread