[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device s
From: |
Ilya Maximets |
Subject: |
Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop |
Date: |
Thu, 14 Dec 2017 10:06:43 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 |
On 13.12.2017 22:48, Michael S. Tsirkin wrote:
> On Wed, Dec 13, 2017 at 04:45:20PM +0300, Ilya Maximets wrote:
>>>> That
>>>> looks very strange. Some of the functions gets 'old_status', others
>>>> the 'new_status'. I'm a bit confused.
>>>
>>> OK, fair enough. Fixed - let's pass old status everywhere,
>>> users that need the new one can get it from the vdev.
>>>
>>>> And it's not functional in current state:
>>>>
>>>> hw/net/virtio-net.c:264:28: error: ‘status’ undeclared
>>>
>>> Fixed too. new version below.
>>
>> This doesn't fix the segmentation fault.
>
> Hmm you are right. Looking into it.
>
>> I have exactly same crash stacktrace:
>>
>> #0 vhost_memory_unmap hw/virtio/vhost.c:446
>> #1 vhost_virtqueue_stop hw/virtio/vhost.c:1155
>> #2 vhost_dev_stop hw/virtio/vhost.c:1594
>> #3 vhost_net_stop_one hw/net/vhost_net.c:289
>> #4 vhost_net_stop hw/net/vhost_net.c:368
>> #5 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at
>> hw/net/virtio-net.c:180
>> #6 virtio_net_set_status (vdev=0x5625f3901100, old_status=<optimized out>)
>> at hw/net/virtio-net.c:254
>> #7 virtio_set_status (address@hidden, val=<optimized out>) at
>> hw/virtio/virtio.c:1152
>> #8 virtio_error (vdev=0x5625f3901100, address@hidden "Guest says index %u
>> is available") at hw/virtio/virtio.c:2460
>
> BTW what is causing this? Why is guest avail index corrupted?
My testing environment for the issue:
* QEMU 2.10.1
* testpmd from a bit outdated DPDK 16.07.0 in guest with uio_pci_generic
* OVS 2.8 with DPDK 17.05.2 on the host.
* 2 vhost-user ports in VM in server mode.
* Testpmd just forwards packets from one port to another with --forward-mode=mac
testpmd with virtio-net driver sometimes crashes after killing the OVS if some
heavy traffic flows. After restarting of the OVS and stopping it again QEMU
crashes on vhost disconnect and virtqueue_get_head() failure.
So, the sequence is follows:
1. Start OVS, Start QEMU, start testpmd, start external packet generator.
2. pkill -11 ovs-vswitchd
3. If testpmd not crashed in guest goto step #1.
4. Start OVS (testpmd with virtio driver still in down state)
5. pkill -11 ovs-vswitchd
6. Observe QEMU crash
I suspect that virtio-net driver from DPDK 16.07.0 corrupts vrings
just before crash at step #2.
I didn't tried actually to investigate virtio driver crash because it's a bit
out of my scope and I have no enough time and a driver slightly outdated.
But the stability of QEMU itself is really important thing.
One interesting thing is that I can not reproduce virtio driver crash with
"virtio: rework set_status callbacks" applied. I had to break vrings manually
to reproduce original nested call of vhost_net_stop().
>
>> #9 virtqueue_get_head at hw/virtio/virtio.c:543
>> #10 virtqueue_drop_all hw/virtio/virtio.c:984
>> #11 virtio_net_drop_tx_queue_data hw/net/virtio-net.c:240
>> #12 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>> #13 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
>> #14 vhost_net_stop_one at hw/net/vhost_net.c:290
>> #15 vhost_net_stop at hw/net/vhost_net.c:368
>> #16 virtio_net_vhost_status (old_status=15 '\017', n=0x5625f3901100) at
>> hw/net/virtio-net.c:180
>> #17 virtio_net_set_status (vdev=0x5625f3901100, old_status=<optimized out>)
>> at hw/net/virtio-net.c:254
>> #18 qmp_set_link at net/net.c:1430
>> #19 chr_closed_bh at net/vhost-user.c:214
>> #20 aio_bh_call at util/async.c:90
>> #21 aio_bh_poll at util/async.c:118
>> #22 aio_dispatch at util/aio-posix.c:429
>> #23 aio_ctx_dispatch at util/async.c:261
>> #24 g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>> #25 glib_pollfds_poll () at util/main-loop.c:213
>> #26 os_host_main_loop_wait at util/main-loop.c:261
>> #27 main_loop_wait at util/main-loop.c:515
>> #28 main_loop () at vl.c:1917
>> #29 main
>>
>>
>> Actually the logic doesn't change. In function virtio_net_vhost_status():
>>
>> - if ((virtio_net_started(n, status) && !nc->peer->link_down) ==
>> + if ((virtio_net_started(n, vdev->status) && !nc->peer->link_down) ==
>> !!n->vhost_started) {
>> return;
>> }
>>
>> previously new 'status' was checked and the new 'vdev->status' checked now.
>> It's the same condition that doesn't work because vhost_started flag is still
>> set to 1.
>> Anyway, nc->peer->link_down is true in our case, so there is no difference if
>> we'll change the vdev->status.
>>
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <address@hidden>
>>>
>>> Still completely untested, sorry about that - hope you can help here.
>>>
>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>>> index 098bdaa..f5d0ee1 100644
>>> --- a/include/hw/virtio/virtio.h
>>> +++ b/include/hw/virtio/virtio.h
>>> @@ -115,7 +115,7 @@ typedef struct VirtioDeviceClass {
>>> void (*get_config)(VirtIODevice *vdev, uint8_t *config);
>>> void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
>>> void (*reset)(VirtIODevice *vdev);
>>> - void (*set_status)(VirtIODevice *vdev, uint8_t val);
>>> + void (*set_status)(VirtIODevice *vdev, uint8_t old_status);
>>> /* For transitional devices, this is a bitmap of features
>>> * that are only exposed on the legacy interface but not
>>> * the modern one.
>>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>>> index 05d1440..b8b07ba 100644
>>> --- a/hw/block/virtio-blk.c
>>> +++ b/hw/block/virtio-blk.c
>>> @@ -814,15 +814,15 @@ static uint64_t virtio_blk_get_features(VirtIODevice
>>> *vdev, uint64_t features,
>>> return features;
>>> }
>>>
>>> -static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
>>> +static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t old_status)
>>> {
>>> VirtIOBlock *s = VIRTIO_BLK(vdev);
>>>
>>> - if (!(status & (VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK))) {
>>> + if (!(vdev->status & (VIRTIO_CONFIG_S_DRIVER |
>>> VIRTIO_CONFIG_S_DRIVER_OK))) {
>>> assert(!s->dataplane_started);
>>> }
>>>
>>> - if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>> + if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>> return;
>>> }
>>>
>>> diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
>>> index 9470bd7..881b1ff 100644
>>> --- a/hw/char/virtio-serial-bus.c
>>> +++ b/hw/char/virtio-serial-bus.c
>>> @@ -616,7 +616,7 @@ static void guest_reset(VirtIOSerial *vser)
>>> }
>>> }
>>>
>>> -static void set_status(VirtIODevice *vdev, uint8_t status)
>>> +static void set_status(VirtIODevice *vdev, uint8_t old_status)
>>> {
>>> VirtIOSerial *vser;
>>> VirtIOSerialPort *port;
>>> @@ -625,7 +625,7 @@ static void set_status(VirtIODevice *vdev, uint8_t
>>> status)
>>> port = find_port_by_id(vser, 0);
>>>
>>> if (port && !use_multiport(port->vser)
>>> - && (status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>> + && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>> /*
>>> * Non-multiport guests won't be able to tell us guest
>>> * open/close status. Such guests can only have a port at id
>>> @@ -634,7 +634,7 @@ static void set_status(VirtIODevice *vdev, uint8_t
>>> status)
>>> */
>>> port->guest_connected = true;
>>> }
>>> - if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>> + if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>> guest_reset(vser);
>>> }
>>>
>>> diff --git a/hw/input/virtio-input.c b/hw/input/virtio-input.c
>>> index 0e42f0d..abb817b 100644
>>> --- a/hw/input/virtio-input.c
>>> +++ b/hw/input/virtio-input.c
>>> @@ -188,12 +188,12 @@ static uint64_t
>>> virtio_input_get_features(VirtIODevice *vdev, uint64_t f,
>>> return f;
>>> }
>>>
>>> -static void virtio_input_set_status(VirtIODevice *vdev, uint8_t val)
>>> +static void virtio_input_set_status(VirtIODevice *vdev, uint8_t old_val)
>>> {
>>> VirtIOInputClass *vic = VIRTIO_INPUT_GET_CLASS(vdev);
>>> VirtIOInput *vinput = VIRTIO_INPUT(vdev);
>>>
>>> - if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>>> + if (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>> if (!vinput->active) {
>>> vinput->active = true;
>>> if (vic->change_active) {
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 38674b0..7851968 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -124,7 +124,7 @@ static void virtio_net_announce_timer(void *opaque)
>>> virtio_notify_config(vdev);
>>> }
>>>
>>> -static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>> +static void virtio_net_vhost_status(VirtIONet *n, uint8_t old_status)
>>> {
>>> VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>> NetClientState *nc = qemu_get_queue(n->nic);
>>> @@ -134,7 +134,7 @@ static void virtio_net_vhost_status(VirtIONet *n,
>>> uint8_t status)
>>> return;
>>> }
>>>
>>> - if ((virtio_net_started(n, status) && !nc->peer->link_down) ==
>>> + if ((virtio_net_started(n, vdev->status) && !nc->peer->link_down) ==
>>> !!n->vhost_started) {
>>> return;
>>> }
>>> @@ -212,12 +212,12 @@ static bool virtio_net_set_vnet_endian(VirtIODevice
>>> *vdev, NetClientState *ncs,
>>> return false;
>>> }
>>>
>>> -static void virtio_net_vnet_endian_status(VirtIONet *n, uint8_t status)
>>> +static void virtio_net_vnet_endian_status(VirtIONet *n, uint8_t old_status)
>>> {
>>> VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>> int queues = n->multiqueue ? n->max_queues : 1;
>>>
>>> - if (virtio_net_started(n, status)) {
>>> + if (virtio_net_started(n, vdev->status)) {
>>> /* Before using the device, we tell the network backend about the
>>> * endianness to use when parsing vnet headers. If the backend
>>> * can't do it, we fallback onto fixing the headers in the core
>>> @@ -225,7 +225,7 @@ static void virtio_net_vnet_endian_status(VirtIONet *n,
>>> uint8_t status)
>>> */
>>> n->needs_vnet_hdr_swap = virtio_net_set_vnet_endian(vdev,
>>> n->nic->ncs,
>>> queues, true);
>>> - } else if (virtio_net_started(n, vdev->status)) {
>>> + } else if (virtio_net_started(n, old_status)) {
>>> /* After using the device, we need to reset the network backend to
>>> * the default (guest native endianness), otherwise the guest may
>>> * lose network connectivity if it is rebooted into a different
>>> @@ -243,15 +243,15 @@ static void
>>> virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
>>> }
>>> }
>>>
>>> -static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t
>>> status)
>>> +static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t
>>> old_status)
>>> {
>>> VirtIONet *n = VIRTIO_NET(vdev);
>>> VirtIONetQueue *q;
>>> int i;
>>> uint8_t queue_status;
>>>
>>> - virtio_net_vnet_endian_status(n, status);
>>> - virtio_net_vhost_status(n, status);
>>> + virtio_net_vnet_endian_status(n, old_status);
>>> + virtio_net_vhost_status(n, old_status);
>>>
>>> for (i = 0; i < n->max_queues; i++) {
>>> NetClientState *ncs = qemu_get_subqueue(n->nic, i);
>>> @@ -261,7 +261,7 @@ static void virtio_net_set_status(struct VirtIODevice
>>> *vdev, uint8_t status)
>>> if ((!n->multiqueue && i != 0) || i >= n->curr_queues) {
>>> queue_status = 0;
>>> } else {
>>> - queue_status = status;
>>> + queue_status = vdev->status;
>>> }
>>> queue_started =
>>> virtio_net_started(n, queue_status) && !n->vhost_started;
>>> @@ -2048,9 +2048,11 @@ static void virtio_net_device_unrealize(DeviceState
>>> *dev, Error **errp)
>>> VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>> VirtIONet *n = VIRTIO_NET(dev);
>>> int i, max_queues;
>>> + uint8_t old_status = vdev->status;
>>>
>>> /* This will stop vhost backend if appropriate. */
>>> - virtio_net_set_status(vdev, 0);
>>> + vdev->status = 0;
>>> + virtio_net_set_status(vdev, old_status);
>>>
>>> g_free(n->netclient_name);
>>> n->netclient_name = NULL;
>>> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
>>> index 9c1bea8..5a561e4 100644
>>> --- a/hw/scsi/vhost-scsi.c
>>> +++ b/hw/scsi/vhost-scsi.c
>>> @@ -108,11 +108,11 @@ static void vhost_scsi_stop(VHostSCSI *s)
>>> vhost_scsi_common_stop(vsc);
>>> }
>>>
>>> -static void vhost_scsi_set_status(VirtIODevice *vdev, uint8_t val)
>>> +static void vhost_scsi_set_status(VirtIODevice *vdev, uint8_t old_val)
>>> {
>>> VHostSCSI *s = VHOST_SCSI(vdev);
>>> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
>>> - bool start = (val & VIRTIO_CONFIG_S_DRIVER_OK);
>>> + bool start = vdev->status & VIRTIO_CONFIG_S_DRIVER_OK;
>>>
>>> if (vsc->dev.started == start) {
>>> return;
>>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>>> index f7561e2..289a27e 100644
>>> --- a/hw/scsi/vhost-user-scsi.c
>>> +++ b/hw/scsi/vhost-user-scsi.c
>>> @@ -38,11 +38,11 @@ static const int user_feature_bits[] = {
>>> VHOST_INVALID_FEATURE_BIT
>>> };
>>>
>>> -static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
>>> +static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t
>>> old_status)
>>> {
>>> VHostUserSCSI *s = (VHostUserSCSI *)vdev;
>>> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
>>> - bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
>>> + bool start = (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>>> vdev->vm_running;
>>>
>>> if (vsc->dev.started == start) {
>>> return;
>>> diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
>>> index 5ec1c6a..d3f445b 100644
>>> --- a/hw/virtio/vhost-vsock.c
>>> +++ b/hw/virtio/vhost-vsock.c
>>> @@ -154,10 +154,10 @@ static void vhost_vsock_stop(VirtIODevice *vdev)
>>> vhost_dev_disable_notifiers(&vsock->vhost_dev, vdev);
>>> }
>>>
>>> -static void vhost_vsock_set_status(VirtIODevice *vdev, uint8_t status)
>>> +static void vhost_vsock_set_status(VirtIODevice *vdev, uint8_t old_status)
>>> {
>>> VHostVSock *vsock = VHOST_VSOCK(vdev);
>>> - bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
>>> + bool should_start = vdev->status & VIRTIO_CONFIG_S_DRIVER_OK;
>>>
>>> if (!vdev->vm_running) {
>>> should_start = false;
>>> @@ -370,11 +370,14 @@ static void vhost_vsock_device_unrealize(DeviceState
>>> *dev, Error **errp)
>>> {
>>> VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>> VHostVSock *vsock = VHOST_VSOCK(dev);
>>> + uint8_t old_status;
>>>
>>> vhost_vsock_post_load_timer_cleanup(vsock);
>>>
>>> /* This will stop vhost backend if appropriate. */
>>> - vhost_vsock_set_status(vdev, 0);
>>> + old_status = vdev->status;
>>> + vdev->status = 0;
>>> + vhost_vsock_set_status(vdev, old_status);
>>>
>>> vhost_dev_cleanup(&vsock->vhost_dev);
>>> virtio_cleanup(vdev);
>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>>> index 37cde38..84e897a 100644
>>> --- a/hw/virtio/virtio-balloon.c
>>> +++ b/hw/virtio/virtio-balloon.c
>>> @@ -468,12 +468,12 @@ static void virtio_balloon_device_reset(VirtIODevice
>>> *vdev)
>>> }
>>> }
>>>
>>> -static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>>> +static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t
>>> old_status)
>>> {
>>> VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>>>
>>> if (!s->stats_vq_elem && vdev->vm_running &&
>>> - (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq,
>>> 1)) {
>>> + (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>>> virtqueue_rewind(s->svq, 1)) {
>>> /* poll stats queue for the element we have discarded when the VM
>>> * was stopped */
>>> virtio_balloon_receive_stats(vdev, s->svq);
>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>> index ad564b0..4981858 100644
>>> --- a/hw/virtio/virtio.c
>>> +++ b/hw/virtio/virtio.c
>>> @@ -1128,6 +1128,8 @@ static int virtio_validate_features(VirtIODevice
>>> *vdev)
>>> int virtio_set_status(VirtIODevice *vdev, uint8_t val)
>>> {
>>> VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>>> + uint8_t old_status;
>>> +
>>> trace_virtio_set_status(vdev, val);
>>>
>>> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>>> @@ -1140,10 +1142,13 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t
>>> val)
>>> }
>>> }
>>> }
>>> +
>>> + old_status = vdev->status;
>>> + vdev->status = val;
>>> +
>>> if (k->set_status) {
>>> - k->set_status(vdev, val);
>>> + k->set_status(vdev, old_status);
>>> }
>>> - vdev->status = val;
>>> return 0;
>>> }
>>>
>>>
>>>
>>>
>
>
>
- [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Ilya Maximets, 2017/12/06
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Michael S. Tsirkin, 2017/12/06
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Ilya Maximets, 2017/12/07
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Michael S. Tsirkin, 2017/12/07
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Michael S. Tsirkin, 2017/12/07
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Ilya Maximets, 2017/12/08
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Michael S. Tsirkin, 2017/12/10
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Ilya Maximets, 2017/12/13
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Michael S. Tsirkin, 2017/12/13
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop,
Ilya Maximets <=
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Maxime Coquelin, 2017/12/14
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Ilya Maximets, 2017/12/14
- Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop, Ilya Maximets, 2017/12/14