qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] libvhost-user: Start VQs on SET_VRING_CALL


From: Felipe Franciosi
Subject: Re: [Qemu-devel] [PATCH] libvhost-user: Start VQs on SET_VRING_CALL
Date: Tue, 17 Jan 2017 18:53:17 +0000

> On 17 Jan 2017, at 10:41, Michael S. Tsirkin <address@hidden> wrote:
> 
> On Fri, Jan 13, 2017 at 10:29:46PM +0000, Felipe Franciosi wrote:
>> 
>>> On 13 Jan 2017, at 10:18, Michael S. Tsirkin <address@hidden> wrote:
>>> 
>>> On Fri, Jan 13, 2017 at 05:15:22PM +0000, Felipe Franciosi wrote:
>>>> 
>>>>> On 13 Jan 2017, at 09:04, Michael S. Tsirkin <address@hidden> wrote:
>>>>> 
>>>>> On Fri, Jan 13, 2017 at 03:09:46PM +0000, Felipe Franciosi wrote:
>>>>>> Hi Marc-Andre,
>>>>>> 
>>>>>>> On 13 Jan 2017, at 07:03, Marc-André Lureau <address@hidden> wrote:
>>>>>>> 
>>>>>>> Hi
>>>>>>> 
>>>>>>> ----- Original Message -----
>>>>>>>> Currently, VQs are started as soon as a SET_VRING_KICK is received. 
>>>>>>>> That
>>>>>>>> is too early in the VQ setup process, as the backend might not yet have
>>>>>>> 
>>>>>>> I think we may want to reconsider queue_set_started(), move it 
>>>>>>> elsewhere, since kick/call fds aren't mandatory to process the rings.
>>>>>> 
>>>>>> Hmm. The fds aren't mandatory, but I imagine in that case we should 
>>>>>> still receive SET_VRING_KICK/CALL messages without an fd (ie. with the 
>>>>>> VHOST_MSG_VQ_NOFD_MASK flag set). Wouldn't that be the case?
>>>>> 
>>>>> Please look at docs/specs/vhost-user.txt, Starting and stopping rings
>>>>> 
>>>>> The spec says:
>>>>>   Client must start ring upon receiving a kick (that is, detecting that
>>>>>   file descriptor is readable) on the descriptor specified by
>>>>>   VHOST_USER_SET_VRING_KICK, and stop ring upon receiving
>>>>>   VHOST_USER_GET_VRING_BASE.
>>>> 
>>>> Yes I have seen the spec, but there is a race with the current 
>>>> libvhost-user code which needs attention. My initial proposal (which got 
>>>> turned down) was to send a spurious notification upon seeing a callfd. 
>>>> Then I came up with this proposal. See below.
>>>> 
>>>>> 
>>>>> 
>>>>>>> 
>>>>>>>> a callfd to notify in case it received a kick and fully processed the
>>>>>>>> request/command. This patch only starts a VQ when a SET_VRING_CALL is
>>>>>>>> received.
>>>>>>> 
>>>>>>> I don't like that much, as soon as the kick fd is received, it should 
>>>>>>> start polling it imho. callfd is optional, it may have one and not the 
>>>>>>> other.
>>>>>> 
>>>>>> So the question is whether we should be receiving a SET_VRING_CALL 
>>>>>> anyway or not, regardless of an fd being sent. (I think we do, but I 
>>>>>> haven't done extensive testing with other device types.)
>>>>> 
>>>>> I would say not, only KICK is mandatory and that is also not enough
>>>>> to process ring. You must wait for it to be readable.
>>>> 
>>>> The problem is that Qemu takes time between sending the kickfd and the 
>>>> callfd. Hence the race. Consider this scenario:
>>>> 
>>>> 1) Guest configures the device
>>>> 2) Guest put a request on a virtq
>>>> 3) Guest kicks
>>>> 4) Qemu starts configuring the backend
>>>> 4.a) Qemu sends the masked callfds
>>>> 4.b) Qemu sends the virtq sizes and addresses
>>>> 4.c) Qemu sends the kickfds
>>>> 
>>>> (When using MQ, Qemu will only send the callfd once all VQs are configured)
>>>> 
>>>> 5) The backend starts listening on the kickfd upon receiving it
>>>> 6) The backend picks up the guest's request
>>>> 7) The backend processes the request
>>>> 8) The backend puts the response on the used ring
>>>> 9) The backend notifies the masked callfd
>>>> 
>>>> 4.d) Qemu sends the callfds
>>>> 
>>>> At which point the guest missed the notification and gets stuck.
>>>> 
>>>> Perhaps you prefer my initial proposal of sending a spurious notification 
>>>> when the backend sees a callfd?
>>>> 
>>>> Felipe
>>> 
>>> I thought we read the masked callfd when we unmask it,
>>> and forward the interrupt. See kvm_irqfd_assign:
>>> 
>>>       /*
>>>        * Check if there was an event already pending on the eventfd
>>>        * before we registered, and trigger it as if we didn't miss it.
>>>        */
>>>       events = f.file->f_op->poll(f.file, &irqfd->pt);
>>> 
>>>       if (events & POLLIN)
>>>               schedule_work(&irqfd->inject);
>>> 
>>> 
>>> 
>>> Is this a problem you observe in practice?
>> 
>> Thanks for pointing out to this code; I wasn't aware of it.
>> 
>> Indeed I'm encountering it in practice. And I've checked that my kernel has 
>> the code above.
>> 
>> Starts to sound like a race:
>> Qemu registers the new notifier with kvm
>> Backend kicks the (now no longer registered) maskfd
> 
> vhost user is not supposed to use maskfd at all.
> 
> We have this code:
>        if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
>            dev->use_guest_notifier_mask = false;
>        }
> 
> isn't it effective?

I'm observing this problem when using vhost-user-scsi, not -net. So the code 
above is not in effect. Anyway, I'd expect the race I described to also happen 
on vhost-scsi.

The problem is aggravated on storage for the following reason:
SeaBIOS configures the vhost-(user)-scsi device and finds the boot drive and 
reads the boot data.
Then the guest kernel boots, the virtio-scsi driver loads and reconfigures the 
device.
Qemu sends the new virtq information to the backend, but as soon as the device 
status is OK the guest sends reads to the root disk.
And if the irq is lost the guest will wait for a response forever before making 
progress.

Unlike networking (which must cope with packet drops), the guest hangs waiting 
for the device to answer.

So even if you had this race in networking, the guest would eventually 
retransmit which would hide the issue.

Thoughts?
Felipe

> 
> 
> 
>> Qemu sends the new callfd to the application
>> 
>> It's not hard to repro. How could this situation be avoided?
>> 
>> Cheers,
>> Felipe
>> 
>> 
>>> 
>>>> 
>>>>> 
>>>>>>> 
>>>>>>> Perhaps it's best for now to delay the callfd notification with a flag 
>>>>>>> until it is received?
>>>>>> 
>>>>>> The other idea is to always kick when we receive the callfd. I remember 
>>>>>> discussing that alternative with you before libvhost-user went in. The 
>>>>>> protocol says both the driver and the backend must handle spurious 
>>>>>> kicks. This approach also fixes the bug.
>>>>>> 
>>>>>> I'm happy with whatever alternative you want, as long it makes 
>>>>>> libvhost-user usable for storage devices.
>>>>>> 
>>>>>> Thanks,
>>>>>> Felipe
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Signed-off-by: Felipe Franciosi <address@hidden>
>>>>>>>> ---
>>>>>>>> contrib/libvhost-user/libvhost-user.c | 26 +++++++++++++-------------
>>>>>>>> 1 file changed, 13 insertions(+), 13 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/contrib/libvhost-user/libvhost-user.c
>>>>>>>> b/contrib/libvhost-user/libvhost-user.c
>>>>>>>> index af4faad..a46ef90 100644
>>>>>>>> --- a/contrib/libvhost-user/libvhost-user.c
>>>>>>>> +++ b/contrib/libvhost-user/libvhost-user.c
>>>>>>>> @@ -607,19 +607,6 @@ vu_set_vring_kick_exec(VuDev *dev, VhostUserMsg 
>>>>>>>> *vmsg)
>>>>>>>>      DPRINT("Got kick_fd: %d for vq: %d\n", vmsg->fds[0], index);
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> -    dev->vq[index].started = true;
>>>>>>>> -    if (dev->iface->queue_set_started) {
>>>>>>>> -        dev->iface->queue_set_started(dev, index, true);
>>>>>>>> -    }
>>>>>>>> -
>>>>>>>> -    if (dev->vq[index].kick_fd != -1 && dev->vq[index].handler) {
>>>>>>>> -        dev->set_watch(dev, dev->vq[index].kick_fd, VU_WATCH_IN,
>>>>>>>> -                       vu_kick_cb, (void *)(long)index);
>>>>>>>> -
>>>>>>>> -        DPRINT("Waiting for kicks on fd: %d for vq: %d\n",
>>>>>>>> -               dev->vq[index].kick_fd, index);
>>>>>>>> -    }
>>>>>>>> -
>>>>>>>>  return false;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> @@ -661,6 +648,19 @@ vu_set_vring_call_exec(VuDev *dev, VhostUserMsg 
>>>>>>>> *vmsg)
>>>>>>>> 
>>>>>>>>  DPRINT("Got call_fd: %d for vq: %d\n", vmsg->fds[0], index);
>>>>>>>> 
>>>>>>>> +    dev->vq[index].started = true;
>>>>>>>> +    if (dev->iface->queue_set_started) {
>>>>>>>> +        dev->iface->queue_set_started(dev, index, true);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (dev->vq[index].kick_fd != -1 && dev->vq[index].handler) {
>>>>>>>> +        dev->set_watch(dev, dev->vq[index].kick_fd, VU_WATCH_IN,
>>>>>>>> +                       vu_kick_cb, (void *)(long)index);
>>>>>>>> +
>>>>>>>> +        DPRINT("Waiting for kicks on fd: %d for vq: %d\n",
>>>>>>>> +               dev->vq[index].kick_fd, index);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>  return false;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 1.9.4
>>>>>>>> 
>>>>>>>> 
>>>>>> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]