qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
Date: Wed, 8 Oct 2014 13:55:15 +0300

On Wed, Oct 08, 2014 at 01:37:25PM +0300, Avi Kivity wrote:
> 
> On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote:
> >On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
> >>On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
> >>>>On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> >>>>>a single descriptor might use all of
> >>>>>the virtqueue. In this case we wont to be able to pass the
> >>>>>descriptor directly to linux as a single iov, since
> >>>>>
> >>>>You could separate maximum request scatter/gather list size from the
> >>>>virtqueue size.  They are totally unrelated - even now you can have a 
> >>>>larger
> >>>>request by using indirect descriptors.
> >>>We could add a feature to have a smaller or larger S/G length limit.
> >>>Is this something useful?
> >>>
> >>Having a larger ring size is useful, esp. with zero-copy transmit, and you
> >>would need the sglist length limit in order to not require linearization on
> >>linux hosts.  So the limit is not useful in itself, only indirectly.
> >>
> >>Google cloud engine exposes virtio ring sizes of 16384.
> >OK this sounds useful, I'll queue this up for consideration.
> >Thanks!
> 
> Thanks.
> 
> >>Even more useful is getting rid of the desc array and instead passing descs
> >>inline in avail and used.
> >You expect this to improve performance?
> >Quite possibly but this will have to be demonstrated.
> >
> 
> The top vhost function in small packet workloads is vhost_get_vq_desc, and
> the top instruction within that (50%) is the one that reads the first 8
> bytes of desc.  It's a guaranteed cache line miss (and again on the guest
> side when it's time to reuse).

OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.

If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.

Sounds good in theory.

> Inline descriptors will amortize the cache miss over 4 descriptors, and will
> allow the hardware to prefetch, since the descriptors are linear in memory.

If descriptors are used in order (as they are with current qemu)
then aren't they amortized already?

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]