qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of use


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
Date: Wed, 19 Oct 2011 02:56:31 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Oct 19, 2011 at 12:13:49AM +0200, Jan Kiszka wrote:
> On 2011-10-18 23:40, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> >>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>>>>>> What KVM has to do is just mapping an arbitrary MSI message
> >>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) 
> >>>>>>> to
> >>>>>>
> >>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >>>>>> only a current interpretation of one specific arch. )
> >>>>>
> >>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
> >>>>
> >>>> Right, but those additional bits like the destination make different
> >>>> messages. We have to encode those 24 bits into a unique GSI number and
> >>>> restore them (by table lookup) on APIC injection inside the kernel. If
> >>>> we only had to encode 256 different vectors, we would be done already.
> >>>
> >>> Right. But in practice guests always use distinct vectors (from the
> >>> 256 available) for distinct messages. This is because
> >>> the vector seems to be the only thing that gets communicated by the APIC
> >>> to the software.
> >>>
> >>> So e.g. a table with 256 entries, with extra 1024-256
> >>> used for spill-over for guests that do something unexpected,
> >>> would work really well.
> >>
> >> Already Linux manages vectors on a pre-CPU basis. For efficiency
> >> reasons, it does not exploit the full range of 256 vectors but actually
> >> allocates them in - IIRC - steps of 16. So I would not be surprised to
> >> find lots of vector number "collisions" when looking over a full set of
> >> CPUs in a system.
> >>
> >> Really, these considerations do not help us. We must store all 96 bits,
> >> already for the sake of other KVM architectures that want MSI routing.
> >>>
> >>>
> >>>>>
> >>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>>>> messages, we could run out of them when creating routes, statically or
> >>>>>>> lazily.
> >>>>>>>
> >>>>>>> What would probably help us long-term out of your concerns regarding
> >>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>>>> messages, i.e. those that are not associated with an irqfd number or 
> >>>>>>> an
> >>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that 
> >>>>>>> accepts
> >>>>>>> address and data directly.
> >>>>>>
> >>>>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>>>> on our GSI limitation issue, I think I will hack up something like 
> >>>>>> that.
> >>>>>>
> >>>>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>>>> host kernels would force us to do searches for already cached messages.
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> Hmm, I'm not all that sure. Existing design really allows
> >>>>> caching the route in various smart ways. We currently do
> >>>>> this for irqfd but this can be extended to ioctls.
> >>>>> If we just let the guest inject arbitrary messages,
> >>>>> that becomes much more complex.
> >>>>
> >>>> irqfd and kvm device assignment do not allow us to inject arbitrary
> >>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >>>> routes from an MSI message to a GSI number (+they configure the related
> >>>> backends).
> >>>
> >>> Yes, it's a very flexible API but it would be very hard to optimize.
> >>> GSIs let us do the slow path setup, but they make it easy
> >>> to optimize target lookup in kernel.
> >>
> >> Users of the API above have no need to know anything about GSIs. They
> >> are an artifact of the KVM-internal interface between user space and
> >> kernel now - thanks to the MSIRoutingCache encapsulation.
> > 
> > Yes but I am saying that the API above can't be implemented
> > more efficiently than now: you will have to scan all apics on each MSI.
> > The GSI implementation can be optimized: decode the vector once,
> > if it matches a single vcpu, store that vcpu and use when sending
> > interrupts.
> 
> Sorry, missed that you switched to kernel.
> 
> What information do you want to cache there that cannot be easily
> obtained by looking at a concrete message? I do not see any. Once you
> checked that the delivery mode targets a specific cpu, you could address
> it directly.

I thought we need to match APIC ID. That needs a table lookup, no?

> Or are you thinking about some cluster mode?

That too.

> > 
> > 
> >>>
> >>> An analogy would be if read/write operated on file paths.
> >>> fd makes it easy to do permission checks and slow lookups
> >>> in one place. GSI happens to work like this (maybe, by accident).
> >>
> >> Think of an opaque file handle as a MSIRoutingCache object. And it
> >> encodes not only the routing handle but also other useful associated
> >> information we need from time to time - internally, not in the device
> >> models.
> > 
> > Forget qemu abstractions, I am talking about data path
> > optimizations in kernel in kvm. From that POV the point of an fd is not
> > that it is opaque. It is that it's an index in an array that
> > can be used for fast lookups.
> > 
> >>>>>
> >>>>> Another concern is mask bit emulation. We currently
> >>>>> handle mask bit in userspace but patches
> >>>>> to do them in kernel for assigned devices where seen
> >>>>> and IMO we might want to do that for virtio as well.
> >>>>>
> >>>>> For that to work the mask bit needs to be tied to
> >>>>> a specific gsi or specific device, which does not
> >>>>> work if we just inject arbitrary writes.
> >>>>
> >>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> I do.
> >>> How would we maintain a mask/pending bit in kernel if we are not
> >>> supplied info on all available vectors even?
> >>
> >> It's tricky to discuss an undefined interface (there only exists an
> >> outdated proposal for kvm device assignment). But I suppose that user
> >> space will have to define the maximum number of vectors when creating an
> >> in-kernel MSI-X MMIO area. The device already has to tell this to 
> >> msix_init.
> >>
> >> The number of used vectors will correlate with the number of registered
> >> irqfds (in the case of vhost or vfio, device assignment still has
> >> SET_MSIX_NR). As kernel space would then be responsible for mask
> >> processing, user space would keep vectors registered with irqfds, even
> >> if they are masked. It could just continue to play the trick and drop
> >> data=0 vectors.
> > 
> > Which trick?  We don't play any tricks except for device assignment.
> > 
> >> The point here is: All those steps have _nothing_ to do with the generic
> >> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> >> an API. In contrast, msix_vector_use/unuse were generic services that
> >> were actually created to please KVM requirements. But if we split that
> >> up, we can address the generic MSI-X requirements in a way that makes
> >> more sense for emulated devices (and particularly msix_vector_use makes
> >> no sense for emulation).
> >>
> >> Jan
> >>
> > 
> > We need at least msix_vector_unuse
> 
> Not at all. We rather need some qemu_irq_set(level) for MSI.
> The spec
> requires that the device clears pending when the reason for that is
> removed. And any removal that is device model-originated should simply
> be signaled like an irq de-assert.

OK, this is a good argument.
In particular virtio ISR read could clear msix pending bit
(note: it would also need to clear irqfd as that is where
 we get the pending bit).

I would prefer not to use qemu_irq_set for this though.
We can add a level flag to msix_notify.

> Vector "unusage" is just one reason here.

I don't see removing the use/unuse functions as a priority though,
but if we add an API that also lets devices say
'reason for interrupt is removed', that would be nice.

Removing extra code can then be done separately, and on qemu.git
not on qemu-kvm.

> > - IMO it makes more sense than "clear
> > pending vector". msix_vector_use is good to keep around for symmetry:
> > who knows whether we'll need to allocate resources per vector
> > in the future.
> 
> For MSI[-X], the spec is already there, and we know that there no need
> for further resources when emulating it.
> Only KVM has special needs.
> 
> Jan
> 

It's not hard to speculate.  Imagine an out of process device that
shares guest memory and sends interrupts to qemu using eventfd. Suddenly
we need an fd per vector, and this without KVM.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]