qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/4] vl: Prioritize realizations of devices


From: Markus Armbruster
Subject: Re: [PATCH 4/4] vl: Prioritize realizations of devices
Date: Thu, 02 Sep 2021 17:26:57 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Peter Xu <peterx@redhat.com> writes:

> On Thu, Sep 02, 2021 at 10:26:16AM +0200, Igor Mammedov wrote:
>> On Mon, 30 Aug 2021 15:02:53 -0400
>> Peter Xu <peterx@redhat.com> wrote:
>> 
>> > On Thu, Aug 26, 2021 at 09:43:59AM -0400, Peter Xu wrote:
>> > > > > A simple state machine can track "has IOMMU" state.  It has three 
>> > > > > states
>> > > > > "no so far", "yes", and "no", and two events "add IOMMU" and "add 
>> > > > > device
>> > > > > that needs to know".  State diagram:
>> > > > > 
>> > > > >                           no so far
>> > > > >                    +--- (start state) ---+
>> > > > >                    |                     |
>> > > > >          add IOMMU |                     | add device that
>> > > > >                    |                     |  needs to know
>> > > > >                    v                     v
>> > > > >              +--> yes                    no <--+
>> > > > >              |     |   add device that   |     |
>> > > > >              +-----+    needs to know    +-----+
>> > > > > 
>> > > > > "Add IOMMU" in state "no" is an error.  
>> > > > 
>> > > > question is how we distinguish "device that needs to know"
>> > > > from device that doesn't need to know, and then recently
>> > > > added feature 'bypass IOMMU' adds more fun to this.  
>> > > 
>> > > Maybe we can start from "no device needs to know"? Then add more into it 
>> > > when
>> > > the list expands.
>> > > 
>> > > vfio-pci should be a natural fit because we're sure it won't break any 
>> > > valid
>> > > old configurations.  However we may need to be careful on adding more 
>> > > devices,
>> > > e.g. when some old configuration might work on old binaries, but I'm not 
>> > > sure.  
>> > 
>> > Btw, I think this state machine is indeed bringing some complexity on even
>> > understanding it - it is definitely working but it's not obvious to anyone 
>> > at
>> > the first glance, and it's only solving problem for vIOMMU.  E.g., do we 
>> > need
>> > yet another state machine for some other ordering constraints?
>> >
>> > From that POV, I don't like this solution more than the simple "assign 
>> > priority
>> > for device realization" idea..
>> It seems simple but implicit dependencies are fragile (good or
>> I'd rather say bad example implicit dependencies is vl.c magic code,
>> where changing order of initialization can easily break QEMU,
>> which happens almost every time it's refactored),
>
> True.  I was never able of telling whether that comes from natural complexity
> of the piece of software or there's indeed some smarter moves.
>
>> and as Markus already mentioned it won't work in QMP case.
>
> I didn't ask before, but I do have a question on whether that's a real 
> problem.
>
> When QMP interface is ready, we should have a last phase of "preconfig done"
> command, right?  I thought it was "x-exit-preconfig" now, but I'm not sure so
> am guessing.  If so, can we do the reorder there and postpone all device
> creations, maybe?  Then the ordering properties can be applied there too.

This would regress error reporting.

A significant advantage of doing configuration in basic steps is the
relative ease of pinpointing what exactly fails.

If you change device_add to merely record the request for later, you
delay the non-trivial failures until later.

Compare:

    -> {'execute':'device_add','arguments':{'driver':'virtio-blk',"drive": 
"foo"}}
    <- {"error": {"class": "GenericError", "desc": "Device needs media, but 
drive is empty"}}

to getting the same error in reply to x-exit-preconfig, with a dozen or
two device_add queued up.

The error comes from virtio_blk_device_realize(), which you propose to
delay until x-exit-preconfig.

In theory, we can rework all the errors in question to provide
sufficient context, and rework libvirt to pick the context apart, so it
can pinpoint what's wrong in the user's configuration.  In practice,
thanks, but no thanks.

> The other thing is I think only libvirt will use the QMP case even if it'll be
> ready, and libvirt will need to know the ordering already like vIOMMU and vfio
> and pci buses - even if a new qemu supported the ordering of all these, 
> libvirt
> still need to support old qemu binaries (and the code is already there anyway)
> and it's fairly natural they initiate the QMP commands using the same 
> ordering.

This is actually a pretty strong argument for not having QEMU reorder at
all.

> IMHO it means ordering for QMP is not as important as cmdline if that's always
> initiated by another software.
>
>> 
>> What could work for both cases is explicit dependencies,
>> however it would be hard to describe such dependency in this case,
>> where device can work both with and without IOMMU depending
>> on the bus settings it's attached to.
>> 
>> Have you considered another approach, i.e. instead of reordering,
>> change vfio-pci device model to reconfigure DMA address space
>> after realize time (ex: at reset time)?
>
> Yes, I agree this seems to be the cleanest appproach.  It may just need more
> changes and they'll be on vfio, and I'm not aware of the rest (Jason should
> know better on virtio/vdpa).
>
> The other thing is some dependency cannot be resolved by this, e.g., the pci
> bus issue where we put the pci bus to be after the device that it owns.  But
> indeed we're already early-fail that now so it seems ok.

Yes, that's a complete non-problem :)

> Side note: from my gut feeling I still think at some point we shouldn't do the
> early-failure for cases in the case of pci bus and device - the cmdline
> obviously contains enough information on the dependency on "this pci device
> needs that pci bus", and early-fail does seem to be an work-around to me just
> because they specified in the wrong order.

Again, there are two sane command line evaluation orders: (1) strictly
left to right, and (2) order doesn't matter.  QEMU of course does (3)
unpredicable for humans without referring back to the source code.

The difficulty with getting to (1) is mostly compatibility and putting
in the work.  Reporting when things don't get created in time nicely and
reliably is yet more work.  Not particularly challenging, just work.

Getting to a reliable version of (2) feels like a pipe dream to me.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]