qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: virtio-vsock requires 'disable-legacy=on' in QEMU 5.1


From: Cornelia Huck
Subject: Re: virtio-vsock requires 'disable-legacy=on' in QEMU 5.1
Date: Tue, 18 Aug 2020 14:44:50 +0200

On Mon, 17 Aug 2020 15:11:28 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:

> On Mon, Aug 17, 2020 at 12:27:46PM +0200, Cornelia Huck wrote:
> > On Thu, 13 Aug 2020 14:04:15 +0200
> > Stefano Garzarella <sgarzare@redhat.com> wrote:
> >   
> > > On Thu, Aug 13, 2020 at 12:37:37PM +0200, Cornelia Huck wrote:  
> > > > On Thu, 13 Aug 2020 12:24:30 +0200
> > > > Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > >     
> > > > > On Thu, Aug 13, 2020 at 11:28:20AM +0200, Cornelia Huck wrote:    
> > > > > > We basically have three possible ways to deal with this:
> > > > > > 
> > > > > > - Force it to modern (i.e., what you have been doing; would need the
> > > > > >   equivalent changes in ccw as well.)      
> > > > > 
> > > > > Oo, thanks for pointing out ccw!
> > > > > I don't know ccw well, in this case should we set dev->max_rev to 1 
> > > > > or 2
> > > > > to force to modern?    
> > > > 
> > > > No, ->max_rev is the wrong side of the limit :) You want    
> > > 
> > > Well :-) Thanks!
> > >   
> > > > 
> > > >     ccw_dev->force_revision_1 = true;
> > > > 
> > > > in _instance_init() (see e.g. virtio-ccw-gpu.c).
> > > >     
> > > > >     
> > > > > >   Pro: looks like the cleanest approach.
> > > > > >   Con: not sure if we would need backwards compatibility support,
> > > > > >   which looks hairy.      
> > > > > 
> > > > > Not sure too.    
> > > > 
> > > > Yes, I'm not sure at all how to handle user-specified values for
> > > > legacy/modern.  
> > 
> > Thinking a bit more about it, I'm not sure whether we even *can*
> > provide backwards compatibility: we have different autoconfigurations
> > for PCI based upon where it is plugged, and ccw does not have a way to
> > turn legacy on/off, except from within the code.  
> 
> Yes, I discovered today for example that the PCIe bus set auto-legacy
> mode to off.

And vhost-vsock actually really seems to be modern-only, see below.

> 
> >   
> > > >     
> > > > >     
> > > > > > - Add vsock to the list of devices with legacy support.
> > > > > >   Pro: Existing setups continue to work.
> > > > > >   Con: If vsock is really virtio-1-only, we still carry around
> > > > > >   possibly broken legacy support.      
> > > > > 
> > > > > I'm not sure it is virtio-1-only, but virtio-vsock was introduced in
> > > > > 2016, so I supposed it is modern-only.    
> > > > 
> > > > Yes, I would guess so as well.
> > > >     
> > > > > 
> > > > > How can I verify that? Maybe forcing legacy mode and run some tests.  
> > > > >   
> > > > 
> > > > Probably yes. The likeliest area with issues is probably endianness, so
> > > > maybe with something big endian in the mix?
> > > >     
> > > 
> > > Yeah, I'll try this setup!

Ok, I tried this now with an x86 host and an s390x guest. Reverted the
checking commit, tried both with a -ccw and a -pci device and your ncat
example.
- When using virtio-1, both devices work fine.
- When using the -pci device with disable-modern=yes, I get "reset by
  peer".
- When using the -ccw device with max_revision=0, I get an instant
  timeout.

Smells like endianness problems (aka weird things are happening).

Also noticed that vhost-vsock-ccw does not have an immediate problem,
even with the commit: The code only checks whether the device has been
forced to legacy, not whether legacy is allowed (which cannot be
controlled by the user anyway). Probably best to address after we've
dealt with the vhost-vsock issue and made sure that there are no other
problems.

> > >   
> > > > >     
> > > > > > - Do nothing, have users force legacy off. Bad idea, as ccw has no 
> > > > > > way
> > > > > >   to do that on the command line.
> > > > > > 
> > > > > > The first option is probably best.  
> > 
> > The first option is now "force modern, but with no backwards
> > compatibility", which is not that great; but "allow legacy, even though
> > it should not exist" is not particularly appealing, either... what a
> > mess :(  
> 
> Yeah, it's a mess :-( anyway I still prefer option 1, it seems a little
> bit more correct to me.

It seems to me that the status before this was "works by accident, but
only if we're not negotiating to legacy, or the guest/host are both
little endian". IOW, no visible breakage for most people (or we'd
probably have heard of it already). Now we have a setup that's correct,
but forces users to adapt their QEMU command lines. Option 1 would
eliminate the need to do that, but would cause possibly
not-really-fixable migration issues (you can probably deal with that
manually, detaching and re-attaching the device as a last resort.)

So, force modern, probably also remove the -transitional device type,
and put a prominent explanation into the change log?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]