|
From: | Jason Wang |
Subject: | Re: [PATCH 0/3] virtio-net: graceful drop of vhost for TAP |
Date: | Thu, 18 Feb 2021 11:02:17 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 2021/2/10 下午4:38, Michael S. Tsirkin wrote:
On Wed, Feb 10, 2021 at 02:19:59PM +0800, Jason Wang wrote:On 2021/2/9 下午11:04, Michael S. Tsirkin wrote:On Tue, Feb 09, 2021 at 02:51:05PM +0000, Daniel P. Berrangé wrote:On Tue, Feb 09, 2021 at 09:34:20AM -0500, Michael S. Tsirkin wrote:On Thu, Feb 04, 2021 at 10:29:12PM +0200, Yuri Benditovich wrote:This set of patches introduces graceful switch from tap-vhost to tap-no-vhost depending on guest features. Before that the features that vhost does not support were silently cleared in get_features. This creates potential problem of migration from the machine where some of virtio-net features are supported by the vhost kernel to the machine where they are not supported (packed ring as an example).I still worry that adding new features will silently disable vhost for people. Can we limit the change to when a VM is migrated in?Some management applications expect bi-directional live migration to work, so taking specific actions on incoming migration only feels dangerous.Could you be more specific? Bi-directional migration is currently broken when migrating new kernel->old kernel. This seems to be the motivation for this patch, though I wish it was spelled out more explicitly. People don't complain much, but I'm fine with fixing that with a userspace fallback. I'd rather not force the fallback on others though: vhost is generally specified explicitly by user while features are generally set automatically, so this patch will make us override what user specified, not nice.IMHO if the features we're adding cannot be expected to exist in host kernels in general, then the feature should defualt to off and require explicit user config to enable. Downstream distros which can guarantee newer kernels can flip the default in their custom machine types if they desire. Regards, DanielUnfortunately that will basically mean we are stuck with no new features for years. We did what this patch is trying to change for years now, in particular KVM also seems to happily disable CPU features not supported by kernel so I wonder why we can't keep doing it, with tweaks for some corner cases.It's probably not the corner case. So my understanding is when a feature is turned on via command line,Most people do not play with these flags on command line, the main path to turn features on if when they default to on.
The problem is qemu may choose to clear feature whose default are on.
it should not be cleared silently otherwise we may break migration for sure.Not if we are careful. Setting flags is more dangerous.
The problem is that user is expected to enable the feature for the device. If Qemu fail to do that, shouldn't we fail the device initialization?
Clearing is generally ok.E.g when packed=on is specified, we should disable vhost instead of clear it from the device. ThanksFrom usability POV, consider packed as an example, people only enable it to get better performance. libvirt says: The attribute packed controls if QEMU should try to use packed virtqueues. Compared to regular split queues, packed queues consist of only a single descriptor ring replacing available and used ring, index and descriptor buffer. This can result in better cache utilization and performance. If packed virtqueues are actually used depends on the feature negotiation between QEMU, vhost backends and guest drivers. Possible values are on or off. Switching to a completely different backend clearly isn't what user intended.
If we don't do migration, all should be fine. But the problem is the migration compatibility. My understanding is that libvirt will assume the compatibility if the same user-visible feature were set through cli. So if any feature were cleared the migration compatibility will break. This will also be a burden to support cross vendor live migration in the future for vDPA.
Thanks
userspace and kernel not being in 100% sync wrt features is not a corner case though, and switching backends seems like too big a hammer.-- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
[Prev in Thread] | Current Thread | [Next in Thread] |