|
| From: | Hanna Czenczek |
| Subject: | Re: [PATCH 0/4] vhost-user-fs: Internal migration |
| Date: | Fri, 5 May 2023 14:51:55 +0200 |
| User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 |
On 05.05.23 11:53, Eugenio Perez Martin wrote:
On Fri, May 5, 2023 at 11:03 AM Hanna Czenczek <hreitz@redhat.com> wrote:On 04.05.23 23:14, Stefan Hajnoczi wrote:On Thu, 4 May 2023 at 13:39, Hanna Czenczek <hreitz@redhat.com> wrote:
[...]
All state is lost and the Device Initialization process must be followed to make the device operational again. Existing vhost-user backends don't implement SET_STATUS 0 (it's new). It's messy and not your fault. I think QEMU should solve this by treating stateful devices differently from non-stateful devices. That way existing vhost-user backends continue to work and new stateful devices can also be supported.It’s my understanding that SET_STATUS 0/RESET_DEVICE is problematic for stateful devices. In a previous email, you wrote that these should implement SUSPEND+RESUME so qemu can use those instead. But those are separate things, so I assume we just use SET_STATUS 0 when stopping the VM because this happens to also stop processing vrings as a side effect? I.e. I understand “treating stateful devices differently” to mean that qemu should use SUSPEND+RESUME instead of SET_STATUS 0 when the back-end supports it, and stateful back-ends should support it.Honestly I cannot think of any use case where the vhost-user backend did not ignore set_status(0) and had to retrieve vq states. So maybe we can totally remove that call from qemu?
I don’t know so I can’t really say; but I don’t quite understand why qemu would reset a device at any point but perhaps VM reset (and even then I’d expect the post-reset guest to just reset the device on boot by itself, too).
[...]
Naturally, what I want to know most of all is whether you believe I can get away without SUSPEND/RESUME for now. To me, it seems like honestly not really, only when turning two blind eyes, because otherwise we can’t ensure that virtiofsd isn’t still processing pending virt queue requests when the state transfer is begun, even when the guest CPUs are already stopped. Of course, virtiofsd could stop queue processing right there and then, but… That feels like a hack that in the grand scheme of things just isn’t necessary when we could “just” introduce SUSPEND/RESUME into vhost-user for exactly this. Beyond the SUSPEND/RESUME question, I understand everything can stay as-is for now, as the design doesn’t seem to conflict too badly with possible future extensions for other migration phases or more finely grained migration phase control between front-end and back-end. Did I at least roughly get the gist?One part we haven't discussed much: I'm not sure how much trouble you'll face due to the fact that QEMU assumes vhost devices can be reset across vhost_dev_stop() -> vhost_dev_start(). I don't think we should keep a copy of the state in-memory just so it can be restored in vhost_dev_start().All I can report is that virtiofsd continues to work fine after a cancelled/failed migration.Isn't the device reset after a failed migration? At least net devices are reset before sending VMState. If it cannot be applied at the destination, the device is already reset...
It doesn’t look like the Rust crate virtiofsd uses for vhost-user supports either F_STATUS or F_RESET_DEVICE, so I think this just doesn’t affect virtiofsd.
Hanna
| [Prev in Thread] | Current Thread | [Next in Thread] |