[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: TR: Openstack NOVA - Improve the time of file system freeze during l
From: |
Kashyap Chamarthy |
Subject: |
Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot |
Date: |
Mon, 24 Jan 2022 18:24:38 +0100 |
Hi,
(Sorry for the slowness here.)
On Thu, Jan 20, 2022 at 12:45:02PM +0100, Kevin Wolf wrote:
> Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben:
[...]
> > Hello,
> >
> > I'm working on a patch in nova to improve the time of file system
> > freeze during live-snapshot on an instance with a local disk and I
> > need your opinion about the solution I would propose.
> >
> > My issue during the live migration is the duration of file system
> > freeze on an instance with a big local disk. [1]
> >
> > In my case instance have locally a disk (400Go) and the
> > qemu-guest-agent is installed.
> >
> > Nova process like that: [2]
> > dev = guest.get_block_device(disk_path)
> >
> > 1. guest.freeze_filesystems()
> > 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True)
> > 3. while not dev.is_job_complete() #wait for the end of mirroring (the
> > issue is here, the waiting time depend on the size of the disk and
> > the IOPS)
> > 4. dev.abort_job()
> > 5. guest.thaw_filesystems()
>
> So first of all, I have to do some translation of terminology which
> seems to be different from what I am used to.
First, here's the API mapping from Nova to QEMU:
- rebase() is referring to a Nova's helper[b] method
- ... which maps to libvirt's blockRebase() API
- ... which in turns maps to QMP 'block-stream'
And here's the broader QEMU and libvirt block API mapping (Eric/Peter
correct me if I missed something):
- QEMU 'block-commit' == blockCommit() API in libvirt
- QEMU 'block-stream' == blockRebase() API in libvirt
- QEMU 'drive-mirror' / 'blockdev-mirror' == blockCopy() API in libvirt
- QEMU 'blockdev-backup' == backupBegin() API in libvirt
> dev.rebase with copy=True seems to result in a mirror block job in QEMU?
A detail: if you _only_ have copy=True, then you're right it makes a
fully copy. But there's also the "shallow=True" and "reuse_ext=True".
It's worth quoting (at least for me :-)) the official libvirt API
docs[c] of virDomainRebase():
- "When flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a
copy, where base must be the name of a new file to copy the chain
to. By default, the copy will pull the entire source chain into the
destination file"
- ... "but if flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then
only the top of the source chain will be copied (the source and
destination have a common backing file)"
- ... VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT means, "reuse an existing file
which was pre-created with the correct format and metadata and
sufficient size to hold the copy"
- ... "In case the VIR_DOMAIN_BLOCK_REBASE_SHALLOW flag is used the
pre-created file has to exhibit the same guest visible contents as
the backing file of the original image. This allows a management app
to pre-create files with relative backing file names, rather than
the default of absolute backing file names; as a security
precaution, you should generally only use reuse_ext with the shallow
flag and a non-raw destination file"
> So what you're calling a snapshot here doesn't seem to be a differential
> snapshot (e.g. by adding a COW overlay), but a full copy that results in
> two fully independent, standalone images. Is this right?
Correct. The "live snapshot" in Nova has always been full copies,
afraid.
> Adding a bit more context, the whole process seems to be:
>
> 1. Create a qcow2 for the copy of the top layer that shares the backing
> file with the active image.
>
> 2. Freeze guest filesystems
>
> 3. Create a full copy of the active layer (into the new qcow2 file)
> a. Start a mirror job
As noted above, it's a stream job. (Assuming libvirt's blockRebase() is
still caling stream under the hood)
> b. Wait for the mirror job to move to the READY state
> c. Cancel the mirror job with force=false, i.e. complete the mirror
> job without changing the active image of the VM
Yeah, the "full copy of the active layer" is what libvirt calls "shallow
copy" -- shallow=True in the rebase() call above
> 4. Thaw the guest filesystems
>
> 5. qemu-img convert the copied top layer with its full backing chain
> to a standalone raw image
>
> 6. Delete the temporary qcow2 copy
>
> > My proposition is to move the freeze after the end of mirroring and
> > before the stop of mirroring. [3] I have tried on an instance and the
> > last written file on the fs corresponds to the end of the mirror.
>
> Yes, you only need the freeze around the mirror job completion, that is,
> step 3c above.
Thanks for confirming; I always forget these freeze semantics.
> However, the whole process seems very complicated for a rather simple
> operation. A comment mentions that the dance with the temporary qcow2
> file is because of a (not further specified) bug in QEMU 1.3. I believe,
> libvirt hasn't supported a QEMU version that old for a while, so is this
> really still a valid reason?
You're right -- you spotted code-rot in Nova here; the QEMU 1.3
code-comment gives it away (although it doesn't tell what the bug was).
That part[a] of the Nova code in _live_snapshot() method can be
refactored to use newer libvirt/QEMU APIs.
That said, some of the "undefine a guest XML and the redefine it later"
dance is because blockRebase() doesn't have a way to restart a copy job
on guest restart while mirroring is still intact. So the trick when
using libvirt's blockRebase() for a copy-job is to temporarily make the
domain "transient" (the guest.delete_configuration() ...
host.write_instance_config() calls in Nova).
However, blockCopy() API has a _TRANSIENT_JOB that works around the
limitation of blockRebase()
Overall, wherever Nova can, it should completely use replace
blockRebase() usage with one of the following APIs:
- virDomainBlockCopy() -- blockCopy() -- this is already used by
Nova today; but not consistently
- virDomainBackupBegin() -- backupBegin()
- virDomainBackupGetXMLDesc() -- backupGetXMLDesc()
- virDomainCheckpointCreateXML() -- checkpointCreateXML()
- virDomainCheckpointDelete()
> But what I would actually have used is a backup block job, which makes
> sure that the copy will contain the disk content at the point of time
> when the block job was started rather than when it happened to complete.
I agree, I'd prefer that too for the long term -- using the backup APIs
above. I _think_ Pierre can get his problem solved with libvirt's
blockCopy() API. Pierre, Nova has a wrapper for it, look at the usage
of the copy() wrapper method[d] in Nova.
[...]
[a]
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3166,L3190
[b]
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L745,L767
[c] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase
[d]
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L729,#L743
--
/kashyap