Our group would
like to write block device backups directly to an object store, using an
interface such as s3fs or rclone-mount. We've run into problems with both
interfaces, and in both cases the problems revolve around fdatasync system
calls. With s3fs, fdatasync calls are painfully slow. With rclone-mount,
the calls are very fast but don't do anything.
Syncing files
to an object store is inherently problematic, as a proper sync requires
finalizing the object that holds the file. After finalization, additional
writes to the file require a new object to be created and the old object
to be copied and destroyed. This process results in an N-squared performance
problem for files that are synced periodically as they are written, as
is the case for qemu backups.
Empirically, s3fs
implements fdatasync, and hence backups written to s3fs take an untenably
long time. I can provide data and straces, if needed.
Backups written
to rclone-mount are much faster, but there are obvious semantic problems.
The backup job completes successfully before the file is actually stable
in the object store. And in fact, a lot of the work of finalizing the file
occurs during the "close" system call that is invoked as part
of the qmp_blockdev_del operation.The syscall causes that operation to
take so long that other commands time out waiting to "acquire state
change lock (held by monitor qemuProcessEventHandler)".
My questions for
the group are: Has anyone else tried writing backups to file systems that
don't have good support for fdatasync, and do you have any advice other
than "Don't do that." ?