qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
Date: Wed, 14 Mar 2012 10:34:08 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1

Am 14.03.2012 01:14, schrieb Federico Simoncelli:
> ----- Original Message -----
>> From: "Eric Blake" <address@hidden>
>> To: "Paolo Bonzini" <address@hidden>
>> Cc: address@hidden, address@hidden, address@hidden, address@hidden, 
>> address@hidden
>> Sent: Tuesday, March 13, 2012 9:48:10 PM
>> Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command
>>
>> On 03/06/2012 10:56 AM, Paolo Bonzini wrote:
>>> From: Federico Simoncelli <address@hidden>
>>>
>>> Signed-off-by: Federico Simoncelli <address@hidden>
>>> Signed-off-by: Paolo Bonzini <address@hidden>
>>
>>>  ##
>>> +# @drive-reopen
>>> +#
>>> +# Assigns a new image file to a device.
>>> +#
>>> +# @device: the name of the device for which we are changing the
>>> image file.
>>> +#
>>> +# @new-image-file: the target of the new image. If the file
>>> doesn't exists the
>>> +#                  command will fail.
>>> +#
>>> +# @format: #optional the format of the new image, default is
>>> 'qcow2'.
>>> +#
>>> +# Returns: nothing on success
>>> +#          If @device is not a valid block device, DeviceNotFound
>>> +#          If @new-image-file can't be opened, OpenFileFailed
>>> +#          If @format is invalid, InvalidBlockFormat
>>> +#
>>> +# Since 1.1
>>> +##
>>> +{ 'command': 'drive-reopen',
>>> +  'data': { 'device': 'str', 'new-image-file': 'str', '*format':
>>> 'str' } }
>>
>> I still think we need a 'drive-reopen' action included in
>> 'transaction',
>> as an 11/10 on this series.  For disk migration, it is true that you
>> can
>> migrate one disk at a time, and therefore only need to reopen one
>> disk
>> at a time, to get the guarantee that for a single disk image, the
>> current state of that image will be guaranteed to be consistent using
>> only one storage domain.
> 
> I'm not sure if this was already addressed on this mailing list but
> the main problem is that as general rule a qcow file cannot be opened
> in r/w mode twice. I believe there was only one exception to that
> with the live migration and it generated several issues.

In fact the same is true for any image. There are just some special
cases that happen to work anyway, but using them is not more than a hack.

> That said, reopen is really hard to be implemented as a transaction
> without breaking that rule. For example in the blkmirror case you'd
> need to open the destination image in r/w while the mirroring is in
> action (already having the same image in r/w mode).
> 
> There are several solutions here but they are either really hard to
> implement or non definitive. For example:
> 
> * We could try to implement the reopen command for each special case,
>   eg: blkmirror, reopening the same image, etc... and in such cases
>   reusing the same bs that we already have. The downside is that this
>   command will be coupled with all this special cases.

The problem we're trying to solve is that we have a graph of open
BlockDriverStates (connected by bs->file, backing file relations etc.)
and we want to transform it into a different graph of open BDSs
atomically, reusing zero, one or many of the existing BDSs, and possibly
with changed properties (cache mode, read-only, etc.)

What we can have reasonably easily (there are patches floating around,
they just need to be completed), is a bdrv_reopen() that changes flags
on one given BDS, without changing the file it's backed by. This is
already broken up into prepare/commit/abort stages as we need it to
reopen VMDK's split images safely.

In theory this should be enough to build the new graph by opening yet
unused BDSs, preparing the reopen of reused ones and only if all of that
was successful, committing the bdrv_reopen and changing the relations
between the nodes. I hope that at the same time it's clear that this
isn't exactly trivial to implement.

> * We could use the transaction APIs without actually making it
>   transaction (if we fail in the middle we can't rollback). The only
>   advantage of this is that we'd provide a consistent API to libvirt
>   and we would postpone the problem to the future. Anyway I strongly
>   discourage this as it's completely unsafe and it's going to break
>   the transaction semantic. Moreover it's a solution that relies too
>   much on the hope of finding something appropriate in the future.

This is not an option. Advertising transactional behaviour and not
implementing it is just plain wrong.

> * We could leave it as it is, a distinct command that is not part of
>   the transaction and that it's closing the old image before opening
>   the new one.

Yes, this would be the short-term preliminary solution. I would tend to
leave it to downstreams to implement it as an extension, though.

> This is not completely correct, the main intent was to not spread one
> image chain across two storage domains (making it incomplete if one of
> them was missing). In the next oVirt release a VM can have different
> disks on different storage domains, so this wouldn't be a special case
> but just a normal situation.

The problem with this kind of argument is that we're not developing only
for oVirt, but need to look for what makes sense for any management tool
(or even just direct users of qemu).

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]