[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_field
Re: [Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_fields()
Tue, 1 Mar 2016 14:24:48 -0500
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0
On 03/01/2016 09:19 AM, Kevin Wolf wrote:
> Am 01.03.2016 um 11:43 hat Markus Armbruster geschrieben:
>> qemu-block@ without qemu-devel@, intentional?
>> Kevin Wolf <address@hidden> writes:
>>> Hi all,
>>> I'm currently trying to get rid of bdrv_move_feature_fields(), so we can
>>> finally have more than one BB per BDS. Generally the way to do this is
>>> to move features from BDS and block.c to BB and block-backend.c.
>>> However, for two of the features I'm not sure about this:
>>> * Copy on Read:
>>> When Jeff introduced bdrv_append() in commit 8802d1fd, the CoR flag
>>> was already moved to the new top level when taking a snapshot. Does
>>> anyone remember why it works like that? It doesn't seem to make a lot
>>> of sense to me.
>>> The use case for manually enabled CoR is to avoid reading data twice
>>> from a slow remote image, so we want to save it to a local overlay,
>>> say an ISO image accessed via HTTP to a local qcow2 overlay.
>> Ignorant / forgetful question: we do that by adding a QCOW2 on top with
>> COR enabled, and that makes QCOW2 copy up on read?
> It's not qcow2, but block.c that implements it, but otherwise yes.
> $ qemu-img create -f qcow2 -b http://something.slow disk.qcow2
> $ qemu-system-x86_64 -drive file=disk.qcow2,copy-on-read=on ...
>>> taking a snapshot, we end up with a backing chain like this:
>>> http <- local.qcow2 <- snap_overlay.qcow2
>> Now COR is enabled where? Just in snap_overlay.qcow2?
> Yes, that's the current behaviour.
> Of course it still copies everything that is read from the remote host
> because it goes through both qcow2 layer, it just copies a bit more than
> that and duplicates local data in both layers.
>>> There is no point in performing copy on read from local.qcow2 into
>>> snap_overlay.qcow2, we just want to keep copying data from the remote
>>> source into local.qcow2.
>> Makes sense.
>>> Possible caveat: We would be writing to a backing file, but that's
>>> similar to what some block jobs do, so if we design our op blockers to
>>> cover this case, it should be fine.
>> COR would write to backing file local.qcow2. Doesn't change contents of
>> the http <- local.qcow2 substack, though.
> Right, that's why the operation can be done in the first place.
>>> I'm actually pretty sure that simply removing COR from the list, and
>>> therefore changing the behaviour to not move it to the top any more,
>>> is the right thing to do and could be considered a bug fix.
>> I'm not sure I got the relation to BBs. Perhaps its about the rule "if
>> $feature sticks to the top when we put a BDS on top, it should probably
>> live in the BB instead." Your point seems to be that COR shouldn't
>> stick to the top. Is that roughly right?
> Not only roughly. :-)
> In order to allow multiple BBs per BDS I need to complete the split now,
> so all features that currently stick to the top by using the remnants of
> bdrv_swap() need to be properly converted to BB instead. The reason for
> that is that if the feature is supposed to be logically part of the BB
> level, different BBs on the same BDS can have different setting.
> If we decide that it shouldn't be in the BB level in the first place,
> instead of converting the feature, I can simply drop it from the
> bdrv_swap remnants.
>> You gave an example where COR should stay put. Do we know of any use of
>> COR where sticking to the top makes sense?
> To be honest, I'm not sure if any COR users exist out there. The main
> motivation for it was that the streaming block job uses it internally.
> The example I gave is what Anthony used to give when the feature was
>> In general, having the block layer move things around implicitly when
>> the user adds a BDS or BB is prone to create awkward questions like "is
>> this the right move for all possible user intents?" I hope that the
>> ongoing rework will lead to less implicit magic and more explicit
> Implicit magic is becoming harder to implement as I remove bs->blk, so I
> think we're on the right way there.
>>> * Dirty bitmaps:
>>> We're currently trying, and if I'm not mistaken failing, to move dirty
>>> bitmaps to the top. The (back then one) bitmap was first added to the
>>> list in Paolo's commit a9fc4408, with the following commit message:
>>> While these should not be in use at the time a transaction is
>>> started, a command in the prepare phase of a transaction might have
>>> added them, so they need to be brought over.
>>> At that point, there was no transactionable command that did this in
>>> the prepare phase. Today we have mirror and backup, but op blockers
>>> should prevent them from being mixed with snapshots in a single
>>> transaction, so I can't see how this change had any effect.
>>> The reason why I think we're failing to move dirty bitmaps to the top
>>> today is that we're moving the head of the list to a different object
>>> without updating the prev link in the first element, so in any case
>>> it's buggy today.
>>> I really would like to keep bitmaps on the BDS where they are, but
>>> unfortunately, we also have user-defined bitmaps by now, and if we
>>> change whether they stick with the top level, that's a change that is
>>> visible on the QMP interface.
>>> On the other hand, the QMP interface clearly describes bitmaps as
>>> belonging to a node rather than a BB (you can use node-name, even with
>>> no BB attached), so moving them could be considered a bug, even if
>>> it is the existing behaviour.
>> You just told us moving doesn't work. Did it ever work in any release
>> that also provides the QMP interface in question?
> The feature was introduced in 2.4 (commit 341ebc2f) and I think I broke
> it in time for 2.5 (the bdrv_swap() removal in dd62f1ca and the
> following two patches).
>> If no, existing behavior doesn't matter :)
>> If yes, the interface might be new enough to permit incompatible design
>> flaw fixes. Paolo thinks bitmaps haven't been used widely. Discuss
>> with their known users?
> Who knows the known users? John? (CCed)
Libvirt doesn't use it, I do not know of any scripts or programs that
attempt to use this interface yet. I have even discouraged people from
using it in production setups because of the lack of migration and
If any users do exist, I think we are within our trial period to say
"Sorry, necessary bugfix."
See the other tail on this thread for my thoughts on solutions.
A thought: If users DID use it on 2.4, they have no way to migrate to
2.5+ anyway! We consider the existence of migration and persistence
necessary to call the feature "supported."
I should amend the wiki to make this very, very, very clear.
>>> I can imagine use cases for both ways, so the interface that would
>>> make the most sense to me is to generally keep BDSes at their node,
>>> and to provide a QMP command to move them to a different one.
>> Explicit control instead of implicit magic --- yes, please.
>>> With compatibility in mind, this seems to be a reall tough one,
>>> Any comments or ideas how to proceed with those two?
>> Hope I could help a little.
> Yes, thanks. I hope my answers to your questions give you a clearer
> picture, too.