qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID any


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore
Date: Fri, 11 Jan 2019 14:22:14 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

On 10.01.19 12:41, Dr. David Alan Gilbert wrote:
> * Kevin Wolf (address@hidden) wrote:
>> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
>>> On 1/9/19 12:51 PM, Kevin Wolf wrote:
>>>
>>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
>>>>> human-monitor-command, since there is no QMP counterpart for internal
>>>>> snapshot.  Even though lately we consistently tell people that internal
>>>>> snapshots are underdeveloped and you should use external snapshots, it
>>>>> does not get away from the fact that libvirt has been using 'savevm' to
>>>>> drive internal snapshots for years now, and that we MUST consider
>>>>> back-compat and/or add an introspectible QMP interface before making
>>>>> changes that would break libvirt.
>>>>
>>>> Okay, so what does libvirt do when you request a snapshot with a
>>>> numerical name? Without having looked at the code, the best case I would
>>>> expect that it forbids them, and more realistically I suspect that we
>>>> may actually fix a bug for libvirt by changing the semantics.
>>>>
>>>> Or does libvirt really use snapshot IDs rather than names?
>>>
>>> At the moment, libvirt does not place any restriction on internal
>>> snapshot names, but passes the user's name through without thought of
>>> whether it is an id or a name.
>>>
>>> So yes, arguably tightening the semantics fixes a libvirt bug for
>>> libvirt having allowed internal snapshots to get into an inconsistent
>>> state.
>>
>> So there are two scenarios to consider with respect to breaking
>> backwards compatibility:
>>
>> 1. There may be code out there that relies on numeric arguments being
>>    interpreted as IDs. This code will break if we make this change and
>>    numeric snapshot names exist. That such code exists is speculation
>>    (even though plausible) and we don't know how widespread it is.
>>
>> 2. There may be code out there that blindly assumes that whatever it
>>    passes is interpreted as a name. Nobody considered that with a
>>    numeric snapshot name, it maybe misinterpreted as an ID. We know that
>>    this is true for libvirt, the single most used management tool for
>>    QEMU. More code like this may (and probably does) exist.
>>
>> Essentially the same two categories exist for human users: those who
>> somehow found out that QEMU accepts IDs as monitor command arguments and
>> are using those (mitigated by not displaying IDs any more), and those
>> who are trapped because they wanted to access a numeric name, but
>> surprisingly got it interpreted as an ID. Both are speculation to some
>> degree, but my guess is that the latter group is larger.
>>
>> Given all this, this is a no-brainer for me. We simplify the interface,
>> we simplify the code, we make things less confusing for manual users and
>> we fix the management tool that everybody uses. How could we not make
>> this change?

So you're trying to make a bug fix out of this now to get around
deprecation?  To me, changing behavior in qemu to fix a bug in libvirt
doesn't equate to fixing a bug in qemu.  So let's try to find a real bug
in qemu.

>>> But again, it falls in the category of "properly fixing this
>>> requires a lot of auditing, time, mental power, and unit tests", which
>>> makes it a rather daunting task to state for certain.
>>
>> Fix the world before we can fix anything?
> 
> Can't we break this loop for the savevm command fairly easily; it's
> currently documnted as:
> 
> savevm [tag|id]

The bug starts here.  The code doesn't match the interface description,
because this to me as a user suggests that "tag" has precedence over
"id" (same for loadvm and delvm).  But let's go over the strange
behavior for each:

savevm deletes all snapshots where either tag or ID match.[1]  I think
that's buggy because it should replace only one snapshot, not
accidentally remove something totally different.  It then...  Tries to
find that snapshot? and overwrite it with the new one.  Which makes no
sense because it has just deleted that snapshot, if it existed.[2]
So savevm will then just create a new snapshot with the given tag.

So it seems clear that savevm does not accept IDs, but really just tags.
 It follows that it should not try to delete all snapshots where the ID
matches the given tag.


loadvm uses bdrv_all_find_snapshot() -> bdrv_snapshot_find() where the
ID takes precedence over the tag.  This does not seem to match the
interface description, as I've said above.  So the first thing to do
clearly is to switch the comparison order in bdrv_snapshot_find().  Or,
well, apply this series because it removes the ID comparison altogether.
 But that wouldn't be needed for a bug fix.


delvm directly invokes bdrv_all_delete_snapshot() and thus shares the
same woes as savevm (see [1]).


[1] Actually, this is not quite true.  bdrv_all_delete_snapshot()
invokes bdrv_snapshot_delete_by_id_or_name() if bdrv_snapshot_find()
returned true.  So far, so good.  But
bdrv_snapshot_delete_by_id_or_name() will first try to delete a snapshot
with matching ID, and only if that failed it will delete a snapshot with
matching tag.  So if you have two snapshots { "0": "foo", "1": "0" } and
then bdrv_all_delete_snapshot("0"), only "foo" will be deleted.  So this
function doesn't really delete all snapshots that match...

[2] It follows from [1] that this is not quite true as well.  Take the
above example and savevm "0".  It will then delete snapshot "foo" and
indeed overwrite snapshot "0" (ID "1").  But that's just stupid.  It
looks like we have specialized code for after we have deleted an
unrelated snapshot for no reason whatsoever.


So, yes, there are bugs in qemu, but to fix them, we would need to
switch the comparison order in bdrv_snapshot_find() and fix
bdrv_snapshot_delete_by_id_or_name() to always invoke
bdrv_snapshot_delete() for both cases (ID and tag).  And we should
probably remove the special code path in savevm for overwriting an
existing snapshot.

This would also alleviate the bug in libvirt (because the tag then
always takes precedence, like the interface description suggests).


Sure, this series would fix the bugs as well, but it does more than
that.  It really isn't just a bug fix.

> If we made that:
> 
> savevm [-t] [-i] [tag|id]
> 
> then:
>   a) with neither -t or -i  it would behave in the same roulette way
>      as it does in the moment, and it might be a tag or id
> 
>   b) with -t we'd explicitly treat the parameter as a tag and it
>      would error if it wasn't found
> 
>   c) With -i we'd explicitly treat the parameter as an id and
>      it would error if it wasn't found
> 
> Since we still allow (a) it doesn't break any existing code.

I agree with the others that while this might solve issues on the
libvirt side of things, I think we also want to confuse end users less.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]