[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
From: |
Avi Kivity |
Subject: |
Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy |
Date: |
Tue, 01 Mar 2011 11:39:27 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Thunderbird/3.1.7 |
On 02/28/2011 08:12 PM, Anthony Liguori wrote:
On Feb 28, 2011 11:47 AM, "Avi Kivity" <address@hidden
<mailto:address@hidden>> wrote:
>
> On 02/28/2011 07:33 PM, Anthony Liguori wrote:
>>
>>
>> >
>> > You're just ignoring what I've written.
>>
>> No, you're just impervious to my subtle attempt to refocus the
discussion on solving a practical problem.
>>
>> There's a lot of good, reasonably straight forward changes we can
make that have a high return on investment.
>>
>
> Is making qemu the authoritative source of configuration information
a straightforward change? Is the return on it high? Is the
investment low?
I think this is where we fundamentally disagree. My position is that
QEMU is already the authoritative source. Having a state file doesn't
change anything.
Do a hot unplug of a network device with upstream libvirt with acpiphp
unloaded, consult libvirt and then consult the monitor to see who has
the right view of the guests config.
libvirt is right and the monitor is wrong.
On real hardware, calling _EJ0 doesn't affect the configuration one
little bit (if I understand it correctly). It just turns off power to
the slot. If you power-cycle, the card will be there.
In the real world, the authoritative source of configuration is a human
with a screwdriver. The virtualized equivalent is the management tool.
To me, that's the definition of authoritative.
> "No" to all three (ignoring for the moment whether it is good or
not, which we were debating).
>
>
>> The only suggestion I'm making beyond Marcelo's original patch is
that we use a structured format and that we make it possible to use
the same file to solve this problem in multiple places.
>>
>
> No, you're suggesting a lot more than that.
That's exactly what I'm suggesting from a technical perspective.
Unless I'm hallucinating, you're suggesting quite a bit more. A
revolution in how qemu is to be managed.
>> I don't think this creates a fundamental break in how management
tools interact with QEMU. I don't think introducing RAID support in
the block layer is a reasonable alternative.
>>
>>
>
> Why not?
Because its a lot of complexity and code that can go wrong while only
solving the race for one specific case. Not to mention that we double
the iop rate.
IMO it's of similar complexity. The number of I/Os don't change (reads
stay the same, and any write that has already been mirrored needs to be
re-mirrored in both cases. We do gain lower latency switchover and we
package the code as a block format driver instead of core block code.
We decouple the dependencies from live migration.
> Something that avoids the whole state thing altogether:
>
> - instead of atomically switching when live copy is done, keep on
issuing writes to both the origin and the live copy
> - issue a notification to management
> - management receives the notification, and issues an atomic
blockdev switch command
> this is really the RAID-1 solution but without the state file
(credit Dor). An advantage is that there is no additional latency
when trying to catch up to the dirty bitmap.
It still suffers from the two generals problem. You cannot solve this
without making one node reliable and that takes us back to it being
either QEMU (posted event and state file) or the management tool (sync
event).
It works without either. If qemu fails, you simply re-mirror
everything. If the management tool fails, it re-subscribes to the
mirror-complete event, queries whether it already happened in its
absence, and if it did, requests the switchover.
--
error compiling committee.c: too many arguments to function