qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1


From: Kevin Wolf
Subject: Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Date: Mon, 21 May 2012 12:32:55 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1

Am 21.05.2012 12:02, schrieb Paolo Bonzini:
> Il 21/05/2012 11:29, Kevin Wolf ha scritto:
>>> * block-stream: I propose adding two options to the existing
>>> block-stream command.  If this is rejected, only mirroring will be able
>>> to use rerror/werror.
>>>
>>> The new options are of course rerror/werror.  They are enum options,
>>> with the following possible values:
>>
>> Do we really need separate werror/rerror? For guest operations they
>> really exist only for historical reasons: werror was there first, and
>> when we wanted the same functionality, it seemed odd to overload werror
>> to include reads as well.
>>
>> For block jobs, where there is no such option yet, we could go with a
>> single error option, unless there is a use case for separate
>> werror/rerror options.
> 
> For mirroring rerror=source and werror=target.  I'm not sure there is an
> actual usecase, but at least it is more interesting than for devices...

Hm. What if we add an active mirror? Then we can get some kind of COW,
and rerror can happen on the target as well.

If source/target is really the distinction we want to have, should the
available options be specific to the job type, so that you could have
src_error and dst_error for mirroring?

>> Just like with guest operations it's a mostly useless mode, do we really
>> need this option?
> 
> Perhaps we should remove it for guest operations as well; certainly it
> makes more sense (if any) for jobs than for guest operations.

The guest operation one I used once for debugging/reproducing a specific
error condition, and I'm not aware of other users. Maybe it can be
useful for jobs in similar contexts.

>>> 'stop': The VM *and* the job will be paused---the VM is stopped even if
>>> the block device has neither rerror=stop nor werror={stop,enospc}.  The
>>> error is recorded in the block device's iostatus (which can be examined
>>> with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
>>> job.
>>>
>>>   Rationale: stopping all I/O seems to be the best choice in order
>>>   to limit the number of errors received.  However, due to backwards-
>>>   compatibility with QEMU 1.1 we cannot pause the job when guest-
>>>   initiated I/O causes an error.  We could do that if the block
>>>   device has rerror=stop/werror={stop,enospc}, but it seems more
>>>   complicated to just never do it.
>>
>> I don't agree with stopping the VM. Consider a case where the target is
>> somewhere on the network and you lose the connection, but the primary
>> image is local on the hard disk. You don't want to stop the VM just
>> because continuing with the copy isn't possible for the moment.
> 
> I think this is something that management should resolve.

Management doesn't necessarily exist.

> For an error
> on the source, stopping the VM makes sense.  I don't think management
> cares about what caused an I/O error on a device.  Does it matter if
> streaming was active or rather the guest was executing "dd if=/dev/sda
> of=/dev/null".

Yes, there's a big difference: If it was a job, the guest can keep
running without any problems. If it was a guest operation, we would have
to return an error to the guest, which may offline the disk in response.

As long as the VM can keep running, we should let it run.

> Management may want to keep the VM stopped even for an error on the
> target, as long as mirroring has finished the initial synchronization
> step.  The VM can perform large amounts of I/O while the job is paused,
> and then completing the job can take a large amount of time.

If management wants to limit the impact of this, it can decide to
explicitly stop the VM when it receives the error event.

>> Of course, this means that you can't reuse the block device's io_status,
>> but you need a separate job_iostatus.
> 
> For mirroring, source and target are separate devices and have separate
> iostatuses anyway.
> 
>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
>> on at all. Do we really keep running the jobs in 1.1? If so, this is a
>> bug and should be fixed before the release.
> 
> Yes, we do.  Do you think it's a problem for migration (thinking more
> about it: ouch, yes, it should be)?

I'm pretty sure that it is a problem for migration. And it's likely a
problem in more cases.

> We have no pause/resume infrastructure, so we could simply force
> synchronous cancellation at the end (before vm_stop_force_state).
> Stefan, do you have any free cycle for this?

Not very nice, but considering that we're past -rc2 this might be the
only thing we can do now.

>>> * query-block-jobs: The returned JSON object will grow an additional
>>> member, "target".  The target field is a dictionary with two fields,
>>> "info" and "stats" (resembling the output of query-block and
>>> query-blockstat but for the mirroring target).  Member "device" of the
>>> BlockInfo structure will be made optional.
>>>
>>>   Rationale: this allows libvirt to observe the high watermark of qcow2
>>>   mirroring targets, and avoids putting a bad iostatus on a working
>>>   migration source.
>>
>> The mirroring target should be present in query-block instead. It is a
>> user-visible BlockDriverState
> 
> It is not user visible, and making it user visible adds a lot more
> things to worry about (e.g. making sure you cannot use it in a
> device_add).
> 
> It reminds me of Xen's renaming of domains (foo->migrating-foo and
> foo->zombie-foo), which was an endless source of pain.
> 
> I'd rather make the extension of query-block-jobs more generic, with a
> list "devices" instead of a member "target", and making up the device
> name in the implementation (so you have "device": "target" for mirroring).

Well, my idea for blockdev was something like (represented in a -drive
syntax because I don't know what it will look like):

(qemu) blockdev_add file=foo.img,id=src
(qemu) device_add virtio-blk-pci,drive=src
...
(qemu) blockdev_add file=bar.img,id=dst
(qemu) blockdev_mirror foo bar

Once QOM reaches the block layer, I guess we'll want to make all
BlockDriverStates user visible anyway.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]