[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap. |
Date: |
Thu, 20 Jan 2011 10:15:20 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10 |
Am 20.01.2011 06:19, schrieb Yoshiaki Tamura:
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov,
>>>>> + blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb,
>>>>> + blk_req->reqs[0].opaque);
>>>>
>>>> Same here.
>>>>
>>>>> + bdrv_flush(bs);
>>>>
>>>> This looks really strange. What is this supposed to do?
>>>>
>>>> One point is that you write it immediately after bdrv_aio_write, so you
>>>> get an fsync for which you don't know if it includes the current write
>>>> request or if it doesn't. Which data do you want to get flushed to the
>>>> disk?
>>>
>>> I was expecting to flush the aio request that was just initiated.
>>> Am I misunderstanding the function?
>>
>> Seems so. The function names don't use really clear terminology either,
>> so you're not the first one to fall in this trap. Basically we have:
>>
>> * qemu_aio_flush() waits for all AIO requests to complete. I think you
>> wanted to have exactly this, but only for a single block device. Such a
>> function doesn't exist yet.
>>
>> * bdrv_flush() makes sure that all successfully completed requests are
>> written to disk (by calling fsync)
>>
>> * bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run
>> the fsync in the thread pool
>
> Then what I wanted to do is, call qemu_aio_flush first, then
> bdrv_flush. It should be like live migration.
Okay, that makes sense. :-)
>>>> The other thing is that you introduce a bdrv_flush for each request,
>>>> basically forcing everyone to something very similar to writethrough
>>>> mode. I'm sure this will have a big impact on performance.
>>>
>>> The reason is to avoid inversion of queued requests. Although
>>> processing one-by-one is heavy, wouldn't having requests flushed
>>> to disk out of order break the disk image?
>>
>> No, that's fine. If a guest issues two requests at the same time, they
>> may complete in any order. You just need to make sure that you don't
>> call the completion callback before the request really has completed.
>
> We need to flush requests, meaning aio and fsync, before sending
> the final state of the guests, to make sure we can switch to the
> secondary safely.
In theory I think you could just re-submit the requests on the secondary
if they had not completed yet.
But you're right, let's keep things simple for the start.
>> I'm just starting to wonder if the guest won't timeout the requests if
>> they are queued for too long. Even more, with IDE, it can only handle
>> one request at a time, so not completing requests doesn't sound like a
>> good idea at all. In what intervals is the event-tap queue flushed?
>
> The requests are flushed once each transaction completes. So
> it's not with specific intervals.
Right. So when is a transaction completed? This is the time that a
single request will take.
>> On the other hand, if you complete before actually writing out, you
>> don't get timeouts, but you signal success to the guest when the request
>> could still fail. What would you do in this case? With a writeback cache
>> mode we're fine, we can just fail the next flush (until then nothing is
>> guaranteed to be on disk and order doesn't matter either), but with
>> cache=writethrough we're in serious trouble.
>>
>> Have you thought about this problem? Maybe we end up having to flush the
>> event-tap queue for each single write in writethrough mode.
>
> Yes, and that's what I'm trying to do at this point.
Oh, I must have missed that code. Which patch/function should I look at?
> I know that
> performance matters a lot, but sacrificing reliability over
> performance now isn't a good idea. I first want to lay the
> ground, and then focus on optimization. Note that without dirty
> bitmap optimization, Kemari suffers a lot in sending rams.
> Anthony and I discussed to take this approach at KVM Forum.
I agree, starting simple makes sense.
Kevin
- [Qemu-devel] [PATCH 10/19] Call init handler of event-tap at main() in vl.c., (continued)
- [Qemu-devel] [PATCH 10/19] Call init handler of event-tap at main() in vl.c., Yoshiaki Tamura, 2011/01/19
- [Qemu-devel] [PATCH 02/19] Introduce read() to FdMigrationState., Yoshiaki Tamura, 2011/01/19
- [Qemu-devel] [PATCH 01/19] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer()., Yoshiaki Tamura, 2011/01/19
- [Qemu-devel] [PATCH 08/19] savevm: introduce util functions to control ft_trans_file from savevm layer., Yoshiaki Tamura, 2011/01/19
- [Qemu-devel] [PATCH 07/19] Introduce fault tolerant VM transaction QEMUFile and ft_mode., Yoshiaki Tamura, 2011/01/19
- [Qemu-devel] [PATCH 09/19] Introduce event-tap., Yoshiaki Tamura, 2011/01/19
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Kevin Wolf, 2011/01/19
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Yoshiaki Tamura, 2011/01/19
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Kevin Wolf, 2011/01/19
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Yoshiaki Tamura, 2011/01/20
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.,
Kevin Wolf <=
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Yoshiaki Tamura, 2011/01/20
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Kevin Wolf, 2011/01/20
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Yoshiaki Tamura, 2011/01/20
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Kevin Wolf, 2011/01/20
- Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap., Yoshiaki Tamura, 2011/01/20