Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block

From:	Avi Kivity
Subject:	Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Date:	Sun, 12 Sep 2010 15:40:23 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Lightning/1.0b2pre Thunderbird/3.1.2

 On 09/12/2010 03:25 PM, Anthony Liguori wrote:

On 09/12/2010 07:41 AM, Avi Kivity wrote:

 On 09/07/2010 05:57 PM, Anthony Liguori wrote:

I agree that streaming should be generic, like block migration.  The
trivial generic implementation is:

void bdrv_stream(BlockDriverState* bs)
{
     for (sector = 0; sector<  bdrv_getlength(bs); sector += n) {
         if (!bdrv_is_allocated(bs, sector,&n)) {

Three problems here. First problem is that bdrv_is_allocated issynchronous.


Put the whole thing in a thread.

It doesn't fix anything. You don't want stream to serialize all I/Ooperations.

Why would it serialize all I/O operations? It's just like another vcpuissuing reads.

The second problem is that streaming makes the most sense when it'sthe smallest useful piece of work whereas bdrv_is_allocated() mayreturn a very large range.
You could cap it here but you then need to make sure that cap is atleast cluster_size to avoid a lot of unnecessary I/O.
That seems like a nice solution. You probably want a multiple of thecluster size to retain efficiency.
What you basically do is:

stream_step_three():
   complete()

stream_step_two(offset, length):
   bdrv_aio_readv(offset, length, buffer, stream_step_three)

bdrv_aio_stream():
    bdrv_aio_find_free_cluster(stream_step_two)


Isn't there a write() missing somewhere?

And that's exactly what the current code looks like. The only changeto the patch that this does is make some of qed's internals be blocklayer interfaces.

Why do you need find_free_cluster()? That's a physical offset thing.Just write to the same logical offset.


IOW:

    bdrv_aio_stream():
        bdrv_aio_read(offset, stream_2)

    stream_2():
        if all zeros:
            increment offset
            if more:
                bdrv_aio_stream()
       bdrv_aio_write(offset, stream_3)

    stream_3():
        bdrv_aio_write(offset, stream_4)

    stream_4():
        increment offset
        if more:
             bdrv_aio_stream()

Of course, need to serialize wrt guest writes, which adds a bit morecomplexity. I'll leave it to you to code the state machine for that.

One of the things Stefan has mentioned is that a lot of the QED codecould be reused by other formats. All formats implement things likeCoW on their own today but if you exposed interfaces likebdrv_aio_find_free_cluster(), you could actually implement a lot morein the generic block layer.
So, I agree with you in principle that this all should be commoncode. I think it's a larger effort though.


Not that large I think; and it will make commit async as a side effect.

The QED streaming implementation is 140 LOCs too so you quickly endup adding more code to the block formats to support these newinterfaces than it takes to just implement it in the block format.
bdrv_is_allocated() already exists (and is needed for commit), whatelse is needed? cluster size?
Synchronous implementations are not reusable to implement asynchronousanything.


Surely this is easy to fix, at least for qed.

What we need is thread infrastructure that allows us to convert betweenthe two methods.

But you need the code to be cluster aware too.


Yes, another variable in BlockDriverState.

Third problem is that streaming really requires being able to dozero write detection in a meaningful way. You don't want to alwaysdo zero write detection so you need another interface to mark aspecific write as a write that should be checked for zeros.
You can do that in bdrv_stream(), above, before the actual write, andcall bdrv_unmap() if you detect zeros.
My QED branch now does that FWIW. At the moment, it only detects zeroreads to unallocated clusters and writes a special zero clustermarker. However, the detection code is in the generic path so oncethe fsck() logic is working, we can implement a free list in QED.
In QED, the detection code needs to have a lot of knowledge aboutcluster boundaries and the format of the device. In principle, thisshould be common code but it's not for the same reason copy-on-writeis not common code today.


Parts of it are: commit.  Of course, that's horribly synchronous.

--
error compiling committee.c: too many arguments to function

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, (continued)
- Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Kevin Wolf, 2010/09/07
  - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Stefan Hajnoczi, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Stefan Hajnoczi, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Avi Kivity, 2010/09/12
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/12
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Avi Kivity <=
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/12
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Avi Kivity, 2010/09/12
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/12
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Avi Kivity, 2010/09/12
  - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Kevin Wolf, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Kevin Wolf, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Anthony Liguori, 2010/09/07
    - Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration, Kevin Wolf, 2010/09/07

Prev by Date: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Next by Date: [Qemu-devel] Re: [PATCH v2 2/9] pcie: helper functions for pcie extended capability.
Previous by thread: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Next by thread: Re: [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Index(es):
- Date
- Thread