qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/2] block: allow live commit of active image


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v3 1/2] block: allow live commit of active image
Date: Wed, 18 Sep 2013 11:36:14 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Sep 18, 2013 at 11:32:31AM +0800, Fam Zheng wrote:
> On Wed, 09/04 14:35, Stefan Hajnoczi wrote:
> > On Thu, Aug 15, 2013 at 04:14:06PM +0800, Fam Zheng wrote:
> > > diff --git a/block/commit.c b/block/commit.c
> > > index 2227fc2..b5e024b 100644
> > > --- a/block/commit.c
> > > +++ b/block/commit.c
> > > @@ -17,14 +17,13 @@
> > >  #include "block/blockjob.h"
> > >  #include "qemu/ratelimit.h"
> > >  
> > > -enum {
> > > -    /*
> > > -     * Size of data buffer for populating the image file.  This should 
> > > be large
> > > -     * enough to process multiple clusters in a single call, so that 
> > > populating
> > > -     * contiguous regions of the image is efficient.
> > > -     */
> > > -    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
> > > -};
> > > +/*
> > > + * Size of data buffer for populating the image file.  This should be 
> > > large
> > > + * enough to process multiple clusters in a single call, so that 
> > > populating
> > > + * contiguous regions of the image is efficient.
> > > + */
> > > +#define COMMIT_BUFFER_SECTORS 128
> > > +#define COMMIT_BUFFER_BYTES (COMMIT_BUFFER_SECTORS * BDRV_SECTOR_SIZE)
> > 
> > Changing from 512 KB to 64 KB can affect performance.  8 times as many
> > iops may be issued to copy data.
> > 
> > Also, the image's cluster size should really be taken into account.
> > Otherwise additional inefficiency will be suffered when we populate a
> > 128 KB cluster with a COMMIT_BUFFER_SECTORS (64 KB) write only to
> > overwrite the remaining part in the next loop iteration.
> > 
> > This can be solved by setting dirty bitmap granularity to cluster size
> > or 64 KB minimum *and* finding continuous runs of dirty bits so larger
> > I/Os can be performed by the main loop (up to 512 KB in one request).
> > 
> > >  #define SLICE_TIME 100000000ULL /* ns */
> > >  
> > > @@ -34,11 +33,27 @@ typedef struct CommitBlockJob {
> > >      BlockDriverState *active;
> > >      BlockDriverState *top;
> > >      BlockDriverState *base;
> > > +    BlockDriverState *overlay;
> > >      BlockdevOnError on_error;
> > >      int base_flags;
> > >      int orig_overlay_flags;
> > > +    bool should_complete;
> > > +    bool ready;
> > 
> > Why introduce the ready state when the active layer is being committed?
> > 
> > There is no documentation update that mentions the job will not complete
> > by itself if the top image is active.
> > 
> > > +    for (;;) {
> > > +        int64_t cnt = bdrv_get_dirty_count(s->top);
> > > +        if (cnt == 0) {
> > > +            if (!s->overlay && !s->ready) {
> > > +                s->ready = true;
> > > +                block_job_ready(&s->common);
> > >              }
> > > -            ret = commit_populate(top, base, sector_num, n, buf);
> > > -            bytes_written += n * BDRV_SECTOR_SIZE;
> > > +            /* We can complete if user called complete job or the job is
> > > +             * committing non-active image */
> > > +            if (s->should_complete || s->overlay) {
> > > +                break;
> > 
> > This termination condition is not safe:
> > 
> > A write request only marks the dirty bitmap upon completion.  A guest
> > write request could still be in flight so we get cnt == 0 but we
> > actually have not copied all data into the base.
> 
> Can we mark the dirty bitmap immediately upon getting guest write request?

No, because then the bit might get cleared before the request completes.
The actual request might not hit the disk straight away - it could yield
on an image format coroutine mutex.

We could do the equivalent of drain asynchronously: get a callback when
there are no requests.  There is also a stricter form of this with a
guarantee that the guest cannot make us wait forever: "freeze" the block
device so new requests will yield immediately until the device is
unfrozen.  Now a guest cannot stop us from completing by continuously
submitting requests.

Note that freeze has the disadvantage that the guest might time out if
we don't unfreeze the device soon.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]