qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] mirror: Improve zero-write and discard with fra


From: Fam Zheng
Subject: Re: [Qemu-block] [PATCH] mirror: Improve zero-write and discard with fragmented image
Date: Tue, 10 Nov 2015 14:14:58 +0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, 11/09 17:29, Kevin Wolf wrote:
> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
> > 
> > 
> > On 09/11/2015 17:04, Kevin Wolf wrote:
> > > Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> > >> The "pnum < nb_sectors" condition in deciding whether to actually copy
> > >> data is unnecessarily strict, and the qiov initialization is
> > >> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> > >> branches.
> > >>
> > >> Reorganize mirror_iteration flow so that we:
> > >>
> > >>     1) Find the contiguous zero/discarded sectors with
> > >>     bdrv_get_block_status_above() before deciding what to do. We query
> > >>     s->buf_size sized blocks at a time.
> > >>
> > >>     2) If the sectors in question are zeroed/discarded and aligned to
> > >>     target cluster, issue zero write or discard accordingly. It's done
> > >>     in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> > >>
> > >>     3) Otherwise, do the same loop as before in mirror_do_read.
> > >>
> > >> Signed-off-by: Fam Zheng <address@hidden>
> > > 
> > > I'm not sure where in the patch to comment on this, so I'll just do it
> > > here right in the beginning.
> > > 
> > > I'm concerned that we need to be more careful about races in this patch,
> > > in particular regarding the bitmaps. I think the conditions for the two
> > > bitmaps are:
> > > 
> > > * Dirty bitmap: We must clear the bit after finding the next piece of
> > >   data to be mirrored, but before we yield after getting information
> > >   that we use for the decision which kind of operation we need.
> > > 
> > >   In other words, we need to clear the dirty bitmap bit before calling
> > >   bdrv_get_block_status_above(), because that's both the function that
> > >   retrieves information about the next chunk and also a function that
> > >   can yield.
> > > 
> > >   If after this point the data is written to, we need to mirror it
> > >   again.
> > 
> > With Fam's patch, that's not trivial for two reasons:
> > 
> > 1) bdrv_get_block_status_above() can return a smaller amount than what
> > is asked.
> > 
> > 2) the "read and write" case can handle s->granularity sectors per
> > iteration (many of them can be coalesced, but still that's how the
> > iteration works).
> > 
> > The simplest solution is to perform the query with s->granularity size
> > rather than s->buf_size.
> 
> Then we end up with many small operations, that's not what we want.
> 
> Why can't we mark up to s->buf_size dirty clusters as clean first, then
> query the status, and mark all of those that we can't handle dirty
> again?

Then we may end up marking more clusters as dirty than it should be.

Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are coroutine,
we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
bdrv_set_dirty_bitmap.

Fam

> 
> Kevin
> 
> > > * In-flight bitmap: We need to make sure that we never mirror the same
> > >   data twice at the same time as older data could overwrite newer data
> > >   then.
> > > 
> > >   Strictly speaking, it looks to me as if this meant we can delay
> > >   setting the bit until before we issue an AIO operation. It might be
> > >   more obviously correct to set it at the same time as the dirty bitmap
> > >   is cleared.
> > 
> > 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]