[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-b
From: |
Eric Blake |
Subject: |
Re: [Qemu-block] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based |
Date: |
Thu, 6 Jul 2017 11:24:27 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 |
On 07/06/2017 11:02 AM, Kevin Wolf wrote:
> Am 05.07.2017 um 23:08 hat Eric Blake geschrieben:
>> We are gradually moving away from sector-based interfaces, towards
>> byte-based. In the common case, allocation is unlikely to ever use
>> values that are not naturally sector-aligned, but it is possible
>> that byte-based values will let us be more precise about allocation
>> at the end of an unaligned file that can do byte-based access.
>>
>> Changing the signature of the function to use int64_t *pnum ensures
>> that the compiler enforces that all callers are updated. For now,
>> the io.c layer still assert()s that all callers are sector-aligned
>> on input and that *pnum is sector-aligned on return to the caller,
>> but that can be relaxed when a later patch implements byte-based
>> block status. Therefore, this code adds usages like
>> DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
>> values, where the call might reasonbly give non-aligned results
>> in the future; on the other hand, no rounding is needed for callers
>> that should just continue to work with byte alignment.
>>
>> For the most part this patch is just the addition of scaling at the
>> callers followed by inverse scaling at bdrv_is_allocated(). But
>> some code, particularly bdrv_commit(), gets a lot simpler because it
>> no longer has to mess with sectors; also, it is now possible to pass
>> NULL if the caller does not care how much of the image is allocated
>> beyond the initial offset.
>>
>> For ease of review, bdrv_is_allocated_above() will be tackled
>> separately.
>>
>> @@ -1036,14 +1036,15 @@ static int coroutine_fn
>> bdrv_aligned_preadv(BdrvChild *child,
>>
>> - if (!ret || pnum != nb_sectors) {
>> + if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) {
>> ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
>> goto out;
>> }
>
> This code would become a lot simpler if you just removed the local
> variables start_sector/end_sector and used the byte offsets instead that
> are available in this function. (And even simpler once sector alignment
> isn't required any more for bdrv_is_allocated(). Should we leave a
> comment here so we won't forget the conversion?)
Hmm - that cleanup is not (yet) done in my bdrv_get_block_status series
(where I finally drop the need for sector alignment in
bdrv_is_allocated). Yes, I can add a temporary comment here (that aids
with the understanding of intermediate states), as well as a new
followup patch that actually does the cleanup.
>> -int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
>> - int nb_sectors, int *pnum)
>> +int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
>> + int64_t bytes, int64_t *pnum)
>> {
>> BlockDriverState *file;
>> - int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum,
>> - &file);
>> + int64_t sector_num = offset >> BDRV_SECTOR_BITS;
>> + int nb_sectors = bytes >> BDRV_SECTOR_BITS;
>> + int64_t ret;
>> + int psectors;
>> +
>> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
>> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
>> + bytes < INT_MAX * BDRV_SECTOR_SIZE);
>
> I think we can actually assert bytes <= INT_MAX. Both ways to write the
> assertion are only true because of BDRV_REQUEST_MAX_SECTORS, which
> ensures that the byte counts fit in an int.
That's a tighter assertion, but if it passes, I'm fine using it.
>
> Some of the places that call bdrv_is_allocated() actually depend on this
> because your patches tend to use nb_sectors << BDRV_SECTOR_BITS (which
> truncates) rather than nb_sectors * BDRV_SECTOR_BYTES (which is always a
> 64 bit calculation).
Actually, for this series of cleanups, I tried really hard to use *
_BYTES everywhere that I widen sectors to bytes, and limit myself to >>
_BITS for going from bytes back to sectors (after also checking if I
needed DIV_ROUND_UP instead of straight division). If you spot places
where I'm still doing a 32-bit << _BITS, please call it out (as I do
remember that I caused bugs that way back when I converted pdiscard to
byte-based, back near commit d2cb36af).
>> +++ b/block/stream.c
>> @@ -137,6 +137,7 @@ static void coroutine_fn stream_run(void *opaque)
>>
>> for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
>> bool copy;
>> + int64_t count = 0;
>>
>> /* Note that even when no rate limit is applied we need to yield
>> * with no pending I/O here so that bdrv_drain_all() returns.
>> @@ -148,8 +149,8 @@ static void coroutine_fn stream_run(void *opaque)
>>
>> copy = false;
>>
>> - ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
>> - STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
>> + ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count);
>> + n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>> if (ret == 1) {
>> /* Allocated in the top, no need to copy. */
>> } else if (ret >= 0) {
>
> I'm not sure if I agree with rounding up here. If ret == 0, it could
> mean that we skip some data that is actually allocated. So I'd like an
> assertion better.
>
> The last patch of the series gets rid of this by switching the whole
> function to bytes, so not that important.
A temporary assertion is fine (and you're right that the assertion goes
away once the function uses bytes everywhere).
Sounds like we're racking up enough cleanups that I'll spin a v5, but
also sounds like we're real close to getting this in.
>> @@ -274,9 +275,9 @@ static int mig_save_device_bulk(QEMUFile *f,
>> BlkMigDevState *bmds)
>> /* Skip unallocated sectors; intentionally treats failure as
>> * an allocated sector */
>> while (cur_sector < total_sectors &&
>> - !bdrv_is_allocated(blk_bs(bb), cur_sector,
>> - MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
>> - cur_sector += nr_sectors;
>> + !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE,
>> + MAX_IS_ALLOCATED_SEARCH, &count)) {
>> + cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>> }
>> aio_context_release(blk_get_aio_context(bb));
>> qemu_mutex_unlock_iothread();
>
> This one stays and it has the same problem. I think you need to round
> down if you don't want to accidentally skip migrating allocated data.
I haven't (yet) tackled trying to rewrite this one to do byte-base
iteration, but that may help. I'm also worried (in general) that if
something truncates down a sub-sector result in code that is still
sector-based, then we could get stuck in an inf-loop of querying the
same sector offset over and over again instead of making progress.
But for this particular code, I see what you are saying - we are trying
to optimize the migration by skipping unallocated sectors, but if there
is nothing to skip, we still make migration progress as long as I
remember to break from the loop if I ever get a sub-sector answer (thus
migrating the whole sector, including the [unlikely] initial unallocated
portion). Okay, I'll fix it for v5.
>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index d5a1560..5271b41 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -3229,6 +3229,7 @@ static int img_rebase(int argc, char **argv)
>> int64_t new_backing_num_sectors = 0;
>> uint64_t sector;
>> int n;
>> + int64_t count;
>> float local_progress = 0;
>>
>> buf_old = blk_blockalign(blk, IO_BUF_SIZE);
>> @@ -3276,12 +3277,14 @@ static int img_rebase(int argc, char **argv)
>> }
>>
>> /* If the cluster is allocated, we don't need to take action */
>> - ret = bdrv_is_allocated(bs, sector, n, &n);
>> + ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS,
>> + n << BDRV_SECTOR_BITS, &count);
>> if (ret < 0) {
>> error_report("error while reading image metadata: %s",
>> strerror(-ret));
>> goto out;
>> }
>> + n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
>> if (ret) {
>> continue;
>> }
>
> Same thing. Sectors that are half allocated and half free must be
> considered completely allocated if the caller can't do byte alignment.
> Just rounding up without looking at ret isn't correct.
>
>> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
>> index b0ea327..e529b8f 100644
>> --- a/qemu-io-cmds.c
>> +++ b/qemu-io-cmds.c
>> @@ -1894,15 +1889,14 @@ static int map_f(BlockBackend *blk, int argc, char
>> **argv)
>> }
>>
>> retstr = ret ? " allocated" : "not allocated";
>> - cvtstr(num << BDRV_SECTOR_BITS, s1, sizeof(s1));
>> - cvtstr(offset << BDRV_SECTOR_BITS, s2, sizeof(s2));
>> + cvtstr(num, s1, sizeof(s1));
>> + cvtstr(offset, s2, sizeof(s2));
>> printf("%s (0x%" PRIx64 ") bytes %s at offset %s (0x%" PRIx64 ")\n",
>> - s1, num << BDRV_SECTOR_BITS, retstr,
>> - s2, offset << BDRV_SECTOR_BITS);
>> + s1, num, retstr, s2, offset);
>>
>> offset += num;
>> - nb_sectors -= num;
>> - } while (offset < total_sectors);
>> + bytes -= num;
>> + };
>
> This semicolon isn't needed.
Indeed. I converted do/while to plain while, but not quite correctly ;)
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
- [Qemu-block] [PATCH v4 16/21] backup: Switch block_backup.h to byte-based, (continued)
- [Qemu-block] [PATCH v4 16/21] backup: Switch block_backup.h to byte-based, Eric Blake, 2017/07/05
- [Qemu-block] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based, Eric Blake, 2017/07/05
- [Qemu-block] [PATCH v4 17/21] backup: Switch backup_do_cow() to byte-based, Eric Blake, 2017/07/05
- [Qemu-block] [PATCH v4 18/21] backup: Switch backup_run() to byte-based, Eric Blake, 2017/07/05
- [Qemu-block] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based, Eric Blake, 2017/07/05
- [Qemu-block] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors, Eric Blake, 2017/07/05
[Qemu-block] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based, Eric Blake, 2017/07/05