[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] QCow2 compression
From: |
mgreger |
Subject: |
Re: [Qemu-devel] QCow2 compression |
Date: |
Sat, 5 Mar 2016 0:11:44 +0000 |
---- Eric Blake <address@hidden> wrote:
> [any way you could convince your mailer to not break threading?]
>
> On 03/03/2016 09:24 PM, address@hidden wrote:
> >>
> >> The zeros are not part of the compressed data, though, that's why the
> >> Compressed Cluster Descriptor indicates a shorter size. Had another
> >> compressed cluster been written to the same image, it might have ended
> >> up where you are seeing the zero padding now. (The trick with
> >> compression is that multiple guest clusters can end up in a single host
> >> cluster.)
> >>
> >
> > Thanks, but the given length of 0x5600 is still short by 160(decimal) bytes
> > compared to the
> > non-zero data (which occupies an additional 133 bytes beyond the expected
> > end at
> > 0x3DED50) and zero
> > padding (an additional 27 bytes beyond that). Could there be an off-by-one
> > error
> > somewhere?
> > The file doesn't even end on a sector boundary let alone a cluster
> > boundary.
>
> Based on an IRC conversation I had when you first asked the question, I
> think the spec is indeed weak, and that we DO have some fishy code.
>
> Look at what the code does:
>
> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
> uint64_t offset,
> int compressed_size)
> ...
> nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
> (cluster_offset >> 9);
>
> cluster_offset |= QCOW_OFLAG_COMPRESSED |
> ((uint64_t)nb_csectors << s->csize_shift);
>
> That sure does sound like an off-by-one. cluster_offset does indeed
> look like a byte offset (from qcow2_alloc_bytes); so let's consider what
> happens if we've already allocated one compressed cluster in the past,
> going from 65536 to 66999. So on this call, cluster_offset would be
> 67000, and if compressed size is 1025 (just round numbers to make
> discussion easy; no idea if gzip would really do this on any particular
> data), we are computing ((67000+1025-1)>>9) - (67000>>9) == 2, but 1025
> bytes occupies 3 sectors, not 2.
>
> But we offset it by another off-by-one:
>
> int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
> {
> ...
> nb_csectors = ((cluster_offset >> s->csize_shift) &
> s->csize_mask) + 1;
>
> Yuck. That is horribly ugly.
>
> We need to fix the documentation to make it obvious that the sector
> count is a _LOWER BOUND_ on the number of sectors occupied, and that you
> need to read at least one more cluster's worth of data before decompressing.
>
> It would also be nice to fix qcow2 code to not have quite so many
> off-by-one computations, but be more precise about what data is going where.
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Thanks for verifying the behavior for me. This will
allow me to add compression support to the code I've
already written.
I've implemented QCow2 support for a fork of the popular
DOSBox emulator.
The project (not mine), is called DOSBox-X, and is geared
towards emulating Windows9x era hardware with a specific
focus on games: http://dosbox-x.com/
If you'd like to give me any feedback on the QCow2 code I've written
please feel free to e-mail me directly. The relevant files are
qcow2_disk.cpp and qcow_disk.h. If you do, please be kind as
my C++ skills are very rusty. Overall I think it is pretty readable.
It took me about 2 and a half weeks working in my spare time to go
from reading the spec to the final code integrated into DOSBox-X.
Thanks,
Michael Greger