qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/7] block: initial VHDX driver support framewor


From: Jeff Cody
Subject: Re: [Qemu-devel] [PATCH 4/7] block: initial VHDX driver support framework - supports open and probe
Date: Thu, 7 Mar 2013 10:23:54 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Mar 07, 2013 at 03:30:44PM +0100, Stefan Hajnoczi wrote:
> On Wed, Mar 06, 2013 at 09:48:11AM -0500, Jeff Cody wrote:
> > +#define leguid_to_cpus(guid) do { \
> > +    le32_to_cpus(&(guid)->data1); \
> > +    le16_to_cpus(&(guid)->data2); \
> > +    le16_to_cpus(&(guid)->data3); } while (0)
> 
> This should be a function.  Please avoid macros.
> 

OK

> > +static const ms_guid logical_sector_guid = {.data1 = 0x8141bf1d,
> > +                                            .data2 = 0xa96f,
> > +                                            .data3 = 0x4709,
> > +                                           .data4 = { 0xba, 0x47, 0xf2, 
> > 0x33,
> 
> Indentation
> 

OK.  I was bumping up against the 80 char limit, on this and the other
one you flagged.  I'll figure something out.

> > +                                                      0xa8, 0xfa, 0xab, 
> > 0x5f} };
> > +
> > +static const ms_guid phys_sector_guid =  { .data1 = 0xcda348c7,
> > +                                           .data2 = 0x445d,
> > +                                           .data3 = 0x4471,
> > +                                           .data4 = { 0x9c, 0xc9, 0xe9, 
> > 0x88,
> > +                                                      0x52, 0x51, 0xc5, 
> > 0x56} };
> > +
> > +static const ms_guid parent_locator_guid = {.data1 = 0xa8d35f2d,
> > +                                            .data2 = 0xb30b,
> > +                                            .data3 = 0x454d,
> > +                                           .data4 = { 0xab, 0xf7, 0xd3, 
> > 0xd8,
> 
> Indentation
> 
> > +typedef struct BDRVVHDXState {
> > +    CoMutex lock;
> > +
> > +    int curr_header;
> > +    vhdx_header *headers[2];
> > +
> > +    vhdx_region_table_header rt;
> > +    vhdx_region_table_entry bat_rt;         /* region table for the BAT */
> > +    vhdx_region_table_entry metadata_rt;    /* region table for the 
> > metadata */
> > +    vhdx_region_table_entry *unknown_rt;
> > +    unsigned int unknown_rt_size;
> 
> This field isn't used yet but "unsigned int" for something that has
> "size" in its name is suspicious.  Should it be size_t (memory) or
> uint64_t (disk)?
> 

Yes, good catch - although I'll just drop them, since they are not
used.

> > +static int vhdx_probe(const uint8_t *buf, int buf_size, const char 
> > *filename)
> > +{
> > +    if (buf_size >= 8 && !strncmp((char *)buf, "vhdxfile", 8)) {
> 
> memcmp() saves you the trouble of casting buf (which should stay const,
> BTW).
> 

Good point; thanks.

> > +static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > +    int ret = 0;
> > +    uint8_t *buffer;
> > +    int offset = 0;
> > +    int i = 0;
> > +    uint32_t block_size, sectors_per_block, logical_sector_size;
> > +    uint64_t chunk_ratio;
> > +    vhdx_metadata_table_entry md_entry;
> > +
> > +    buffer = g_malloc(VHDX_METADATA_TABLE_MAX_SIZE);
> 
> Please use qemu_blockalign() to avoid bounce buffers in case the memory
> is not 512-byte aligned.  Use qemu_vfree() instead of g_free().
> 
> This applies to all I/O buffers that the block driver allocates.
> 

OK.

> > +
> > +    ret = bdrv_pread(bs->file, s->metadata_rt.file_offset, buffer,
> > +                     VHDX_METADATA_TABLE_MAX_SIZE);
> > +    if (ret < 0) {
> > +        goto fail_no_free;
> > +    }
> > +    memcpy(&s->metadata_hdr, buffer, sizeof(s->metadata_hdr));
> > +    offset += sizeof(s->metadata_hdr);
> > +
> > +    le64_to_cpus(&s->metadata_hdr.signature);
> > +    le16_to_cpus(&s->metadata_hdr.reserved);
> > +    le16_to_cpus(&s->metadata_hdr.entry_count);
> > +
> > +    if (s->metadata_hdr.signature != VHDX_METADATA_MAGIC) {
> > +        ret = -EINVAL;
> > +        goto fail_no_free;
> > +    }
> > +
> > +    s->metadata_entries.present = 0;
> > +
> > +    for (i = 0; i < s->metadata_hdr.entry_count; i++) {
> > +        memcpy(&md_entry, buffer+offset, sizeof(md_entry));
> 
> Read beyond end of buffer if metadata_hdr.entry_count is wrong.
> 
> We cannot trust the image file - it can be corrupt or malicious.  Checks
> are required for everything that can be validated.  QEMU cannot do
> anything that would cause a crash or memory corruption on invalid input.
> 

Good point. I'll add a check before the loop - if entry_count is
wrong, we can't really trust the file anyway, so should probably
return -EINVAL.

> > +    ret = bdrv_pread(bs->file,
> > +                     s->metadata_entries.file_parameters_entry.offset
> > +                                         + s->metadata_rt.file_offset,
> > +                     &s->params,
> > +                     sizeof(s->params));
> 
> Missing error code path when ret < 0.
> 

Thanks.

> > +    /* determine virtual disk size, logical sector size,
> > +     * and phys sector size */
> > +
> > +    ret = bdrv_pread(bs->file,
> > +                     s->metadata_entries.virtual_disk_size_entry.offset
> > +                                           + s->metadata_rt.file_offset,
> > +                     &s->virtual_disk_size,
> > +                     sizeof(uint64_t));
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +    ret = bdrv_pread(bs->file,
> > +                     s->metadata_entries.logical_sector_size_entry.offset
> > +                                             + s->metadata_rt.file_offset,
> > +                     &s->logical_sector_size,
> > +                     sizeof(uint32_t));
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +    ret = bdrv_pread(bs->file,
> > +                     s->metadata_entries.phys_sector_size_entry.offset
> > +                                          + s->metadata_rt.file_offset,
> > +                     &s->physical_sector_size,
> > +                     sizeof(uint32_t));
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +
> > +    le64_to_cpus(&s->virtual_disk_size);
> > +    le32_to_cpus(&s->logical_sector_size);
> > +    le32_to_cpus(&s->physical_sector_size);
> > +
> > +    /* both block_size and sector_size are guaranteed powers of 2 */
> > +    s->sectors_per_block = s->params.block_size / s->logical_sector_size;
> 
> Divide-by-zero if file is corrupt/malicious.
> 
> > +    s->chunk_ratio = (VHDX_MAX_SECTORS_PER_BLOCK) *
> > +                     (uint64_t)s->logical_sector_size /
> > +                     (uint64_t)s->params.block_size;
> 
> Divide-by-zero if file is corrupt/malicious.
> 

Thanks, will add checks.

> > +
> > +
> > +    /* These values are ones we will want to use for division / 
> > multiplication
> > +     * later on, and they are all guaranteed (per the spec) to be powers 
> > of 2,
> > +     * so we can take advantage of that for shift operations during
> > +     * reads/writes */
> 
> Before doing that, let's verify that they are powers of two and fail if
> they violate the spec.
> 
> if (x & x - 1) {
>     ret = -EINVAL;
>     goto exit;
> }
> 

Good point, I will do that.

> > +static int vhdx_open(BlockDriverState *bs, int flags)
> > +{
> > +    BDRVVHDXState *s = bs->opaque;
> > +    int ret = 0;
> > +    int i;
> > +
> > +    s->bat = NULL;
> > +
> > +    qemu_co_mutex_init(&s->lock);
> > +
> > +    ret = vhdx_parse_header(bs, s);
> > +    if (ret) {
> > +        goto fail;
> > +    }
> > +
> > +    ret = vhdx_parse_log(bs, s);
> > +    if (ret) {
> > +        goto fail;
> > +    }
> > +
> > +    ret = vhdx_open_region_tables(bs, s);
> > +    if (ret) {
> > +        goto fail;
> > +    }
> > +
> > +    ret = vhdx_parse_metadata(bs, s);
> > +    if (ret) {
> > +        goto fail;
> > +    }
> > +    s->block_size = s->params.block_size;
> > +
> > +    /* the VHDX spec dictates that virtual_disk_size is always a multiple 
> > of
> > +     * logical_sector_size */
> > +    bs->total_sectors = s->virtual_disk_size / s->logical_sector_size;
> > +
> > +    s->bat_offset = s->bat_rt.file_offset;
> > +    s->bat_entries = s->bat_rt.length / sizeof(vhdx_bat_entry);
> > +    s->bat = g_malloc(s->bat_rt.length);
> > +
> > +    ret = bdrv_pread(bs->file, s->bat_offset, s->bat, s->bat_rt.length);
> > +
> > +    for (i = 0; i < s->bat_entries; i++) {
> > +        le64_to_cpus(&s->bat[i]);
> > +    }
> 
> How does BAT size scale when the image size is increased?  QCOW2 and QED
> use caches for metadata that would be too large or wasteful to keep in
> memory.

The BAT size is dependent on the virtual disk size, and the block
size.  The block size is allowed to range from 1MB - 256MB.  There is
one BAT entry per block.

In practice, the large block size keeps the BAT entry table reasonable
(for a 127GB file, the block size created by Hyper-V is 32MB, so the
table is pretty small - 32KB).

However, I don't see anything in the spec that forces the block size
to be larger, for a large virtual disk size.  So for the max size of
64TB, and the smallest block size of 1MB, keeping the BAT in memory
would indeed be excessive.

I'll re-read the spec, and see if there is anything that ties the
block size and virtual size together.  If not, I'll have to add
caching.

Thanks for the detailed review!

Jeff



reply via email to

[Prev in Thread] Current Thread [Next in Thread]