qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm


From: Yehuda Sadeh Weinraub
Subject: Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Date: Sun, 23 May 2010 19:17:13 -0700

On Sun, May 23, 2010 at 12:59 AM, Blue Swirl <address@hidden> wrote:
> On Thu, May 20, 2010 at 11:02 PM, Yehuda Sadeh Weinraub
> <address@hidden> wrote:
>> On Thu, May 20, 2010 at 1:31 PM, Blue Swirl <address@hidden> wrote:
>>> On Wed, May 19, 2010 at 7:22 PM, Christian Brunner <address@hidden> wrote:
>>>> The attached patch is a block driver for the distributed file system
>>>> Ceph (http://ceph.newdream.net/). This driver uses librados (which
>>>> is part of the Ceph server) for direct access to the Ceph object
>>>> store and is running entirely in userspace. Therefore it is
>>>> called "rbd" - rados block device.
>> ...
>>>
>>> IIRC underscores here may conflict with system header use. Please use
>>> something like QEMU_BLOCK_RADOS_H.
>>
>> This header is shared between the linux kernel client and the ceph
>> userspace servers and client. We can actually get rid of it, as we
>> only need it to define CEPH_OSD_TMAP_SET. We can move this definition
>> to librados.h.
>>
>>>> diff --git a/block/rbd_types.h b/block/rbd_types.h
>>>> new file mode 100644
>>>> index 0000000..dfd5aa0
>>>> --- /dev/null
>>>> +++ b/block/rbd_types.h
>>>> @@ -0,0 +1,48 @@
>>>> +#ifndef _FS_CEPH_RBD
>>>> +#define _FS_CEPH_RBD
>>>
>>> QEMU_BLOCK_RBD?
>>
>> This header is shared between the ceph kernel client, between the qemu
>> rbd module (and between other ceph utilities). It'd be much easier
>> maintaining it without having to have a different implementation for
>> each. The same goes to the use of __le32/64 and __u32/64 within these
>> headers.
>
> This is user space, so identifiers must conform to C standards. The
> identifiers beginning with underscores are reserved.
>
> Doesn't __le32/64 also depend on some GCC extension? Or sparse magic?
It depends on gcc extension. If needed we can probably have a separate
header for the qemu block device that uses alternative types. Though
looking at the qemu code I see use of other gcc extensions so I'm not
sure this is a real issue.

>
>>
>>>
>>>> +
>>>> +#include <linux/types.h>
>>>
>>> Can you use standard includes, like <sys/types.h> or <inttypes.h>? Are
>>> Ceph libraries used in other systems than Linux?
>>
>> Not at the moment. I guess that we can take this include out.
>>
>>>
>>>> +
>>>> +/*
>>>> + * rbd image 'foo' consists of objects
>>>> + *   foo.rbd      - image metadata
>>>> + *   foo.00000000
>>>> + *   foo.00000001
>>>> + *   ...          - data
>>>> + */
>>>> +
>>>> +#define RBD_SUFFIX             ".rbd"
>>>> +#define RBD_DIRECTORY           "rbd_directory"
>>>> +
>>>> +#define RBD_DEFAULT_OBJ_ORDER  22   /* 4MB */
>>>> +
>>>> +#define RBD_MAX_OBJ_NAME_SIZE  96
>>>> +#define RBD_MAX_SEG_NAME_SIZE  128
>>>> +
>>>> +#define RBD_COMP_NONE          0
>>>> +#define RBD_CRYPT_NONE         0
>>>> +
>>>> +static const char rbd_text[] = "<<< Rados Block Device Image >>>\n";
>>>> +static const char rbd_signature[] = "RBD";
>>>> +static const char rbd_version[] = "001.001";
>>>> +
>>>> +struct rbd_obj_snap_ondisk {
>>>> +       __le64 id;
>>>> +       __le64 image_size;
>>>> +} __attribute__((packed));
>>>> +
>>>> +struct rbd_obj_header_ondisk {
>>>> +       char text[64];
>>>> +       char signature[4];
>>>> +       char version[8];
>>>> +       __le64 image_size;
>>>
>>> Unaligned? Is the disk format fixed?
>>
>> This is a packed structure that represents the on disk format.
>> Operations on it are being done only to read from the disk header or
>> to write to the disk header.
>
> That's clear. But what exactly is the alignment of field 'image_size'?
> Could there be implicit padding to mod 8 between 'version' and
> 'image_size' with some compilers?

Obviously it's not 64 bit aligned. As it's an on-disk header, I don't
see alignment a real issue. As was said before, any operation on these
fields have to go through endianity conversion anyway, and this
structure should not be used directly. For such datastructures I'd
rather have the fields ordered in some logical order than maintaining
the alignment by ourselves. That's why we have that __attribute__
packed in the end to let the compiler deal with those issues. Other
compilers though have their own syntax for packed structures (but I do
see other uses of this packed syntax in the qemu code).

>
> If there were no other constraints, I'd either make the padding
> explicit, or rearrange/resize fields so that the field alignment is
> natural. Thus my question, can you change the disk format or are there
> already some deployments?

We can certainly make changes to the disk format at this point. I'm
not very happy with those 3 __u8 in the middle, and they can probably
be changed to a 32 bit flags field. We can get it 64 bit aligned too.

>
> Otherwise, I'd just add some warning comment so people don't try to
> use clever pointer tricks which will crash on machines with enforced
> alignment.
>
Any clever pointer tricks that'll work on one architecture will
probably be wrong on another (different word
size/alignment/endianity), so maybe crashing machines is a good
indicator to bad implementation. We shouldn't try to hide the
problems.

Thanks,
Yehuda



reply via email to

[Prev in Thread] Current Thread [Next in Thread]