qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Outreachy project task: Adding QEMU block layer APIs resembling Linu


From: Sam Li
Subject: Re: Outreachy project task: Adding QEMU block layer APIs resembling Linux ZBD ioctls.
Date: Wed, 1 Jun 2022 18:19:15 +0800

Hi Damien,

Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2022年6月1日周三 13:47写道:
>
> On 6/1/22 11:57, Sam Li wrote:
> > Hi Stefan,
> >
> > Stefan Hajnoczi <stefanha@gmail.com> 于2022年5月30日周一 19:19写道:
> >
> >
> >>
> >> On Mon, 30 May 2022 at 06:09, Sam Li <faithilikerun@gmail.com> wrote:
> >>>
> >>> Hi everyone,
> >>> I'm Sam Li, working on the Outreachy project which is to add zoned
> >>> device support to QEMU's virtio-blk emulation.
> >>>
> >>> For the first goal, adding QEMU block layer APIs resembling Linux ZBD
> >>> ioctls, I think the naive approach would be to introduce a new stable
> >>> struct zbd_zone descriptor for the library function interface. More
> >>> specifically, what I'd like to add to the BlockDriver struct are:
> >>> 1. zbd_info as zone block device information: includes numbers of
> >>> zones, size of logical blocks, and physical blocks.
> >>> 2. zbd_zone_type and zbd_zone_state
> >>> 3. zbd_dev_model: host-managed zbd, host-aware zbd
> >>> With those basic structs, we can start to implement new functions as
> >>> bdrv*() APIs for BLOCK*ZONE ioctls.
> >>>
> >>> I'll start to finish this task based on the above description. If
> >>> there is any problem or something I may miss in the design, please let
> >>> me know.
> >>
> >> Hi Sam,
> >> Can you propose function prototypes for the new BlockDriver callbacks
> >> needed for zoned devices?
> >
> > I have made some modifications based on Damien's device in design part
> > 1 and added the function prototypes in design part 2. If there is any
> > problem or part I missed, please let me know.
> >
> > Design of Block Layer APIs in BlockDriver:
> > 1. introduce a new stable struct zbd_zone descriptor for the library
> > function interface.
> >   a. zbd_info as zone block device information: includes numbers of
> > zones, size of blocks, write granularity in byte(minimal write size
> > and alignment
> >     - write granularity: 512e SMRs: writes in units of physical block
> > size, 4096 bytes; NVMe ZNS write granularity is equal to the block
> > size.
> >     - zone descriptor: start, length, capacity, write pointer, zone type
> >   b. zbd_zone_type
> >     - zone type: conventional, sequential write required, sequential
> > write preferred
> >   c. zbd_dev_model: host-managed zbd, host-aware zbd
>
> This explanation is a little hard to understand. It seems to be mixing up
> device level information and per-zone information. I think it would be a
> lot simpler to write a struct definition to directly illustrate what you
> are planning.
>
> It is something like this ?
>
> struct zbd_zone {
>         enum zone_type  type;
>         enum zone_cond  cond;
>         uint64_t        start;
>         uint32_t        length;
>         uint32_t        cap;
>         uint64_t        wp;
> };
>
> strcut zbd_dev {
>         enum zone_model model;
>         uint32_t        block_size;
>         uint32_t        write_granularity;
>         uint32_t        nr_zones
>         struct zbd_zone *zones; /* array of zones */
> };
>

Yes! Sorry that I was not very clear in the descriptions.

> If yes, then my comments are as follows.
>
> For the device struct: It may be good to have also the maximum number of
> open zones and the maximum number of active zones.
>
> For the zone struct: You may need to add a read-write lock per zone to be
> able to write lock zones to ensure a sequential write pattern (virtio
> devices can be multi-queue and so writes may be coming in from different
> contexts) and to correctly emulate zone append operations with an atomic
> update of the wp field.
>

Yes, I haven't thought through the thread-safety problem but I'll come
up with an approach.

> These need to be integrated into the generic block driver interface in
> include/block/block_int-common.h or include/block/block-common.h.
>
> >
> >  2. implement new functions as bdrv*() APIs for BLK*ZONE ioctls
> >    a. support basic operations: get the APIs working when executing
> > the zone operations from a guest
> >     - zone information access: report
> >     - zone manipulation: reset,open,close,finish
>
> These operations are generally referred to as "zone management" operations.
>
> >   b. support zone append operation: zone capacity, write pointer
> > positions of all zones(excluded for now)
> >     - can track the zone state we need: zone is full or not.
> >
> > More specifically, the function prototypes for 2a are as follows:
> >
> > int zbd_report_zones(int fd, off_t offset, off_t len, enum
> > zbd_report_opetion ro, struct zbd_zone *zones, unsigned int
> > *nr_zones);
>
> The current virtio zone specification draft does not have a reporting
> option field for the report zones command. However, it has a "partial"
> field that will need to be passed to this function so that the correct
> report zones reply can be built by the driver.
>
> > int zbd_reset_zones(int fd, off_t offset, off_t len);
> > int zbd_open_zones(int fd, off_t offset, off_t len);
> > int zbd_close_zones(int fd, off_t offset, off_t len);
> > int zbd_finish_zones(int fd, off_t offset, off_t len);
>
> These 4 functions have the exact same signature, modulo the function name.
> So we should simplify here to reduce the number of functions. Something like:
>
> int zbd_zone_mgmt(int fd, enum zone_op op, off_t offset, off_t len);
>
> where op can be BDRV_ZONE_RESET, BDRV_ZONE_OPEN, etc can aggregate all 4
> functions into one.
>

Thanks for the suggestions. It would be better to use one general
function supporting four operations rather than four.

> In any case, if you look at how block drivers are defined (an example is
> the one for raw files in qemu/block/file-posix.c) using the BlockDriver
> data type (defined as a struct in qemu/include/block/block_int-common.h),
> you will see that the driver callback functions do not use a file
> descriptor (fd) but a pointer to some data structure (most of the time a
> BlockDriverState pointer).
>

Yes, I'll use it instead.

Meanwhile, the BlockDriver callbacks can be(with Damien's suggestion):
-> virtio_blk_handle_zone_mngmt() -> blk_aio_zone_mngmt() ->
blk_aio_prwv() + blk_aio_zone_mngmt_entry() -> bdrv_co_do_zone_mngmt() ->
bdrv_co_zone_mngmt().

I am still on the way to understand the BlockDriver structures. So the
above may need a second thought.


> Another thing: you will need one more callback to get the device
> information related to zones: maximum number of open and active zones at
> least (the number of zones can be discovered with a report zones call).
>

Is this callback zbd_get_sysfs_attrr(char *devname, const char *attr,
char *str, int str_len) in libzbd?

Thanks for reviewing! Have a good night.

Sam

> Cheers.
>
> --
> Damien Le Moal
> Western Digital Research



reply via email to

[Prev in Thread] Current Thread [Next in Thread]