[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 19/25] block_int-common.h: split function pointers in Bloc

From: Hanna Reitz
Subject: Re: [PATCH v4 19/25] block_int-common.h: split function pointers in BlockDriver
Date: Mon, 15 Nov 2021 13:00:07 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
Similar to the header split, also the function pointers in BlockDriver
can be split in I/O and global state.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
  include/block/block_int-common.h | 458 ++++++++++++++++---------------
  1 file changed, 237 insertions(+), 221 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 79a3d801d2..9857e775fe 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -96,6 +96,7 @@ typedef struct BdrvTrackedRequest {
struct BlockDriver {
+    /* Fields initialized in struct definition and never changed. */

I find this a bit difficult to understand.  Perhaps we could be more verbose to make it easier to read?  Like

“These fields are initialized when this object is created, and are never changed afterwards”

And I’d also add an empty line below to make visually clear that this does not describe the single subsequent field (format_name) but the whole chunk of fields.

I also wonder if we could substantiate the claim “never changed afterwards” by making these fields consts.  Then again, I suppose technically none of the fields in BlockDriver is supposed to be mutable (except for .list), neither these fields nor the function pointers...  Yeah, maybe scratch that idea.

      const char *format_name;
      int instance_size;


+    int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
+                                          QEMUIOVector *qiov,
+                                          int64_t pos);
+    int coroutine_fn (*bdrv_load_vmstate)(BlockDriverState *bs,
+                                          QEMUIOVector *qiov,
+                                          int64_t pos);

In patch 18, you classified bdrv_co_readv_vmstate() and bdrv_co_writev_vmstate() as I/O functions.  They call these methods, though, so something doesn’t seem quite right.

Now I also remember you classified the global bdrv_save_vmstate() and bdrv_load_vmstate() functions as GS.  I don’t mind either way; AFAIU they are I/O functions that are generally called under the BQL.  But the classification should be consistent, of course.

+    int (*bdrv_change_backing_file)(BlockDriverState *bs,
+        const char *backing_file, const char *backing_fmt);
+    void (*bdrv_eject)(BlockDriverState *bs, bool eject_flag);
+    void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);

Above bdrv_is_inserted(), there’s a comment (“removable device specific”) that I think should be duplicated here.

+    /* TODO Better pass a option string/QDict/QemuOpts to add any rule? */
+    int (*bdrv_debug_breakpoint)(BlockDriverState *bs, const char *event,
+        const char *tag);
+    int (*bdrv_debug_remove_breakpoint)(BlockDriverState *bs,
+        const char *tag);
+    int (*bdrv_debug_resume)(BlockDriverState *bs, const char *tag);
+    bool (*bdrv_debug_is_suspended)(BlockDriverState *bs, const char *tag);
+    void (*bdrv_refresh_limits)(BlockDriverState *bs, Error **errp);

Originally, there was an empty line before bdrv_refresh_limits(), and I’d keep it.


+    /**
+     * Try to get @bs's logical and physical block size.
+     * On success, store them in @bsz and return zero.
+     * On failure, return negative errno.
+     */
+    int (*bdrv_probe_blocksizes)(BlockDriverState *bs, BlockSizes *bsz);

Originally, there was a comment above this function (“I/O API, even though if it's a filter jumps on parent”) that you added in patch 6 and that I didn’t understand.  Given this, I’m not unhappy to see it go again, but now I wonder about its purpose even more...

+    /**
+     * Try to get @bs's geometry (cyls, heads, sectors)
+     * On success, store them in @geo and return 0.
+     * On failure return -errno.
+     * Only drivers that want to override guest geometry implement this
+     * callback; see hd_geometry_guess().
+     */
+    int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);

(Here, too)


+    /**
+     * Register/unregister a buffer for I/O. For example, when the driver is
+     * interested to know the memory areas that will later be used in iovs, so
+     * that it can do IOMMU mapping with VFIO etc., in order to get better
+     * performance. In the case of VFIO drivers, this callback is used to do
+     * DMA mapping for hot buffers.
+     */
+    void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
+    void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
+    QLIST_ENTRY(BlockDriver) list;

This field is interesting because it’s the only mutable field in the whole struct.  I think it should be separated from the function pointers above and given a comment like “This field is modified only under the BQL, and is part of the global state”.

+    /*
+     * I/O API functions. These functions are thread-safe.
+     *
+     * See include/block/block-io.h for more information about
+     * the I/O API.
+     */
+    int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
+                                       Error **errp);
+    int coroutine_fn (*bdrv_co_create_opts)(BlockDriver *drv,
+                                            const char *filename,
+                                            QemuOpts *opts,
+                                            Error **errp);

Now this is really interesting.  Technically I suppose these should work in any thread, but trying to do so results in:

$ touch /tmp/iothread-create-test.qcow2
$ ./qemu-system-x86_64 -object iothread,id=iothr0 -qmp stdio <<EOF
{"execute": "qmp_capabilities"}
{"QMP": {"version": {"qemu": {"micro": 90, "minor": 1, "major": 6}, "package": "v6.2.0-rc0-40-gd02d5fe5fb-dirty"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}
{"return": {}}
{"timestamp": {"seconds": 1636973542, "microseconds": 338117}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "create"}} {"timestamp": {"seconds": 1636973542, "microseconds": 338197}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "create"}}
{"return": {}}
qemu: qemu_mutex_unlock_impl: Operation not permitted
[1]    86154 IOT instruction (core dumped)  ./qemu-system-x86_64 -object iothread,id=iothr0 -qmp stdio <<<''

So something’s fishy and perhaps we should investigate this...  I mean, I can’t really imagine a case where someone would need to run a blockdev-create job in an I/O thread, but right now the interface allows for it.

And then bdrv_create() is classified as global state, and also bdrv_co_create_opts_simple(), which is supposed to be a drop-in function for this .bdrv_co_create_opts function.  So that can’t work.

Also, I believe there might have been some functions you classified as GS that are called from .create* implementations.  I accepted that, given the abort I sketched above.  However, if we classify image creation as I/O, then those would need to be re-evaluated. For example, qcow2_co_create_opts() calls bdrv_create_file(), which is a GS function.

Some of this issues could be addressed by making .bdrv_co_create_opts a GS function and .bdrv_co_create an I/O function.  I believe that would be the ideal split, even though as shown above .bdrv_co_create doesn’t work in an I/O thread, and then you have the issue of probably all format drivers’ .bdrv_co_create implementations calling bdrv_open_blockdev_ref(), which is a GS function.

(VMDK even calls blk_new_open(), blk_new_with_bs(), and blk_unref(), none of which can ever be I/O functions, I think.)

I believe in practice the best is to for now classify all create-related functions as GS functions.  This is supported by the fact that qmp_blockdev_create() specifically creates the create job in the main context (with a TODO comment) and requires block drivers to error out when they encounter a node in a different AioContext.

+    int coroutine_fn (*bdrv_co_amend)(BlockDriverState *bs,
+                                      BlockdevAmendOptions *opts,
+                                      bool force,
+                                      Error **errp);
      /* aio */
      BlockAIOCB *(*bdrv_aio_preadv)(BlockDriverState *bs,
          int64_t offset, int64_t bytes, QEMUIOVector *qiov,


@@ -443,47 +632,20 @@ struct BlockDriver {

Right above here (i.e. line 631), there’s an attribute `has_variable_length`.  I believe that should go up to the immutable fields section.

      int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
      BlockMeasureInfo *(*bdrv_measure)(QemuOpts *opts, BlockDriverState *in_bs,
                                        Error **errp);

I’d keep this empty line.


      int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
          int64_t offset, int64_t bytes, QEMUIOVector *qiov);
      int coroutine_fn (*bdrv_co_pwritev_compressed_part)(BlockDriverState *bs,
          int64_t offset, int64_t bytes, QEMUIOVector *qiov,
          size_t qiov_offset);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]