Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with c

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with c

From:	Klaus Jensen
Subject:	Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs
Date:	Mon, 7 Jun 2021 12:00:16 +0200

On Jun  7 10:11, Vladimir Sementsov-Ogievskiy wrote:

07.06.2021 09:17, Klaus Jensen wrote:
On Jun  7 08:14, Vladimir Sementsov-Ogievskiy wrote:
04.06.2021 09:52, Klaus Jensen wrote:
I've kept the RFC since I'm still new to using the block layer like
this. I was hoping that Stefan could find some time to look over this -
this is a huge series, so I don't expect non-nvme folks to spend a large
amount of time on it, but I would really like feedback on my approach in
the reimplementation of flush and format.
If I understand your code correctly, you do stat next io operationfrom call-back of a previous. It works, and it is similar to hawmirror block-job was operating some time ago (still it maintainedseveral in-flight requests simultaneously, but I'm about using _aio_functions). Still, now mirror doesn't use _aio_ functions like this.
Better approach to call several io functions of block layerone-by-one is creating a coroutine. You may just add a coroutinefunction, that does all your linear logic as you want, without anycallbacks like:
nvme_co_flush()
{
  for (...) {
     blk_co_flush();
  }
}
(and you'll need qemu_coroutine_create() and qemu_coroutine_enter()to start a coroutine).
So, this is definitely a tempting way to implement this. I must admitthat I did not consider it like this because I thought this was at thewrong level of abstraction (looked to me like this was something thatbelonged in block/, not hw/). Again, I referred to the Trimimplementation in hw/ide as the source of inspiration on thesequential AIOCB approach.
No, I think it's OK from abstraction point of view. Everybody iswelcome to use coroutines if it is appropriate and especially for doingsequential IOs :)Actually, it's just more efficient: the way I propose, you create onecoroutine, which does all your logic as you want, when blk_aio_functions actually create a coroutine under the hood each time (I don'tthink that it noticeably affects performance, but logic becomes morestraightforward)
The only problem is that for this way we don't have cancellation API,so you can't use it for cancellation anyway :(


Yeah, I'm not really feeling up for adding that :P

Still, I'm not sure that moving from simultaneous issuing several IOcommands to sequential is good idea..And this way you of course can't use blk_aio_canel.. This leads to mylast doubt:
One more thing I don't understand after fast look at the series: howcancelation works? It seems to me, that you just call cancel onnested AIOCBs, produced by blk_<io_functions>, but no one of themimplement cancel.. I see only four implementations of .cancel_asynccallback in the whole Qemu code: in iscsi, in ide/core.c, indma-helpers.c and in thread-pool.c.. Seems no one is related toblk_aio_flush() and other blk_* functions you call in the series. Or,what I miss?
Right now, cancellation is only initiated by the device when asubmission queue is deleted. This causes blk_aio_cancel() to be calledon each BlockAIOCB (NvmeRequest.aiocb) for outstanding requests. Inmost cases this BlockAIOCB is a DMAAIOCB from softmmu/dma-helpers.c,which implements .cancel_async. Prior to this patchset, Flush, DSM,Copy and so on, did not have any BlockAIOCB to cancel since we did notkeep references to the ongoing AIOs.
Hmm. Looking at flush for example, I don't see how DMAAIOCB comes.

You do

 iocb->aiocb = blk_aio_flush(ns->blkconf.blk, nvme_flush_ns_cb, iocb);
it calls blk_aio_prwv(), it calls blk_aio_get() withblk_aio_em_aiocb_info, that doesn't implement .cancel_async..

I meant that most I/O in the regular path (read/write) are using the dmahelpers (since they do DMA). We might use the blk_aio_p{read,write}directly when we read from/write to memory on the device (the controllermemory buffer), but it is not the common case.

You are correct that BlkAioEmAIOCB does not implement cancel, but the"wrapper" (NvmeFlushAIOCB) *does*. This means that, from the NVMecontroller perspective, we can cancel the flush in between(un-cancellable blk_aio_flush-initiated) flushes to multiple namespaces.

The blk_aio_cancel() call is synchronous, but it does callbdrv_aio_cancel_async() which calls the .cancel_async callback ifimplemented. This means that we can now cancel ongoing DSM or Copycommands while they are processing their individual LBA ranges. Sowhile blk_aio_cancel() subsequently waits for the AIO to complete thismay cause them to complete earlier than if they had run to fullcompletion (i.e. if they did not implement .cancel_async).
There are two things I'd like to do subsequent to this patch series:
1. Fix the Abort command to actually do something. Currently thecommand is a no-op (which is allowed by the spec), but I'd like it toactually cancel the command that the host specified.
  2. Make submission queue deletion asynchronous.
The infrastructure provided by this refactor should allow this if I amnot mistaken.
Overall, I think this "sequentialization" makes it easier to reasonabout cancellation, but that might just be me ;)
I just don't like sequential logic simulated on top of aio-callbackasync API, which in turn is simulated on top of coroutine-drivensequential API (which is made on top of real async block API(thread-based or linux-aio based, etc)) :)


Ha! Yes, we are not exactly improving on that layering here ;)

Still I can't suggest now an alternative that supports cancellation.But I still think that cancellation doesn't work for blk_aio_flush andfriends either..

The aiocb from blk_aio_flush is considered "un-cancellable" I guess (bydesign from the block layer), but the NVMe command "Flush" iscancellable from the perspective of the NVMe controller. Or at least,that's what I am intending to do here.

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

[RFC PATCH 05/11] hw/nvme: remove assert from nvme_get_zone_by_slba, (continued)
- [RFC PATCH 05/11] hw/nvme: remove assert from nvme_get_zone_by_slba, Klaus Jensen, 2021/06/04
- [RFC PATCH 06/11] hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check, Klaus Jensen, 2021/06/04
- [RFC PATCH 07/11] hw/nvme: add dw0/1 to the req completion trace event, Klaus Jensen, 2021/06/04
- [RFC PATCH 08/11] hw/nvme: reimplement the copy command to allow aio cancellation, Klaus Jensen, 2021/06/04
- [RFC PATCH 09/11] hw/nvme: reimplement zone reset to allow cancellation, Klaus Jensen, 2021/06/04
- [RFC PATCH 10/11] hw/nvme: reimplement format nvm to allow cancellation, Klaus Jensen, 2021/06/04
- [RFC PATCH 11/11] Partially revert "hw/block/nvme: drain namespaces on sq deletion", Klaus Jensen, 2021/06/04
- Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs, Vladimir Sementsov-Ogievskiy, 2021/06/07
  - Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs, Klaus Jensen, 2021/06/07
    - Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs, Vladimir Sementsov-Ogievskiy, 2021/06/07
    - Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs, Klaus Jensen <=
    - Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs, Vladimir Sementsov-Ogievskiy, 2021/06/07
    - Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs, Klaus Jensen, 2021/06/07
- Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs, Stefan Hajnoczi, 2021/06/08

Prev by Date: Re: [PATCH] hw/nvme: add param to control auto zone transitioning to zone state closed
Next by Date: Re: [PATCH] hw/nvme: add param to control auto zone transitioning to zone state closed
Previous by thread: Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs
Next by thread: Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs
Index(es):
- Date
- Thread