qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Na


From: Dmitry Fomichev
Subject: RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Date: Mon, 28 Sep 2020 02:33:06 +0000

> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Thursday, September 24, 2020 5:08 PM
> To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Klaus Jensen
> <k.jensen@samsung.com>; Kevin Wolf <kwolf@redhat.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>; Maxim Levitsky
> <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;
> qemu-block@nongnu.org; qemu-devel@nongnu.org; Alistair Francis
> <Alistair.Francis@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> and Zoned Namespace Command Set
> 
> On Sep 24 03:20, Dmitry Fomichev wrote:
> > v3 -> v4
> >
> >  - Fix bugs introduced in v2/v3 for QD > 1 operation. Now, all writes
> >    to a zone happen at the new write pointer variable, zone->w_ptr,
> >    that is advanced right after submitting the backend i/o. The existing
> >    zone->d.wp variable is updated upon the successful write completion
> >    and it is used for zone reporting. Some code has been split from
> >    nvme_finalize_zoned_write() function to a new function,
> >    nvme_advance_zone_wp().
> >
> 
> Same approach that I've used, +1.
> 
> >  - Make the code compile under mingw. Switch to using QEMU API for
> >    mmap/msync, i.e. memory_region...(). Since mmap is not available in
> >    mingw (even though there is mman-win32 library available on Github),
> >    conditional compilation is added around these calls to avoid
> >    undefined symbols under mingw. A better fix would be to add stub
> >    functions to softmmu/memory.c for the case when CONFIG_POSIX is not
> >    defined, but such change is beyond the scope of this patchset and it
> >    can be made in a separate patch.
> >
> 
> Ewwww.
> 
> This feels like a hack or at the very least an abuse of the physical
> memory management API.
> 
> If it really needs to be memory mapped, then I think a hostmem-based
> approach similar to what Andrzej did for PMR is needed (I think that
> will get rid of the CONFIG_POSIX ifdef at least, but still leave it
> slightly tricky to get it to work on all platforms AFAIK).

Ok, it looks that using the HostMemoryBackendFile backend will be
more appropriate. This will remove the need for conditional compile.

The mmap() portability is pretty decent across software platforms.
Any poor Windows user who is forced to emulate ZNS on mingw will be
able to do so, just without having zone state persistency. Considering
how specialized this stuff is in first place, I estimate the number of users
affected by this "limitation" to be exactly zero.

> But really,
> since we do not require memory semantics for this, then I think the
> abstraction is fundamentally wrong.
> 

Seriously, what is wrong with using mmap :) ? It is used successfully for
similar applications, for example -
https://github.com/open-iscsi/tcmu-runner/blob/master/file_zbc.c

> I am, of course, blowing my own horn, since my implementation uses a
> portable blockdev for this.
> 

You are making it sound like the entire WDC series relies on this approach.
Actually, the persistency is introduced in the second to last patch in the
series and it only adds a couple of lines of code in the i/o path to mark
zones dirty. This is possible because of using mmap() and I find the way
it is done to be quite elegant, not ugly :)

> Another issue is the complete lack of endian conversions. Does it
> matter? It depends. Will anyone ever use this on a big endian host and
> move the meta data backing file to a little endian host? Probably not.
> So does it really matter? Probably not, but it is cutting corners.
> 

Great point on endianness! Naturally, all file backed values are stored in
their native endianness. This way, there is no extra overhead on big endian
hardware architectures. Portability concerns can be easily addressed by
storing metadata endianness as a byte flag in its header. Then, during
initialization, the metadata validation code can detect the possible
discrepancy in endianness and automatically convert the metadata to the
endianness of the host. This part is out of scope of this series, but I would
be able to contribute such a solution as an enhancement in the future.

> >  - Make the list of review comments addressed in v2 of the series
> >    (see below).
> >
> 
> Very detailed! Thanks!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]