qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC v5 12/12] hw/block/nvme: add persistence for zone info


From: Stefan Hajnoczi
Subject: Re: [PATCH RFC v5 12/12] hw/block/nvme: add persistence for zone info
Date: Mon, 30 Nov 2020 14:58:41 +0000

On Mon, Nov 30, 2020 at 01:59:17PM +0100, Klaus Jensen wrote:
> On Nov 30 12:33, Stefan Hajnoczi wrote:
> > On Fri, Nov 27, 2020 at 12:46:01AM +0100, Klaus Jensen wrote:
> > > From: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > ---
> > >  docs/specs/nvme.txt   |  15 +++
> > >  hw/block/nvme-ns.h    |  16 ++++
> > >  hw/block/nvme-ns.c    | 212 +++++++++++++++++++++++++++++++++++++++++-
> > >  hw/block/nvme.c       |  87 +++++++++++++++++
> > >  hw/block/trace-events |   2 +
> > >  5 files changed, 331 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
> > > index 03bb4d9516b4..05d81c88ad4e 100644
> > > --- a/docs/specs/nvme.txt
> > > +++ b/docs/specs/nvme.txt
> > > @@ -20,6 +20,21 @@ The nvme device (-device nvme) emulates an NVM Express 
> > > Controller.
> > >    `zns.mor`; Specifies the number of open resources available. This is a 
> > > 0s
> > >       based value.
> > >  
> > > +  `zns.pstate`; This parameter specifies another blockdev to be used for
> > > +     storing zone state persistently.
> > > +
> > > +       -drive id=zns-pstate,file=zns-pstate.img,format=raw
> > > +       -device nvme-ns,zns.pstate=zns-pstate,...
> > > +
> > > +     To reset (or initialize) state, the blockdev image should be of 
> > > zero size:
> > > +
> > > +       qemu-img create -f raw zns-pstate.img 0
> > > +
> > > +     The image will be intialized with a file format header and 
> > > truncated to
> > > +     the required size. If the pstate given is of non-zero size, it will 
> > > be
> > > +     assumed to already contain zone state information and the header 
> > > will be
> > > +     checked.
> > 
> > In principle this makes sense but at first glance it looks like the code
> > is synchronous - it blocks the vCPU if zone metadata I/O is necessary.
> > That works for test/bring-up code but can't be used in production due to
> > the performance impact.
> > 
> > Is the expectation that the QEMU NVMe device emulation will only be used
> > for test/bring-up now and in the future?
> > 
> 
> Hi Stefan,
> 
> Thanks for taking a look at this.
> 
> I could see why someone would maybe use the core nvme emulation in
> production (but I'm not aware of anyone doing it), but the zoned
> emulation is *probably* not for production (and that is where the zone
> updates are needed). But someone could surprise me with a use case I
> guess.
> 
> And yes, I know this is synchronous in this version. I have not
> extensively evaluated the performance impact, but crucially the blocking
> only happens when the zone changes state (i.e. every write does NOT
> trigger a blk_pwrite just to persist the updated write pointer).
> 
> I know this can be done asynchronously (I have implemented it like so
> previously), but I wanted to get an opinion on the general stategry
> before adding that. The opposing strategy, is to use a some form of
> mmap/msync, but I, for one, pushed back against that because I'd like
> this to work on as many platforms as possible. Hence the RFC for this
> blockdev based approach.
> 
> But if you think a blockdev approach like this is a reasonable QEMU-esce
> way of doing it, then I'll proceed to do a v2 with asynchronous updates.

I don't have a strong opinion, especially since I haven't investigated
how NVMe Zoned Namespaces work (besides watching your interesting KVM
Forum presentation).

If you want to do the mmap approach, please use HostMemoryBackend for
that instead of open coding mmap(2) calls. That way it will support
several types of backing memory, including MAP_SHARED files, persistent
memory, etc.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]