[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] hw/block/nvme: fix zone write finalize
From: |
Klaus Jensen |
Subject: |
Re: [PATCH] hw/block/nvme: fix zone write finalize |
Date: |
Fri, 15 Jan 2021 07:01:49 +0100 |
On Jan 14 15:03, Keith Busch wrote:
> On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> >
> > The zone write pointer is unconditionally advanced, even for write
> > faults. Make sure that the zone is always transitioned to Full if the
> > write pointer reaches zone capacity.
>
> Looks like some spec weirdness. It says we can transition to full:
>
> b) as a result of successful completion of a write operation that
> writes one or more logical blocks that causes the zone to reach its
> writeable zone capacity;
>
> But a failed write also advances the write pointer as you've pointed
> out, so they might want to strike "successful".
>
Yes, its slightly inconsistent.
Empty/Closed to Opened is also kinda fuzzy in the spec. The wording is
slightly different when transitioning from Empty and Closed to
Implicitly Opened. ZSE->ZSIO just says "and a write operation writes on
or more logical blocks of that zone". ZSC->ZSIO says "and a write
operation that writes one or more logical blocks to the zone completes
succesfully". This has given me some headaches on if the transition
should occur when "preparing/submitting" the write or when the write
completes. But I guess this is pretty much an implementation detail of
the device.
I should mention this to my representatives ;)
> Looks fine to me.
>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
>
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > ---
> > hw/block/nvme.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 0854ee307227..280b31b4459d 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace
> > *ns, NvmeRequest *req,
> > nlb = le16_to_cpu(rw->nlb) + 1;
> > zone = nvme_get_zone_by_slba(ns, slba);
> >
> > + zone->d.wp += nlb;
> > +
> > if (failed) {
> > res->slba = 0;
> > - zone->d.wp += nlb;
> > - } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> > + }
> > +
> > + if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
>
> The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'.
> As far as I can tell, this difference will mean the zone won't finalize
> until the last write completes, where before it could finalize after the
> zone's last write is submitted. Either way looks okay, but I think these
> two values ought to always be in sync.
>
Right - the zone will transition to Full on the last completed write. I
really not sure if this models devices better or not. But I would argue
that a real device would not relinquish resources until the last write
completes.
An issue is that for QD>1 (zone append), we can easily end up with
appends A and B completing in reversed order such that d.wp is advanced
by B_nlb, but we left a "hole" of A_nlb in the zone because that write
has not completed yet.
But - I think *some* inconsistency on the write pointer value is to be
expected when using Zone Append. The host can only ever depend on a
consistent value of the write pointer by quiescing I/O to the zone and
then getting a zone report - and d.wp and d.w_ptr will always
converge in that case.
signature.asc
Description: PGP signature