qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 12/15] hw/nvme: Initialize capability structures for primary/


From: Łukasz Gieryk
Subject: Re: [PATCH 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers
Date: Mon, 8 Nov 2021 14:57:04 +0100
User-agent: Mutt/1.9.4 (2018-02-28)

On Mon, Nov 08, 2021 at 09:25:58AM +0100, Klaus Jensen wrote:
> On Nov  5 15:04, Łukasz Gieryk wrote:
> > On Fri, Nov 05, 2021 at 09:46:28AM +0100, Łukasz Gieryk wrote:
> > > On Thu, Nov 04, 2021 at 04:48:43PM +0100, Łukasz Gieryk wrote:
> > > > On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> > > > > On Oct  7 18:24, Lukasz Maniak wrote:
> > > > > > From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> > > > > > 
> > > > > > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) 
> > > > > > one
> > > > > > can configure the maximum number of virtual queues and interrupts
> > > > > > assignable to a single virtual device. The primary and secondary
> > > > > > controller capability structures are initialized accordingly.
> > > > > > 
> > > > > > Since the number of available queues (interrupts) now varies between
> > > > > > VF/PF, BAR size calculation is also adjusted.
> > > > > > 
> > > > > 
> > > > > While this patch allows configuring the VQFRSM and VIFRSM fields, it
> > > > > implicitly sets VQFRT and VIFRT (i.e. by setting them to the product 
> > > > > of
> > > > > sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> > > > > bound and this removes a testable case for host software (e.g.
> > > > > requesting more flexible resources than what is currently available).
> > > > > 
> > > > > This patch also requires that these parameters are set if 
> > > > > sriov_max_vfs
> > > > > is. I think we can provide better defaults.
> > > > > 
> > > > 
> > > > Originally I considered more params, but ended up coding the simplest,
> > > > user-friendly solution, because I did not like the mess with so many
> > > > parameters, and the flexibility wasn't needed for my use cases. But I do
> > > > agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
> > > > valid and resembles an actual device.
> > > > 
> > > > > How about,
> > > > > 
> > > > > 1. if only sriov_max_vfs is set, then all VFs get private resources
> > > > >    equal to max_ioqpairs. Like before this patch. This limits the 
> > > > > number
> > > > >    of parameters required to get a basic setup going.
> > > > > 
> > > > > 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
> > > > >    10), the difference between that and max_ioqpairs become flexible
> > > > >    resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
> > > > >    instead and just make the difference become private resources.
> > > > >    Potato/potato.
> > > > > 
> > > > >    a. in the absence of sriov_max_v{q,i}_per_vf, set them to the 
> > > > > number
> > > > >       of calculated flexible resources.
> > > > > 
> > > > > This probably smells a bit like bikeshedding, but I think this gives
> > > > > more flexibility and better defaults, which helps with verifying host
> > > > > software.
> > > > > 
> > > > > If we can't agree on this now, I suggest we could go ahead and merge 
> > > > > the
> > > > > base functionality (i.e. private resources only) and ruminate some 
> > > > > more
> > > > > about these parameters.
> > > > 
> > > > The problem is that the spec allows VFs to support either only private,
> > > > or only flexible resources.
> > > > 
> > > > At this point I have to admit, that since my use cases for
> > > > QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
> > > > attention to the case with VFs having private resources. So this SR/IOV
> > > > implementation doesn’t even support such case (max_vX_per_vf != 0).
> > > > 
> > > > Let me summarize the possible config space, and how the current
> > > > parameters (could) map to these (interrupt-related ones omitted):
> > > > 
> > > > Flexible resources not supported (not implemented):
> > > >  - Private resources for PF     = max_ioqpairs
> > > >  - Private resources per VF     = ?
> > > >  - (error if flexible resources are configured)
> > > > 
> > > > With flexible resources:
> > > >  - VQPRT, private resources for PF      = max_ioqpairs
> > > >  - VQFRT, total flexible resources      = max_vq_per_vf * num_vfs
> > > >  - VQFRSM, maximum assignable per VF    = max_vq_per_vf
> > > >  - VQGRAN, granularity                  = #define constant
> > > >  - (error if private resources per VF are configured)
> > > > 
> > > > Since I don’t want to misunderstand your suggestion: could you provide a
> > > > similar map with your parameters, formulas, and explain how to determine
> > > > if flexible resources are active? I want to be sure we are on the
> > > > same page.
> > > > 
> > > 
> > > I’ve just re-read through my email and decided that some bits need
> > > clarification.
> > > 
> > > This implementation supports the “Flexible”-resources-only flavor of
> > > SR/IOV, while the “Private” also could be supported. Some effort is
> > > required to support both, and I cannot afford that (at least I cannot
> > > commit today, neither the other Lukasz).
> > > 
> > > While I’m ready to rework the Flexible config and prepare it to be
> > > extended later to handle the Private variant, the 2nd version of these
> > > patches will still support the Flexible flavor only.
> > > 
> > > I will include appropriate TODO/open in the next cover letter.
> > > 
> > 
> > The summary of my thoughts, so far:
> > - I'm going to introduce sriov_v{q,i}_flexible and better defaults,
> >   according to your suggestion (as far as I understand your intentions,
> >   please correct me if I've missed something).
> > - The Private SR/IOV flavor, if it's ever implemented, could introduce
> >   sriov_vq_private_per_vf.
> > - The updated formulas are listed below.
> > 
> > Flexible resources not supported (not implemented):
> >  - Private resources for PF     = max_ioqpairs
> >  - Private resources per VF     = sriov_vq_private_per_vf
> 
> I would just keep it simple and say, if sriov_v{q,i}_flexible is not
> set, then each VF gets max_ioqpairs private resources.
> 

Since you did request more tuning knobs for the Flexible variant, the
Private one should follow that and allow full configuration. A device
where PF.priv=64 and each VF.priv=4 makes sense, and I couldn’t
configure it if sriov_v{q,i}_flexible=0 enabled the Private mode.

> >  - (error if sriov_vq_flexible is set)
> > 
> > With flexible resources:
> >  - VQPRT, private resources for PF      = max_ioqpairs - sriov_vq_flexible
> >  - VQFRT, total flexible resources      = sriov_vq_flexible (if set, or)
> >                                           VQPRT * num_vfs
> >  - VQFRSM, maximum assignable per VF    = sriov_max_vq_per_vf (if set, or)
> >                                           VQPRT
> 
> You mean VQFRT here, right?
> 

VQPRT is right, and – in my opinion – makes a better default than VQFRT.

E.g., configuring a device:

(max_vfs=32, PF.priv=VQPRT=X, PF.flex_total=VQFRT=256)

as (num_vfs=1, VF0.flex=256) doesn’t make much sense. Virtualization is
not needed in such case, and user should probably use PF directly. On
the other hand, VQPRT is probably tuned to offer most (if not all) of
the performance and functionality; thus serves as a sane default.

> >  - VQGRAN, granularity                  = #define constant
> 
> Yeah, 1 seems pretty reasonable here.
> 
> >  - (error if sriov_vq_private_per_vf is set)
> > 
> > Is this version acceptable?
> > 
> 
> Sounds good to me. The only one I am not too happy about is the default
> of VQPRT * num_vfs. (i.e. max_ioqpairs * num_vfs) when vq_flexible is
> not set. I think this is the case where we should default to private
> resources. If you don't want to work with private resources right now,
> can we instead have it bug out and complain that sriov_vq_flexible must
> be set? We can then later lift that restriction and implement private
> resources.

I would prefer reserving sriov_v{q,i}_flexible=0 for now. That's my current
plan for V2.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]