qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 2/2] nvme: allow cmb and pmr to be enabled on same device


From: Klaus Jensen
Subject: Re: [PATCH v4 2/2] nvme: allow cmb and pmr to be enabled on same device
Date: Wed, 22 Jul 2020 09:43:05 +0200

@keith, please see below - can you comment on the Linux kernel 2 MB
boundary requirement for the CMB? Or should we hail Stephen (or Logan
maybe) since this seems to be related to p2pdma?

On Jul 21 14:54, Andrzej Jakowski wrote:
> On 7/15/20 1:06 AM, Klaus Jensen wrote:
> > Hi Andrzej,
> > 
> > I've not been ignoring this, but sorry for not following up earlier.
> > 
> > I'm hesitent to merge anything that very obviously breaks an OS that we
> > know is used a lot to this using this device. Also because the issue has
> > not been analyzed well enough to actually know if this is a QEMU or
> > kernel issue.
> 
> Hi Klaus,
> 
> Thx for your response! I understand your hesitance on merging stuff that
> obviously breaks guest OS. 
> 
> > 
> > Now, as far as I can test, having the MSI-X vector table and PBA in BAR
> > 0, PMR in BAR 2 and CMB in BAR 4 seems to make everyone happy
> > (irregardless of IOMMU on/off).
> > 
> > Later, when the issue is better understood, we can add options to set
> > offsets, BIRs etc.
> > 
> > The patch below replaces your "[PATCH v4 2/2] nvme: allow cmb and pmr to
> > be enabled" (but still requires "[PATCH v4 1/2] ...") and applies to
> > git://git.infradead.org/qemu-nvme.git nvme-next branch.
> > 
> > Can you reproduce the issues with that patch? I can't on a stock Arch
> > Linux 5.7.5-arch1-1 kernel.
> 
> While I'm happy that approach with MSIX and PBA in BAR0 works fine, I
> feel that investigation part why it works while mine doesn't is
> missing. It looks to me that both patches are basically following same 
> approach: create memory subregion and overlay on top of other memory
> region. Why one works and the other doesn't then?
> 
> Having in mind that, I have recently focused on understanding problem.
> I observed that when guest assigns address to BAR4, addr field in
> nvme-bar4 memory region gets populated, but it doesn't get automatically
> populated in ctrl_mem (cmb) memory subregion, so later when 
> nvme_addr_is_cmb() 
> is called address check works incorrectly and as a consequence vmm does dma 
> read instead of memcpy.
> I created a patch that sets correct address on ctrl_mem subregion and guest 
> OS boots up correctly.
> 
> When I looked into pci and memory region code I noticed that indeed address
> is only assigned to top level memory region but not to contained subregions.
> I think that because in your approach cmb grabs whole bar exclusively it works
> fine.
> 
> Here is my question (perhaps pci folks can help answer :)): if we consider 
> memory region overlapping for pci devices as valid use case should pci 
> code on configuration cycles walk through all contained subregion and
> update addr field accordingly?
> 
> Thx!
> 

Hi Andrzej,

Thanks for looking into this. I think your analysis helped me nail this.
The problem is that we added the use of a subregion and have some
assumptions that no longer hold.

nvme_addr_is_cmb() assumes that n->ctrl_mem.addr is an absolute address.
But when the memory region is a subregion, addr holds an offset into the
parent container instead. Thus, changing all occurances of
n->ctrl_mem.addr to (n->bar0.addr + n->ctrl_mem.addr) fixes the issue
(this is required in nvme_addr_is_cmb and nvme_map_prp). I patched that
in your original patch[1]. The reason my version worked is because there
was no subregion involved for the CMB, so the existing address
validation calculations were still correct.

This leaves us with the Linux kernel complaining about not being able to
register the CMB if it is not on a 2MB boundary - this is probably just
a constraint in the kernel that we can't do anything about (but I'm no
kernel hacker...), which can be fixed by either being "nice" towards the
Linux kernel by forcing a 2 MB alignment in the device or exposing the
SZU field such that the user can choose 16MiB size units (or higher)
instead of 1MiB. I'm leaning towards ensuring the 2 MB alignment in the
device such that we do not have to introduce new cmb_size parameters,
while also making it easier for the user to configure. But I'm not
really sure.

  [1]: Message-Id: <20200701214858.28515-3-andrzej.jakowski@linux.intel.com>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]