qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 2/2] nvme: allow cmb and pmr to be enabled on same device


From: Andrzej Jakowski
Subject: Re: [PATCH v4 2/2] nvme: allow cmb and pmr to be enabled on same device
Date: Wed, 22 Jul 2020 10:00:30 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0

On 7/22/20 12:43 AM, Klaus Jensen wrote:
> @keith, please see below - can you comment on the Linux kernel 2 MB
> boundary requirement for the CMB? Or should we hail Stephen (or Logan
> maybe) since this seems to be related to p2pdma?
> 
> On Jul 21 14:54, Andrzej Jakowski wrote:
>> On 7/15/20 1:06 AM, Klaus Jensen wrote:
>>> Hi Andrzej,
>>>
>>> I've not been ignoring this, but sorry for not following up earlier.
>>>
>>> I'm hesitent to merge anything that very obviously breaks an OS that we
>>> know is used a lot to this using this device. Also because the issue has
>>> not been analyzed well enough to actually know if this is a QEMU or
>>> kernel issue.
>>
>> Hi Klaus,
>>
>> Thx for your response! I understand your hesitance on merging stuff that
>> obviously breaks guest OS. 
>>
>>>
>>> Now, as far as I can test, having the MSI-X vector table and PBA in BAR
>>> 0, PMR in BAR 2 and CMB in BAR 4 seems to make everyone happy
>>> (irregardless of IOMMU on/off).
>>>
>>> Later, when the issue is better understood, we can add options to set
>>> offsets, BIRs etc.
>>>
>>> The patch below replaces your "[PATCH v4 2/2] nvme: allow cmb and pmr to
>>> be enabled" (but still requires "[PATCH v4 1/2] ...") and applies to
>>> git://git.infradead.org/qemu-nvme.git nvme-next branch.
>>>
>>> Can you reproduce the issues with that patch? I can't on a stock Arch
>>> Linux 5.7.5-arch1-1 kernel.
>>
>> While I'm happy that approach with MSIX and PBA in BAR0 works fine, I
>> feel that investigation part why it works while mine doesn't is
>> missing. It looks to me that both patches are basically following same 
>> approach: create memory subregion and overlay on top of other memory
>> region. Why one works and the other doesn't then?
>>
>> Having in mind that, I have recently focused on understanding problem.
>> I observed that when guest assigns address to BAR4, addr field in
>> nvme-bar4 memory region gets populated, but it doesn't get automatically
>> populated in ctrl_mem (cmb) memory subregion, so later when 
>> nvme_addr_is_cmb() 
>> is called address check works incorrectly and as a consequence vmm does dma 
>> read instead of memcpy.
>> I created a patch that sets correct address on ctrl_mem subregion and guest 
>> OS boots up correctly.
>>
>> When I looked into pci and memory region code I noticed that indeed address
>> is only assigned to top level memory region but not to contained subregions.
>> I think that because in your approach cmb grabs whole bar exclusively it 
>> works
>> fine.
>>
>> Here is my question (perhaps pci folks can help answer :)): if we consider 
>> memory region overlapping for pci devices as valid use case should pci 
>> code on configuration cycles walk through all contained subregion and
>> update addr field accordingly?
>>
>> Thx!
>>
> 
> Hi Andrzej,
> 
> Thanks for looking into this. I think your analysis helped me nail this.
> The problem is that we added the use of a subregion and have some
> assumptions that no longer hold.
> 
> nvme_addr_is_cmb() assumes that n->ctrl_mem.addr is an absolute address.
> But when the memory region is a subregion, addr holds an offset into the
> parent container instead. Thus, changing all occurances of
> n->ctrl_mem.addr to (n->bar0.addr + n->ctrl_mem.addr) fixes the issue
> (this is required in nvme_addr_is_cmb and nvme_map_prp). I patched that
> in your original patch[1]. The reason my version worked is because there
> was no subregion involved for the CMB, so the existing address
> validation calculations were still correct.

I'm a little bit concerned with this approach:
(n->bar0.addr + n->ctrl_mem.addr) and hoping to have some debate. Let me 
describe my understanding of the problem.
It looks to me that addr field sometimes contains *absolute* address (when no 
hierarchy is used) and other times it contains *relative* address (when
hierarchy is created). From my perspective use of this field is inconsistent
and thus error-prone.  
Because of that I think that doing n->bar0.addr + n->ctrl_mem.addr doesn't
solve root problem and is still prone to the same problem if in the future
we potentially build even more complicated hierarchy.
I think that we could solve it by introducing helper function like

hwaddr memory_region_get_abs_addr(MemoryRegion *mr) 

to retrieve absolute address and in the documentation indicate that addr field
can be relative or absolute and it is recommended to use above function to 
retrieve absolute address.
What do you think?

> 
> This leaves us with the Linux kernel complaining about not being able to
> register the CMB if it is not on a 2MB boundary - this is probably just
> a constraint in the kernel that we can't do anything about (but I'm no
> kernel hacker...), which can be fixed by either being "nice" towards the
> Linux kernel by forcing a 2 MB alignment in the device or exposing the
> SZU field such that the user can choose 16MiB size units (or higher)
> instead of 1MiB. I'm leaning towards ensuring the 2 MB alignment in the
> device such that we do not have to introduce new cmb_size parameters,
> while also making it easier for the user to configure. But I'm not
> really sure.
This is kernel limitation that we have to live with. The minimum granularity
of devm_memremap_pages() is 2MB and it must be 2MB aligned. It used to worse
at 128MB size+align (section-size), but sub-section memory-hotplug patches 
adjusted that to a 2MB section. 
Ensuring 2MiB size and alignment in the device emulation makes sense to me.
Perhaps we could document that limitations - making user more aware of it.

> 
>   [1]: Message-Id: <20200701214858.28515-3-andrzej.jakowski@linux.intel.com>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]