qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backe


From: Marcel Apfelbaum
Subject: Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram
Date: Thu, 1 Feb 2018 07:36:50 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
>> On Wed, Jan 31, 2018 at 11:10:07PM +0200, Michael S. Tsirkin wrote:
>>> On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
>>>> On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
>>>>> Currently only file backed memory backend can
>>>>> be created with a "share" flag in order to allow
>>>>> sharing guest RAM with other processes in the host.
>>>>>
>>>>> Add the "share" flag also to RAM Memory Backend
>>>>> in order to allow remapping parts of the guest RAM
>>>>> to different host virtual addresses. This is needed
>>>>> by the RDMA devices in order to remap non-contiguous
>>>>> QEMU virtual addresses to a contiguous virtual address range.
>>>>>
>>>>
>>>> Why do we need to make this configurable?  Would anything break
>>>> if MAP_SHARED was always used if possible?
>>>
>>> See Documentation/vm/numa_memory_policy.txt for a list
>>> of complications.
>>
>> Ew.
>>
>>>
>>> Maybe we should more of an effort to detect and report these
>>> issues.
>>
>> Probably.  Having other features breaking silently when using
>> pvrdma doesn't sound good.  We must at least document those
>> problems in the documentation for memory-backend-ram.
>>
>> BTW, what's the root cause for requiring HVAs in the buffer?
> 
> It's a side effect of the kernel/userspace API which always wants
> a single HVA/len pair to map memory for the application.
> 
> 

Hi Eduardo and Michael,

>>  Can
>> this be fixed?
> 
> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> mapping an s/g list with actual memory. The HVA/len pair would then just
> be used to refer to the region, without creating the two mappings.
> 
> Something like splitting the register mr into
> 
> mr = create mr (va/len) - allocate a handle and record the va/len
> 
> addmemory(mr, offset, hva, len) - pin memory
> 
> register mr - pass it to HW
> 
> As a nice side effect we won't burn so much virtual address space.
>

We would still need a contiguous virtual address space range (for post-send)
which we don't have since guest contiguous virtual address space
will always end up as non-contiguous host virtual address space.

I am not sure the RDMA HW can handle a large VA with holes.

An alternative would be 0-based MR, QEMU intercepts the post-send
operations and can substract the guest VA base address.
However I didn't see the implementation in kernel for 0 based MRs
and also the RDMA maintainer said it would work for local keys
and not for remote keys.

> This will fix rdma with hugetlbfs as well which is currently broken.
> 
> 

There is already a discussion on the linux-rdma list:
    https://www.spinics.net/lists/linux-rdma/msg60079.html
But it will take some (actually a lot of) time, we are currently talking about
a possible API. And it does not solve the re-mapping...

Thanks,
Marcel

>> -- 
>> Eduardo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]