[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PULL v2 14/24] libvhost-user: Map shared RAM with MAP_NORESERVE to supp
From: |
Michael S. Tsirkin |
Subject: |
[PULL v2 14/24] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb |
Date: |
Sun, 6 Feb 2022 04:38:23 -0500 |
From: David Hildenbrand <david@redhat.com>
For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb,
otherwise it's ignored. Older Linux versions that didn't support
reservation of huge pages ignored MAP_NORESERVE completely.
The first client to mmap a hugetlb fd without MAP_NORESERVE will
trigger reservation of huge pages for the whole mmapped range. There are
two cases to consider:
1) QEMU mapped RAM without MAP_NORESERVE
We're not dealing with a sparse mapping, huge pages for the whole range
have already been reserved by QEMU. An additional mmap() without
MAP_NORESERVE won't have any effect on the reservation.
2) QEMU mapped RAM with MAP_NORESERVE
We're delaing with a sparse mapping, no huge pages should be reserved.
Further mappings without MAP_NORESERVE should be avoided.
For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can
simply set it. For 2), we'd be overriding QEMUs decision and trigger
reservation of huge pages, which might just fail if there are not
sufficient huge pages around. We must map with MAP_NORESERVE.
This change is required to support virtio-mem with hugetlb: a
virtio-mem device mapped into the guest physical memory corresponds to
a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE.
Whenever memory in that sparse region will be accessed by the VM, QEMU
populates huge pages for the affected range by preallocating memory
and handling any preallocation errors gracefully.
So let's map shared RAM with MAP_NORESERVE. As libvhost-user only
supports Linux, there shouldn't be anything to take care of in regard of
other OS support.
Without this change, libvhost-user will fail mapping the region if there
are currently not enough huge pages to perform the reservation:
fv_panic: libvhost-user: region mmap error: Cannot allocate memory
Cc: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220111123939.132659-1-david@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
---
subprojects/libvhost-user/libvhost-user.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c
b/subprojects/libvhost-user/libvhost-user.c
index 0ee43b8e93..47d2efc60f 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -751,12 +751,12 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
* accessing it before we userfault.
*/
mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_NONE, MAP_SHARED,
+ PROT_NONE, MAP_SHARED | MAP_NORESERVE,
vmsg->fds[0], 0);
} else {
mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_READ | PROT_WRITE, MAP_SHARED, vmsg->fds[0],
- 0);
+ PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
+ vmsg->fds[0], 0);
}
if (mmap_addr == MAP_FAILED) {
@@ -920,7 +920,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg
*vmsg)
* accessing it before we userfault
*/
mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_NONE, MAP_SHARED,
+ PROT_NONE, MAP_SHARED | MAP_NORESERVE,
vmsg->fds[i], 0);
if (mmap_addr == MAP_FAILED) {
@@ -1007,7 +1007,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
* mapped address has to be page aligned, and we use huge
* pages. */
mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_READ | PROT_WRITE, MAP_SHARED,
+ PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
vmsg->fds[i], 0);
if (mmap_addr == MAP_FAILED) {
--
MST
- [PULL v2 06/24] tests: acpi: update expected blobs, (continued)
- [PULL v2 06/24] tests: acpi: update expected blobs, Michael S. Tsirkin, 2022/02/06
- [PULL v2 05/24] acpi: fix OEM ID/OEM Table ID padding, Michael S. Tsirkin, 2022/02/06
- [PULL v2 10/24] libvhost-user: Simplify VHOST_USER_REM_MEM_REG, Michael S. Tsirkin, 2022/02/06
- [PULL v2 02/24] hw/i386: Add the possibility to disable the 'isapc' machine, Michael S. Tsirkin, 2022/02/06
- [PULL v2 04/24] tests: acpi: whitelist nvdimm's SSDT and FACP.slic expected blobs, Michael S. Tsirkin, 2022/02/06
- [PULL v2 09/24] libvhost-user: Add vu_add_mem_reg input validation, Michael S. Tsirkin, 2022/02/06
- [PULL v2 13/24] libvhost-user: handle removal of identical regions, Michael S. Tsirkin, 2022/02/06
- [PULL v2 08/24] libvhost-user: Add vu_rem_mem_reg input validation, Michael S. Tsirkin, 2022/02/06
- [PULL v2 01/24] cpuid: use unsigned for max cpuid, Michael S. Tsirkin, 2022/02/06
- [PULL v2 12/24] libvhost-user: prevent over-running max RAM slots, Michael S. Tsirkin, 2022/02/06
- [PULL v2 14/24] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb,
Michael S. Tsirkin <=
- [PULL v2 15/24] ACPI ERST: bios-tables-test.c steps 1 and 2, Michael S. Tsirkin, 2022/02/06
- [PULL v2 16/24] ACPI ERST: PCI device_id for ERST, Michael S. Tsirkin, 2022/02/06
- [PULL v2 17/24] ACPI ERST: header file for ERST, Michael S. Tsirkin, 2022/02/06
- [PULL v2 19/24] ACPI ERST: build the ACPI ERST table, Michael S. Tsirkin, 2022/02/06
- [PULL v2 18/24] ACPI ERST: support for ACPI ERST feature, Michael S. Tsirkin, 2022/02/06
- [PULL v2 20/24] ACPI ERST: create ACPI ERST table for pc/x86 machines, Michael S. Tsirkin, 2022/02/06
- [PULL v2 21/24] ACPI ERST: qtest for ERST, Michael S. Tsirkin, 2022/02/06
- [PULL v2 22/24] ACPI ERST: bios-tables-test testcase, Michael S. Tsirkin, 2022/02/06
- [PULL v2 24/24] util/oslib-posix: Fix missing unlock in the error path of os_mem_prealloc(), Michael S. Tsirkin, 2022/02/06
- [PULL v2 23/24] ACPI ERST: step 6 of bios-tables-test.c, Michael S. Tsirkin, 2022/02/06