Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole

From:	Eric Blake
Subject:	Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base
Date:	Thu, 27 Sep 2018 10:15:12 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0

On 9/26/18 3:26 PM, Laszlo Ersek wrote:

(+Eric)

I see shm_open() is used heavily in ivshmem-related tests. I haven't
looked much at shm_open() before. (I've always known it existed in
POSIX, but I've never cared.)

I've never actually played with shm_open() myself, but understand thetheory of it enough to reply.


So now I first checked what shm_open() would give me over a regular
temporary file created with open(); after all, the file descriptor
returned by either would have to be mmap()'d. From the rationale in POSIX:

<http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_08_14>,

it seems like the {shm_open(), mmap()} combo has two significant
guarantees over {open(), mmap()}:

- the namespace may be distinct (there need not be a writeable
   filesystem at all),

- the shared object will *always* be locked in RAM ("Shared memory is
   not just simply providing common access to data, it is providing the
   fastest possible communication between the processes").

The rationale seems to permit, on purpose, an shm_open() implementation
that is actually based on open(), using a special file system -- and
AIUI, /dev/shm is just that, on Linux.

Eric, does the above sound more or less correct?

You're right about it being permitted to be a distinct namespace; on theother hand, it doesn't even have to be a special file system. Animplementation could even use a compile-time fixed directory namevisible to the rest of the system (although of course you shouldn't relyon being able to use the file system to poke at the shmem objects, norshould you manipulate the file system underneath the reserved directorybehind shmem's back if that is what the implementation is using). SoI'm less certain of whether you are guaranteed that the shared memoryhas to be locked in place (where it can never be paged out), since animplementation on top of the filesystem does not have to do such locking- but you are also right that a high quality-of-implementation willstrive to keep the memory live rather than paging it out preciselybecause it is used for interprocess communication that would bepenalized if it can be paged out.


If it is correct, then I think shm_open() is exactly what I *don't* want
for this use case. Because, while I do need a pathname for an
mmap()-able object (regular file, or otherwise), just so I can do:

   -object memory-backend-file,id=mem-obj,...,mem-path=... \
   -device ivshmem-plain,memdev=mem-obj,...

, I want the underlying object to put as little pressure on the system
that runs the test suite as possible.

This means I should specifically ask for a regular file, to be mmap()'d
(with MAP_SHARED). Then the kernel knows in advance that it can always
page out the dirty stuff, and the mapping shouldn't clash with cgroups,
or disabled memory overcommit.

Indeed, shmem CAN be a thin veneer on top of the file system, andsupport being paged out; but since an implementation that pins thememory such that it cannot page is permitted (and in fact maybedesirable), you are right that using shmem can indeed put pressure ondifferent resources in relation to what you can accomplish by using thefile system yourself.


Now, in order to make that actually safe, I should in theory ask for
preallocation on the filesystem (otherwise, if the filesystem runs out
of space, while the kernel is allocating fs extents in order to flush
the dirty pages to them, the process gets a SIGBUS, IIRC). However,
because I know that nothing will be in fact dirtied, I can minimize the
footprint on the filesystem as well, and forego preallocation too.

This suggests that, in my test case,
- I call g_file_open_tmp() for creating the temporary file,
- pass the returned fd to ftruncate() for resizing the temporary file,
- pass the returned pathname to the "memory-backend-file" objects, in
   the "mem-path" property,
- set "share=on",
- set "prealloc=off",
- "discard-data" is irrelevant (there won't be any dirty pages).

Thanks
Laszlo


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base, (continued)
- Re: [Qemu-devel] [PATCH 0/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base, Michael S. Tsirkin, 2018/09/25
  - Re: [Qemu-devel] [PATCH 0/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base, Laszlo Ersek, 2018/09/25

Prev by Date: Re: [Qemu-devel] [Qemu-block] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
Next by Date: Re: [Qemu-devel] [PATCH v12 0/9] Take the image size into account when allocating the L2 cache
Previous by thread: Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base
Next by thread: Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base
Index(es):
- Date
- Thread