qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base
Date: Thu, 27 Sep 2018 10:15:12 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0

On 9/26/18 3:26 PM, Laszlo Ersek wrote:
(+Eric)


I see shm_open() is used heavily in ivshmem-related tests. I haven't
looked much at shm_open() before. (I've always known it existed in
POSIX, but I've never cared.)

I've never actually played with shm_open() myself, but understand the theory of it enough to reply.


So now I first checked what shm_open() would give me over a regular
temporary file created with open(); after all, the file descriptor
returned by either would have to be mmap()'d. From the rationale in POSIX:

<http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_08_14>,

it seems like the {shm_open(), mmap()} combo has two significant
guarantees over {open(), mmap()}:

- the namespace may be distinct (there need not be a writeable
   filesystem at all),

- the shared object will *always* be locked in RAM ("Shared memory is
   not just simply providing common access to data, it is providing the
   fastest possible communication between the processes").

The rationale seems to permit, on purpose, an shm_open() implementation
that is actually based on open(), using a special file system -- and
AIUI, /dev/shm is just that, on Linux.

Eric, does the above sound more or less correct?

You're right about it being permitted to be a distinct namespace; on the other hand, it doesn't even have to be a special file system. An implementation could even use a compile-time fixed directory name visible to the rest of the system (although of course you shouldn't rely on being able to use the file system to poke at the shmem objects, nor should you manipulate the file system underneath the reserved directory behind shmem's back if that is what the implementation is using). So I'm less certain of whether you are guaranteed that the shared memory has to be locked in place (where it can never be paged out), since an implementation on top of the filesystem does not have to do such locking - but you are also right that a high quality-of-implementation will strive to keep the memory live rather than paging it out precisely because it is used for interprocess communication that would be penalized if it can be paged out.


If it is correct, then I think shm_open() is exactly what I *don't* want
for this use case. Because, while I do need a pathname for an
mmap()-able object (regular file, or otherwise), just so I can do:

   -object memory-backend-file,id=mem-obj,...,mem-path=... \
   -device ivshmem-plain,memdev=mem-obj,...

, I want the underlying object to put as little pressure on the system
that runs the test suite as possible.

This means I should specifically ask for a regular file, to be mmap()'d
(with MAP_SHARED). Then the kernel knows in advance that it can always
page out the dirty stuff, and the mapping shouldn't clash with cgroups,
or disabled memory overcommit.

Indeed, shmem CAN be a thin veneer on top of the file system, and support being paged out; but since an implementation that pins the memory such that it cannot page is permitted (and in fact maybe desirable), you are right that using shmem can indeed put pressure on different resources in relation to what you can accomplish by using the file system yourself.


Now, in order to make that actually safe, I should in theory ask for
preallocation on the filesystem (otherwise, if the filesystem runs out
of space, while the kernel is allocating fs extents in order to flush
the dirty pages to them, the process gets a SIGBUS, IIRC). However,
because I know that nothing will be in fact dirtied, I can minimize the
footprint on the filesystem as well, and forego preallocation too.

This suggests that, in my test case,
- I call g_file_open_tmp() for creating the temporary file,
- pass the returned fd to ftruncate() for resizing the temporary file,
- pass the returned pathname to the "memory-backend-file" objects, in
   the "mem-path" property,
- set "share=on",
- set "prealloc=off",
- "discard-data" is irrelevant (there won't be any dirty pages).

Thanks
Laszlo


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]