[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/3] exec: posix_madvise usage on SunOS.

From: Peter Maydell
Subject: Re: [PATCH 2/3] exec: posix_madvise usage on SunOS.
Date: Sat, 18 Jul 2020 15:15:38 +0100

On Sat, 18 Jul 2020 at 14:21, David CARLIER <devnexen@gmail.com> wrote:
> From a9e3cced279ae55a59847ba232f7828bc2479367 Mon Sep 17 00:00:00 2001
> From: David Carlier <devnexen@gmail.com>
> Date: Sat, 18 Jul 2020 13:29:44 +0100
> Subject: [PATCH 2/3] exec: posix_madvise usage on SunOS.
> with _XOPEN_SOURCE set, the older mman.h API based on caddr_t handling
> is not accessible thus using posix_madvise here.
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
>  exec.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> diff --git a/exec.c b/exec.c
> index 6f381f98e2..0466a75b89 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3964,7 +3964,15 @@ int ram_block_discard_range(RAMBlock *rb,
> uint64_t start, size_t length)
>               * fallocate'd away).
>               */
>  #if defined(CONFIG_MADVISE)
> +#if !defined(CONFIG_SOLARIS)
>              ret =  madvise(host_startaddr, length, MADV_DONTNEED);
> +#else
> +        /*
> +         * mmap and its caddr_t based api is not accessible
> +         * with _XOPEN_SOURCE set on illumos
> +         */
> +            ret =  posix_madvise(host_startaddr, length, 
> +#endif

Hi. I'm not sure this patch will do the right thing, because
I don't think that Solaris's POSIX_MADV_DONTNEED provides
the semantics that this QEMU function says it needs. The
comment at the top of the function says:

 * Unmap pages of memory from start to start+length such that
 * they a) read as 0, b) Trigger whatever fault mechanism
 * the OS provides for postcopy.
 * The pages must be unmapped by the end of the function.

(Aside: the use of 'unmap' in this comment is a bit confusing,
because it clearly doesn't mean 'unmap' if it wants read-as-0.
And the reference to faults on postcopy is incomprehensible
to me: if memory is read-as-0 it isn't going to fault.)

Linux's madvise(MADV_DONTNEED) does guarantee us this
read-as-zero behaviour. (It's a silly API choice that Linux
put this behaviour behind madvise, which is supposed to be
merely advisory, but that's how it is.) The Solaris
posix_madvise() manpage says it is merely advisory and
doesn't affect the behaviour of accesses to the memory.

If posix_madvise() behaviour was OK in this function, the
right way to fix this would be to use qemu_madvise()
instead, which already provides this "if host has
madvise(), use it, otherwise use posix_madvise()" logic.
But I suspect that the direct madvise() here is deliberate.

Side note: not sure the current code is correct for the
BSDs either -- they have madvise() but don't provide
Linux's really-read-as-zero guarantee for MADV_DONTNEED.
So we should probably be doing something else there, and
whatever that something-else is is probably also what
Solaris wants.

We use ram_block_discard_range() only in migration and
in virtio-balloon and virtio-mem; I've cc'd some people
who hopefully understand what the requirements on this
function are and might have a view on what the not-Linux
implementation should look like. (David Gilbert: git
blame says you wrote this code :-))

-- PMM

reply via email to

[Prev in Thread] Current Thread [Next in Thread]