[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V7 10/29] machine: memfd-alloc option
From: |
Igor Mammedov |
Subject: |
Re: [PATCH V7 10/29] machine: memfd-alloc option |
Date: |
Fri, 11 Mar 2022 10:42:52 +0100 |
On Thu, 10 Mar 2022 13:18:35 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:
> On 3/10/2022 12:28 PM, Steven Sistare wrote:
> > On 3/10/2022 11:00 AM, Igor Mammedov wrote:
> >> On Thu, 10 Mar 2022 10:36:08 -0500
> >> Steven Sistare <steven.sistare@oracle.com> wrote:
> >>
> >>> On 3/8/2022 2:20 AM, Igor Mammedov wrote:
> >>>> On Tue, 8 Mar 2022 01:50:11 -0500
> >>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>
> >>>>> On Mon, Mar 07, 2022 at 09:41:44AM -0500, Steven Sistare wrote:
> >>>>>> On 3/4/2022 5:41 AM, Igor Mammedov wrote:
> >>>>>>> On Thu, 3 Mar 2022 12:21:15 -0500
> >>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>>>>
> >>>>>>>> On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:
> >>>>>>>>> Allocate anonymous memory using memfd_create if the memfd-alloc
> >>>>>>>>> machine
> >>>>>>>>> option is set.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>>>>>>>> ---
> >>>>>>>>> hw/core/machine.c | 19 +++++++++++++++++++
> >>>>>>>>> include/hw/boards.h | 1 +
> >>>>>>>>> qemu-options.hx | 6 ++++++
> >>>>>>>>> softmmu/physmem.c | 47
> >>>>>>>>> ++++++++++++++++++++++++++++++++++++++---------
> >>>>>>>>> softmmu/vl.c | 1 +
> >>>>>>>>> trace-events | 1 +
> >>>>>>>>> util/qemu-config.c | 4 ++++
> >>>>>>>>> 7 files changed, 70 insertions(+), 9 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >>>>>>>>> index 53a99ab..7739d88 100644
> >>>>>>>>> --- a/hw/core/machine.c
> >>>>>>>>> +++ b/hw/core/machine.c
> >>>>>>>>> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj,
> >>>>>>>>> bool value, Error **errp)
> >>>>>>>>> ms->mem_merge = value;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> >>>>>>>>> +{
> >>>>>>>>> + MachineState *ms = MACHINE(obj);
> >>>>>>>>> +
> >>>>>>>>> + return ms->memfd_alloc;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void machine_set_memfd_alloc(Object *obj, bool value, Error
> >>>>>>>>> **errp)
> >>>>>>>>> +{
> >>>>>>>>> + MachineState *ms = MACHINE(obj);
> >>>>>>>>> +
> >>>>>>>>> + ms->memfd_alloc = value;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> static bool machine_get_usb(Object *obj, Error **errp)
> >>>>>>>>> {
> >>>>>>>>> MachineState *ms = MACHINE(obj);
> >>>>>>>>> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass
> >>>>>>>>> *oc, void *data)
> >>>>>>>>> object_class_property_set_description(oc, "mem-merge",
> >>>>>>>>> "Enable/disable memory merge support");
> >>>>>>>>>
> >>>>>>>>> + object_class_property_add_bool(oc, "memfd-alloc",
> >>>>>>>>> + machine_get_memfd_alloc, machine_set_memfd_alloc);
> >>>>>>>>> + object_class_property_set_description(oc, "memfd-alloc",
> >>>>>>>>> + "Enable/disable allocating anonymous memory using
> >>>>>>>>> memfd_create");
> >>>>>>>>> +
> >>>>>>>>> object_class_property_add_bool(oc, "usb",
> >>>>>>>>> machine_get_usb, machine_set_usb);
> >>>>>>>>> object_class_property_set_description(oc, "usb",
> >>>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
> >>>>>>>>> index 9c1c190..a57d7a0 100644
> >>>>>>>>> --- a/include/hw/boards.h
> >>>>>>>>> +++ b/include/hw/boards.h
> >>>>>>>>> @@ -327,6 +327,7 @@ struct MachineState {
> >>>>>>>>> char *dt_compatible;
> >>>>>>>>> bool dump_guest_core;
> >>>>>>>>> bool mem_merge;
> >>>>>>>>> + bool memfd_alloc;
> >>>>>>>>> bool usb;
> >>>>>>>>> bool usb_disabled;
> >>>>>>>>> char *firmware;
> >>>>>>>>> diff --git a/qemu-options.hx b/qemu-options.hx
> >>>>>>>>> index 7d47510..33c8173 100644
> >>>>>>>>> --- a/qemu-options.hx
> >>>>>>>>> +++ b/qemu-options.hx
> >>>>>>>>> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >>>>>>>>> " vmport=on|off|auto controls emulation of
> >>>>>>>>> vmport (default: auto)\n"
> >>>>>>>>> " dump-guest-core=on|off include guest memory
> >>>>>>>>> in a core dump (default=on)\n"
> >>>>>>>>> " mem-merge=on|off controls memory merge
> >>>>>>>>> support (default: on)\n"
> >>>>>>>>> + " memfd-alloc=on|off controls allocating
> >>>>>>>>> anonymous guest RAM using memfd_create (default: off)\n"
> >>>>>>>>
> >>>>>>>> Question: are there any disadvantages associated with using
> >>>>>>>> memfd_create? I guess we are using up an fd, but that seems minor.
> >>>>>>>> Any
> >>>>>>>> reason not to set to on by default? maybe with a fallback option to
> >>>>>>>> disable that?
> >>>>>>
> >>>>>> Old Linux host kernels, circa 4.1, do not support huge pages for
> >>>>>> shared memory.
> >>>>>> Also, the tunable to enable huge pages for share memory is different
> >>>>>> than for
> >>>>>> anon memory, so there could be performance loss if it is not set
> >>>>>> correctly.
> >>>>>> /sys/kernel/mm/transparent_hugepage/enabled
> >>>>>> vs
> >>>>>> /sys/kernel/mm/transparent_hugepage/shmem_enabled
> >>>>>
> >>>>> I guess we can test this when launching the VM, and select
> >>>>> a good default.
> >>>>>
> >>>>>> It might make sense to use memfd_create by default for the secondary
> >>>>>> segments.
> >>>>>
> >>>>> Well there's also KSM now you mention it.
> >>>>
> >>>> then another quest, is there downside to always using memfd_create
> >>>> without any knobs being involved?
> >>>
> >>> Lower performance if small pages are used (but Michael suggests qemu
> >>> could
> >>> automatically check the tunable and use anon memory instead)
> >>>
> >>> KSM (same page merging) is not supported for shared memory, so
> >>> ram_block_add ->
> >>> memory_try_enable_merging will not enable it.
> >>>
> >>> In both cases, I expect the degradation would be negligible if
> >>> memfd_create is
> >>> only automatically applied to the secondary segments, which are typically
> >>> small.
> >>> But, someone's secondary segment could be larger, and it is time
> >>> consuming to
> >>> prove innocence when someone claims your change caused their performance
> >>> regression.
> >>
> >> Adding David as memory subsystem maintainer, maybe he will a better
> >> idea instead of introducing global knob that would also magically alter
> >> backends' behavior despite of its their configured settings.
> >
> > OK, in ram_block_add I can set the RAM_SHARED flag based on the
> > memory-backend object's
> > shared flag. I already set the latter in create_default_memdev when
> > memfd-alloc is
> > specified. With that change, we do not override configured settings.
> > Users can no longer
> > use memory-backend-ram for CPR, and must change all memory-backend-ram to
> > memory-backend-memfd
> > in the command-line arguments. That is fine.
> >
> > With that change, are you OK with this patch?
>
> Sorry, I mis-read my own code in ram_block_add. The existing code is correct
> and does
> not alter any backend's behavior. It only sets the shared flag when the ram
> is *not*
> being allocated for a backend:
>
> if (!object_dynamic_cast(parent, TYPE_MEMORY_BACKEND)) {
> new_block->flags |= RAM_SHARED;
> }
>
ok, maybe instead of introducing a generic option, introduce the high level
feature one that turns this and other necessary quirks for it to work (i.e.
something like live-update=on|off).
That will not make QEMU internals any better but at least it will hide obscure
memfd-alloc from users.
Is there a patch that makes QEMU error out if backend without
shared=on is used?
Also, can you answer question below, pls
or point to a patch in series that takes care of that invariant?
[...]
> >>>>>> There is currently no way to specify memory backends for the secondary
> >>>>>> memory
> >>>>>> segments (vram, roms, etc), and IMO it would be onerous to specify a
> >>>>>> backend for
> >>>>>> each of them. On x86_64, these include pc.bios, vga.vram, pc.rom,
> >>>>>> vga.rom,
> >>>>>> /rom@etc/acpi/tables, /rom@etc/table-loader, /rom@etc/acpi/rsdp.
> >>
> >> MemoryRegion is not the only place where state is stored.
> >> If we only talk about fwcfg entries state, it can also reference
> >> plain malloced memory allocated elsewhere or make a deep copy internally.
> >> Similarly devices also may store state outside of RamBlock framework.
> >>
> >> How are you dealing with that?
[...]
- Re: [PATCH V7 10/29] machine: memfd-alloc option, (continued)
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/03
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Michael S. Tsirkin, 2022/03/03
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Igor Mammedov, 2022/03/04
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/07
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Michael S. Tsirkin, 2022/03/08
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Igor Mammedov, 2022/03/08
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Igor Mammedov, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option,
Igor Mammedov <=
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/29
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Daniel P . Berrangé, 2022/03/11
Re: [PATCH V7 10/29] machine: memfd-alloc option, David Hildenbrand, 2022/03/11
Re: [PATCH V7 10/29] machine: memfd-alloc option, David Hildenbrand, 2022/03/11