qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V7 10/29] machine: memfd-alloc option


From: Igor Mammedov
Subject: Re: [PATCH V7 10/29] machine: memfd-alloc option
Date: Fri, 11 Mar 2022 10:42:52 +0100

On Thu, 10 Mar 2022 13:18:35 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 3/10/2022 12:28 PM, Steven Sistare wrote:
> > On 3/10/2022 11:00 AM, Igor Mammedov wrote:  
> >> On Thu, 10 Mar 2022 10:36:08 -0500
> >> Steven Sistare <steven.sistare@oracle.com> wrote:
> >>  
> >>> On 3/8/2022 2:20 AM, Igor Mammedov wrote:  
> >>>> On Tue, 8 Mar 2022 01:50:11 -0500
> >>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>     
> >>>>> On Mon, Mar 07, 2022 at 09:41:44AM -0500, Steven Sistare wrote:    
> >>>>>> On 3/4/2022 5:41 AM, Igor Mammedov wrote:      
> >>>>>>> On Thu, 3 Mar 2022 12:21:15 -0500
> >>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>>>>       
> >>>>>>>> On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:      
> >>>>>>>>> Allocate anonymous memory using memfd_create if the memfd-alloc 
> >>>>>>>>> machine
> >>>>>>>>> option is set.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>>>>>>>> ---
> >>>>>>>>>  hw/core/machine.c   | 19 +++++++++++++++++++
> >>>>>>>>>  include/hw/boards.h |  1 +
> >>>>>>>>>  qemu-options.hx     |  6 ++++++
> >>>>>>>>>  softmmu/physmem.c   | 47 
> >>>>>>>>> ++++++++++++++++++++++++++++++++++++++---------
> >>>>>>>>>  softmmu/vl.c        |  1 +
> >>>>>>>>>  trace-events        |  1 +
> >>>>>>>>>  util/qemu-config.c  |  4 ++++
> >>>>>>>>>  7 files changed, 70 insertions(+), 9 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >>>>>>>>> index 53a99ab..7739d88 100644
> >>>>>>>>> --- a/hw/core/machine.c
> >>>>>>>>> +++ b/hw/core/machine.c
> >>>>>>>>> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj, 
> >>>>>>>>> bool value, Error **errp)
> >>>>>>>>>      ms->mem_merge = value;
> >>>>>>>>>  }
> >>>>>>>>>  
> >>>>>>>>> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> >>>>>>>>> +{
> >>>>>>>>> +    MachineState *ms = MACHINE(obj);
> >>>>>>>>> +
> >>>>>>>>> +    return ms->memfd_alloc;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void machine_set_memfd_alloc(Object *obj, bool value, Error 
> >>>>>>>>> **errp)
> >>>>>>>>> +{
> >>>>>>>>> +    MachineState *ms = MACHINE(obj);
> >>>>>>>>> +
> >>>>>>>>> +    ms->memfd_alloc = value;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>  static bool machine_get_usb(Object *obj, Error **errp)
> >>>>>>>>>  {
> >>>>>>>>>      MachineState *ms = MACHINE(obj);
> >>>>>>>>> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass 
> >>>>>>>>> *oc, void *data)
> >>>>>>>>>      object_class_property_set_description(oc, "mem-merge",
> >>>>>>>>>          "Enable/disable memory merge support");
> >>>>>>>>>  
> >>>>>>>>> +    object_class_property_add_bool(oc, "memfd-alloc",
> >>>>>>>>> +        machine_get_memfd_alloc, machine_set_memfd_alloc);
> >>>>>>>>> +    object_class_property_set_description(oc, "memfd-alloc",
> >>>>>>>>> +        "Enable/disable allocating anonymous memory using 
> >>>>>>>>> memfd_create");
> >>>>>>>>> +
> >>>>>>>>>      object_class_property_add_bool(oc, "usb",
> >>>>>>>>>          machine_get_usb, machine_set_usb);
> >>>>>>>>>      object_class_property_set_description(oc, "usb",
> >>>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
> >>>>>>>>> index 9c1c190..a57d7a0 100644
> >>>>>>>>> --- a/include/hw/boards.h
> >>>>>>>>> +++ b/include/hw/boards.h
> >>>>>>>>> @@ -327,6 +327,7 @@ struct MachineState {
> >>>>>>>>>      char *dt_compatible;
> >>>>>>>>>      bool dump_guest_core;
> >>>>>>>>>      bool mem_merge;
> >>>>>>>>> +    bool memfd_alloc;
> >>>>>>>>>      bool usb;
> >>>>>>>>>      bool usb_disabled;
> >>>>>>>>>      char *firmware;
> >>>>>>>>> diff --git a/qemu-options.hx b/qemu-options.hx
> >>>>>>>>> index 7d47510..33c8173 100644
> >>>>>>>>> --- a/qemu-options.hx
> >>>>>>>>> +++ b/qemu-options.hx
> >>>>>>>>> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >>>>>>>>>      "                vmport=on|off|auto controls emulation of 
> >>>>>>>>> vmport (default: auto)\n"
> >>>>>>>>>      "                dump-guest-core=on|off include guest memory 
> >>>>>>>>> in a core dump (default=on)\n"
> >>>>>>>>>      "                mem-merge=on|off controls memory merge 
> >>>>>>>>> support (default: on)\n"
> >>>>>>>>> +    "                memfd-alloc=on|off controls allocating 
> >>>>>>>>> anonymous guest RAM using memfd_create (default: off)\n"        
> >>>>>>>>
> >>>>>>>> Question: are there any disadvantages associated with using
> >>>>>>>> memfd_create? I guess we are using up an fd, but that seems minor.  
> >>>>>>>> Any
> >>>>>>>> reason not to set to on by default? maybe with a fallback option to
> >>>>>>>> disable that?      
> >>>>>>
> >>>>>> Old Linux host kernels, circa 4.1, do not support huge pages for 
> >>>>>> shared memory.
> >>>>>> Also, the tunable to enable huge pages for share memory is different 
> >>>>>> than for
> >>>>>> anon memory, so there could be performance loss if it is not set 
> >>>>>> correctly.
> >>>>>>     /sys/kernel/mm/transparent_hugepage/enabled
> >>>>>>     vs
> >>>>>>     /sys/kernel/mm/transparent_hugepage/shmem_enabled      
> >>>>>
> >>>>> I guess we can test this when launching the VM, and select
> >>>>> a good default.
> >>>>>    
> >>>>>> It might make sense to use memfd_create by default for the secondary 
> >>>>>> segments.      
> >>>>>
> >>>>> Well there's also KSM now you mention it.    
> >>>>
> >>>> then another quest, is there downside to always using memfd_create
> >>>> without any knobs being involved?    
> >>>
> >>> Lower performance if small pages are used (but Michael suggests qemu 
> >>> could 
> >>> automatically check the tunable and use anon memory instead)
> >>>
> >>> KSM (same page merging) is not supported for shared memory, so 
> >>> ram_block_add ->
> >>> memory_try_enable_merging will not enable it.
> >>>
> >>> In both cases, I expect the degradation would be negligible if 
> >>> memfd_create is
> >>> only automatically applied to the secondary segments, which are typically 
> >>> small.
> >>> But, someone's secondary segment could be larger, and it is time 
> >>> consuming to
> >>> prove innocence when someone claims your change caused their performance 
> >>> regression.  
> >>
> >> Adding David as memory subsystem maintainer, maybe he will a better
> >> idea instead of introducing global knob that would also magically alter 
> >> backends' behavior despite of its their configured settings.  
> > 
> > OK, in ram_block_add I can set the RAM_SHARED flag based on the 
> > memory-backend object's
> > shared flag.  I already set the latter in create_default_memdev when 
> > memfd-alloc is
> > specified.  With that change, we do not override configured settings.  
> > Users can no longer
> > use memory-backend-ram for CPR, and must change all memory-backend-ram to 
> > memory-backend-memfd
> > in the command-line arguments.  That is fine.
> > 
> > With that change, are you OK with this patch?  
> 
> Sorry, I mis-read my own code in ram_block_add.  The existing code is correct 
> and does 
> not alter any backend's behavior.   It only sets the shared flag when the ram 
> is *not* 
> being allocated for a backend:
> 
>                 if (!object_dynamic_cast(parent, TYPE_MEMORY_BACKEND)) {
>                     new_block->flags |= RAM_SHARED;
>                 }
> 

ok, maybe instead of introducing a generic option, introduce the high level
feature one that turns this and other necessary quirks for it to work (i.e.
something like live-update=on|off).
That will not make QEMU internals any better but at least it will hide obscure
memfd-alloc from users.
Is there a patch that makes QEMU error out if backend without
shared=on is used?

Also, can you answer question below, pls
or point to a patch in series that takes care of that invariant?

[...]

> >>>>>> There is currently no way to specify memory backends for the secondary 
> >>>>>> memory
> >>>>>> segments (vram, roms, etc), and IMO it would be onerous to specify a 
> >>>>>> backend for
> >>>>>> each of them.  On x86_64, these include pc.bios, vga.vram, pc.rom, 
> >>>>>> vga.rom,
> >>>>>> /rom@etc/acpi/tables, /rom@etc/table-loader, /rom@etc/acpi/rsdp.  
> >>
> >> MemoryRegion is not the only place where state is stored.
> >> If we only talk about fwcfg entries state, it can also reference
> >> plain malloced memory allocated elsewhere or make a deep copy internally.
> >> Similarly devices also may store state outside of RamBlock framework.
> >>
> >> How are you dealing with that?
[...]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]