qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V7 10/29] machine: memfd-alloc option


From: Steven Sistare
Subject: Re: [PATCH V7 10/29] machine: memfd-alloc option
Date: Tue, 29 Mar 2022 13:43:59 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0

On 3/11/2022 4:42 AM, Igor Mammedov wrote:
> On Thu, 10 Mar 2022 13:18:35 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 3/10/2022 12:28 PM, Steven Sistare wrote:
>>> On 3/10/2022 11:00 AM, Igor Mammedov wrote:  
>>>> On Thu, 10 Mar 2022 10:36:08 -0500
>>>> Steven Sistare <steven.sistare@oracle.com> wrote:
>>>>  
>>>>> On 3/8/2022 2:20 AM, Igor Mammedov wrote:  
>>>>>> On Tue, 8 Mar 2022 01:50:11 -0500
>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>>     
>>>>>>> On Mon, Mar 07, 2022 at 09:41:44AM -0500, Steven Sistare wrote:    
>>>>>>>> On 3/4/2022 5:41 AM, Igor Mammedov wrote:      
>>>>>>>>> On Thu, 3 Mar 2022 12:21:15 -0500
>>>>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>>>>>       
>>>>>>>>>> On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:      
>>>>>>>>>>> Allocate anonymous memory using memfd_create if the memfd-alloc 
>>>>>>>>>>> machine
>>>>>>>>>>> option is set.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>>>>>>> ---
>>>>>>>>>>>  hw/core/machine.c   | 19 +++++++++++++++++++
>>>>>>>>>>>  include/hw/boards.h |  1 +
>>>>>>>>>>>  qemu-options.hx     |  6 ++++++
>>>>>>>>>>>  softmmu/physmem.c   | 47 
>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++---------
>>>>>>>>>>>  softmmu/vl.c        |  1 +
>>>>>>>>>>>  trace-events        |  1 +
>>>>>>>>>>>  util/qemu-config.c  |  4 ++++
>>>>>>>>>>>  7 files changed, 70 insertions(+), 9 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>>>>>>>>>> index 53a99ab..7739d88 100644
>>>>>>>>>>> --- a/hw/core/machine.c
>>>>>>>>>>> +++ b/hw/core/machine.c
>>>>>>>>>>> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj, 
>>>>>>>>>>> bool value, Error **errp)
>>>>>>>>>>>      ms->mem_merge = value;
>>>>>>>>>>>  }
>>>>>>>>>>>  
>>>>>>>>>>> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
>>>>>>>>>>> +{
>>>>>>>>>>> +    MachineState *ms = MACHINE(obj);
>>>>>>>>>>> +
>>>>>>>>>>> +    return ms->memfd_alloc;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static void machine_set_memfd_alloc(Object *obj, bool value, Error 
>>>>>>>>>>> **errp)
>>>>>>>>>>> +{
>>>>>>>>>>> +    MachineState *ms = MACHINE(obj);
>>>>>>>>>>> +
>>>>>>>>>>> +    ms->memfd_alloc = value;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>  static bool machine_get_usb(Object *obj, Error **errp)
>>>>>>>>>>>  {
>>>>>>>>>>>      MachineState *ms = MACHINE(obj);
>>>>>>>>>>> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass 
>>>>>>>>>>> *oc, void *data)
>>>>>>>>>>>      object_class_property_set_description(oc, "mem-merge",
>>>>>>>>>>>          "Enable/disable memory merge support");
>>>>>>>>>>>  
>>>>>>>>>>> +    object_class_property_add_bool(oc, "memfd-alloc",
>>>>>>>>>>> +        machine_get_memfd_alloc, machine_set_memfd_alloc);
>>>>>>>>>>> +    object_class_property_set_description(oc, "memfd-alloc",
>>>>>>>>>>> +        "Enable/disable allocating anonymous memory using 
>>>>>>>>>>> memfd_create");
>>>>>>>>>>> +
>>>>>>>>>>>      object_class_property_add_bool(oc, "usb",
>>>>>>>>>>>          machine_get_usb, machine_set_usb);
>>>>>>>>>>>      object_class_property_set_description(oc, "usb",
>>>>>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>>>>>>>>>>> index 9c1c190..a57d7a0 100644
>>>>>>>>>>> --- a/include/hw/boards.h
>>>>>>>>>>> +++ b/include/hw/boards.h
>>>>>>>>>>> @@ -327,6 +327,7 @@ struct MachineState {
>>>>>>>>>>>      char *dt_compatible;
>>>>>>>>>>>      bool dump_guest_core;
>>>>>>>>>>>      bool mem_merge;
>>>>>>>>>>> +    bool memfd_alloc;
>>>>>>>>>>>      bool usb;
>>>>>>>>>>>      bool usb_disabled;
>>>>>>>>>>>      char *firmware;
>>>>>>>>>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>>>>>>>>>> index 7d47510..33c8173 100644
>>>>>>>>>>> --- a/qemu-options.hx
>>>>>>>>>>> +++ b/qemu-options.hx
>>>>>>>>>>> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>>>>>>>>>>      "                vmport=on|off|auto controls emulation of 
>>>>>>>>>>> vmport (default: auto)\n"
>>>>>>>>>>>      "                dump-guest-core=on|off include guest memory 
>>>>>>>>>>> in a core dump (default=on)\n"
>>>>>>>>>>>      "                mem-merge=on|off controls memory merge 
>>>>>>>>>>> support (default: on)\n"
>>>>>>>>>>> +    "                memfd-alloc=on|off controls allocating 
>>>>>>>>>>> anonymous guest RAM using memfd_create (default: off)\n"        
>>>>>>>>>>
>>>>>>>>>> Question: are there any disadvantages associated with using
>>>>>>>>>> memfd_create? I guess we are using up an fd, but that seems minor.  
>>>>>>>>>> Any
>>>>>>>>>> reason not to set to on by default? maybe with a fallback option to
>>>>>>>>>> disable that?      
>>>>>>>>
>>>>>>>> Old Linux host kernels, circa 4.1, do not support huge pages for 
>>>>>>>> shared memory.
>>>>>>>> Also, the tunable to enable huge pages for share memory is different 
>>>>>>>> than for
>>>>>>>> anon memory, so there could be performance loss if it is not set 
>>>>>>>> correctly.
>>>>>>>>     /sys/kernel/mm/transparent_hugepage/enabled
>>>>>>>>     vs
>>>>>>>>     /sys/kernel/mm/transparent_hugepage/shmem_enabled      
>>>>>>>
>>>>>>> I guess we can test this when launching the VM, and select
>>>>>>> a good default.
>>>>>>>    
>>>>>>>> It might make sense to use memfd_create by default for the secondary 
>>>>>>>> segments.      
>>>>>>>
>>>>>>> Well there's also KSM now you mention it.    
>>>>>>
>>>>>> then another quest, is there downside to always using memfd_create
>>>>>> without any knobs being involved?    
>>>>>
>>>>> Lower performance if small pages are used (but Michael suggests qemu 
>>>>> could 
>>>>> automatically check the tunable and use anon memory instead)
>>>>>
>>>>> KSM (same page merging) is not supported for shared memory, so 
>>>>> ram_block_add ->
>>>>> memory_try_enable_merging will not enable it.
>>>>>
>>>>> In both cases, I expect the degradation would be negligible if 
>>>>> memfd_create is
>>>>> only automatically applied to the secondary segments, which are typically 
>>>>> small.
>>>>> But, someone's secondary segment could be larger, and it is time 
>>>>> consuming to
>>>>> prove innocence when someone claims your change caused their performance 
>>>>> regression.  
>>>>
>>>> Adding David as memory subsystem maintainer, maybe he will a better
>>>> idea instead of introducing global knob that would also magically alter 
>>>> backends' behavior despite of its their configured settings.  
>>>
>>> OK, in ram_block_add I can set the RAM_SHARED flag based on the 
>>> memory-backend object's
>>> shared flag.  I already set the latter in create_default_memdev when 
>>> memfd-alloc is
>>> specified.  With that change, we do not override configured settings.  
>>> Users can no longer
>>> use memory-backend-ram for CPR, and must change all memory-backend-ram to 
>>> memory-backend-memfd
>>> in the command-line arguments.  That is fine.
>>>
>>> With that change, are you OK with this patch?  
>>
>> Sorry, I mis-read my own code in ram_block_add.  The existing code is 
>> correct and does 
>> not alter any backend's behavior.   It only sets the shared flag when the 
>> ram is *not* 
>> being allocated for a backend:
>>
>>                 if (!object_dynamic_cast(parent, TYPE_MEMORY_BACKEND)) {
>>                     new_block->flags |= RAM_SHARED;
>>                 }
>>
> 
> ok, maybe instead of introducing a generic option, introduce the high level
> feature one that turns this and other necessary quirks for it to work (i.e.
> something like live-update=on|off).
> That will not make QEMU internals any better but at least it will hide obscure
> memfd-alloc from users.

That occurred to me, but during this review, a few folks have said it would be 
useful
to expose memfd-alloc directly.  And, I don't think that hiding memfd-alloc 
under
another flag is helpful, as some users will still want to understand how 
enabling cpr 
affects the VM environment.  I would still be documenting the memfd behavior 
for the
hypothetical new flag.

I currently document memfd-alloc in the following places.  If you think this is 
confusing or incomplete, please let me know:

qemu-options.hx

@item memfd-alloc=on|off
Enables or disables allocation of anonymous guest RAM using
memfd_create.  Any associated memory-backend objects are created with
share=on.  The memfd-alloc default is off.

hmp-commands.hx

@item cpr-save @var{filename} @var{mode}
...
If @var{mode} is 'restart', the checkpoint remains valid after restarting qemu
using a subsequent cpr-exec.  All guest RAM objects must be shared.  The
share=on property is required for memory created with an explicit -object
option, and the memfd-alloc machine property is required for memory that is
implicitly created. 

And this error message in a few places for the only-cpr-capable command line 
option:

"only-cpr-capable requires -machine memfd-alloc=on"

> Is there a patch that makes QEMU error out if backend without
> shared=on is used?

No.  I will add that check, thanks.

> Also, can you answer question below, pls
> or point to a patch in series that takes care of that invariant?
> 
> [...]
> 
>>>>>>>> There is currently no way to specify memory backends for the secondary 
>>>>>>>> memory
>>>>>>>> segments (vram, roms, etc), and IMO it would be onerous to specify a 
>>>>>>>> backend for
>>>>>>>> each of them.  On x86_64, these include pc.bios, vga.vram, pc.rom, 
>>>>>>>> vga.rom,
>>>>>>>> /rom@etc/acpi/tables, /rom@etc/table-loader, /rom@etc/acpi/rsdp.  
>>>>
>>>> MemoryRegion is not the only place where state is stored.
>>>> If we only talk about fwcfg entries state, it can also reference
>>>> plain malloced memory allocated elsewhere or make a deep copy internally.
>>>> Similarly devices also may store state outside of RamBlock framework.
>>>>
>>>> How are you dealing with that?

Sorry, I missed this before.
fwcfg defines vmstate handlers that save and restore all state to a file across 
the cpr
operation, similar to live migration.  In general, if it works for live 
migration, then
it works for cpr.  If you find a counter-example, please let me know.

- Steve



reply via email to

[Prev in Thread] Current Thread [Next in Thread]