[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V7 10/29] machine: memfd-alloc option
From: |
Steven Sistare |
Subject: |
Re: [PATCH V7 10/29] machine: memfd-alloc option |
Date: |
Tue, 29 Mar 2022 13:43:59 -0400 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 |
On 3/11/2022 4:42 AM, Igor Mammedov wrote:
> On Thu, 10 Mar 2022 13:18:35 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
>
>> On 3/10/2022 12:28 PM, Steven Sistare wrote:
>>> On 3/10/2022 11:00 AM, Igor Mammedov wrote:
>>>> On Thu, 10 Mar 2022 10:36:08 -0500
>>>> Steven Sistare <steven.sistare@oracle.com> wrote:
>>>>
>>>>> On 3/8/2022 2:20 AM, Igor Mammedov wrote:
>>>>>> On Tue, 8 Mar 2022 01:50:11 -0500
>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>>
>>>>>>> On Mon, Mar 07, 2022 at 09:41:44AM -0500, Steven Sistare wrote:
>>>>>>>> On 3/4/2022 5:41 AM, Igor Mammedov wrote:
>>>>>>>>> On Thu, 3 Mar 2022 12:21:15 -0500
>>>>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:
>>>>>>>>>>> Allocate anonymous memory using memfd_create if the memfd-alloc
>>>>>>>>>>> machine
>>>>>>>>>>> option is set.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>>>>>>> ---
>>>>>>>>>>> hw/core/machine.c | 19 +++++++++++++++++++
>>>>>>>>>>> include/hw/boards.h | 1 +
>>>>>>>>>>> qemu-options.hx | 6 ++++++
>>>>>>>>>>> softmmu/physmem.c | 47
>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++---------
>>>>>>>>>>> softmmu/vl.c | 1 +
>>>>>>>>>>> trace-events | 1 +
>>>>>>>>>>> util/qemu-config.c | 4 ++++
>>>>>>>>>>> 7 files changed, 70 insertions(+), 9 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>>>>>>>>>> index 53a99ab..7739d88 100644
>>>>>>>>>>> --- a/hw/core/machine.c
>>>>>>>>>>> +++ b/hw/core/machine.c
>>>>>>>>>>> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj,
>>>>>>>>>>> bool value, Error **errp)
>>>>>>>>>>> ms->mem_merge = value;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
>>>>>>>>>>> +{
>>>>>>>>>>> + MachineState *ms = MACHINE(obj);
>>>>>>>>>>> +
>>>>>>>>>>> + return ms->memfd_alloc;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static void machine_set_memfd_alloc(Object *obj, bool value, Error
>>>>>>>>>>> **errp)
>>>>>>>>>>> +{
>>>>>>>>>>> + MachineState *ms = MACHINE(obj);
>>>>>>>>>>> +
>>>>>>>>>>> + ms->memfd_alloc = value;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> static bool machine_get_usb(Object *obj, Error **errp)
>>>>>>>>>>> {
>>>>>>>>>>> MachineState *ms = MACHINE(obj);
>>>>>>>>>>> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass
>>>>>>>>>>> *oc, void *data)
>>>>>>>>>>> object_class_property_set_description(oc, "mem-merge",
>>>>>>>>>>> "Enable/disable memory merge support");
>>>>>>>>>>>
>>>>>>>>>>> + object_class_property_add_bool(oc, "memfd-alloc",
>>>>>>>>>>> + machine_get_memfd_alloc, machine_set_memfd_alloc);
>>>>>>>>>>> + object_class_property_set_description(oc, "memfd-alloc",
>>>>>>>>>>> + "Enable/disable allocating anonymous memory using
>>>>>>>>>>> memfd_create");
>>>>>>>>>>> +
>>>>>>>>>>> object_class_property_add_bool(oc, "usb",
>>>>>>>>>>> machine_get_usb, machine_set_usb);
>>>>>>>>>>> object_class_property_set_description(oc, "usb",
>>>>>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>>>>>>>>>>> index 9c1c190..a57d7a0 100644
>>>>>>>>>>> --- a/include/hw/boards.h
>>>>>>>>>>> +++ b/include/hw/boards.h
>>>>>>>>>>> @@ -327,6 +327,7 @@ struct MachineState {
>>>>>>>>>>> char *dt_compatible;
>>>>>>>>>>> bool dump_guest_core;
>>>>>>>>>>> bool mem_merge;
>>>>>>>>>>> + bool memfd_alloc;
>>>>>>>>>>> bool usb;
>>>>>>>>>>> bool usb_disabled;
>>>>>>>>>>> char *firmware;
>>>>>>>>>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>>>>>>>>>> index 7d47510..33c8173 100644
>>>>>>>>>>> --- a/qemu-options.hx
>>>>>>>>>>> +++ b/qemu-options.hx
>>>>>>>>>>> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>>>>>>>>>> " vmport=on|off|auto controls emulation of
>>>>>>>>>>> vmport (default: auto)\n"
>>>>>>>>>>> " dump-guest-core=on|off include guest memory
>>>>>>>>>>> in a core dump (default=on)\n"
>>>>>>>>>>> " mem-merge=on|off controls memory merge
>>>>>>>>>>> support (default: on)\n"
>>>>>>>>>>> + " memfd-alloc=on|off controls allocating
>>>>>>>>>>> anonymous guest RAM using memfd_create (default: off)\n"
>>>>>>>>>>
>>>>>>>>>> Question: are there any disadvantages associated with using
>>>>>>>>>> memfd_create? I guess we are using up an fd, but that seems minor.
>>>>>>>>>> Any
>>>>>>>>>> reason not to set to on by default? maybe with a fallback option to
>>>>>>>>>> disable that?
>>>>>>>>
>>>>>>>> Old Linux host kernels, circa 4.1, do not support huge pages for
>>>>>>>> shared memory.
>>>>>>>> Also, the tunable to enable huge pages for share memory is different
>>>>>>>> than for
>>>>>>>> anon memory, so there could be performance loss if it is not set
>>>>>>>> correctly.
>>>>>>>> /sys/kernel/mm/transparent_hugepage/enabled
>>>>>>>> vs
>>>>>>>> /sys/kernel/mm/transparent_hugepage/shmem_enabled
>>>>>>>
>>>>>>> I guess we can test this when launching the VM, and select
>>>>>>> a good default.
>>>>>>>
>>>>>>>> It might make sense to use memfd_create by default for the secondary
>>>>>>>> segments.
>>>>>>>
>>>>>>> Well there's also KSM now you mention it.
>>>>>>
>>>>>> then another quest, is there downside to always using memfd_create
>>>>>> without any knobs being involved?
>>>>>
>>>>> Lower performance if small pages are used (but Michael suggests qemu
>>>>> could
>>>>> automatically check the tunable and use anon memory instead)
>>>>>
>>>>> KSM (same page merging) is not supported for shared memory, so
>>>>> ram_block_add ->
>>>>> memory_try_enable_merging will not enable it.
>>>>>
>>>>> In both cases, I expect the degradation would be negligible if
>>>>> memfd_create is
>>>>> only automatically applied to the secondary segments, which are typically
>>>>> small.
>>>>> But, someone's secondary segment could be larger, and it is time
>>>>> consuming to
>>>>> prove innocence when someone claims your change caused their performance
>>>>> regression.
>>>>
>>>> Adding David as memory subsystem maintainer, maybe he will a better
>>>> idea instead of introducing global knob that would also magically alter
>>>> backends' behavior despite of its their configured settings.
>>>
>>> OK, in ram_block_add I can set the RAM_SHARED flag based on the
>>> memory-backend object's
>>> shared flag. I already set the latter in create_default_memdev when
>>> memfd-alloc is
>>> specified. With that change, we do not override configured settings.
>>> Users can no longer
>>> use memory-backend-ram for CPR, and must change all memory-backend-ram to
>>> memory-backend-memfd
>>> in the command-line arguments. That is fine.
>>>
>>> With that change, are you OK with this patch?
>>
>> Sorry, I mis-read my own code in ram_block_add. The existing code is
>> correct and does
>> not alter any backend's behavior. It only sets the shared flag when the
>> ram is *not*
>> being allocated for a backend:
>>
>> if (!object_dynamic_cast(parent, TYPE_MEMORY_BACKEND)) {
>> new_block->flags |= RAM_SHARED;
>> }
>>
>
> ok, maybe instead of introducing a generic option, introduce the high level
> feature one that turns this and other necessary quirks for it to work (i.e.
> something like live-update=on|off).
> That will not make QEMU internals any better but at least it will hide obscure
> memfd-alloc from users.
That occurred to me, but during this review, a few folks have said it would be
useful
to expose memfd-alloc directly. And, I don't think that hiding memfd-alloc
under
another flag is helpful, as some users will still want to understand how
enabling cpr
affects the VM environment. I would still be documenting the memfd behavior
for the
hypothetical new flag.
I currently document memfd-alloc in the following places. If you think this is
confusing or incomplete, please let me know:
qemu-options.hx
@item memfd-alloc=on|off
Enables or disables allocation of anonymous guest RAM using
memfd_create. Any associated memory-backend objects are created with
share=on. The memfd-alloc default is off.
hmp-commands.hx
@item cpr-save @var{filename} @var{mode}
...
If @var{mode} is 'restart', the checkpoint remains valid after restarting qemu
using a subsequent cpr-exec. All guest RAM objects must be shared. The
share=on property is required for memory created with an explicit -object
option, and the memfd-alloc machine property is required for memory that is
implicitly created.
And this error message in a few places for the only-cpr-capable command line
option:
"only-cpr-capable requires -machine memfd-alloc=on"
> Is there a patch that makes QEMU error out if backend without
> shared=on is used?
No. I will add that check, thanks.
> Also, can you answer question below, pls
> or point to a patch in series that takes care of that invariant?
>
> [...]
>
>>>>>>>> There is currently no way to specify memory backends for the secondary
>>>>>>>> memory
>>>>>>>> segments (vram, roms, etc), and IMO it would be onerous to specify a
>>>>>>>> backend for
>>>>>>>> each of them. On x86_64, these include pc.bios, vga.vram, pc.rom,
>>>>>>>> vga.rom,
>>>>>>>> /rom@etc/acpi/tables, /rom@etc/table-loader, /rom@etc/acpi/rsdp.
>>>>
>>>> MemoryRegion is not the only place where state is stored.
>>>> If we only talk about fwcfg entries state, it can also reference
>>>> plain malloced memory allocated elsewhere or make a deep copy internally.
>>>> Similarly devices also may store state outside of RamBlock framework.
>>>>
>>>> How are you dealing with that?
Sorry, I missed this before.
fwcfg defines vmstate handlers that save and restore all state to a file across
the cpr
operation, similar to live migration. In general, if it works for live
migration, then
it works for cpr. If you find a counter-example, please let me know.
- Steve
- Re: [PATCH V7 10/29] machine: memfd-alloc option, (continued)
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Michael S. Tsirkin, 2022/03/03
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Igor Mammedov, 2022/03/04
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/07
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Michael S. Tsirkin, 2022/03/08
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Igor Mammedov, 2022/03/08
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Igor Mammedov, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Steven Sistare, 2022/03/10
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Igor Mammedov, 2022/03/11
- Re: [PATCH V7 10/29] machine: memfd-alloc option,
Steven Sistare <=
- Re: [PATCH V7 10/29] machine: memfd-alloc option, Daniel P . Berrangé, 2022/03/11
Re: [PATCH V7 10/29] machine: memfd-alloc option, David Hildenbrand, 2022/03/11
Re: [PATCH V7 10/29] machine: memfd-alloc option, David Hildenbrand, 2022/03/11