qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC] memory: pause all vCPUs for the duration of memory trans


From: Vitaly Kuznetsov
Subject: Re: [PATCH RFC] memory: pause all vCPUs for the duration of memory transactions
Date: Tue, 27 Oct 2020 14:02:10 +0100

David Hildenbrand <david@redhat.com> writes:

> On 27.10.20 13:36, Vitaly Kuznetsov wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> On 26.10.20 11:43, David Hildenbrand wrote:
>>>> On 26.10.20 09:49, Vitaly Kuznetsov wrote:
>>>>> Currently, KVM doesn't provide an API to make atomic updates to memmap 
>>>>> when
>>>>> the change touches more than one memory slot, e.g. in case we'd like to
>>>>> punch a hole in an existing slot.
>>>>>
>>>>> Reports are that multi-CPU Q35 VMs booted with OVMF sometimes print 
>>>>> something
>>>>> like
>>>>>
>>>>> !!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000003 
>>>>> !!!!
>>>>> ExceptionData - 0000000000000010  I:1 R:0 U:0 W:0 P:0 PK:0 SS:0 SGX:0
>>>>> RIP  - 000000007E35FAB6, CS  - 0000000000000038, RFLAGS - 0000000000010006
>>>>> RAX  - 0000000000000000, RCX - 000000007E3598F2, RDX - 00000000078BFBFF
>>>>> ...
>>>>>
>>>>> The problem seems to be that TSEG manipulations on one vCPU are not atomic
>>>>> from other vCPUs views. In particular, here's the strace:
>>>>>
>>>>> Initial creation of the 'problematic' slot:
>>>>>
>>>>> 10085 ioctl(13, KVM_SET_USER_MEMORY_REGION, {slot=6, flags=0, 
>>>>> guest_phys_addr=0x100000,
>>>>>     memory_size=2146435072, userspace_addr=0x7fb89bf00000}) = 0
>>>>>
>>>>> ... and then the update (caused by e.g. mch_update_smram()) later:
>>>>>
>>>>> 10090 ioctl(13, KVM_SET_USER_MEMORY_REGION, {slot=6, flags=0, 
>>>>> guest_phys_addr=0x100000,
>>>>>     memory_size=0, userspace_addr=0x7fb89bf00000}) = 0
>>>>> 10090 ioctl(13, KVM_SET_USER_MEMORY_REGION, {slot=6, flags=0, 
>>>>> guest_phys_addr=0x100000,
>>>>>     memory_size=2129657856, userspace_addr=0x7fb89bf00000}) = 0
>>>>>
>>>>> In case KVM has to handle any event on a different vCPU in between these
>>>>> two calls the #PF will get triggered.
>>>>>
>>>>> An ideal solution to the problem would probably require KVM to provide a
>>>>> new API to do the whole transaction in one shot but as a band-aid we can
>>>>> just pause all vCPUs to make memory transations atomic.
>>>>>
>>>>> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>> ---
>>>>> RFC: Generally, memap updates happen only a few times during guest boot 
>>>>> but
>>>>> I'm not sure there are no scenarios when pausing all vCPUs is undesireable
>>>>> from performance point of view. Also, I'm not sure if kvm_enabled() check
>>>>> is needed.
>>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>> ---
>>>>>   softmmu/memory.c | 11 +++++++++--
>>>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>>>>> index fa280a19f7f7..0bf6f3f6d5dc 100644
>>>>> --- a/softmmu/memory.c
>>>>> +++ b/softmmu/memory.c
>>>>> @@ -28,6 +28,7 @@
>>>>>   
>>>>>   #include "exec/memory-internal.h"
>>>>>   #include "exec/ram_addr.h"
>>>>> +#include "sysemu/cpus.h"
>>>>>   #include "sysemu/kvm.h"
>>>>>   #include "sysemu/runstate.h"
>>>>>   #include "sysemu/tcg.h"
>>>>> @@ -1057,7 +1058,9 @@ static void 
>>>>> address_space_update_topology(AddressSpace *as)
>>>>>   void memory_region_transaction_begin(void)
>>>>>   {
>>>>>       qemu_flush_coalesced_mmio_buffer();
>>>>> -    ++memory_region_transaction_depth;
>>>>> +    if ((++memory_region_transaction_depth == 1) && kvm_enabled()) {
>>>>> +        pause_all_vcpus();
>>>>> +    }
>>>>>   }
>>>>>   
>>>>>   void memory_region_transaction_commit(void)
>>>>> @@ -1087,7 +1090,11 @@ void memory_region_transaction_commit(void)
>>>>>               }
>>>>>               ioeventfd_update_pending = false;
>>>>>           }
>>>>> -   }
>>>>> +
>>>>> +        if (kvm_enabled()) {
>>>>> +            resume_all_vcpus();
>>>>> +        }
>>>>> +    }
>>>>>   }
>>>>>   
>>>>>   static void memory_region_destructor_none(MemoryRegion *mr)
>>>>>
>>>>
>>>> This is in general unsafe. pause_all_vcpus() will temporarily drop the
>>>> BQL, resulting in bad things happening to caller sites.
>> 
>> Oh, I see, thanks! I was expecting there's a reason we don't have this
>> simple fix in already :-)
>> 
>>>>
>>>> I studies the involved issues quite intensively when wanting to resize
>>>> memory regions from virtio-mem code. It's not that easy.
>>>>
>>>> Have a look at my RFC for resizing. You can apply something similar to
>>>> other operations.
>>>>
>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg684979.html
>>>
>>> Oh, and I even mentioned the case you try to fix here back then
>>>
>>> "
>>> Instead of inhibiting during the region_resize(), we could inhibit for the
>>> hole memory transaction (from begin() to commit()). This could be nice,
>>> because also splitting of memory regions would be atomic (I remember there
>>> was a BUG report regarding that), however, I am not sure if that might
>>> impact any RT users.
>>> "
>>>
>>> The current patches live in
>>> https://github.com/davidhildenbrand/qemu/commits/virtio-mem-next
>>>
>>> Especially
>>>
>>> https://github.com/davidhildenbrand/qemu/commit/433fbb3abed20f15030e42f2b2bea7e6b9a15180
>>>
>>>
>> 
>> I'm not sure why we're focusing on ioctls here. I was debugging my case
>> quite some time ago but from what I remember it had nothing to do with
>> ioctls from QEMU. When we are removing a memslot any exit to KVM may
>> trigger an error condition as we'll see that vCPU or some of our
>> internal structures (e.g. VMCS for a nested guest) references
>> non-existent memory. I don't see a good solution other than making the
>> update fully atomic from *all* vCPUs point of view and this requires
>> stopping all CPUs -- either from QEMU or from KVM.
>
> I cannot follow. My patch waits until *any* KVM ioctls are out of the 
> kernel. That includes VCPUs, but also other ioctls (because there are 
> some that require a consistent memory block state).
>
> So from a KVM point of view, the CPUs are stopped.

Sorry for not being clear: your patch looks good to me, what I tried to
say is that with the current KVM API the only way to guarantee atomicity
of the update is to make vCPUs stop (one way or another), kicking them
out and preventing new IOCTLs from being dispatched is one way
(temporary pausing them inside KVM would be another, for example -- but
that would require *new* API supplying the whole transaction and not one
memslot update).

-- 
Vitaly




reply via email to

[Prev in Thread] Current Thread [Next in Thread]