qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" ph


From: Joao Martins
Subject: Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Sat, 9 Sep 2023 15:57:44 +0100

On 06/09/2023 22:29, William Roche wrote:
> On 9/6/23 17:16, Peter Xu wrote:
>>
>> Just a note..
>>
>> Probably fine for now to reuse block page size, but IIUC the right thing to
>> do is to fetch it from the signal info (in QEMU's sigbus_handler()) of
>> kernel_siginfo.si_addr_lsb.
>>
>> At least for x86 I think that stores the "shift" of covered poisoned page
>> (one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a
>> huge page, though.. not aware of any man page for that).  It'll then work
>> naturally when Linux huge pages will start to support sub-huge-page-size
>> poisoning someday.  We can definitely leave that for later.
>>
> 
> I totally agree with that !
>

Provided this bug affects all qemu versions thus far, perhaps should be a follow
up series, to make the changer easier to bring into stable tree.

> 
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus
>>>> *pss, QEMUFile *file,
>>>>       uint8_t *p = block->host + offset;
>>>>       int len = 0;
>>>>   -    if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>>> +    if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||
>>
>> Can we move this out of zero page handling?  Zero detection is not
>> guaranteed to always be the 1st thing to do when processing a guest page.
>> Currently it'll already skip either rdma or when compression enabled, so
>> it'll keep crashing there.
>>
>> Perhaps at the entry of ram_save_target_page_legacy()?
> 
> Right, as expected, using migration compression with poisoned pages crashes 
> even
> with this fix...
> 
> The difficulty I see to place the poisoned page verification on the
> entry of ram_save_target_page_legacy() is what to do to skip the found poison
> page(s) if any ?
> 
> Should I continue to treat them as zero pages written with
> save_zero_page_to_file ? 

MCE had already been forward to the guest, so guest is supposed to not be using
the page (nor rely on its contents). Hence destination ought to just see a zero
page. So what you said seems like the best course of action.

> Or should I consider the case of an ongoing compression
> use and create a new code compressing an empty page with save_compress_page() 
> ?
> 
The compress code looks to be a tentative compression (not guaranteed IIUC), so
I am not sure it needs any more logic that just adding at the top of
ram_save_target_page_legacy() as Peter suggested?

> And what about an RDMA memory region impacted by a memory error ?
> This is an important aspect.
> Does anyone know how this situation is dealt with ? And how it should be 
> handled
> in Qemu ?
> 

If you refer to guest RDMA MRs that is just guest RAM, not sure we are even
aware of those from qemu. But if you refer to the RDMA transport that sits below
the Qemu file (or rather acts as an implementation of QemuFile), so handling in
ram_save_target_page_legacy() already seems to cover it.

> -- 
> Thanks,
> William.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]