qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" ph


From: William Roche
Subject: Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Wed, 6 Sep 2023 23:29:29 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0

On 9/6/23 17:16, Peter Xu wrote:

Just a note..

Probably fine for now to reuse block page size, but IIUC the right thing to
do is to fetch it from the signal info (in QEMU's sigbus_handler()) of
kernel_siginfo.si_addr_lsb.

At least for x86 I think that stores the "shift" of covered poisoned page
(one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a
huge page, though.. not aware of any man page for that).  It'll then work
naturally when Linux huge pages will start to support sub-huge-page-size
poisoning someday.  We can definitely leave that for later.


I totally agree with that !


--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus *pss, 
QEMUFile *file,
      uint8_t *p = block->host + offset;
      int len = 0;
- if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+    if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||

Can we move this out of zero page handling?  Zero detection is not
guaranteed to always be the 1st thing to do when processing a guest page.
Currently it'll already skip either rdma or when compression enabled, so
it'll keep crashing there.

Perhaps at the entry of ram_save_target_page_legacy()?

Right, as expected, using migration compression with poisoned pages crashes even with this fix...

The difficulty I see to place the poisoned page verification on the
entry of ram_save_target_page_legacy() is what to do to skip the found poison page(s) if any ?

Should I continue to treat them as zero pages written with save_zero_page_to_file ? Or should I consider the case of an ongoing compression use and create a new code compressing an empty page with save_compress_page() ?

And what about an RDMA memory region impacted by a memory error ?
This is an important aspect.
Does anyone know how this situation is dealt with ? And how it should be handled in Qemu ?

--
Thanks,
William.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]