|
From: | William Roche |
Subject: | Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase |
Date: | Wed, 6 Sep 2023 23:29:29 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 |
On 9/6/23 17:16, Peter Xu wrote:
Just a note.. Probably fine for now to reuse block page size, but IIUC the right thing to do is to fetch it from the signal info (in QEMU's sigbus_handler()) of kernel_siginfo.si_addr_lsb. At least for x86 I think that stores the "shift" of covered poisoned page (one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a huge page, though.. not aware of any man page for that). It'll then work naturally when Linux huge pages will start to support sub-huge-page-size poisoning someday. We can definitely leave that for later.
I totally agree with that !
--- a/migration/ram.c +++ b/migration/ram.c @@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file, uint8_t *p = block->host + offset; int len = 0;- if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {+ if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||Can we move this out of zero page handling? Zero detection is not guaranteed to always be the 1st thing to do when processing a guest page. Currently it'll already skip either rdma or when compression enabled, so it'll keep crashing there. Perhaps at the entry of ram_save_target_page_legacy()?
Right, as expected, using migration compression with poisoned pages crashes even with this fix...
The difficulty I see to place the poisoned page verification on theentry of ram_save_target_page_legacy() is what to do to skip the found poison page(s) if any ?
Should I continue to treat them as zero pages written with save_zero_page_to_file ? Or should I consider the case of an ongoing compression use and create a new code compressing an empty page with save_compress_page() ?
And what about an RDMA memory region impacted by a memory error ? This is an important aspect.Does anyone know how this situation is dealt with ? And how it should be handled in Qemu ?
-- Thanks, William.
[Prev in Thread] | Current Thread | [Next in Thread] |