[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH] migration: Fix a potential guest memory corruption
From: |
Duan, Zhenzhong |
Subject: |
RE: [PATCH] migration: Fix a potential guest memory corruption |
Date: |
Mon, 17 Oct 2022 02:17:19 +0000 |
>-----Original Message-----
>From: Dr. David Alan Gilbert <dgilbert@redhat.com>
>Sent: Tuesday, October 11, 2022 7:06 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; quintela@redhat.com
>Subject: Re: [PATCH] migration: Fix a potential guest memory corruption
>
>* Zhenzhong Duan (zhenzhong.duan@intel.com) wrote:
>
>Hi,
Hi,
Sorry for late response. Just back from vacation.
>
>> Imagine a rare case, after a dirty page is sent to compression
>> threads's ring, dirty bitmap sync trigger right away and mark the same
>> page dirty again and sent. Then the new page may be overwriten by
>> stale page in compression threads's ring in the destination.
>
>Yes, I think we had a similar problem in multifd.
Multifd flush operation multifd_send_sync_main() is called in each memory
iteration
which is more aggressive than in compression. I think not an issue in multifd?
>
>> So we need to ensure there is only one copy of the same dirty page
>> either by flushing the ring after every bitmap sync or avoiding
>> processing same dirty page continuously.
>>
>> I choose the 2nd which avoids the time consuming flush operation.
>
>I'm not sure this guarantees it; it makes it much less likely but if only a few
>pages are dirtied and you have lots of threads, I think the same thing could
>still happy.
I didn't get it, imagine there are dirty page "A B C D E F G" in current
RAMBLOCK.
1. Page "A B C D" are sent to compression thread.
2. dirty page sync triggers, update dirty map to "A B D E F G"
3. Page D is checked and sent to compression thread again, so there may be two
copy of page D in compression thread, corruption!
4. Page "E F G" are sent to compression thread.
5. flush operation triggered at end of current RAMBLOCK.
6. In next iteration, page "A B" are sent.
After patch:
1. Page "A B C D" are sent to compression thread.
2. dirty page sync triggers, update dirty map to " A B D E F G"
3. Page after page D are checked and sent to compression thread which are Page
"E F G".
5. flush operation triggered at end of current RAMBLOCK, ensures page D flushed.
6. In next iteration, page "A B D" are sent.
Thanks
Zhenzhong
>
>I think you're going to need to flush the ring after each sync.
>
>Dave
>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> migration/ram.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c index
>> dc1de9ddbc68..67b2035586bd 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1551,7 +1551,7 @@ static bool find_dirty_block(RAMState *rs,
>PageSearchStatus *pss, bool *again)
>> pss->postcopy_requested = false;
>> pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
>>
>> - pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
>> + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page
>> + + 1);
>> if (pss->complete_round && pss->block == rs->last_seen_block &&
>> pss->page >= rs->last_page) {
>> /*
>> @@ -1564,7 +1564,7 @@ static bool find_dirty_block(RAMState *rs,
>PageSearchStatus *pss, bool *again)
>> if (!offset_in_ramblock(pss->block,
>> ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
>> /* Didn't find anything in this RAM Block */
>> - pss->page = 0;
>> + pss->page = -1;
>> pss->block = QLIST_NEXT_RCU(pss->block, next);
>> if (!pss->block) {
>> /*
>> @@ -2694,7 +2694,7 @@ static void ram_state_reset(RAMState *rs) {
>> rs->last_seen_block = NULL;
>> rs->last_sent_block = NULL;
>> - rs->last_page = 0;
>> + rs->last_page = -1;
>> rs->last_version = ram_list.version;
>> rs->xbzrle_enabled = false;
>> postcopy_preempt_reset(rs);
>> @@ -2889,7 +2889,7 @@ void
>ram_postcopy_send_discard_bitmap(MigrationState *ms)
>> /* Easiest way to make sure we don't resume in the middle of a host-page
>*/
>> rs->last_seen_block = NULL;
>> rs->last_sent_block = NULL;
>> - rs->last_page = 0;
>> + rs->last_page = -1;
>>
>> postcopy_each_ram_send_discard(ms);
>>
>> --
>> 2.25.1
>>
>--
>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK