qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH] migration: Fix a potential guest memory corruption


From: Duan, Zhenzhong
Subject: RE: [PATCH] migration: Fix a potential guest memory corruption
Date: Mon, 17 Oct 2022 02:17:19 +0000


>-----Original Message-----
>From: Dr. David Alan Gilbert <dgilbert@redhat.com>
>Sent: Tuesday, October 11, 2022 7:06 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; quintela@redhat.com
>Subject: Re: [PATCH] migration: Fix a potential guest memory corruption
>
>* Zhenzhong Duan (zhenzhong.duan@intel.com) wrote:
>
>Hi,
Hi,
Sorry for late response. Just back from vacation.

>
>> Imagine a rare case, after a dirty page is sent to compression
>> threads's ring, dirty bitmap sync trigger right away and mark the same
>> page dirty again and sent. Then the new page may be overwriten by
>> stale page in compression threads's ring in the destination.
>
>Yes, I think we had a similar problem in multifd.
Multifd flush operation multifd_send_sync_main() is called in each memory 
iteration
which is more aggressive than in compression. I think not an issue in multifd?

>
>> So we need to ensure there is only one copy of the same dirty page
>> either by flushing the ring after every bitmap sync or avoiding
>> processing same dirty page continuously.
>>
>> I choose the 2nd which avoids the time consuming flush operation.
>
>I'm not sure this guarantees it; it makes it much less likely but if only a few
>pages are dirtied and you have lots of threads, I think the same thing could
>still happy.
I didn't get it, imagine there are dirty page "A B C D E F G" in current 
RAMBLOCK.
1. Page "A B C D" are sent to compression thread.
2. dirty page sync triggers, update dirty map to "A B D E F G"
3. Page D is checked and sent to compression thread again, so there may be two 
copy of page D in compression thread, corruption!
4. Page "E F G" are sent to compression thread.
5. flush operation triggered at end of current RAMBLOCK.
6. In next iteration, page "A B" are sent.

After patch:
1. Page "A B C D" are sent to compression thread.
2. dirty page sync triggers, update dirty map to " A B D E F G"
3. Page after page D are checked and sent to compression thread which are Page 
"E F G".
5. flush operation triggered at end of current RAMBLOCK, ensures page D flushed.
6. In next iteration, page "A B D" are sent.

Thanks
Zhenzhong
>
>I think you're going to need to flush the ring after each sync.
>
>Dave
>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  migration/ram.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c index
>> dc1de9ddbc68..67b2035586bd 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1551,7 +1551,7 @@ static bool find_dirty_block(RAMState *rs,
>PageSearchStatus *pss, bool *again)
>>      pss->postcopy_requested = false;
>>      pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
>>
>> -    pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
>> +    pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page
>> + + 1);
>>      if (pss->complete_round && pss->block == rs->last_seen_block &&
>>          pss->page >= rs->last_page) {
>>          /*
>> @@ -1564,7 +1564,7 @@ static bool find_dirty_block(RAMState *rs,
>PageSearchStatus *pss, bool *again)
>>      if (!offset_in_ramblock(pss->block,
>>                              ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
>>          /* Didn't find anything in this RAM Block */
>> -        pss->page = 0;
>> +        pss->page = -1;
>>          pss->block = QLIST_NEXT_RCU(pss->block, next);
>>          if (!pss->block) {
>>              /*
>> @@ -2694,7 +2694,7 @@ static void ram_state_reset(RAMState *rs)  {
>>      rs->last_seen_block = NULL;
>>      rs->last_sent_block = NULL;
>> -    rs->last_page = 0;
>> +    rs->last_page = -1;
>>      rs->last_version = ram_list.version;
>>      rs->xbzrle_enabled = false;
>>      postcopy_preempt_reset(rs);
>> @@ -2889,7 +2889,7 @@ void
>ram_postcopy_send_discard_bitmap(MigrationState *ms)
>>      /* Easiest way to make sure we don't resume in the middle of a host-page
>*/
>>      rs->last_seen_block = NULL;
>>      rs->last_sent_block = NULL;
>> -    rs->last_page = 0;
>> +    rs->last_page = -1;
>>
>>      postcopy_each_ram_send_discard(ms);
>>
>> --
>> 2.25.1
>>
>--
>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]