[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page pois
From: |
David Hildenbrand |
Subject: |
Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature |
Date: |
Wed, 15 Apr 2020 15:46:01 -0400 (EDT) |
> Am 15.04.2020 um 21:29 schrieb Alexander Duyck <address@hidden>:
>
> On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand <address@hidden> wrote:
>>
>>>
>>> The comment above explains the "why". Basically poisoning a page will
>>> dirty it. So why hint a page as free when that will drop it back into
>>> the guest and result in it being dirtied again. What you end up with
>>> is all the pages that were temporarily placed in the balloon are dirty
>>> after the hinting report is finished at which point you made things
>>> worse instead of helping to improve them.
>>
>> Let me think this through. What happens on free page hinting is that
>>
>> a) we tell the guest that migration starts
>> b) it allocates pages (alloc_pages()), sends them to us and adds them to
>> a list
>> b) we exclude these pages on migration
>> c) we tell the guest that migration is over
>> d) the guest frees all allocated pages
>>
>> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
>> will fill all pages again with a pattern (or zero).
>
> They should have already been filled with the poison pattern before we
> got to d). A bigger worry is that we at some point in the future
> update the kernel so that d) doesn't trigger re-poisoning, in which
> case the pages won't be marked as dirty, we will have skipped the
> migration, and we have no idea what will be in the pages at the end.
>
>> AFAIKs, it's either
>>
>> 1) Not performing free page hinting, migrating pages filled with a
>> poison value (or zero).
>> 2) Performing free page hinting, not migrating pages filled with a
>> poison value (or zero), letting only the guest fill them again.
>>
>> I have the feeling, that 2) is still better for migration, because we
>> don't migrate useless pages and let the guest reconstruct the content?
>> (having a poison value of zero might be debatable)
>>
>> Can you tell me what I am missing? :)
>
> The goal of the free page hinting was to reduce the number of pages
> that have to be migrated, in the second case you point out is is
> basically deferring it and will make the post-copy quite more
> expensive since all of the free memory will be pushed to the post-copy
> which I would think would be undesirable since it means the VM would
> have to be down for a greater amount of time with the poisoning
> enabled.
Postcopy is a very good point, bought!
But (what you wrote above) it sounds like that this is really what we *have to*
do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was
documented). We should rephrase the comment then.
I could imagine that we also have to care about ordinary page
inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate
if set for now).
>
> The worst case scenario would be one in which the VM was suspended for
> migration while there were still pages in the balloon that needed to
> be drained. In such a case you would have them in an indeterminate
> state since the last page poisoning for them might have been ignored
> since they were marked as "free", so they are just going to be
> whatever value they were if they had been migrated at all.
>
>>>
>>>>
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> if (s->free_page_report_cmd_id == UINT_MAX) {
>>>>> s->free_page_report_cmd_id =
>>>>> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>>>
>>>> We should rename all "free_page_report" stuff we can to
>>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
>>>> side :) ) before adding free page reporting .
>>>>
>>>> (looking at the virtio-balloon linux header, it's also confusing but
>>>> we're stuck with that - maybe we should add better comments)
>>>
>>> Are we stuck? Couldn't we just convert it to an anonymous union with
>>> free_page_hint_cmd_id and then use that where needed?
>>
>> Saw your patch already :)
>>
>>>
>>>>> @@ -618,12 +627,10 @@ static size_t
>>>>> virtio_balloon_config_size(VirtIOBalloon *s)
>>>>> if (s->qemu_4_0_config_size) {
>>>>> return sizeof(struct virtio_balloon_config);
>>>>> }
>>>>> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>>>>> + if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
>>>>> + virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>> return sizeof(struct virtio_balloon_config);
>>>>> }
>>>>> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>> - return offsetof(struct virtio_balloon_config, poison_val);
>>>>> - }
>>>>
>>>> I am not sure this change is completely sane. Why is that necessary at all?
>>>
>>> The poison_val is stored at the end of the structure and is required
>>> in order to make free page hinting to work. What this change is doing
>>
>> Required to make it work? In the kernel,
>>
>> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
>> Author: Wei Wang <address@hidden>
>> Date: Mon Aug 27 09:32:19 2018 +0800
>>
>> virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>>
>> was merged after
>>
>> commit 86a559787e6f5cf662c081363f64a20cad654195
>> Author: Wei Wang <address@hidden>
>> Date: Mon Aug 27 09:32:17 2018 +0800
>>
>> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>>
>> So I assume it's perfectly fine to not have it.
>>
>> I'd say it's the responsibility of the guest to *not* use
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
>> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
>> into the foot.
>
> Based on the timing I am guessing the page poisoning was in the same
> patch set as the free page hinting since there is only 2 seconds
> between when the two are merged. If I recall the page poisoning logic
> was added after the issue was pointed out that it needed to address
> it.
>
Yeah, but any other OS implementing this, not caring about page poisoning
wouldn‘t need it. Maybe there is something in the spec.
Mental note: we‘ll have to migrate the poison value if not already done
(writing on my mobile).
- [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting, Alexander Duyck, 2020/04/09
- [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, Alexander Duyck, 2020/04/09
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, David Hildenbrand, 2020/04/15
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, Alexander Duyck, 2020/04/15
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, David Hildenbrand, 2020/04/15
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, Alexander Duyck, 2020/04/15
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature,
David Hildenbrand <=
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, Alexander Duyck, 2020/04/15
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, David Hildenbrand, 2020/04/16
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, David Hildenbrand, 2020/04/16
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, Michael S. Tsirkin, 2020/04/16
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, David Hildenbrand, 2020/04/16
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, Alexander Duyck, 2020/04/16
- Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature, David Hildenbrand, 2020/04/16
[PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting, Alexander Duyck, 2020/04/09
[PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting, Alexander Duyck, 2020/04/09