qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host pag


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes
Date: Thu, 14 Apr 2016 13:45:32 +1000

On Wed, 13 Apr 2016 21:14:44 +0300
"Michael S. Tsirkin" <address@hidden> wrote:

> On Wed, Apr 13, 2016 at 08:11:36PM +0200, Thomas Huth wrote:
> > On 13.04.2016 19:55, Michael S. Tsirkin wrote:  
> > > On Wed, Apr 13, 2016 at 07:38:12PM +0200, Thomas Huth wrote:  
> > >> On 13.04.2016 19:07, Michael S. Tsirkin wrote:  
> > >>> On Wed, Apr 13, 2016 at 04:51:49PM +0200, Thomas Huth wrote:  
> > >>>> On 13.04.2016 15:15, Michael S. Tsirkin wrote:  
> > >>>>> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote:  
> > >> ...  
> > >>>>>> Then, there's yet another problem: If the host page size is bigger
> > >>>>>> than the 4k balloon page size, we can not simply call madvise() on
> > >>>>>> each of the 4k balloon addresses that we get from the guest - since
> > >>>>>> the madvise() always evicts the whole host page, not only a 4k area!
> > >>>>>>
> > >>>>>> So in this case, we've got to track the 4k fragments of a host page
> > >>>>>> and only call madvise(DONTNEED) when all fragments have been 
> > >>>>>> collected.
> > >>>>>> This of course only works fine if the guest sends consecutive 4k
> > >>>>>> fragments - which is the case in the most important scenarios that
> > >>>>>> I try to address here (like a ppc64 guest with 64k page size that
> > >>>>>> is running on a ppc64 host with 64k page size). In case the guest
> > >>>>>> uses a page size that is smaller than the host page size, we might
> > >>>>>> need to add some more additional logic here later to increase the
> > >>>>>> probability of being able to release memory, but at least the guest
> > >>>>>> should now not crash anymore due to unintentionally evicted pages.  
> > >>>> ...  
> > >>>>>>  static void virtio_balloon_instance_init(Object *obj)
> > >>>>>> diff --git a/include/hw/virtio/virtio-balloon.h 
> > >>>>>> b/include/hw/virtio/virtio-balloon.h
> > >>>>>> index 35f62ac..04b7c0c 100644
> > >>>>>> --- a/include/hw/virtio/virtio-balloon.h
> > >>>>>> +++ b/include/hw/virtio/virtio-balloon.h
> > >>>>>> @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon {
> > >>>>>>      int64_t stats_last_update;
> > >>>>>>      int64_t stats_poll_interval;
> > >>>>>>      uint32_t host_features;
> > >>>>>> +    void *current_addr;
> > >>>>>> +    unsigned long *fragment_bits;
> > >>>>>> +    int fragment_bits_size;
> > >>>>>>  } VirtIOBalloon;
> > >>>>>>  
> > >>>>>>  #endif  
> > >>>>>
> > >>>>> It looks like fragment_bits would have to be migrated.
> > >>>>> Which is a lot of complexity.  
> > >> ...  
> > >>>>> How about we just skip madvise if host page size is > balloon
> > >>>>> page size, for 2.6?  
> > >>>>
> > >>>> That would mean a regression compared to what we have today. Currently,
> > >>>> the ballooning is working OK for 64k guests on a 64k ppc host - rather
> > >>>> by chance than on purpose, but it's working. The guest is always 
> > >>>> sending
> > >>>> all the 4k fragments of a 64k page, and QEMU is trying to call 
> > >>>> madvise()
> > >>>> for every one of them, but the kernel is ignoring madvise() on
> > >>>> non-64k-aligned addresses, so we end up with a situation where the
> > >>>> madvise() frees a whole 64k page which is also declared as free by the
> > >>>> guest.
> > >>>>
> > >>>> I think we should either take this patch as it is right now (without
> > >>>> adding extra code for migration) and later update it to the bitmap code
> > >>>> by Jitendra Kolhe, or omit it completely (leaving 4k guests broken) and
> > >>>> fix it properly after the bitmap code has been applied. But disabling
> > >>>> the balloon code for 64k guests on 64k hosts completely does not sound
> > >>>> very appealing to me. What do you think?  
> > >>>
> > >>> True. As simple a hack - how about disabling madvise when host page 
> > >>> size >
> > >>> target page size?  
> > >>
> > >> That could work - but is there a generic way in QEMU to get the current
> > >> page size from a guest (since this might differ from TARGET_PAGE_SIZE)?
> > >> Or would that mean to pollute the virtio-balloon code with ugly #ifdefs? 
> > >>  
> > > 
> > > let's just use TARGET_PAGE_SIZE, that's the best I can think of.  
> > 
> > That won't work - at least not on ppc: TARGET_PAGE_SIZE is always
> > defined to 4096 here. The Linux kernel then switches the real page size
> > during runtime to 65536. So we'd need a way to detect this automatically...
> > 
> >  Thomas  
> 
> I see. I don't know how to do this.  If we can't find a quick fix, leave
> this part around for now then?  Just fix the page size.

I'm not sure what you're suggesting by "fix the page size".
TARGET_PAGE_SIZE remains 4kiB because that's the default hardware page
size.  However, modern Linux guests always use 64kiB pages in
practice.  This isn't a global setting, it just means it uses 64kiB
pages for every mapping it establishes, so there's really no sane way
for qemu to determine the "guest page size" because that's not really a
well-defined concept.

Even having some flag that's cleared if the guest ever makes a < 64kiB
mapping won't work, because I think there are a few edge cases where a
"64kiB" guest which uses 64kiB pages for all RAM mappings will use 4kiB
pages for certain IO mappings.

-- 
David Gibson <address@hidden>
Senior Software Engineer, Virtualization, Red Hat

Attachment: pgpO9g0o79O9R.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]