Re: HPT allocation failures on POWER8 KVM hosts

From: Roman Bolshakov
Subject: Re: HPT allocation failures on POWER8 KVM hosts
Date: Mon, 18 Nov 2019 14:42:42 +0300

On Mon, Nov 18, 2019 at 01:02:00PM +1100, Daniel Axtens wrote:
> Hi Roman,
> > We're running a lot of KVM virtual machines on POWER8 hosts and
> > sometimes new VMs can't be started because there are no contiguous
> > regions for HPT because of CMA region fragmentation.
> >
> > The issue is covered in the LWN article: https://lwn.net/Articles/684611/
> > The article points that you raised the problem on LSFMM 2016. However I
> > couldn't find a follow up article on the issue.
> >
> > Looking at the kernel commit log I've identified a few commits that
> > might reduce CMA fragmentaiton and overcome HPT allocation failure:
> >   - bd2e75633c801 ("dma-contiguous: use fallback alloc_pages for single 
> > pages")
> >   - 678e174c4c16a ("powerpc/mm/iommu: allow migration of cma allocated
> >     pages during mm_iommu_do_alloc")
> >   - 9a4e9f3b2d739 ("mm: update get_user_pages_longterm to migrate pages 
> > allocated from
> >     CMA region")
> >   - d7fefcc8de914 ("mm/cma: add PF flag to force non cma alloc")
> >
> > Are there any other commits that address the issue? What is the first
> > kernel version that shouldn't have the HPT allocation problem due to CMA
> > fragmentation?
> I've had some success increasing the CMA allocation with the
> kvm_cma_resv_ratio boot parameter - see
> arch/powerpc/kvm/book3s_hv_builtin.c
> The default is 5%. In a support case in a former job we had a customer
> who increased this to I think 7 or 8% and saw the symptoms subside
> dramatically.

Hi Daniel,

Thank you, I'll try to increase kvm_cma_resv_ratio for now, but even 5%
CMA reserve should be more than enough, given the size of HPT as 1/128th
of VM max memory.

For a 16GB RAM VM without balloon device, only 128MB is going to be
reserved for HPT using CMA. So, 5% CMA reserve should allow to provision
VMs with over 1.5TB of RAM on 256GB RAM host. In other words the default
CMA reserve allows to overprovision 6 times more memory for VMs than
presented on a host.

We rarely add balloon device and sometimes don't add it at all. Therefore
I'm still looking for commits that would help to avoid the issue with
the default CMA reserve.

Thank you,

