qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment f


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment for pc-dimm
Date: Mon, 26 Oct 2015 14:24:32 +0100

On Mon, 26 Oct 2015 12:28:21 +0200
"Michael S. Tsirkin" <address@hidden> wrote:

> On Mon, Oct 26, 2015 at 10:46:55AM +0100, Igor Mammedov wrote:
> > commit aa8580cd "pc: memhp: force gaps between DIMM's GPA"
> > regressed memory hot-unplug for linux guests triggering
> > following BUGON
> >  =====
> >  kernel BUG at mm/memory_hotplug.c:703!
> >  ...
> >  [<ffffffff81385fa7>] acpi_memory_device_remove+0x79/0xa5
> >  [<ffffffff81357818>] acpi_bus_trim+0x5a/0x8d
> >  [<ffffffff81359026>] acpi_device_hotplug+0x1b7/0x418
> >  ===
> >     BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
> >  ===
> > 
> > reson for it is that x86-64 linux guest supports memory
> > hotplug in chunks of 128Mb and memory section also should
> > be 128Mb aligned.
> > However gaps forced between 128Mb DIMMs with backend's
> > natural alignment of 2Mb make the 2nd and following
> > DIMMs not being aligned on 128Mb boundary as it was
> > originally. To fix regression enforce minimal 128Mb
> > alignment like it was done for PPC.
> > 
> > Signed-off-by: Igor Mammedov <address@hidden>
> 
> So our temporary work around is creating more trouble.  I'm inclined to just
> revert aa8580cd and df0acded19 with it.
> 
> > ---
> > PS:
> >   PAGE_SECTION_MASK is derived from SECTION_SIZE_BITS which
> >   is arch dependent so this is fix for x86-64 target only.
> >   If anyone cares obout 32bit guests, it should also be fine
> >   for x86-32 which has 64Mb memory sections/alignment.
> 
> Like 32 bit guests are unheard of?  This does not inspire confidence at all.
> 
> 
> So I dug in linux guest code:
> 
> #ifdef CONFIG_X86_32
> # ifdef CONFIG_X86_PAE
> #  define SECTION_SIZE_BITS     29
> #  define MAX_PHYSADDR_BITS     36
> #  define MAX_PHYSMEM_BITS      36
> # else
> #  define SECTION_SIZE_BITS     26
> #  define MAX_PHYSADDR_BITS     32
> #  define MAX_PHYSMEM_BITS      32
> # endif
> #else /* CONFIG_X86_32 */
> # define SECTION_SIZE_BITS      27 /* matt - 128 is convenient right now */
> # define MAX_PHYSADDR_BITS      44
> # define MAX_PHYSMEM_BITS       46
> #endif
> 
> Looks like PAE needs more alignment.
> And it looks like 128 is arbitrary here.
I'll amend it to 512Mb.


> 
> So we are tying ourselves to specific guest quirks.
> All this just looks wrong to me.
On real hardware I've seen, slots were aligned to 1Gb
which is also arbitrary but happens to work with exiting
linux/windows OSes.
There is no point to make broken hardware or in this
case to make with QEMU intentionally broken (alignment)
for linux guests when we can accommodate it.

> 
> 
> > ---
> >  hw/i386/pc.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 3d958ba..0f7cf7c 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
> > *parent_name)
> >      }
> >  }
> >  
> > +#define PC_MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */
> > +
> 
> This kind of comment doesn't really help.
Sorry forgot to add comment here, how about:

/* minimal DIMM alignment is derived from
 * arch/x86/include/asm/sparsemem.h:SECTION_SIZE_BITS
 * from PAE variant as the largest required alignment
 * among of PAE/x32/x64 arch-s.
 */
> 
> >  static void pc_dimm_plug(HotplugHandler *hotplug_dev,
> >                           DeviceState *dev, Error **errp)
> >  {
> > @@ -1624,6 +1626,16 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
> >  
> >      if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) {
> >          align = memory_region_get_alignment(mr);
> > +        /*
> > +         * Linux x64 guests expect 128Mb aligned DIMM,
> 
> this implies no other guest cares. which isn't true.
> 
> > +         * but this change
> 
> which change?
> 
> > causes memory layout change
> 
> change compared to what?
> 
> > so
> > +         * for compatibility
> 
> compatibility with what?
> 
> > apply 128Mb alignment only
> > +         * when forced gaps are enabled since it is the cause
> > +         * of misalignment.
> 
> Which makes no sense, sorry.
> 
> Can it be misaligned for some other reason?
> 
> If not, why limit to this case?
2.4 machine will have DIMM's plugged in with backend's alignment
which is 2Mb for memory-backend-ram, so forcing minimal 128Mb alignment
will change DIMM's GPA and break migration from old 2.4 machine to new
QEMU 2.4 machine type.

Since gaps introduced regression it makes sense to condition
alignment adjustment on inter_dimm_gap property so it would not
affect machine types prior 2.5.

How about following comment:
/* Linux guests expect 512/64/128Mb alignment for PAE/x32/x64 arch
 * respectively. Windows works fine with 2Mb. To make sure that
 * memory hotplug would work with above flavors of Linux set
 * minimal alignment to 512Mb (i.e. PAE arch).
 * Enforcing minimal alignment bigger than that of a backend
 * (2Mb for memory-backend-ram) changes GPAs of the 2nd and
 * following DIMMs which breaks migration for older machine types,
 * so force minimal alignment only since 2.5 machine type and do not
 * apply it to older machine types (prior 2.5)
 */

> 
> > +         */
> > +        if (pcmc->inter_dimm_gap && align < PC_MIN_DIMM_ALIGNMENT) {
> > +            align = PC_MIN_DIMM_ALIGNMENT;
> > +        }
> >      }
> >  
> >      if (!pcms->acpi_dev) {
> 
> All this sounds pretty fragile. How about we revert the inter dimm gap
> thing for 2.4? It's just a work around, this is piling work arounds on
> top of work arounds.
Yep it's workaround but it works around QEMU's broken virtio
implementation in a simple way without need for guest side changes.

Without foreseeable virtio fix it makes memory hotplug unusable and even
more so if there were a virtio fix it won't fix old guests since you've
said that virtio fix would require changes of both QEMU and guest sides.

> 
> > -- 
> > 1.8.3.1
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]