qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too.


From: Andrea Arcangeli
Subject: Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too.
Date: Thu, 17 Oct 2013 18:15:33 +0200

On Thu, Oct 17, 2013 at 04:30:27PM +0200, Gerd Hoffmann wrote:
> On Do, 2013-10-17 at 15:00 +0200, Andrea Arcangeli wrote:
> > Hi,
> > 
> > On Thu, Oct 17, 2013 at 01:09:38PM +0200, Gerd Hoffmann wrote:
> > > We have a fw_cfg entry to pass e820 entries from qemu to the firmware.
> > > Today it's used to pass reservations only.  This patch makes qemu pass
> > > entries for RAM too.
> > > 
> > > This allows to pass RAM sizes larger than 1TB to the firmware and it
> > > will also allow to pass non-contignous memory ramges should we decide
> > > to implement that some day, say for our virtual numa nodes.
> > > 
> > > Obviously this needs some extra care to not break existing firware.
> > > 
> > > SeaBIOS loads the entries and happily adds them without looking at the
> > > type.  Which is problematic for memory below 4g as this will overwrite
> > > reservations added for bios memory etc.  For memory above 4g it works
> > > just fine, seabios will merge the entry derived from cmos with the one
> > > loaded from fw_cfg.
> > 
> > The reason for not fixing the cmos and defer the fixage of the >1TB
> > boot, is to develop a better approach, and this mixture of e820 and
> > cmos doesn't look like an improvement. The only thing it avoids is to
> > touch seabios but it provides no benefit whatsoever if compared to
> > fixing the cmos which looks cleaner to me than having to compute a mix
> > of cmos and e820 in seabios (and potentially having other bioses
> > following this mix-incomplete-API).
> 
> e820 allows to pass non-contignous ram ranges to seabios (not that qemu
> supports that today, but when implemented the qemu/seabios interface
> will deal with it just fine).  How you'll do that with the cmos?

You're changing the qemu-bios paravirt protocol, and to boot with >1TB
seabios is now requires to mix information from two APIs (rtc and e820
fw_cfg command).

> IMO e820 is better than CMOS.

Agreed.

> 
> > The premise that "this will also allow to pass non-contiguous memory"
> > is partly false, as you can't use the e820 API below 4g so there's no
> > way to create non contiguous memory with this mix-cmos-e820-API.
> 
> Sure you can.  Why do you think you can't?

How do you specify an hole below 4g unless you modify seabios first?

> That is the goal.  seabios will be fixed to deal with this correctly.
> I don't want break old seabios versions though (especially not before we
> have a seabios release which can handle it), so I'll wait with flipping
> the switch for that.

Why to ship qemu with an intermediate paravirt protocol?

And if you don't want to break old seabios I guess you should use a
new fw_cfg command.

Just to show you how flakey this intermediate paravirt interface is,
assume I boot with -m 1029g. So "high" is 1g in seabios. So
RamSizeOver4G is 1g.

    RamSizeOver4G = high;
    add_e820(0x100000000ull, high, E820_RAM);

so far so good for e820 maps, that gets overwritten later. But that's
not the end of it.

Then seabios does:

        r64_mem.base = ALIGN(0x100000000LL + RamSizeOver4G, align_mem);
        r64_pref.base = ALIGN(r64_mem.base + sum_mem, align_pref);

So seabios will map pci space at 5g where there is ram, instead of at
1024g.

And in smbios (what is smbios anyway? :)

    add_struct(19, p, 0, RamSize >> 20, 0);
    if (RamSizeOver4G)
        add_struct(19, p, 4096, RamSizeOver4G >> 20, 1);


I doubt you intended the above range to be 4g-5g on a with 1029g of
ram.

Not to tell:

    int ram_mb = (RamSize + RamSizeOver4G) >> 20;


ram_mb is actually 5G when it should be 1029g.

In short your change is already breaking current seabios.

But even if it would work, my fundamental problem is the fact this is
a flakey mixture of APIs to create a new intermediate paravirt
interface that some other bios could have the idea to support and if
they do, they risk a breakage again when qemu speaks the final
paravirt protocol that allows real ram holes to be created below 4g.

If we don't want to fix the rtc interface to fix the 1TB, to get a
better paravirt protocol implemented instead, well then the only way
is to first modify seabios to pick the ramover4g info from the highest
address of the e820 table, and to avoid the e820 reservations to be
overwritten by ram ranges below 4g. And then use a different fw_cfg
command value, if you intend to be backwards compatible with old
seabios that wouldn't cope with qemu initially passing (0, ram_size)
as e820 range for the RAM.

When seabios speaks the new paravirt interface, only then modify qemu
to use the new paravirt interface.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]