Re: [Qemu-devel] [RFC] Memory API

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC] Memory API

From:	Gleb Natapov
Subject:	Re: [Qemu-devel] [RFC] Memory API
Date:	Fri, 20 May 2011 14:25:34 +0300
On Fri, May 20, 2011 at 09:40:13AM +0200, Jan Kiszka wrote:
> On 2011-05-20 09:23, Gleb Natapov wrote:
> > On Thu, May 19, 2011 at 08:55:49PM +0200, Jan Kiszka wrote:
> >>>>>> Because we should catch accidental overlaps in all those non PCI 
> >>>>>> devices
> >>>>>> with hard-wired addressing. That's a bug in the device/machine model 
> >>>>>> and
> >>>>>> should be reported as such by QEMU.
> >>>>> Why should we complicate API to catch unlikely errors? If you want to
> >>>>> debug that add capability to dump memory map from the monitor.
> >>>>
> >>>> Because we need to switch tons of code that so far saw a fairly
> >>>> different reaction of the core to overlapping regions.
> >>>>
> >>> How so? Today if there is accidental overlap device will not function 
> >>> properly.
> >>> With new API it will be the same.
> >>
> >> I rather expect subtle differences as overlapping registration changes
> >> existing regions, in the future those will recover.
> >>
> > Where do you expect the differences will come from? Conversion to the new
> > API shouldn't change the order of the registration and if the last
> > registration will override previous one the end result should be the
> > same as we have today.
> 
> A) Removing regions will change significantly. So far this is done by
> setting a region to IO_MEM_UNASSIGNED, keeping truncation. With the new
> API that will be a true removal which will additionally restore hidden
> regions.
> 
And what problem do you expect may arise from that? Currently accessing
such region after unassign will result in undefined behaviour, so this
code is non working today, you can't make it worse.

> B) Uncontrolled overlapping is a bug that should be caught by the core,
> and a new API is a perfect chance to do this.
> 
Well, this will indeed introduce the difference in behaviour :) The guest
that ran before will abort now. Are you actually aware of any such
overlaps in the current code base?

But if priorities are gona stay why not fail if two regions with the
same priority overlap? If that happens it means that the memory creation
didn't pass the point where conflict should have been resolved (by
assigning different priorities) and this means that overlap is
unintentional, no?

> > 
> >>>>>>>> new region management will not cause any harm to overlapping regions 
> >>>>>>>> so
> >>>>>>>> that they can "recover" when the overlap is gone.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Another example may be APIC region and PCI. They overlap, but 
> >>>>>>>>> neither
> >>>>>>>>> CPU nor PCI knows about it.
> >>>>>>>>
> >>>>>>>> And they do not need to. The APIC regions will be managed by the 
> >>>>>>>> per-CPU
> >>>>>>>> region management, reusing the tool box we need for all bridges. It 
> >>>>>>>> will
> >>>>>>>> register the APIC page with a priority higher than the default one, 
> >>>>>>>> thus
> >>>>>>>> overriding everything that comes from the host bridge. I think that
> >>>>>>>> reflects pretty well real machine behaviour.
> >>>>>>>>
> >>>>>>> What is "higher"? How does it know that priority is high enough?
> >>>>>>
> >>>>>> Because no one else manages priorities at a specific hierarchy level.
> >>>>>> There is only one.
> >>>>>>
> >>>>> PCI and CPU are on different hierarchy levels. PCI is under the PIIX and
> >>>>> CPU is on a system BUS.
> >>>>
> >>>> The priority for the APIC mapping will be applied at CPU level, of
> >>>> course. So it will override everything, not just PCI.
> >>>>
> >>> So you do not need explicit priority because the place in hierarchy
> >>> implicitly provides you with one.
> >>
> >> Yes.
> > OK :) So you agree that we can do without priorities :)
> 
> Nope, see below how your own example depends on them.
> 
It depends on them in very defined way. Only layer that knows exactly
what is going on defines priorities. The priorities do not leak on any
other level or global database. It is different from propagating priority
from PCI BAR to core memory API.

I am starting to see how you can represent all this local decisions as
priority numbers and then travel this weighted tree to find what memory
region should be accessed (memory registration _has_ to be hierarchical
for that to work in meaningful way). I still don't see why it is better
than flattening the tree in the point of conflict.
 
> > 
> >>       Alternatively, you could add a prio offset to all mappings when
> >> climbing one level up, provided that offset is smaller than the prio
> >> range locally available to each level.
> >>
> > Then a memory region final priority will depend on a tree height. If two
> > disjointed tree branches of different height will claim the same memory
> > region the higher one will have higher priority. I think this priority
> > management is a can of worms.
> 
> It is not as it remains a pure local thing and helps implementing the
> sketched scenarios. Believe, I tried to fix PAM/SMRAM already.
If it remains local thing then I misunderstand what do you mean by
"could add a prio offset to all mappings when climbing one level up".
Doesn't sound like local things to me any more.

What problem did you have with PAM except low number of KVM slots btw?

> 
> > 
> > Only the lowest level (aka system bus) will use memory API directly.
> 
> Not necessarily. It depends on how much added value buses like PCI or
> ISA or whatever can offer for managing I/O regions. For some purposes,
> it may as well be fine to just call the memory_* service directly and
> pass the result of some operation to the bus API later on.
Depend on what memory_* service you are talking about. Just creating
unattached memory region is OK. But if two independent pieces of code
want to map two different memory regions into the same phys address I do
not see who will resolve the conflict.

> 
> > PCI
> > device will call PCI subsystem. PCI subsystem, instead of assigning
> > arbitrary priorities to all overlappings,
> 
> Again: PCI will _not_ assign arbitrary priorities but only
> MEMORY_REGION_DEFAULT_PRIORITY, likely 0.
That is as arbitrary as it can get. Just assigning
MEMORY_REGION_DEFAULT_PRIORITY/2^0xfff will work equally well, so what
is not arbitrary about that number?

BTW why wouldn't PCI layer assign different priorities to overlapping
regions to let the core know which one should be actually available? Why
leave this decision to the core if it is clearly belongs to PCI?

> 
> > may just resolve them and pass
> > flattened view to the chipset. Chipset in turn will look for overlappings
> > between PCI memory areas and RAM/ISA/other memory areas that are outside
> > of PCI windows and resolve all those passing the flattened view to system
> > bus where APIC/PCI conflict will be resolved and finally memory API will
> > be used to create memory map. In such a model I do not see the need for
> > priorities. All overlappings are resolved in the most logical place,
> > the one that has the best knowledge about how to resolve the conflict.
> > The will be no code duplication. Overlapping resolution code will be in
> > separate library used by all layers.
> 
> That does not specify how the PCI bridge or the chipset will tell that
> overlapping resolution lib _how_ overlapping regions shall be translated
> into a flat representation. And precisely here come priorities into
> play. It is the way to tell that lib either "region A shall override
> region B" if A has higher prio or "if region A and B overlap, do
> whatever you want" if both have the same prio.
> 
Yep! And the question is why shouldn't this be done on the level that
knows most about the conflict but propagated to the core. I am not
arguing that priorities do not exists! Obviously they are. I am
questioning the usefulness of priorities be part of memory core API.

--
                        Gleb.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [RFC] Memory API, (continued)
Prev by Date: Re: [Qemu-devel] Regression "Warning: more nics requested than this machine supports"
Next by Date: Re: [Qemu-devel] Regression "Warning: more nics requested than this machine supports"
Previous by thread: Re: [Qemu-devel] [RFC] Memory API
Next by thread: Re: [Qemu-devel] [RFC] Memory API
Index(es):
- Date
- Thread