|
From: | Anthony Liguori |
Subject: | Re: [Qemu-devel] [RFC] Memory API |
Date: | Wed, 18 May 2011 12:04:13 -0500 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 |
On 05/18/2011 11:41 AM, Avi Kivity wrote:
On 05/18/2011 07:33 PM, Anthony Liguori wrote:On 05/18/2011 10:23 AM, Avi Kivity wrote:The tricky part is wiring this up efficiently for TCG, ie. in QEMU's softmmu. I played with passing the issuing CPUState (or NULL for devices) down the MMIO handler chain. Not totally beautiful as decentralized dispatching was still required, but at least only moderately invasive. Maybe your API allows for cleaning up the management and dispatching part, need to rethink...My suggestion is opposite - have a different MemoryRegion for each (e.g. CPUState::memory). Then the TLBs will resolve to a different ram_addr_t for the same physical address, for the local APIC range.I don't understand the different ram_addr_t part.The TLBs map a virtual address to a ram_addr_t.
It actually maps virtual address to host virtual addresses. Virtual addresses that map to I/O memory never get stored in the TLB.
You don't need separate I/O registration addresses in order to do per-CPU dispatch provided that you route the dispatch routines through the CPUs first.
Overlapping regions can be handled differently at each level. For instance, if a PCI device registers an IO region to the same location as the APIC, the APIC always wins because the PCI bus will never see the access.That's inefficient, since you always have to traverse the hierarchy.
Is efficiency really a problem here? Besides, I don't think that's really correct. You're adding at most 2-3 extra function pointer invocations. I don't think you can really call that inefficient.
You cannot do this properly with a single dispatch table because the behavior depends on where in the hierarchy the I/O is being handled.You can. When you have a TLB miss, you walk the memory hierarchy (which is per-cpu) and end up with a ram_addr_t which is stowed in the TLB entry.
I think we're overloading the term TLB. Are you referring to l1_phys_map as the TLB because I thought Jan was referring to the actual emulated TLB that TCG uses?
Further accesses dispatch via this ram_addr_t, without taking the cpu into consideration (the TLB is, after all, already per-cpu). Since each APIC will have its own ram_addr_t, we don't need per-cpu dispatch.
You need to have per-CPU l1_phys_maps which would result in quite a lot of additional memory overhead.
Regards, Anthony Liguori
[Prev in Thread] | Current Thread | [Next in Thread] |