[Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to users

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to users

From:	Anthony Liguori
Subject:	[Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace
Date:	Mon, 07 Jun 2010 17:23:57 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100423 Lightning/1.0b1 Thunderbird/3.0.4

On 06/07/2010 01:42 PM, Avi Kivity wrote:

On 06/07/2010 08:04 PM, Anthony Liguori wrote:
I think we could also move the local APIC.
I'm not even sure we can safely move the ioapic/pic (mostly due tochurn). But the local APIC is so heavily accessed by the guest thatit's impossible to move it. Run an ftrace one day, especially on ansmp guest. Every IPI requires several APIC accesses. Before a halt atickless kernel sets the wakeup timer. EOIs.
To optimize device models, we've tended to put the full device modelin the kernel whereas the hardware vendors have tended to put onlythe fast paths of the devices models in hardware.
For instance, we could introduce a userspace interface similar tovapic support whereas a shared page that mapped the APIC's layout wasused with a mask to select which registers trapped on read/write.
That leads to very problematic interfaces. When you separate along adevice boundary, you have a spec that defines the softwareinterfaces. When you separate along a boundary that you define, it'sup to you to get everything right.
In fact with the ioapic/pic/lapic one of the problems is that theinterconnection between the devices that is not well defined, andthat's where we have bugs.
That said, I can understand an argument that the local APIC is partof the CPU state since it's a very special type of device.
A better example would be a generic counter kernel mechanism. I canenvision such a device as doing nothing more than providing aread-only view of a counter with a userspace configurable divider andwidth. Any write to the counter or read of any other byte outsidethe counter register would result in a trap to userspace.
What about latches? byte access to word registers? There will be asmany special cases as there are timers.
If the kernel supported a bytecode/jit facility I'd happily use thatto download portions of the device model into the kernel.
That should allow both the PIT and the HPET to be accelerated withminimal effort in the kernel.
IMO it's probably more effort than porting HPET to the kernel. Tryoutlining an interface that supports PIT, HPET, RTC, and ACPI PMTIMER.


I was referring specifically to time sources, not time events.

An accelerated counter for HPET is pretty trivial. It's a 32-bitregister that's actually a nanosecond value in qemu. We need to be ableto set an offset from the host wall clock time, a means to stop it, anda means to start it.

The PIT is latched so the kernel needs to know enough about how todecode the PIT state to understand the latching. There's very littlestate associated with latching though so I don't think this is a hugeproblem. It's a fixed value write to a fixed register followed by aread to a fixed register. The act of latching doesn't effect the statebeyond the fact that you need to save the latched value in the eventthat you have a live migration before reading the latched value.

The PMTIMER is also pretty straight forward. It's a variable portaddress (that's fixed during execution).

Even if we require three separate interfaces, the interfaces are sosimply that it seems like an obvious win.

I'd be in favor of a straight port to userspace. We already have theinterfaces to communicate with an external device model for thesedevices so let's just take the kernel code and stick it intodedicated threads in userspace.
Currently we support an all-or-nothing approach. I don't think localAPIC in userspace is worthwhile. Esp. as it will slow down vhost andassigned devices significantly - interrupts will have to be mediatedby userspace.

Yeah, as I said, I can understand the arguments for keeping the lapic inthe kernel.

I think it's easier to then work to merge the two bits of code in thesame tree than it is to try and take out-of-tree code and merge itincrementally.
Are you talking about qemu.git/qemu-kvm.git? That's the least of myconcerns, I'm worried about kvm.git.


qemu.git.

5. Risk
We may find out after all this is implemented that performance isnot acceptable and all the work will have to be dropped.
That's another advantage to a straight port to userspace. We cancollect performance data with only a modest amount of engineeringeffort.
Port what exactly? We have a userspace irqchip implementation. Whatwe don't have is just the ioapic/pic/pit in userspace, and the onlyway to try it out is to implement the whole thing.

If you take the kernel code and do a pretty straight port: switchingkernel functions to libc functions and maintaining all the existinglocking via pthreads, you could then implement a very simple MMIO/PIOdispatch mechanism in the kvm code that shortcutted those devices beforewe ever hit the qemu_mutex and the traditional qemu code paths. Itshould be a relatively easy conversion and it gives a proper vehicle fordoing experimentations.

In fact, you could pretty quickly determine viability by porting the PITto userspace and implementing a vpit interface in the kernel thatallowed the channel 0 counters to be latched and read within lightweightexits.


Regards,

Anthony Liguori

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Avi Kivity, 2010/06/07
- [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, David S. Ahern, 2010/06/07
  - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Avi Kivity, 2010/06/07
    - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, David S. Ahern, 2010/06/07
    - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Avi Kivity, 2010/06/07
- [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Anthony Liguori, 2010/06/07
  - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Avi Kivity, 2010/06/07
    - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Anthony Liguori <=
    - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Avi Kivity, 2010/06/08
- [Qemu-devel] RE: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Dong, Eddie, 2010/06/09
  - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Avi Kivity, 2010/06/09
    - [Qemu-devel] RE: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Dong, Eddie, 2010/06/09
    - [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Avi Kivity, 2010/06/09
    - [Qemu-devel] RE: [RFC] Moving the kvm ioapic, pic, and pit back to userspace, Dong, Eddie, 2010/06/10

Prev by Date: Re: [Qemu-devel] [PATCH] configure: add an option to disable vlans
Next by Date: [Qemu-devel] KVM call agenda for June 8
Previous by thread: [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace
Next by thread: [Qemu-devel] Re: [RFC] Moving the kvm ioapic, pic, and pit back to userspace
Index(es):
- Date
- Thread