[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mut

From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
Date: Fri, 22 Jun 2012 17:56:30 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1

On 06/22/2012 05:27 PM, Jan Kiszka wrote:
On 2012-06-22 23:44, Anthony Liguori wrote:
1) unlock iothread before entering the do {} look in kvm_cpu_exec()
    a) reacquire the lock after the loop
    b) reacquire the lock in kvm_handle_io()
    c) introduce an unlocked memory accessor that for now, just requires
the iothread lock() and calls cpu_physical_memory_rw()

Right, that's what we have here as well. The latter is modeled as a so
called "I/O pathway", a thread-based execution context for
frontend/backend pairs with some logic to transfer certain I/O requests
asynchronously to the pathway thread.

Interesting, so the VCPU threads always hold the iothread mutex but some requests are routed to other threads?

I hadn't considered a design like that. I've been thinking about a long term architecture that's a bit more invasive.

What we think of the I/O thread today wouldn't be special. It would be one of N I/O threads all running separate copies of the main loop. All of the functions that defer dispatch to a main loop would take a context as an argument and devices would essentially have a "vector" array of main loops as input.

So virtio-net probably would have two main loop "vectors" since it would like to schedule tx and rx independently. There's nothing that says that you can't pass the same main loop context for each vector but that's a configuration choice.

Dispatch from VCPU context would behave the same it does today but obviously per-device locking is needed.

The tricky part was to get nested requests right, i.e. when a requests
triggers another one from within the device model. This is where things
get ugly. In theory, you can end up with a vm deadlock if you just apply
per-device locking. I'm currently trying to rebase our patches, review
and document the logic behind it.

I really think the only way to solve this is to separate map()'d DMA access (where the device really wants to deal with RAM only) and copy-based access (where devices map DMA to other devices).

For copy-based access, we really ought to move to a callback based API. It adds quite a bit of complexity but it's really the only way to solve the problem robustly.

2) focus initially on killing the lock in kvm_handle_io()
    a) the ioport table is pretty simplistic so adding fine grain locking
won't be hard.
    b) reacquire lock right before ioport dispatch

3) allow for register ioport handlers w/o the dispatch function carrying
a iothread
    a) this is mostly memory API plumbing

We skipped this as our NICs didn't do PIO, but you clearly need it for


4) focus on going back and adding fine grain locking to the
cpu_physical_memory_rw() accessor

In the end, PIO and MMIO should use the same patterns - and will face
the same challenges. Ideally, we model things very similar right from
the start.


And then there is also

5) provide direct IRQ delivery from the device model to the IRQ chip.
That's much like what we need for VFIO and KVM device assignment. But
here we won't be able to cheat and ignore correct generation of vmstates
of the bypassed PCI host bridges etc... Which leads me to that other
thread about how to handle this for PCI device pass-through.
Contributions to that discussion are welcome as well.

I think you mean to the in-kernel IRQ chip. I'm thinking about this still so I don't have a plan yet that I'm ready to share. I have some ideas though.

Note that whenever possible, we should be using rwlocks instead of a
normal mutex.  In particular, for the ioport data structures, a rwlock
seems pretty obvious.

I think we should mostly be fine with a "big hammer" rwlock: unlocked
read access from VCPUs and iothreads, and vmstop/resume around
modifications of fast path data structures (like the memory region
hierarchy or the PIO table).


Where that's not sufficient, RCU will be
needed. Sleeping rwlocks have horrible semantics (specifically when
thread priorities come into play) and are performance-wise inferior. We
should avoid them completely.

Yes, I think RCU is inevitable here but I think starting with rwlocks will help with the big refactoring.

To be clear, I'm not advocating introducing cpu_lock.  We should do
whatever makes the most sense to not have to hold iothread lock while
processing an exit from KVM.

Good that we agree. :)

Note that this is an RFC, the purpose of this series is to have this
discussion :-)

Yep, I think we have it now ;). Hope I can contribute some code bits to
it soon, though I didn't schedule this task for the next week.

Great! If you have something you can share, I'd be eager to look at it regardless of the condition of the code.


Anthony Liguori


reply via email to

[Prev in Thread] Current Thread [Next in Thread]