|
From: | Anthony Liguori |
Subject: | Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread |
Date: | Fri, 22 Jun 2012 15:11:08 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 |
On 06/22/2012 05:37 AM, Jan Kiszka wrote:
On 2012-06-22 12:24, liu ping fan wrote:On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka<address@hidden> wrote:On 2012-06-21 16:49, Liu Ping Fan wrote:Nowadays, we use qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to protect the race to access the emulated dev launched by vcpu threads& iothread. But this lock is too big. We can break it down. These patches separate the CPUArchState's protection from the other devices, so we can have a per-cpu lock for each CPUArchState, not the big lock any longer.Anything that reduces lock dependencies is generally welcome. But can you specify in more details what you gain, and under which conditions?In fact, there are several steps to break down the Qemu big lock. This step just aims to shrink the code area protected by qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am working on the following steps, which focus on breaking down the big lock when calling handle_{io,mmio}Then let us discuss the strategy. This is important as it is unrealistic to break up the lock for all code paths. We really need to focus on goals that provide benefits for relevant use cases.
Stefan put together a proof of concept that implemented the data-plane portion of virtio-blk in a separate thread. This is possible because of I/O eventfd (we were able to select() on that fd in a separate thread).
The performance difference between virtio-blk-pci and virtio-blk-data-plane is staggering when dealing with a very large storage system.
So we'd like to get the infrastructure in place where we can start multithreading devices in QEMU to we can integrate this work.
The basic plan is introduce granular locking starting at the KVM dispatch level until we can get to MemoryRegion dispatch. We'll then have some way to indicate that a MemoryRegion's callbacks should be invoked without holding the qemu global mutex.
We can then convert devices one at a time.While the threading in the KVM code is certainly complex, it's also relatively isolated from the rest of QEMU. So we don't have to worry about auditing large subsystems for re-entrancy safety.
Once we have unlocked MemoryRegions, we can start writing some synthetic test cases to really stress the locking code too.
Regards, Anthony Liguori
Jan
[Prev in Thread] | Current Thread | [Next in Thread] |