qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mut


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
Date: Fri, 22 Jun 2012 15:11:08 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1

On 06/22/2012 05:37 AM, Jan Kiszka wrote:
On 2012-06-22 12:24, liu ping fan wrote:
On Thu, Jun 21, 2012 at 11:23 PM, Jan Kiszka<address@hidden>  wrote:
On 2012-06-21 16:49, Liu Ping Fan wrote:
Nowadays, we use qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread() to
protect the race to access the emulated dev launched by vcpu threads&  iothread.

But this lock is too big. We can break it down.
These patches separate the CPUArchState's protection from the other devices, so 
we
can have a per-cpu lock for each CPUArchState, not the big lock any longer.

Anything that reduces lock dependencies is generally welcome. But can
you specify in more details what you gain, and under which conditions?

In fact, there are several steps to break down the Qemu big lock. This
step just aims to shrink the code area protected by
qemu_mutex_lock_iothread()/qemu_mutex_unlock_iothread(). And I am
working on the following steps, which focus on breaking down the big
lock when calling handle_{io,mmio}

Then let us discuss the strategy. This is important as it is unrealistic
to break up the lock for all code paths. We really need to focus on
goals that provide benefits for relevant use cases.

Stefan put together a proof of concept that implemented the data-plane portion of virtio-blk in a separate thread. This is possible because of I/O eventfd (we were able to select() on that fd in a separate thread).

The performance difference between virtio-blk-pci and virtio-blk-data-plane is staggering when dealing with a very large storage system.

So we'd like to get the infrastructure in place where we can start multithreading devices in QEMU to we can integrate this work.

The basic plan is introduce granular locking starting at the KVM dispatch level until we can get to MemoryRegion dispatch. We'll then have some way to indicate that a MemoryRegion's callbacks should be invoked without holding the qemu global mutex.

We can then convert devices one at a time.

While the threading in the KVM code is certainly complex, it's also relatively isolated from the rest of QEMU. So we don't have to worry about auditing large subsystems for re-entrancy safety.

Once we have unlocked MemoryRegions, we can start writing some synthetic test cases to really stress the locking code too.

Regards,

Anthony Liguori


Jan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]