qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH -V3 09/32] virtio-9p: Implement P9_TWRITE/ Threa


From: Anthony Liguori
Subject: Re: [Qemu-devel] [PATCH -V3 09/32] virtio-9p: Implement P9_TWRITE/ Thread model in QEMU
Date: Tue, 30 Mar 2010 08:13:57 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091209 Fedora/3.0-4.fc12 Lightning/1.0pre Thunderbird/3.0

On 03/30/2010 05:24 AM, Avi Kivity wrote:
On 03/30/2010 12:23 AM, Anthony Liguori wrote:
It's not sufficient. If you have a single thread that runs both live migrations and timers, then timers will be backlogged behind live migration, or you'll have to yield often. This is regardless of the locking model (and of course having threads without fixing the locking is insufficient as well, live migration accesses guest memory so it needs the big qemu lock).


But what's the solution? Sending every timer in a separate thread? We'll hit the same problem if we implement an arbitrary limit to number of threads.

A completion that's expected to take a couple of microseconds at most can live in the iothread. A completion that's expected to take a couple of milliseconds wants its own thread. We'll have to think about anything in between.

vnc and migration can perform large amounts of work in a single completion; they're limited only by the socket send rate and our internal rate-limiting which are both outside our control. Most device timers are O(1). virtio completions probably fall into the annoying "have to think about it" department.

I think it may make more sense to have vcpu completions vs. io thread completions and make vcpu completions target short lived operations.

What I'm skeptical of, is whether converting virtio-9p or qcow2 to handle each request in a separate thread is really going to improve things.

Currently qcow2 isn't even fullly asynchronous, so it can't fail to improve things.

Unless it introduces more data corruptions which is my concern with any significant change to qcow2.

It's possible to move qcow2 to a thread without any significant change to it (simply run the current code in its own thread, protected by a mutex). Further changes would be very incremental.

But that offers no advantage to what we have which fails the proof-by-example that threading makes the situation better. To convert qcow2 to be threaded, I think you would have to wrap the whole thing in a lock, then convert the current asynchronous functions to synchronous (that's the whole point, right). At this point, you've regressed performance because you can only handle one read/write outstanding at a given time. So now you have to make the locking more granular but because we do layered block devices, you've got to make most of the core block driver functions thread safe.

Once you get basic data operations concurrent, which I expect won't be so bad, to get an improvement over the current code, you have to allow simultaneous access to metadata which is where I think the vast majority of the complexity will come from.

You could argue that we stick qcow2 into a thread and stop there and that fixes the problems with synchronous data access. If that's the argument, then let's not even bother doing at the qcow layer, let's just switch the block aio emulation to use a dedicated thread.

Sticking the VNC server in it's own thread would be fine. Trying to make the VNC server multithreaded though would be problematic.

Why would it be problematic? Each client gets its own threads, they don't interact at all do they?

Dealing with locking of the core display which each client uses for rendering. Things like CopyRect will get ugly quickly.Ultimately, this comes down to a question of lock granularity and thread granularity. I don't think it's a good idea to start with the assumption that we want extremely fine granularity. There's certainly very low hanging fruit with respect to threading.

Sure. Currently the hotspots are block devices (except raw) and hpet (seen with large Windows guests). The latter includes the bus lookup and hpet itself, hpet reads can be performed locklessly if we're clever.

I'm all for making devices thread safe and the hpet is probably a good candidate for initial converting.


I meant, exposing qemu core to the threads instead of pretending they aren't there. I'm not familiar with 9p so don't hold much of an opinion, but didn't you say you need threads in order to handle async syscalls? That may not be the deep threading we're discussing here.

btw, IIUC currently disk hotunplug will stall a guest, no? We need async aio_flush().

But aio_flush() never takes a very long time, right :-)

We had this discussion in the past re: live migration because we do an aio_flush() in the critical stage.

Regards,

Anthony Liguori





reply via email to

[Prev in Thread] Current Thread [Next in Thread]