[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Docs for and debugging of Asynchronous I/O

From: Anthony Liguori
Subject: Re: [Qemu-devel] Docs for and debugging of Asynchronous I/O
Date: Tue, 20 Jul 2010 16:47:42 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100528 Lightning/1.0b1 Thunderbird/3.0.5

On 07/20/2010 01:34 PM, Ot ten Thije wrote:

I am working on fixing the savevm/loadvm functionality in the Android emulator, and the two issues I've encountered so far both appear to stem from the asynchronous I/O (AIO) code. In both cases, the emulator busy-waits indefinitely for an operation that never signals completion.

Unfortunately I am not really familiar with AIO, so I was hoping one of the emulator devs could point me some resources (design docs, general introduction, etc.). I've done some searching myself and found some docs for the Linux kernel AIO implementation (http://lse.sourceforge.net/io/aio.html), but I'm not sure to what extent it applies to the QEMU code.

Tips for debugging AIO would also be greatly appreciated. I can trace the execution until I am within the (emulated) device driver (i.e. block/qcow2.c:qcow_aio_writev()), but haven't been able to pinpoint the exact location where the actual async call is made. This makes it difficult to identify the code that should signal completion back to the main process (and apparently fails to do so). I know this code is called though, because some asynchronous calls *do* signal completion.

TCG translates guest code into small sequences of host code (basic blocks). These basic blocks can be chained together such that one block directly jmps to the next block. The effect is that a guest can run a tight loop whereas guest code continuously runs without a chance for QEMU to do any work.

To allow qemu to make forward progress in such a scenario, we program signals to fire. Currently, the signals fire in a number of circumstances including when AIO operations complete, or when a periodic timer needs to fire.

When dealing with multiple threads, it's very easy to screw things up by not masking signals properly. Often times, this is hidden because the periodic timer runs often enough that it doesn't matter if you miss a signal. An exception, however, would be emulation of synchronous code. This tends to happen in qcow2 metadata operations since they are still synchronous. To complete this emulation, we have to block the current thread until the I/O operation completes. But since qemu isn't re-entrant, we can't run the full main loop as that could trigger re-entrancy in qcow2. To work around this, we implement "idle bottom halves" which are special bottom halves that are run by the normal io loop but also by a special I/O used exclusive for emulating synchronous writes.

To further complicate matters, non-x86 platforms (like ARM) are more likely to not use a periodic timer which makes these bugs much more obvious.

I realize that the Android emulator is a rather heavy fork of QEMU, so giving specific advice will probably be difficult. However, the overall approach is still the same, so I hope you can help me get a better understanding of that.

This is the problem with forking. This is very hairy code that requires careful attention to detail. If you're introducing any type of threading, disk emulation, or changes to the block subsystem, chances are you've done it wrong.


Anthony Liguori

Ot ten Thije

reply via email to

[Prev in Thread] Current Thread [Next in Thread]