[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] QEMU/KVM SCSI lock up
From: |
Jamie Lokier |
Subject: |
Re: [Qemu-devel] QEMU/KVM SCSI lock up |
Date: |
Thu, 3 Apr 2008 12:31:29 +0100 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
Avi Kivity wrote:
> > bdrv_aio_read()
> > r->sector += n;
> > r->sector_count -= n;
> >
> >For reasons that I do not fully understand, bdrv_aio_read() does
> >not return immediately, but instead it calls scsi_read_data()
> >recursively.
>
> What happens (I think) is that bdrv_aio_read() completes immediately,
> calls the completion callback, which starts a read for the next batch of
> sectors.
>
> Long term we want to replace the recursion by queuing.
Ah yes. Reminds me of a bug in a program of mine with asynchronous
sockets, which sometimes completes immediately. My async connection
function took a callback argument. The callback was called
immediately if the connect returned success immediately, otherwise was
queued.
Then I wrote some code like this:
// some stuff
my_obj->connector = async_connection (address, my_obj_connect_cb, my_obj);
// some more stuff
void my_obj_connect_cb (void * arg)
{
MyObj * my_obj = arg;
do_stuff_with (my_obj->connector);
}
That worked for several months flawlessly, the code was deployed on
500 units around the country, then one day an async connect happened
to complete fast enough that the callback was called recursively, and
a core dump followed.
I learned a few lessons about async callbacks:
1. Queue them, even when the async operation completes immediately.
It's really not worth the "optimisation" of immediate callback,
especially if the code for queuing is already there.
Even when immediate callback is documented, mistakes are too
easy to make when calling it, because the async call pattern
_looks_ safe, and may work most of the time or always with some
configurations.
2. If you really must have recursion, e.g. perhaps the synchronous
path is common and a hot path, beware that this is prone to
mistakes in usage. Check that it's worth the optimisation.
Then document with words like "warning" the need for special
care.
3. If there is immediate callback possible, sometimes it's useful
API to have a flag argument to enable that, so that always
queueing is an option, and so the programmer has to think
about consequences if they set the flag. E.g.:
async_id = async_func (args, callback, cb_arg, CB_ALWAYS_QUEUED);
async_id = async_func (args, callback, cb_arg, CB_MAYBE_IMMEDIATE);
3. The code pattern "async_id = async_func (args, callback,
callback_arg)" looks very natural, but it doesn't work when
there is immediate callback.
Callback is called before the id can be stored anywhere. If
the async_id is still meaningful at the point of callback, the
API is fundamentally broken.
Just a few notes on async callback API pitfalls I've had...
-- Jamie