qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU/KVM SCSI lock up


From: Jamie Lokier
Subject: Re: [Qemu-devel] QEMU/KVM SCSI lock up
Date: Thu, 3 Apr 2008 12:31:29 +0100
User-agent: Mutt/1.5.13 (2006-08-11)

Avi Kivity wrote:
> >    bdrv_aio_read()
> >    r->sector += n;
> >    r->sector_count -= n;
> >
> >For reasons that I do not fully understand, bdrv_aio_read() does
> >not return immediately, but instead it calls scsi_read_data()
> >recursively.
> 
> What happens (I think) is that bdrv_aio_read() completes immediately, 
> calls the completion callback, which starts a read for the next batch of 
> sectors.
> 
> Long term we want to replace the recursion by queuing.

Ah yes.  Reminds me of a bug in a program of mine with asynchronous
sockets, which sometimes completes immediately.  My async connection
function took a callback argument.  The callback was called
immediately if the connect returned success immediately, otherwise was
queued.

Then I wrote some code like this:

     // some stuff
     my_obj->connector = async_connection (address, my_obj_connect_cb, my_obj);
     // some more stuff

     void my_obj_connect_cb (void * arg)
     {
        MyObj * my_obj = arg;
        do_stuff_with (my_obj->connector);
     }

That worked for several months flawlessly, the code was deployed on
500 units around the country, then one day an async connect happened
to complete fast enough that the callback was called recursively, and
a core dump followed.

I learned a few lessons about async callbacks:

    1. Queue them, even when the async operation completes immediately.
       It's really not worth the "optimisation" of immediate callback,
       especially if the code for queuing is already there.

       Even when immediate callback is documented, mistakes are too
       easy to make when calling it, because the async call pattern
       _looks_ safe, and may work most of the time or always with some
       configurations.

    2. If you really must have recursion, e.g. perhaps the synchronous
       path is common and a hot path, beware that this is prone to
       mistakes in usage.  Check that it's worth the optimisation.
       Then document with words like "warning" the need for special
       care.

    3. If there is immediate callback possible, sometimes it's useful
       API to have a flag argument to enable that, so that always
       queueing is an option, and so the programmer has to think
       about consequences if they set the flag.  E.g.:

          async_id = async_func (args, callback, cb_arg, CB_ALWAYS_QUEUED);
          async_id = async_func (args, callback, cb_arg, CB_MAYBE_IMMEDIATE);

    3. The code pattern "async_id = async_func (args, callback,
       callback_arg)" looks very natural, but it doesn't work when
       there is immediate callback.

       Callback is called before the id can be stored anywhere.  If
       the async_id is still meaningful at the point of callback, the
       API is fundamentally broken.

Just a few notes on async callback API pitfalls I've had...

-- Jamie




reply via email to

[Prev in Thread] Current Thread [Next in Thread]