qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 19/19] test-bdrv-drain: Test draining job sou


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v3 19/19] test-bdrv-drain: Test draining job source child and parent
Date: Fri, 21 Sep 2018 19:34:08 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

Am 21.09.2018 um 19:01 hat Eric Blake geschrieben:
> On 9/20/18 11:19 AM, Kevin Wolf wrote:
> > For the block job drain test, don't only test draining the source and
> > the target node, but create a backing chain for the source
> > (source_backing <- source <- source_overlay) and test draining each of
> > the nodes in it.
> > 
> > When using iothreads, the source node (and therefore the job) is in a
> > different AioContext than the drain, which happens from the main
> > thread. This way, the main thread waits in AIO_WAIT_WHILE() for the
> > iothread to make process and aio_wait_kick() is required to notify it.
> > The test validates that calling bdrv_wakeup() for a child or a parent
> > node will actually notify AIO_WAIT_WHILE() instead of letting it hang.
> > 
> > Signed-off-by: Kevin Wolf <address@hidden>
> > ---
> >   tests/test-bdrv-drain.c | 75 
> > +++++++++++++++++++++++++++++++++++++++++++------
> >   1 file changed, 67 insertions(+), 8 deletions(-)
> > 
> 
> > @@ -818,12 +819,17 @@ static int coroutine_fn test_job_run(Job *job, Error 
> > **errp)
> >   {
> >       TestBlockJob *s = container_of(job, TestBlockJob, common.job);
> > +    /* We are running the actual job code past the pause point in
> > +     * job_co_entry(). */
> > +    s->running = true;
> > +
> >       job_transition_to_ready(&s->common.job);
> >       while (!s->should_complete) {
> >           /* Avoid job_sleep_ns() because it marks the job as !busy. We 
> > want to
> >            * emulate some actual activity (probably some I/O) here so that 
> > drain
> >            * has to wait for this acitivity to stop. */
> > -        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 100000);
> > +        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 1000000);
> 
> The commit message didn't mention why you had to lengthen this sleep, but
> I'm guessing it's because you are now doing enough additional things that
> you have to scale the delay to match?

Maybe it's actually interesting enought to add a paragraph to the commit
message:

When running under 'rr record -n' (because I wanted to debug some
behaviour), the bug suddently didn't reproduce any more. This was
because bdrv_drain_invoke_entry() (in the main thread) was only called
after the job had already reached the pause point, so we got a
bdrv_dec_in_flight() from the main thread and the additional
aio_wait_kick() when the job becomes idle wasn't necessary any more.

I don't think this would happen outside a debugging environment (no idea
what rr does to multithreading), but I thought increasing the delay
can't hurt because it's still quite short (1 ms).

Of course, if you have an idea how to check the actual condition (only
allow a pause point after the wakeup from bdrv_drain_invoke_entry() has
already happened), I'd consider that instead. But I don't think there's
an easy way to get this information.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]