qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] main-loop: Unconditionally unlock iothread


From: Peter Crosthwaite
Subject: Re: [Qemu-devel] [RFC PATCH] main-loop: Unconditionally unlock iothread
Date: Wed, 3 Apr 2013 12:17:41 +1000

Hi Paolo,

Thanks for the clues.

On Tue, Apr 2, 2013 at 9:11 PM, Paolo Bonzini <address@hidden> wrote:
> Il 02/04/2013 11:04, Peter Crosthwaite ha scritto:
>> Public bug: 1154328
>> Broken Commit: a29753f8aa79a34a324afebe340182a51a5aef11
>>
>> ATM, the timeout from g_pollfds_fill is inhibiting unlocking of the
>> iothread. This is capable of causing a total deadlock when hw/serial
>> is used as a device. The bug manifests when you go -nographic -serial
>> mon:stdio and then paste 40 or more chars into the terminal.
>>
>> My knowledge of this g_foo is vague at best, but my best working
>> theory is this:
>>
>> - First 8 chars are recieved by the serial device no complaints.
>> - The next 32 chars, serial returns false for can_receive() so they
>>       are buffered by the MuxDriver object - mux_chr_read()
>> - Buffer is full, so 41st char causes false return from Muxes own
>>       can_read()
>> - This propagates all the way up to glib_pollfds_fill and manifests
>>       as a timeout
>
> I suppose you mean "manifests as timeout==0".  The question is *which*
> GSource has a timeout of zero?

None of the GSources are to blame. The timeout actually starts as zero
as set by mainloop wait when the nonblocking flag is set. This little
hack also fixes the bug (while leaving the mutex ifery in).

--- a/main-loop.c
+++ b/main-loop.c
@@ -428,10 +428,6 @@ int main_loop_wait(int nonblocking)
     int ret;
     uint32_t timeout = UINT32_MAX;

-    if (nonblocking) {
-        timeout = 0;
-    }
-
     /* poll any events */
     g_array_set_size(gpollfds, 0); /* reset for new iteration */
     /* XXX: separate device handlers from system ones */

---
Is it expected that this non-blocking condition implies lockup of the iothread?

Diving deeper again, I notice that this non-blocking feature isn't
even enabled at all for KVM. Which would probably mean that this bug
is not replicable by anyone testing with KVM. We could just make all
the CPU backends consistent with KVM by removing the nonblocking
altogether. Any comments from TCG people :) ?

--- a/vl.c
+++ b/vl.c
@@ -2030,17 +2030,15 @@ static bool main_loop_should_exit(void)

 static void main_loop(void)
 {
-    bool nonblocking;
     int last_io = 0;
 #ifdef CONFIG_PROFILER
     int64_t ti;
 #endif
     do {
-        nonblocking = !kvm_enabled() && last_io > 0;
 #ifdef CONFIG_PROFILER
         ti = profile_getclock();
 #endif
-        last_io = main_loop_wait(nonblocking);
+        last_io = main_loop_wait(0);
 #ifdef CONFIG_PROFILER
         dev_time += profile_getclock() - ti;
 #endif

Not the mux's: if mux_chr_can_read()
> returns zero, the prepare function returns FALSE without touching the
> timeout at all...
>
> static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
> {
>     IOWatchPoll *iwp = io_watch_poll_from_source(source);
>
>     iwp->max_size = iwp->fd_can_read(iwp->opaque);
>     if (iwp->max_size == 0) {
>         return FALSE;
>     }
>
>     return g_io_watch_funcs.prepare(source, timeout_);
> }
>
>> - Timeout means no unlock of IOthread. Device land never sees any more
>>       cycles so the serial port never progresses - no flushing of
>>       buffer
>
> Still, this is plausible, so the patch looks correct.
>

Ok,

Ill give it some more time and fix commit message. if we don't figure
out a better patch.

Regards,
Peter

> Paolo
>
>> - Deadlock
>>
>> Tested on petalogix_ml605 microblazeel machine model, which was faulty
>> due to 1154328.
>>
>> Fix by removing the conditions on unlocking the iothread. Don't know
>> what else this will break but the timeout is certainly the wrong
>> condition for the unlock. Probably the real solution is to have a more
>> selective unlock policy.
>>
>> I'm happy for someone to take this patch off my hands, or educate me on
>> the correct implementation. For the peeps doing automated testing on
>> nographic platforms this will get your build working again.
>>
>> Signed-off-by: Peter Crosthwaite <address@hidden>
>> ---
>>  main-loop.c |    8 ++------
>>  1 files changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/main-loop.c b/main-loop.c
>> index eb80ff3..a376898 100644
>> --- a/main-loop.c
>> +++ b/main-loop.c
>> @@ -194,15 +194,11 @@ static int os_host_main_loop_wait(uint32_t timeout)
>>
>>      glib_pollfds_fill(&timeout);
>>
>> -    if (timeout > 0) {
>> -        qemu_mutex_unlock_iothread();
>> -    }
>> +    qemu_mutex_unlock_iothread();
>>
>>      ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout);
>>
>> -    if (timeout > 0) {
>> -        qemu_mutex_lock_iothread();
>> -    }
>> +    qemu_mutex_lock_iothread();
>>
>>      glib_pollfds_poll();
>>      return ret;
>>
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]