[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] failed migration makes monitor stuck
From: |
Michael Tokarev |
Subject: |
[Qemu-devel] failed migration makes monitor stuck |
Date: |
Sat, 09 Jul 2011 15:07:00 +0400 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.1.16) Gecko/20110704 Icedove/3.0.11 |
After some debugging I found a programming error in
error handling in migration, but I'm not sure how to
fix it.
When migration starts, monitor gets suspended, calling
monitor_suspend() routine which increments assotiated
suspend_cnt counter.
At the end of migration, in migrate_fd_cleanup(),
monitor_resume() gets called, which decrements the
counter.
But monitor_resume() gets also called from another
place, in migrate_fd_put_buffer(), in case we
encountered a write error.
So, suppose a tcp endpoint has disconnected, or the
exec: program terminated due to error or whatnot --
in all these cases write will fail, and we'll call
monitor_resume() twice as a result: once in this
place in migrate_fd_put_buffer(), and once more at
the end in migrate_fd_cleanup().
This results in suspend_cnt being decremented twice,
with the resultant value being -1.
So monitor_can_read() will return 0 from now on, since
it compares suspend_cnt with 0. And hence, monitor will
stop working.
To me it looks like monitor_resume() call should be
removed from migrate_fd_put_buffer(), but I'm not sure
_why_ it were here in the first place.
There's more: monitor_suspend() gets called from within
protocol handlers (using migrate_fd_monitor_suspend()
routine), -- are we sure that all current and future
protocol handlers will call this function?
Thanks!
/mjt
- [Qemu-devel] failed migration makes monitor stuck,
Michael Tokarev <=