qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] I have two issues about it to discuss with you RE: ping RE:


From: wangjie (P)
Subject: [Qemu-devel] I have two issues about it to discuss with you RE: ping RE: question: I found a qemu crash about migration
Date: Fri, 29 Sep 2017 03:20:53 +0000

Excuse me, I have two issues to discuss with you :>

//Question 1:

Can we fix it by the demo patch as attachment? (the patch is base on 
qemu-2.8.1, just a demo for discuss)

Because I think we should not set  INACTIVE label for drive-mirror 
BlockDriverState, mirror IO will not affect the memory migration. So we can 
filter the drive-mirror bs in bdrv_inactivate_all, and I tested, my demo patch 
can fix it.



//Question 2:

I used git-bisect and found the qemu-2.6.0 is OK,  because I cann't trigger the 
bug on qemu-2.6.0 even though I tried many times.     But I can trigger the bug 
and reproduce it very easy  since qemu-2.7.0-rc0.   so I git-biset it go on, 
and found the bug begin to occur since the patchs committed as following:

commit cbd614870fce00f46088be7054a7bf5eadcc77ac
Merge: 500acc9 0bc7a6f
Author: Peter Maydell <address@hidden>
Date:   Thu Jun 2 13:42:52 2016 +0100

    Merge remote-tracking branch 'remotes/famz/tags/pull-docker-20160601' into 
staging


.....But I reviewed the patchs about the 
commit(cbd614870fce00f46088be7054a7bf5eadcc77ac) above, I didn't find something 
contact to this bug, did I neglect some details?





-----Original Message-----
From: Dr. David Alan Gilbert [mailto:address@hidden 
Sent: Friday, September 29, 2017 1:01 AM
To: wangjie (P) <address@hidden>
Cc: address@hidden; address@hidden; address@hidden; address@hidden; 
address@hidden; fuweiwei (C) <address@hidden>; address@hidden; address@hidden 
address@hidden <address@hidden>; address@hidden
Subject: Re: ping RE: question: I found a qemu crash about migration

* wangjie (P) (address@hidden) wrote:
> Ping?
> 
> From: wangjie (P)
> Sent: Tuesday, September 26, 2017 9:10 PM
> To: address@hidden; address@hidden; address@hidden
> Cc: wangjie (P) <address@hidden>; fuweiwei (C) 
> <address@hidden>; address@hidden; address@hidden; 
> address@hidden; address@hidden; Wubin (H) <address@hidden>
> Subject: question: I found a qemu crash about migration
> 
> Hi,
> 
> When I use qemuMigrationRun to migrate both memory and storage with 
> some IO press in VM, and configured iothreads. We triggered a error 
> reports:  (I use the current qemu master branch) " bdrv_co_do_pwritev: 
> Assertion `!(bs->open_flags & 0x0800)' failed",
> 
> I reviewed the code, and gdb the coredump file, I think one case can 
> trigger the error reports
> 
> Case:
> 
> Migration_thread()
>       Migration_completion() ----------> last iteration of memory migration
>             Vm_stop_force_state()--------------> Stop the VM, and call 
> bdrv_drain_all, but I gdb the core file, and found the cnt of dirty bitmap of 
> driver-mirror is not 0, and in_flight mirror IO is 16,
>                   Bdrv_inactivate_all()----------------> inactivate images 
> and set the INACTIVE label.
>       -> bdrv_co_do_pwritev()-------------->then the mirror IO handled 
> after will trigger the Assertion `!(bs->open_flags & 0x0800)' and qemu 
> crashed
> 
> 
> 
> 
> As we can see from above,  Migration_completion call 
> Bdrv_inactivate_all to inactivate images, but the mirror_run is not 
> done (still has dirty clusters), the mirror_run IO issued later will 
> triggered error reports: " bdrv_co_do_pwritev: Assertion 
> `!(bs->open_flags & 0x0800)' failed",
> 
> It seems that memory migration and storage mirror is done independently and 
> the sequence of the two progresses are quite random.
> 
> How can I solve this problem, should we not set  INACTIVE label for 
> drive-mirror BlockDriverState?

Hi,
  This is a 'fun' bug;  I had a good chat to kwolf about it earlier.
A proper fix really needs to be done together with libvirt so that we can 
sequence:
   a) The stopping of the CPU on the source
   b) The termination of the mirroring block job
   c) The inactivation of the block devices on the source
       (bdrv_inactivate_all)
   d) The activation of the block devices on the destination
       (bdrv_invalidate_cache_all)
   e) The start of the CPU on the destination

It looks like you're hitting a race between b/c;  we've had races between c/d 
in the past and moved the bdrv_inactivate_all.

During the discussion we ended up with two proposed solutions; both of them 
require one extra command and one extra migration capability.

The block way
-------------
   1) Add a new migration capability pause-at-complete
   2) Add a new migration state almost-complete
   3) After saving devices, if pause-at-complete is set,
      transition to almost-complete
   4) Add a new command (migration-continue) that 
      causes the migration to inactivate the devices (c)
      and send the final EOF to the destination.

You set pause-at-complete, wait until migrate hits almost-complete; cleanup the 
mirror job, and then do migration-continue.  When it completes do 'cont' on the 
destination.

The migration way
-----------------
   1) Stop doing (d) when the destination is started with -S
      since it happens anyway when 'cont' is issued
   2) Add a new migration capability ext-manage-storage
   3) When 'ext-manage-storage' is set, we don't bother doing (c)
   4) Add a new command 'block-inactivate' on the source

You set ext-manage-storage, do the migrate and when it's finished clean up the 
block job, block-inactivate on the source, and then cont on the destination.
  

My worry about the 'block way' is that the point at which we do the pause seems 
pretty interesting;  it probably is best done after the final device save but 
before the inactivate, but could be done before it.  But it probably becomes 
API and something might become dependent on where we did it.

I think Kevin's worry about the 'migration way' is that it's a bit of a 
block-specific fudge; which is probably right.


I've not really thought what happens when you have a mix of shared and 
non-shared storage.

Could we do any hack that isn't libvirt-visible for existing versions?
I guess maybe hack drive-mirror so it interlocks with the migration code 
somehow to hold off on that inactivate?

This code is visible probalby from 2.9.ish with the new locking code; but 
really that b/c race has been there for ever - there's maybe always the chance 
that the last few blocks of mirroring might have happened too late ?

Thoughts?
What are the libvirt view on the preferred solution.

Dave


> Qemu Crash bt:
> (gdb) bt
> #0  0x00007f6b6e2a71d7 in raise () from /usr/lib64/libc.so.6
> #1  0x00007f6b6e2a88c8 in abort () from /usr/lib64/libc.so.6
> #2  0x00007f6b6e2a0146 in __assert_fail_base () from 
> /usr/lib64/libc.so.6
> #3  0x00007f6b6e2a01f2 in __assert_fail () from /usr/lib64/libc.so.6
> #4  0x00000000007b9211 in bdrv_co_pwritev (child=<optimized out>, 
> address@hidden, address@hidden,
>     address@hidden, flags=0) at block/io.c:1536
> #5  0x00000000007a6f02 in blk_co_pwritev (blk=0x2f92750, offset=7034896384, 
> bytes=65536, qiov=0x7f69cc09b068,
>     flags=<optimized out>) at block/block_backend.c:851
> #6  0x00000000007a6fc1 in blk_aio_write_entry (opaque=0x301dad0) at 
> block/block_backend.c:1043
> #7  0x0000000000835e2a in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at util/coroutine_ucontext.c:79
> #8  0x00007f6b6e2b8cf0 in ?? () from /usr/lib64/libc.so.6
> #9  0x00007f6a1bcfc780 in ?? ()
> #10 0x0000000000000000 in ?? ()
> 
> And I see the mirror_run is not done,  gdb info as following:
> [cid:image001.png@01D3386F.DBC9FF10]
> 
> 
> Src VM qemu log:
> 
> [cid:image002.png@01D3386F.DBC9FF10]
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

Attachment: demo.patch
Description: demo.patch


reply via email to

[Prev in Thread] Current Thread [Next in Thread]