qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(


From: zhanghailiang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Tue, 25 Aug 2015 15:03:24 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 2015/8/24 22:38, Dr. David Alan Gilbert wrote:
* zhanghailiang (address@hidden) wrote:
This is the 8th version of COLO.

I'm seeing an occasional error:

   pcibus_reset: Assertion `bus->irq_count[i] == 0' failed.

on the secondary; have you seen that?

bus->irq_count[4] is -1 in my backtrace; it's
colo_process_incoming_checkpoints->qemu_devices_reset->qbus_walk_children->qbus_reset_one->pcibus_reset


No, we didn't come across such problem. Is there anything special for your test 
? What's your command line ?
Did it happen during the first checkpoint process ?

Thanks,
zhanghailiang

Dave

Here is only COLO frame part, include: VM checkpoint,
failover, proxy API, block replication API, not include block replication.
The block part is treated as a separate series.

As usual, we provide 'basic' and 'developing' branches in github:
https://github.com/coloft/qemu/commits/colo-v1.5-basic
https://github.com/coloft/qemu/commits/colo-v1.5-developing (more features)

The 'basic' branch is exactly the same with this patch series,
We will keep this series simple as possible, just for easy review.

The extra features in colo-v1.5-developing branch:
1) Separate ram and device save/load process to reduce size of extra memory
used during checkpoint
2) Live migrate part of dirty pages to slave during sleep time.
3) You get the statistic info about checkpoint by command 'info migrate'

Please reference to the follow link to test COLO.
http://wiki.qemu.org/Features/COLO.

COLO is a totally new feature which is still in early stage,
your comments and feedback are warmly welcomed.

NOTE:
We have decided to re-implement the colo proxy in userspace (In qemu exactly).
you can find the discussion about why & how to realize the colo proxy in qemu 
from the follow link:
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html

TODO:
1. COLO function switch on/off
2. The capability of continuous FT
3. Optimize the performance.

v8:
- Move some global variables into MigrationIncomingState and MigrationState
- Move some cleanup work form colo thread and colo incoming thread into failover
   BH function and also fix the code logic for the cleanup work.
- fix the bug that colo thread and colo incoming thread possibly block in the
   socket 'recv' call when do failover work.
- Optimize colo_flush_ram_cache()
- Add migration state for incoming side, we use the state to verify if migration
   incoming side is in COLO state or not (Patch 5).
- Drop the patch 'COLO: Disable qdev hotplug when VM is in COLO mode', since it 
is not correct.

zhanghailiang (34):
   configure: Add parameter for configure to enable/disable COLO support
   migration: Introduce capability 'colo' to migration
   COLO: migrate colo related info to slave
   colo-comm/migration: skip colo info section for special cases
   migration: Add state records for migration incoming
   migration: Integrate COLO checkpoint process into migration
   migration: Integrate COLO checkpoint process into loadvm
   COLO: Implement colo checkpoint protocol
   COLO: Add a new RunState RUN_STATE_COLO
   QEMUSizedBuffer: Introduce two help functions for qsb
   COLO: Save VM state to slave when do checkpoint
   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
   COLO VMstate: Load VM state into qsb before restore it
   arch_init: Start to trace dirty pages of SVM
   COLO RAM: Flush cached RAM into SVM's memory
   COLO failover: Introduce a new command to trigger a failover
   COLO failover: Introduce state to record failover process
   COLO failover: Implement COLO primary/secondary vm failover work
   qmp event: Add event notification for COLO error
   COLO failover: Don't do failover during loading VM's state
   COLO: Add new command parameter 'forward_nic' 'colo_script' for net
   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
   tap: Make launch_script() public
   COLO NIC: Implement colo nic device interface configure()
   colo-nic: Handle secondary VM's original net device configure
   COLO NIC: Implement colo nic init/destroy function
   COLO NIC: Some init work related with proxy module
   COLO: Handle nfnetlink message from proxy module
   COLO: Do checkpoint according to the result of packets comparation
   COLO: Improve checkpoint efficiency by do additional periodic
     checkpoint
   COLO: Add colo-set-checkpoint-period command
   COLO NIC: Implement NIC checkpoint and failover
   COLO: Implement shutdown checkpoint
   COLO: Add block replication into colo process

  configure                     |  33 +-
  docs/qmp/qmp-events.txt       |  16 +
  hmp-commands.hx               |  30 ++
  hmp.c                         |  15 +
  hmp.h                         |   2 +
  include/exec/cpu-all.h        |   1 +
  include/migration/colo.h      |  45 +++
  include/migration/failover.h  |  33 ++
  include/migration/migration.h |  19 +
  include/migration/qemu-file.h |   3 +-
  include/net/colo-nic.h        |  37 ++
  include/net/net.h             |   2 +
  include/net/tap.h             |  19 +
  include/sysemu/sysemu.h       |   3 +
  migration/Makefile.objs       |   2 +
  migration/colo-comm.c         |  75 ++++
  migration/colo-failover.c     |  83 +++++
  migration/colo.c              | 805 ++++++++++++++++++++++++++++++++++++++++++
  migration/migration.c         | 116 ++++--
  migration/qemu-file-buf.c     |  58 +++
  migration/ram.c               | 242 ++++++++++++-
  migration/savevm.c            |   2 +-
  net/Makefile.objs             |   1 +
  net/colo-nic.c                | 457 ++++++++++++++++++++++++
  net/net.c                     |   2 +
  net/tap.c                     |  90 +++--
  qapi-schema.json              |  58 ++-
  qapi/event.json               |  15 +
  qemu-options.hx               |   7 +
  qmp-commands.hx               |  42 +++
  scripts/colo-proxy-script.sh  | 145 ++++++++
  stubs/Makefile.objs           |   1 +
  stubs/migration-colo.c        |  58 +++
  trace-events                  |  10 +
  vl.c                          |  37 +-
  35 files changed, 2474 insertions(+), 90 deletions(-)
  create mode 100644 include/migration/colo.h
  create mode 100644 include/migration/failover.h
  create mode 100644 include/net/colo-nic.h
  create mode 100644 migration/colo-comm.c
  create mode 100644 migration/colo-failover.c
  create mode 100644 migration/colo.c
  create mode 100644 net/colo-nic.c
  create mode 100755 scripts/colo-proxy-script.sh
  create mode 100644 stubs/migration-colo.c

--
1.8.3.1


--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]