[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) |
Date: |
Wed, 26 Aug 2015 17:49:37 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
* zhanghailiang (address@hidden) wrote:
> On 2015/8/24 22:38, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (address@hidden) wrote:
> >>This is the 8th version of COLO.
> >
> >I'm seeing an occasional error:
> >
> > pcibus_reset: Assertion `bus->irq_count[i] == 0' failed.
> >
> >on the secondary; have you seen that?
> >
> >bus->irq_count[4] is -1 in my backtrace; it's
> >colo_process_incoming_checkpoints->qemu_devices_reset->qbus_walk_children->qbus_reset_one->pcibus_reset
> >
>
> No, we didn't come across such problem. Is there anything special for your
> test ? What's your command line ?
> Did it happen during the first checkpoint process ?
I was using e1000, it hasn't happened again since I switched to virtio-net-pci;
so I suspect it's the e1000 having an outstanding interrupt while it's being
reset.
block_param="-drive
if=none,driver=raw,file=$disk_path,id=colo1,cache=none,aio=native \
-drive
if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=70000000,\
file.file.filename=$TMPDISKS/colo-active-disk.qcow2,\
file.driver=qcow2,\
file.backing.file.filename=$TMPDISKS/colo-hidden-disk.qcow2,\
file.backing.driver=qcow2,\
file.backing.allow-write-backing-file=on,\
file.backing.backing.backing_reference=colo1"
net_param="-netdev tap,id=hn0,script=$PWD/ifup-slave,\
downscript=$PWD/ifdown-slave,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,forward_nic=em4
\
-device virtio-net-pci,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0"
console_param="-chardev
socket,id=hmpfeed,server,nowait,telnet,port=9999,host=localhost -mon hmpfeed
-nographic -chardev stdio,mux=on,id=mon -mon chardev=mon,mode=readline --device
isa-serial,chardev=mon"
./try/bin/qemu-system-x86_64 -enable-kvm $console_param \
-boot c -m 4096 -smp 4 -machine pc-i440fx-2.3,accel=kvm -S \
-name debug-threads=on -trace events=trace-file \
-device virtio-rng-pci \
$block_param $net_param\
-incoming tcp:0:8888
Dave
>
> Thanks,
> zhanghailiang
>
> >Dave
> >
> >>Here is only COLO frame part, include: VM checkpoint,
> >>failover, proxy API, block replication API, not include block replication.
> >>The block part is treated as a separate series.
> >>
> >>As usual, we provide 'basic' and 'developing' branches in github:
> >>https://github.com/coloft/qemu/commits/colo-v1.5-basic
> >>https://github.com/coloft/qemu/commits/colo-v1.5-developing (more features)
> >>
> >>The 'basic' branch is exactly the same with this patch series,
> >>We will keep this series simple as possible, just for easy review.
> >>
> >>The extra features in colo-v1.5-developing branch:
> >>1) Separate ram and device save/load process to reduce size of extra memory
> >>used during checkpoint
> >>2) Live migrate part of dirty pages to slave during sleep time.
> >>3) You get the statistic info about checkpoint by command 'info migrate'
> >>
> >>Please reference to the follow link to test COLO.
> >>http://wiki.qemu.org/Features/COLO.
> >>
> >>COLO is a totally new feature which is still in early stage,
> >>your comments and feedback are warmly welcomed.
> >>
> >>NOTE:
> >>We have decided to re-implement the colo proxy in userspace (In qemu
> >>exactly).
> >>you can find the discussion about why & how to realize the colo proxy in
> >>qemu from the follow link:
> >>http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html
> >>
> >>TODO:
> >>1. COLO function switch on/off
> >>2. The capability of continuous FT
> >>3. Optimize the performance.
> >>
> >>v8:
> >>- Move some global variables into MigrationIncomingState and MigrationState
> >>- Move some cleanup work form colo thread and colo incoming thread into
> >>failover
> >> BH function and also fix the code logic for the cleanup work.
> >>- fix the bug that colo thread and colo incoming thread possibly block in
> >>the
> >> socket 'recv' call when do failover work.
> >>- Optimize colo_flush_ram_cache()
> >>- Add migration state for incoming side, we use the state to verify if
> >>migration
> >> incoming side is in COLO state or not (Patch 5).
> >>- Drop the patch 'COLO: Disable qdev hotplug when VM is in COLO mode',
> >>since it is not correct.
> >>
> >>zhanghailiang (34):
> >> configure: Add parameter for configure to enable/disable COLO support
> >> migration: Introduce capability 'colo' to migration
> >> COLO: migrate colo related info to slave
> >> colo-comm/migration: skip colo info section for special cases
> >> migration: Add state records for migration incoming
> >> migration: Integrate COLO checkpoint process into migration
> >> migration: Integrate COLO checkpoint process into loadvm
> >> COLO: Implement colo checkpoint protocol
> >> COLO: Add a new RunState RUN_STATE_COLO
> >> QEMUSizedBuffer: Introduce two help functions for qsb
> >> COLO: Save VM state to slave when do checkpoint
> >> COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
> >> COLO VMstate: Load VM state into qsb before restore it
> >> arch_init: Start to trace dirty pages of SVM
> >> COLO RAM: Flush cached RAM into SVM's memory
> >> COLO failover: Introduce a new command to trigger a failover
> >> COLO failover: Introduce state to record failover process
> >> COLO failover: Implement COLO primary/secondary vm failover work
> >> qmp event: Add event notification for COLO error
> >> COLO failover: Don't do failover during loading VM's state
> >> COLO: Add new command parameter 'forward_nic' 'colo_script' for net
> >> COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
> >> tap: Make launch_script() public
> >> COLO NIC: Implement colo nic device interface configure()
> >> colo-nic: Handle secondary VM's original net device configure
> >> COLO NIC: Implement colo nic init/destroy function
> >> COLO NIC: Some init work related with proxy module
> >> COLO: Handle nfnetlink message from proxy module
> >> COLO: Do checkpoint according to the result of packets comparation
> >> COLO: Improve checkpoint efficiency by do additional periodic
> >> checkpoint
> >> COLO: Add colo-set-checkpoint-period command
> >> COLO NIC: Implement NIC checkpoint and failover
> >> COLO: Implement shutdown checkpoint
> >> COLO: Add block replication into colo process
> >>
> >> configure | 33 +-
> >> docs/qmp/qmp-events.txt | 16 +
> >> hmp-commands.hx | 30 ++
> >> hmp.c | 15 +
> >> hmp.h | 2 +
> >> include/exec/cpu-all.h | 1 +
> >> include/migration/colo.h | 45 +++
> >> include/migration/failover.h | 33 ++
> >> include/migration/migration.h | 19 +
> >> include/migration/qemu-file.h | 3 +-
> >> include/net/colo-nic.h | 37 ++
> >> include/net/net.h | 2 +
> >> include/net/tap.h | 19 +
> >> include/sysemu/sysemu.h | 3 +
> >> migration/Makefile.objs | 2 +
> >> migration/colo-comm.c | 75 ++++
> >> migration/colo-failover.c | 83 +++++
> >> migration/colo.c | 805
> >> ++++++++++++++++++++++++++++++++++++++++++
> >> migration/migration.c | 116 ++++--
> >> migration/qemu-file-buf.c | 58 +++
> >> migration/ram.c | 242 ++++++++++++-
> >> migration/savevm.c | 2 +-
> >> net/Makefile.objs | 1 +
> >> net/colo-nic.c | 457 ++++++++++++++++++++++++
> >> net/net.c | 2 +
> >> net/tap.c | 90 +++--
> >> qapi-schema.json | 58 ++-
> >> qapi/event.json | 15 +
> >> qemu-options.hx | 7 +
> >> qmp-commands.hx | 42 +++
> >> scripts/colo-proxy-script.sh | 145 ++++++++
> >> stubs/Makefile.objs | 1 +
> >> stubs/migration-colo.c | 58 +++
> >> trace-events | 10 +
> >> vl.c | 37 +-
> >> 35 files changed, 2474 insertions(+), 90 deletions(-)
> >> create mode 100644 include/migration/colo.h
> >> create mode 100644 include/migration/failover.h
> >> create mode 100644 include/net/colo-nic.h
> >> create mode 100644 migration/colo-comm.c
> >> create mode 100644 migration/colo-failover.c
> >> create mode 100644 migration/colo.c
> >> create mode 100644 net/colo-nic.c
> >> create mode 100755 scripts/colo-proxy-script.sh
> >> create mode 100644 stubs/migration-colo.c
> >>
> >>--
> >>1.8.3.1
> >>
> >>
> >--
> >Dr. David Alan Gilbert / address@hidden / Manchester, UK
> >
> >.
> >
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK