[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v1: 12/13] updated protocol doc

From: Michael R. Hines
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v1: 12/13] updated protocol documentation
Date: Wed, 10 Apr 2013 22:47:10 -0400
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

Great comments, thanks.

On 04/10/2013 10:43 PM, Eric Blake wrote:
On 04/10/2013 04:28 PM, address@hidden wrote:
From: "Michael R. Hines" <address@hidden>

Full documentation on the rdma protocol: docs/rdma.txt

Signed-off-by: Michael R. Hines <address@hidden>
  docs/rdma.txt |  331 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 331 insertions(+)
  create mode 100644 docs/rdma.txt

diff --git a/docs/rdma.txt b/docs/rdma.txt
new file mode 100644
index 0000000..ae68d2f
--- /dev/null
+++ b/docs/rdma.txt
@@ -0,0 +1,331 @@
+Changes since v6:
+(Thanks, Paolo - things look much cleaner now.)
+- Try to get patch-ordering correct =)
+- Much cleaner use of QEMUFileOps
+- Much fewer header files changes
+- Convert zero check capability to QMP command instead
+- Updated documentation
The above text probably shouldn't be in the file.

+Wiki: http://wiki.qemu.org/Features/RDMALiveMigration
+Github: address@hidden:hinesmr/qemu.git
+Contact: Michael R. Hines, address@hidden
Missing a copyright statement, but that's just following the example of
other docs, so I guess it's okay?

+RDMA Live Migration Specification, Version # 1
+* Running
+* RDMA Protocol Description
+* Versioning and Capabilities
+* QEMUFileRDMA Interface
+* Migration of pc.ram
+* Error handling
+* Performance
No high-level overview of what the acronym RDMA even stands for?

+First, decide if you want dynamic page registration on the server-side.
+This always happens on the primary-VM side, but is optional on the server.
+Doing this allows you to support overcommit (such as cgroups or ballooning)
+with a smaller footprint on the server-side without having to register the
+entire VM memory footprint.
+NOTE: This significantly slows down RDMA throughput (about 30% slower).
+$ virsh qemu-monitor-command --hmp \
+    --cmd "migrate_set_capability chunk_register_destination off" # enabled by 
'virsh qemu-monitor-command' is documented as unsupported by libvirt
(it's intended solely as a development/debugging aid); but I guess until
libvirt learns to expose RDMA support by default, this is okay for a
first cut of documentation.  Furthermore, you are missing a domain argument.

Do you really want to be requiring the user to do everything through
libvirt?  This is qemu documentation, so you should document how things
work without needing libvirt in the picture.

+Next, if you decided *not* to use chunked registration on the server,
+it is recommended to also disable zero page detection. While this is not
+strictly necessary, zero page detection also significantly slows down
+throughput on higher-performance links (by about 50%), like 40 gbps infiniband 
+$ virsh qemu-monitor-command --hmp \
+    --cmd "migrate_check_for_zero off" # enabled by default
Missing a domain argument.

+Finally, set the migration speed to match your hardware's capabilities:
+$ virsh qemu-monitor-command --hmp \
+    --cmd "migrate_set_speed 40g" # or whatever is the MAX of your RDMA device
This modifies qemu state behind libvirt's back, and won't necessarily do
what you want if libvirt tries to change things back to the speed it
thought it was managing.  Instead, use 'virsh migrate-setspeed $dom 40'.

+Finally, perform the actual migration:
+$ virsh migrate domain rdma:xx.xx.xx.xx:port
That's not quite valid syntax for 'virsh migrate'.  Again, do you really
want to be documenting libvirt's interface, or qemu's interface?

+RDMA Protocol Description:
Aesthetics: match the length of === to the line above it.

<snip> I'm not reviewing technical content, just face value...

+These two functions are very short and simply used the protocol
+describe above to deliver bytes without changing the upper-level
+users of QEMUFile that depend on a bytstream abstraction.

+After pinning, an RDMA Write is generated and tramsmitted
+for the entire chunk.

+5. Also, some form of balloon-device usage tracking would also
+   help aleviate some of these issues.

+Using a 40gbps infinband link performing a worst-case stress test:

+RDMA Throughput With $ stress --vm-bytes 1024M --vm 1 --vm-keep
+Approximately 30 gpbs (little better than the paper)
which paper? Call that out in your high-level summary

+An *exhaustive* paper (2010) shows additional performance details
+linked on the QEMU wiki:
Missing the actual reference?  And it would help to mention it at the
beginning of the file.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]