[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v5 01/45] Start documenting how postcopy works.
Dr. David Alan Gilbert (git)
[Qemu-devel] [PATCH v5 01/45] Start documenting how postcopy works.
Wed, 25 Feb 2015 16:51:24 +0000
From: "Dr. David Alan Gilbert" <address@hidden>
Signed-off-by: Dr. David Alan Gilbert <address@hidden>
docs/migration.txt | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 189 insertions(+)
diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..c6c3798 100644
@@ -294,3 +294,192 @@ save/send this state when we are in the middle of a pio
(that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is
not enabled, the values on that fields are garbage and don't need to
+= Return path =
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages
+However, some uses need two way communication; in particular the Postcopy
+needs to be able to request pages on demand from the source.
+For these scenarios there is a 'return path' from the destination to the
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+ Source side
+ Forward path - written by migration thread
+ Return path - opened by main thread, read by return-path thread
+ Destination side
+ Forward path - read by main thread
+ Return path - opened by main thread, written by main thread AND postcopy
+ thread (protected by rp_mutex)
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration
+and time it takes, the down side is that during the postcopy phase, a failure
+*either* side or the network connection causes the guest to be lost.
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+Postcopy can be combined with precopy (i.e. normal migration) so that if
+doesn't finish in a given time the switch is made to postcopy.
+=== Enabling postcopy ===
+To enable postcopy (prior to the start of migration):
+migrate_set_capability x-postcopy-ram on
+The migration will still start in precopy mode, however issuing:
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on. Issuing it after the end of a migration is harmless.
+=== Postcopy device transfer ===
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream
+before the device load begins to free the stream up. This is achieved by
+'packaging' the device data into a blob that's read in one go.
+Until postcopy is entered the migration stream is identical to normal
+precopy, except for the addition of a 'postcopy advise' command at
+the beginning, to tell the destination that postcopy might happen.
+When postcopy starts the source sends the page discard data and then
+forms the 'package' containing:
+ Command: 'postcopy ram listen'
+ The device state
+ A series of sections, identical to the precopy streams device state
+ containing everything except postcopiable devices (i.e. RAM)
+ Command: 'postcopy ram run'
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+ 1 2 3 4 5 6 7
+main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN )
+thread | |
+ | (page request)
+ | \___
+ v \
+listen thread: --- page -- page -- page -- page -- page --
+ a b c
+On receipt of CMD_PACKAGED (1)
+ All the data associated with the package - the ( ... ) section in the
+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
+recurses into qemu_loadvm_state_main to process the contents of the package (2)
+which contains commands (3,6) and devices (4...)
+On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the package)
+a new thread (a) is started that takes over servicing the migration stream,
+while the main thread carries on loading the package. It loads normal
+background page data (b) but if during a device load a fault happens (5) the
+returned page (c) is loaded by the listen thread allowing the main threads
+device load to carry on.
+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the
+CPUs start running.
+At the end of the CMD_PACKAGED (7) the main thread returns to normal running
+and is no longer used by migration, while the listen thread carries
+on servicing page data until the end of migration.
+=== Postcopy states ===
+Postcopy moves through a series of states (see postcopy_state) from
+ Advise: Set at the start of migration if postcopy is enabled, even
+ if it hasn't had the start command; here the destination
+ checks that its OS has the support needed for postcopy, and performs
+ setup to ensure the RAM mappings are suitable for later postcopy.
+ (Triggered by reception of POSTCOPY_ADVISE command)
+ Listen: The first command in the package, POSTCOPY_LISTEN, switches
+ the destination state to Listen, and starts a new thread
+ (the 'listen thread') which takes over the job of receiving
+ pages off the migration stream, while the main thread carries
+ on processing the blob. With this thread able to process page
+ reception, the destination now 'sensitises' the RAM to detect
+ any access to missing pages (on Linux using the 'userfault'
+ Running: POSTCOPY_RUN causes the destination to synchronise all
+ state and start the CPUs and IO devices running. The main
+ thread now finishes processing the migration package and
+ now carries on as it would for normal precopy migration
+ (although it can't do the cleanup it would do as it
+ finishes a normal migration).
+ End: The listen thread can now quit, and perform the cleanup of migration
+ state, the migration is now complete.
+=== Source side page maps ===
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'. The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending. During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is masked against the migration bitmap
+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
+have been previously been sent but are now dirty again. This masked
+sentmap is sent to the destination which discards those now dirty pages
+before starting the CPUs.
+Note that once in postcopy mode, the sent map is still updated; however,
+its contents are not necessarily consistent with the pages already sent
+due to the masking with the migration bitmap.
+=== Destination side page maps ===
+(Needs to be changed so we can update both easily - at the moment updates are
+ with a lock)
+The destination keeps a state for each page which is 'missing', 'received'
+or 'requested'; these three states are encoded in a 2 bit state array.
+Incoming requests from the kernel cause the state to transition from 'missing'
+to 'requested'. Received pages cause a transition from either 'missing' or
+'requested' to 'received'; the kernel is notified on reception to wake up
+any threads that were waiting for the page.
+If the kernel requests a page that has already been 'received' the kernel is
+notified without re-requesting.
+This leads to four valid page states:
+ missing - page not yet received or requested
+ received - Page received
+ requested - page requested but not yet received
+ received -> missing (only during setup/discard)
+ missing -> received (normal incoming page)
+ requested -> received (incoming page previously requested)
+ missing -> requested (userfault request)
- [Qemu-devel] [PATCH v5 00/45] Postcopy implementation, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 01/45] Start documenting how postcopy works.,
Dr. David Alan Gilbert (git) <=
- [Qemu-devel] [PATCH v5 03/45] qemu_ram_foreach_block: pass up error value, and down the ramblock name, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 05/45] Create MigrationIncomingState, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 02/45] Split header writing out of qemu_save_state_begin, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 04/45] Add qemu_get_counted_string to read a string prefixed by a count byte, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 06/45] Provide runtime Target page information, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 09/45] Migration commands, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 07/45] Return path: Open a return path on QEMUFile for sockets, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 10/45] Return path: Control commands, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 08/45] Return path: socket_writev_buffer: Block even on non-blocking fd's, Dr. David Alan Gilbert (git), 2015/02/25
- [Qemu-devel] [PATCH v5 13/45] ram_debug_dump_bitmap: Dump a migration bitmap as text, Dr. David Alan Gilbert (git), 2015/02/25