qemu/docs/rdma.txt

21 * Migration of VM's ram
29 of the significantly lower latency and higher throughput over TCP/IP. This is
30 because the RDMA I/O architecture reduces the number of interrupts and
32 migration, under certain types of memory-bound workloads, may take a more
33 unpredictable amount of time to complete the migration if the amount of
35 with the rate of dirty memory produced by the workload.
38 over Converged Ethernet) as well as Infiniband-based. This implementation of
39 migration using RDMA is capable of using both technologies because of
40 the use of the OpenFabrics OFED software stack that abstracts out the
41 programming model irrespective of the underlying hardware.
47 for a working build of QEMU to run successfully using RDMA Migration.
52 Use of RDMA during migration requires pinning and registering memory
56 of RDMA migration may in fact be harmful to co-located VMs or other
58 relocate the entire footprint of the virtual machine. If so, then the
59 use of RDMA is discouraged and it is recommended to use standard TCP migration.
65 bulk-phase round of the migration and can be enabled for extremely
75 of the migration, which can greatly reduce the "total" time of your migration.
76 Example performance of this using an idle VM in the previous example
79 Note: for very large virtual machines (hundreds of GBs), pinning all
80 *all* of the memory of your virtual machine in the kernel is very expensive
83 affect the determinism or predictability of your migration you will
84 still gain from the benefits of advanced pinning with RDMA.
92 $ migrate_set_parameter max-bandwidth 40g # or whatever is the MAX of your RDMA device
106 Here is a brief summary of total migration time and downtime using RDMA:
117 EFFECTS of memory registration on bulk phase round:
119 For example, in the same 8GB RAM example with all 8GB of memory in
126 These numbers would of course scale up to whatever size virtual machine
130 migration *downtime*. This is because, without this feature, all of the
140 1. The transmission of the pages using RDMA
144 protocol now, consisting of infiniband SEND messages.
147 message used by applications of infiniband hardware.
156 1. registration of the memory that will be transmitted
158    sides of the network before the actual transmission
173 a control transport for migration of device state.
185 At this point, we define a control channel on top of SEND messages
191     * Length               (of the data portion, uint32, network byte order)
193     * Repeat               (Number of commands in data portion, same type only)
197 so that the protocol is compatible against multiple versions of QEMU.
198 Version #1 requires that all server implementations of the protocol must
199 check this field and register all requests found in the array of commands located
200 in the data portion and return an equal number of results in the response.
201 The maximum number of repeats is hard-coded to 4096. This is a conservative
202 limit based on the maximum size of a SEND message along with empirical
203 observations on the maximum future benefit of simultaneous page registrations.
220 portion an array of many commands of the same type. If there is more than
228 using the above list of values:
254    the requested command type of the header we were asked for.
261    of the connection (described below).
263 All of the remaining command types (not including 'ready')
268    a description of each RAMBlock on the server side as well as the virtual addresses
269    and lengths of each RAMBlock. This is used by the client to determine the
270    start and stop locations of chunks and how to register them dynamically
272 2. During runtime, once a 'chunk' becomes full of pages ready to
275    with the result (rkey) of the registration.
288 Current version of the protocol is version #1.
300     * Flags   (bitwise OR of each capability),
303 There is no data portion of this header right now, so there is
304 no length field. The maximum size of the 'private data' section
310 transmit a few bytes of version information.
331 QEMUFileRDMA introduces a couple of new functions:
338 users of QEMUFile that depend on a bytestream abstraction.
350 Then, we return the number of bytes requested by get_buffer()
358 Migration of VM's ram:
361 At the beginning of the migration, (migration-rdma.c),
362 the sender and the receiver populate the list of RAMBlocks
365 description of these blocks with each other, to be used later
366 during the iteration of main memory. This description includes
367 a list of all the RAMBlocks, their offsets and lengths, virtual
386 for the completion of *every* chunk. The current batch size
387 is about 64 chunks (corresponding to 64 MB of memory).
396 link (one of 4 choices). This is the mode in which
413 2. Use of the recent /proc/<pid>/pagemap would likely speed up
414    the use of KSM and ballooning while using RDMA.
415 3. Also, some form of balloon-device usage tracking would also
417 4. Use LRU to provide more fine-grained direction of UNREGISTER
419 5. Expose UNREGISTER support to the user by way of workload-specific