Lines Matching full:the

10   Licence: This work is licensed under the terms of the GNU GPL,
11 version 2 or later. See the COPYING file in the top-level
19 This protocol is aiming to complement the ``ioctl`` interface used to
20 control the vhost implementation in the Linux kernel. It implements
21 the control plane needed to establish virtqueue sharing with a user
22 space process on the same host. It uses communication over a Unix
23 domain socket to share file descriptors in the ancillary data of the
26 The protocol defines 2 sides of the communication, *front-end* and
27 *back-end*. The *front-end* is the application that shares its virtqueues, in
28 our case QEMU. The *back-end* is the consumer of the virtqueues.
30 In the current implementation QEMU is the *front-end*, and the *back-end*
31 is the external process consuming the virtio queues, for example a
35 implementations, it is recommended to follow the :ref:`Backend program
38 The *front-end* and *back-end* can be either a client (i.e. connecting) or
39 server (listening) in the socket communication.
45 is supported on any platform that provides the following features:
48 so it can be passed over a UNIX domain socket and then mapped by the
51 - AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
58 to the corresponding it. The 8-value itself has no meaning and
64 .. Note:: All numbers are in the machine native byte order.
75 :request: 32-bit type of the request
79 - Lower 2 bits are the version (currently 0x01)
80 - Bit 2 is the reply flag - needs to be sent on each reply from the back-end
81 - Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
84 :size: 32-bit size of the payload
89 Depending on the request type, **payload** can be:
118 :vring index: 32-bit index of the respective virtqueue
120 :index in avail ring: 32-bit value, of which currently only the lower 16
123 - Bits 0–15: Index of the next *Available Ring* descriptor that the
125 wrapped by the ring size.
135 :vring index: 32-bit index of the respective virtqueue
139 - Bits 0–14: Index of the next *Available Ring* descriptor that the
141 wrapped by the ring size.
143 - Bits 16–30: Index of the entry in the *Used Ring* where the back-end
144 will place the next descriptor. This is a free-running index that
145 is not wrapped by the ring size.
159 :descriptor: a 64-bit ring address of the vring descriptor table
161 :used: a 64-bit ring address of the vring used ring
163 :available: a 64-bit ring address of the vring available ring
179 :guest address: a 64-bit guest address of the region
185 :mmap offset: a 64-bit offset where region starts in the mapped memory
187 When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been
188 successfully negotiated, the memory region description contains two extra
189 fields at the end.
199 - Bit 8 is set if the memory region can not be mapped in advance, and memory
200 areas within this region must be mapped / unmapped only when required by the
201 back-end. The back-end shouldn't try to map the entire region at once, as the
202 front-end may not allow it. The back-end should rather map only the required
251 :iova: a 64-bit I/O virtual address programmed by the guest
284 :payload: Size bytes array holding the contents of the virtio
298 :offset: a 64-bit offset of this area from the start of the
310 :mmap offset: a 64-bit offset of this area from the start
311 of the supplied file descriptor
334 :transfer direction: a 32-bit enum, describing the direction in which
335 the state is transferred:
337 - 0: Save: Transfer the state from the back-end to the front-end,
338 which happens on the source side of migration
339 - 1: Load: Transfer the state from the front-end to the back-end,
340 which happens on the destination side of migration
342 :migration phase: a 32-bit enum, describing the state in which the VM
345 - 0: Stopped (in the period after the transfer of memory-mapped
346 regions before switch-over to the destination): The VM guest is
347 stopped, and the vhost-user device is suspended (see
350 In the future, additional phases might be added e.g. to allow
351 iterative migration while the device is running.
356 In QEMU the vhost-user message is implemented with the following struct:
380 The protocol for vhost-user is based on the existing implementation of
381 vhost for the Linux Kernel. Most messages that can be sent via the
383 the kernel implementation.
385 The communication consists of the *front-end* sending message requests and
386 the *back-end* sending message replies. Most of the requests don't require
387 replies, except for the following requests:
398 The section on ``REPLY_ACK`` protocol extension.
400 There are several messages that the front-end sends with file descriptors passed
401 in the ancillary data:
414 If *front-end* is unable to send the full message or receives a wrong
415 reply it will close the connection. An optional reconnection mechanism
419 close the connection. This should only happen in exceptional circumstances.
428 Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature
434 This reserved feature bit was reused by the vhost-user protocol to add
445 * While a ring is stopped, the back-end must not process the ring at
446 all, regardless of whether it is enabled or disabled. The
448 into effect once the ring is started.
450 * started and disabled: The back-end must process the ring without
452 in the disabled state the back-end must not supply any new RX packets,
455 * started and enabled: The back-end must process the ring normally, i.e.
458 Each ring is initialized in a stopped and disabled state. The back-end
460 descriptor is readable) on the descriptor specified by
461 ``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message
468 the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the
471 While processing the rings (whether they are enabled or not), the back-end
472 must support changing some configuration aspects on the fly.
479 While all vrings are stopped, the device is *suspended*. In addition to
480 not processing any vring (because they are stopped), the device must:
483 * not send any notifications to the guest,
484 * not send any messages to the front-end,
485 * still process and reply to messages from the front-end.
490 Many devices have a fixed number of virtqueues. In this case the front-end
491 already knows the number of available virtqueues without communicating with the
494 Some devices do not have a fixed number of virtqueues. Instead the maximum
495 number of virtqueues is chosen by the back-end. The number can depend on host
499 Multiple queue support allows the back-end to advertise the maximum number of
500 queues. This is treated as a protocol extension, hence the back-end has to
501 implement protocol features first. The multiple queues feature is supported
502 only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
504 The max number of queues the back-end supports can be queried with message
505 ``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested
508 As all queues share one connection, the front-end uses a unique index for each
509 queue in the sent message to identify a specified queue.
511 The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
512 vhost-user-net has historically automatically enabled the first queue pair.
514 Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
518 Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
525 During live migration, the front-end may need to track the modifications
526 the back-end makes to the memory mapped regions. The front-end should mark
527 the dirty pages in a log. Once it complies to this logging, it may
528 declare the ``VHOST_F_LOG_ALL`` vhost feature.
530 To start/stop logging of data/used ring writes, the front-end may send
535 All the modifications to memory pointed by vring "descriptor" should
543 The log memory fd is provided in the ancillary data of
544 ``VHOST_USER_SET_LOG_BASE`` message when the back-end has
547 The size of the log is supplied as part of ``VhostUserMsg`` which
549 at the supplied offset in the supplied file descriptor. The log
550 covers from address 0 to the maximum of guest regions. In pseudo-code,
556 Where ``addr`` is the guest physical address.
558 Use atomic operations, as the log may be concurrently manipulated.
560 Note that when logging modifications to the used ring (when
562 be used to calculate the log offset: the write to first byte of the
564 value might be outside the legal guest physical address range
565 (i.e. does not have to be covered by the ``VhostUserMemory`` table), but
566 the bit offset of the last byte of the ring must fall within the size
570 ancillary data, it may be used to inform the front-end that the log has
573 Once the source has finished migration, rings will be stopped by the
577 In postcopy migration the back-end is started before all the memory has
578 been received from the source host, and care must be taken to avoid
579 accessing pages that have yet to be received. The back-end opens a
580 'userfault'-fd and registers the memory with it; this fd is then
581 passed back over to the front-end. The front-end services requests on the
582 userfaultfd for pages that are accessed and when the page is available
583 it performs WAKE ioctl's on the userfaultfd to wake the stalled
584 back-end. The front-end indicates support for this via the
592 Migrating device state involves transferring the state from one
593 back-end, called the source, to another back-end, called the
594 destination. After migration, the destination transparently resumes
595 operation without requiring the driver to re-initialize the device at
596 the VIRTIO level. If the migration fails, then the source can
599 Generally, the front-end is connected to a virtual machine guest (which
600 contains the driver), which has its own state to transfer between source
602 mechanism to do so. The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature
603 provides functionality to have the front-end include the back-end's
604 state in this transfer operation so the back-end does not need to
605 implement its own mechanism, and so the virtual machine may have its
609 To do this, the back-end state is transferred from back-end to front-end
610 on the source side, and vice versa on the destination side. This
611 transfer happens over a channel that is negotiated using the
615 * Direction of transfer: On the source, the data is saved, transferring
616 it from the back-end to the front-end. On the destination, the data
617 is loaded, transferring it from the front-end to the back-end.
619 * Migration phase: Currently, the only supported phase is the period
620 after the transfer of memory-mapped regions before switch-over to the
621 destination, when both the source and destination devices are
623 In the future, additional phases might be supported to allow iterative
624 migration while the device is running.
626 The nature of the channel is implementation-defined, but it must
627 generally behave like a pipe: The writing end will write all the data it
628 has into it, signalling the end of data by closing its end. The reading
629 end must read all of this data (until encountering the end of file) and
632 * When saving, the writing end is the source back-end, and the reading
633 end is the source front-end. After reading the state data from the
634 channel, the source front-end must transfer it to the destination
637 * When loading, the writing end is the destination front-end, and the
638 reading end is the destination back-end. After reading the state data
639 from the channel, the destination back-end must deserialize its
640 internal state from that data and set itself up to allow the driver to
641 seamlessly resume operation on the VIRTIO level.
643 Seamlessly resuming operation means that the migration must be
644 transparent to the guest driver, which operates on the VIRTIO level.
646 to use the device as if no migration had occurred. The vhost-user
647 front-end, however, will re-initialize the vhost state on the
648 destination, following the usual protocol for establishing a connection
651 features, or setting the initial vring base indices (to the same value
652 as on the source side, so that operation can resume).
654 Both on the source and on the destination side, after the respective
655 front-end has seen all data transferred (when the transfer FD has been
656 closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to
657 verify that data transfer was successful in the back-end, too. The
658 back-end responds once it knows whether the transfer and processing was
664 The front-end sends a list of vhost memory regions to the back-end using the
669 within the shared memory. The mapping of these addresses works as follows.
671 User addresses map to the vhost memory region containing that user address.
673 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
675 * Guest addresses map to the vhost memory region containing that guest
678 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
681 translated to user addresses via the IOTLB.
683 * The vhost memory region guest address is not used.
688 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the
690 ``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct
691 vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload
692 has to be filled with the update message type (2), the I/O virtual
693 address, the size, the user virtual address, and the permissions
695 the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the
696 ``iotlb`` payload has to be filled with the invalidation message type
697 (3), the I/O virtual address and the size. On success, the back-end is
700 The back-end relies on the back-end communication channel (see :ref:`Back-end
703 requests to the front-end with a ``struct vhost_iotlb_msg`` as
704 payload. For miss events, the iotlb payload has to be filled with the
705 miss message type (1), the I/O virtual address and the permissions
706 flags. For access failure event, the iotlb payload has to be filled
707 with the access failure message type (4), the I/O virtual address and
708 the permissions flags. For synchronization purpose, the back-end may
709 rely on the reply-ack feature, so the front-end may send a reply when
710 operation is completed if the reply-ack feature is negotiated and
712 either front-end sent an update message containing the IOTLB entry
714 the IOTLB miss message is invalid (invalid IOVA or permission).
716 The front-end isn't expected to take the initiative to send IOTLB update
717 messages, as the back-end sends IOTLB miss messages for the guest virtual
725 An optional communication channel is provided if the back-end declares
726 ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the
727 back-end to make requests to the front-end.
729 The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data.
731 A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end
744 easily achieve that by getting the inflight descriptors from
747 out-of-order because some entries which store the information of
750 this problem, the back-end need to allocate an extra buffer to store this
754 between front-end and back-end. And the format of this buffer is described
761 N is the number of available virtqueues. The back-end could get it from num
776 /* Maintain a list for the last batch of used descriptors.
780 /* Used to preserve the order of fetching available descriptors.
786 /* The feature flags of this region. Now it's initialized to 0. */
789 /* The version of this region. It's 1 currently.
793 /* The size of DescStateSplit array. It's equal to the virtqueue size.
794 * The back-end could get it from queue size field of VhostUserInflight. */
797 /* The head of list that track the last batch of used descriptors. */
800 /* Store the idx value of used ring */
803 /* Used to track the state of each descriptor in descriptor table */
807 To track inflight I/O, the queue region should be processed as follows:
809 When receiving available buffers from the driver:
811 #. Get the next available head-descriptor index from available ring, ``i``
813 #. Set ``desc[i].counter`` to the value of global counter
819 When supplying used buffers to the driver:
829 #. Increase the ``idx`` value of used ring by the size of the batch
831 #. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0
833 #. Set ``used_idx`` to the ``idx`` value of used ring
837 #. If the value of ``used_idx`` does not match the ``idx`` value of
838 used ring (means the inflight field of ``DescStateSplit`` entries in
841 a. Subtract the value of ``used_idx`` from the ``idx`` value of
844 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch
847 #. Set ``used_idx`` to the ``idx`` value of used ring
864 /* Link to the next free entry */
867 /* Link to the last entry of descriptor list.
871 /* The length of descriptor list.
875 /* Used to preserve the order of fetching available descriptors.
879 /* The buffer id */
882 /* The descriptor flags */
885 /* The buffer length */
888 /* The buffer address */
893 /* The feature flags of this region. Now it's initialized to 0. */
896 /* The version of this region. It's 1 currently.
900 /* The size of DescStatePacked array. It's equal to the virtqueue size.
901 * The back-end could get it from queue size field of VhostUserInflight. */
904 /* The head of free DescStatePacked entry list */
907 /* The old head of free DescStatePacked entry list */
910 /* The used index of descriptor ring */
913 /* The old used index of descriptor ring */
919 /* The old device ring wrap counter */
925 /* Used to track the state of each descriptor fetched from descriptor ring */
929 To track inflight I/O, the queue region should be processed as follows:
931 When receiving available buffers from the driver:
933 #. Get the next available descriptor entry from descriptor ring, ``d``
939 #. Set ``desc[old_free_head].counter`` to the value of global counter
958 When supplying used buffers to the driver:
967 4. Set ``free_head`` to the index of ``e``
971 #. Increase ``used_idx`` by the size of the batch and update
976 #. Set the ``inflight`` field of each head ``DescStatePacked`` entry
977 in the batch to 0
984 #. If ``used_idx`` does not match ``old_used_idx`` (means the
988 a. Get the next descriptor ring entry through ``old_used_idx``, ``d``
990 #. Use ``old_used_wrap_counter`` to calculate the available flags
992 #. If ``d.flags`` is not equal to the calculated flags value (means
993 back-end has submitted the buffer to guest driver before crash, so
994 it has to commit the in-progress update), set ``old_free_head``,
1002 #. Set the ``inflight`` field of each ``DescStatePacked`` entry in
1012 have the kick, call and error (if used) signals done via in-band
1014 done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS``
1017 Note that due to the fact that too many messages on the sockets can
1018 cause the sending application(s) to block, it is not advised to use
1022 the former is necessary for getting a message channel from the back-end
1023 to the front-end, while the latter needs to be used with the in-band
1025 blocking later and for proper processing (at least in the simulation
1026 use case.) As it has no other way of signalling this error, the back-end
1027 should close the connection as a response to a
1028 ``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
1029 notifications feature flag without the other two.
1066 Get from the underlying vhost implementation the features bitmask.
1077 Enable features in the underlying vhost implementation using a
1088 Get the protocol feature bitmask from the underlying vhost
1105 Enable protocol features in the underlying vhost implementation.
1121 Issued when a new connection is established. It marks the sender
1122 as the front-end that owns of the session. This can be used on the *back-end*
1144 Sets the memory map regions on the back-end so it can translate the
1145 vring addresses. In the ancillary data there is an array of file
1146 descriptors for each memory mapped region. The size and ordering of
1147 the fds matches the number and ordering of memory regions.
1150 ``SET_MEM_TABLE`` replies with the bases of the memory mapped
1151 regions to the front-end. The back-end must have mmap'd the regions but
1157 reply back to the list of mappings with an empty
1159 reception of this message may the guest start accessing the memory
1170 When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
1171 the log memory fd is provided in the ancillary data of
1172 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared
1173 memory area provided in the message.
1181 Sets the logging file descriptor, which is passed as ancillary data.
1189 Set the size of the queue.
1197 Sets the addresses of the different aspects of the vring.
1205 Sets the next index to use for descriptors in this vring:
1207 * For a split virtqueue, sets only the next descriptor index to
1208 process in the *Available Ring*. The device is supposed to read the
1209 next index in the *Used Ring* from the respective vring structure in
1215 Consequently, the payload type is specific to the type of virt queue
1225 Stops the vring and returns the current descriptor index or indices:
1227 * For a split virtqueue, returns only the 16-bit next descriptor
1228 index to process in the *Available Ring*. Note that this may
1229 differ from the available ring index in the vring structure in
1230 memory, which points to where the driver will put new available
1231 descriptors. For the *Used Ring*, the device only needs the next
1232 descriptor index at which to put new descriptors, which is the
1233 value in the vring structure in memory, so this value is not
1237 read from memory, so both indices (as maintained by the device) are
1240 Consequently, the payload type is specific to the type of virt queue
1248 The request payload's *num* field is currently reserved and must be
1257 Set the event file descriptor for adding buffers to the vring. It is
1258 passed in the ancillary data.
1260 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1262 in the ancillary data. This signals that polling should be used
1263 instead of waiting for the kick. Note that if the protocol feature
1265 this message isn't necessary as the ring is also started on the
1267 set an event file descriptor (which will be preferred over the
1276 Set the event file descriptor to signal when buffers are used. It is
1277 passed in the ancillary data.
1279 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1281 in the ancillary data. This signals that polling will be used
1282 instead of waiting for the call. Note that if the protocol features
1285 isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be
1295 Set the event file descriptor to signal when error occurs. It is
1296 passed in the ancillary data.
1298 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1300 in the ancillary data. Note that if the protocol features
1303 isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be
1305 (which will be preferred over the message).
1313 Query how many queues the back-end supports.
1325 Signal the back-end to enable or disable corresponding vring.
1336 Ask vhost user back-end to broadcast a fake RARP to notify the migration
1342 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the
1343 payload contain the mac address of the guest to allow the vhost user
1344 back-end to construct and broadcast the fake RARP.
1352 Set host MTU value exposed to the guest.
1360 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1361 respond with zero in case the specified MTU is valid, or non-zero
1370 Set the socket file descriptor for back-end initiated requests. It is passed
1371 in the ancillary data.
1377 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1388 The front-end sends such requests to update and invalidate entries in the
1389 device IOTLB. The back-end has to acknowledge the request with sending
1401 Set the endianness of a VQ for legacy devices. Little-endian is
1409 configuration (ie. before the front-end starts the VQ).
1418 submitted by the vhost-user front-end to fetch the contents of the
1420 MUST match the front-end's request, vhost-user back-end uses zero length of
1421 payload to indicate an error to the vhost-user front-end. The vhost-user
1422 front-end may cache the contents to avoid repeated
1432 submitted by the vhost-user front-end when the Guest changes the virtio
1434 on the destination host. The vhost-user back-end must check the flags
1436 configuration space fields unless the live migration bit is set.
1444 Create a session for crypto operation. The back-end must return
1445 the session id, 0 or positive for success, negative for failure.
1470 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end
1472 the back-end must open a userfaultfd for later use. Note that at this
1473 stage the migration is still in precopy mode.
1480 The front-end advises back-end that a transition to postcopy mode has
1481 happened. The back-end must ensure that shared memory is registered
1492 The front-end advises that postcopy migration has now completed. The back-end
1493 must disable the userfaultfd. The reply is an acknowledgement
1497 is sent at the end of the migration, after
1500 The value returned is an error indication; 0 is success.
1509 been successfully negotiated, this message is submitted by the front-end to
1510 get a shared buffer from back-end. The shared buffer will be used to
1521 been successfully negotiated, this message is submitted by the front-end to
1522 send the shared inflight buffer back to the back-end so that the back-end
1531 Sets the GPU protocol socket file descriptor, which is passed as
1532 ancillary data. The GPU protocol is used to inform the front-end of
1541 Ask the vhost user back-end to disable all rings and reset all
1542 internal device state to the initial state, ready to be
1543 reinitialized. The back-end retains ownership of the device
1544 throughout the reset operation.
1546 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
1547 feature is set by the back-end.
1555 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1557 submitted by the front-end to indicate that a buffer was added to
1558 the vring instead of signalling it using the vring's kick file
1559 descriptor or having the back-end rely on polling.
1561 The state.num field is currently reserved and must be set to 0.
1569 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1571 by the front-end to the back-end. The back-end should return the message with a
1572 u64 payload containing the maximum number of memory slots for
1573 QEMU to expose to the guest. The value returned by the back-end
1574 will be capped at the maximum number of ram slots which can be
1575 supported by the target platform.
1583 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1585 by the front-end to the back-end. The message payload contains a memory
1587 the back-end device must map in. When the
1589 been successfully negotiated, along with the
1591 update the memory tables of the back-end device.
1593 Exactly one file descriptor from which the memory is mapped is
1594 passed in the ancillary data.
1596 In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end
1597 replies with the bases of the memory mapped region to the front-end.
1607 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1609 by the front-end to the back-end. The message payload contains a memory
1611 the back-end device must unmap. When the
1613 been successfully negotiated, along with the
1615 update the memory tables of the back-end device.
1617 The memory region to be removed is identified by its guest address,
1618 user address and size. The mmap offset is ignored.
1620 No file descriptors SHOULD be passed in the ancillary data. For
1621 compatibility with existing incorrect implementations, the back-end MAY
1623 passed, the back-end MUST close it without using it otherwise.
1631 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1632 successfully negotiated, this message is submitted by the front-end to
1633 notify the back-end with updated device status as defined in the Virtio
1642 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1643 successfully negotiated, this message is submitted by the front-end to
1644 query the back-end for its device status as defined in the Virtio
1653 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1654 feature has been successfully negotiated, and the UUID is found
1655 in the exporters cache, this message is submitted by the front-end
1657 the requested UUID. Back-end will reply passing the fd when the operation
1666 Front-end and back-end negotiate a channel over which to transfer the
1668 back-end) may create the channel. The nature of this channel is not
1670 must create a file descriptor that is provided to the respectively
1671 other side, allowing access to the channel. This FD must behave as
1674 * For the writing end, it must allow writing the whole back-end state
1675 sequentially. Closing the file descriptor signals the end of
1678 * For the reading end, it must allow reading the whole back-end state
1679 sequentially. The end of file signals the end of the transfer.
1681 For example, the channel may be a pipe, in which case the two ends of
1682 the pipe fulfill these requirements respectively.
1684 Initially, the front-end creates a channel along with such an FD. It
1685 passes the FD to the back-end as ancillary data of a
1686 ``VHOST_USER_SET_DEVICE_STATE_FD`` message. The back-end may create a
1687 different transfer channel, passing the respective FD back to the
1688 front-end as ancillary data of the reply. If so, the front-end must
1689 then discard its channel and use the one provided by the back-end.
1691 Whether the back-end should decide to use its own channel is decided
1692 based on efficiency: If the channel is a pipe, both ends will most
1695 zero-copy, is considered more efficient and thus preferred. If the
1698 The request payload contains parameters for the subsequent data
1699 transfer, as described in the :ref:`Migrating back-end state
1702 The value returned is both an indication for success, and whether a
1704 are 0 on success, and non-zero on error. Bit 8 is the invalid FD
1706 When this flag is not set, the front-end must use the returned file
1707 descriptor as its end of the transfer channel. The back-end must not
1710 Using this function requires prior negotiation of the
1719 After transferring the back-end's internal state during migration (see
1720 the :ref:`Migrating back-end state <migrating_backend_state>`
1721 section), check whether the back-end was able to successfully fully
1722 process the state.
1724 The value returned indicates success or error; 0 is success, any
1727 Using this function requires prior negotiation of the
1733 For this type of message, the request is sent by the back-end and the reply
1734 is sent by the front-end.
1743 The back-end sends such requests to notify of an IOTLB miss, or an IOTLB
1745 negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end
1758 back-end sends such messages to notify that the virtio device's
1761 message to the back-end to get the latest content. If
1762 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the
1763 ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when
1772 Sets host notifier for a specified queue. The queue index is
1773 contained in the ``u64`` field of the vring area description. The
1774 host notifier is described by the file descriptor (typically it's a
1775 VFIO device fd) which is passed as ancillary data and the size
1776 (which is mmap size and should be the same as host page size) and
1777 offset (which is mmap offset) carried in the vring area
1778 description. QEMU can mmap the file descriptor based on the size and
1780 mapping this memory range to the VM as the specified queue's notify
1781 MMIO region. The back-end sends this request to tell QEMU to de-register
1782 the existing notifier if any and register the new notifier if the
1795 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1797 submitted by the back-end to indicate that a buffer was used from
1798 the vring instead of signalling this using the vring's call file
1799 descriptor or having the front-end relying on polling.
1801 The state.num field is currently reserved and must be set to 0.
1809 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1811 submitted by the back-end to indicate that an error occurred on the
1812 specific vring, instead of signalling the error file descriptor
1813 set by the front-end via ``VHOST_USER_SET_VRING_ERR``.
1815 The state.num field is currently reserved and must be set to 0.
1823 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1825 by the backends to add themselves as exporters to the virtio shared lookup
1826 table. The back-end device gets associated with a UUID in the shared table.
1827 The back-end is responsible of keeping its own table with exported dma-buf fds.
1828 When another back-end tries to import the resource associated with the UUID,
1829 it will send a message to the front-end, which will act as a proxy to the
1831 the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must
1841 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1843 by the backend to remove themselves from to the virtio-dmabuf shared
1844 table API. Only the back-end owning the entry (i.e., the one that first added
1845 it) will have permission to remove it. Otherwise, the message is ignored.
1846 The shared table will remove the back-end device associated with
1847 the UUID. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the
1848 back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond
1857 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1859 by the backends to retrieve a given dma-buf fd from the virtio-dmabuf
1860 shared table given a UUID. Frontend will reply passing the fd and a zero
1861 when the operation is successful, or non-zero otherwise. Note that if the
1862 operation fails, no fd is sent to the backend.
1869 The original vhost-user specification only demands replies for certain
1870 commands. This differs from the vhost protocol implementation where
1871 commands are sent over an ``ioctl()`` call and block until the back-end
1874 With this protocol extension negotiated, the sender (QEMU) can set the
1875 ``need_reply`` [Bit 3] flag to any command. This indicates that the
1877 or failure. The payload should be set to zero on success or non-zero
1878 on failure, unless the message already has an explicit reply body.
1880 The reply payload gives QEMU a deterministic indication of the result
1881 of the command. Today, QEMU is expected to terminate the main vhost-user
1885 For the message types that already solicit a reply from the back-end,
1886 the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit
1887 being set brings no behavioural change. (See the Communication_
1896 need to be configured manually depending on the use case. However, it
1897 is a good idea to follow the conventions listed here when
1899 behaviour to avoid heterogeneous configuration and management of the
1903 JSON file that conforms to the vhost-user.json schema. Each file
1904 informs the management applications about the back-end type, and binary
1906 picking the highest priority back-end when multiple match the search
1907 criteria (see ``@VhostUserBackend`` documentation in the schema file).
1909 If the back-end is not capable of enabling a requested feature on the
1910 host (such as 3D acceleration with virgl), or the initialization
1911 failed, the back-end should fail to start early and exit with a status
1914 The back-end program must not daemonize itself, but it may be
1915 daemonized by the management layer. It may also have a restricted
1916 access to the system.
1920 by the management layer, or to a log handler).
1922 The back-end program must end (as quickly and cleanly as possible) when
1923 the SIGTERM signal is received. Eventually, it may receive SIGKILL by
1924 the management layer after a few seconds.
1926 The following command line options have an expected behaviour. They
1931 This option specify the location of the vhost-user Unix domain socket.
1936 When this argument is given, the back-end program is started with the
1942 Output to stdout the back-end capabilities in JSON format, and then
1944 the back-end program should not perform its normal function. The
1945 capabilities can be reported dynamically depending on the host
1948 The JSON output is described in the ``vhost-user.json`` schema, by
1968 Specify the linux input device.
1974 Do no request exclusive access to the input device.
1985 Specify the GPU DRM render node.