xref: /qemu/docs/interop/vfio-user.rst (revision da198e8f0f99cd8539f3072ad2071f9dc01680d6)
1*da198e8fSThanos Makatos.. include:: <isonum.txt>
2*da198e8fSThanos Makatos.. SPDX-License-Identifier: GPL-2.0-or-later
3*da198e8fSThanos Makatos
4*da198e8fSThanos Makatos================================
5*da198e8fSThanos Makatosvfio-user Protocol Specification
6*da198e8fSThanos Makatos================================
7*da198e8fSThanos Makatos
8*da198e8fSThanos Makatos.. contents:: Table of Contents
9*da198e8fSThanos Makatos
10*da198e8fSThanos MakatosIntroduction
11*da198e8fSThanos Makatos============
12*da198e8fSThanos Makatosvfio-user is a protocol that allows a device to be emulated in a separate
13*da198e8fSThanos Makatosprocess outside of a Virtual Machine Monitor (VMM). vfio-user devices consist
14*da198e8fSThanos Makatosof a generic VFIO device type, living inside the VMM, which we call the client,
15*da198e8fSThanos Makatosand the core device implementation, living outside the VMM, which we call the
16*da198e8fSThanos Makatosserver.
17*da198e8fSThanos Makatos
18*da198e8fSThanos MakatosThe vfio-user specification is partly based on the
19*da198e8fSThanos Makatos`Linux VFIO ioctl interface <https://www.kernel.org/doc/html/latest/driver-api/vfio.html>`_.
20*da198e8fSThanos Makatos
21*da198e8fSThanos MakatosVFIO is a mature and stable API, backed by an extensively used framework. The
22*da198e8fSThanos Makatosexisting VFIO client implementation in QEMU (``qemu/hw/vfio/``) can be largely
23*da198e8fSThanos Makatosre-used, though there is nothing in this specification that requires that
24*da198e8fSThanos Makatosparticular implementation. None of the VFIO kernel modules are required for
25*da198e8fSThanos Makatossupporting the protocol, on either the client or server side. Some source
26*da198e8fSThanos Makatosdefinitions in VFIO are re-used for vfio-user.
27*da198e8fSThanos Makatos
28*da198e8fSThanos MakatosThe main idea is to allow a virtual device to function in a separate process in
29*da198e8fSThanos Makatosthe same host over a UNIX domain socket. A UNIX domain socket (``AF_UNIX``) is
30*da198e8fSThanos Makatoschosen because file descriptors can be trivially sent over it, which in turn
31*da198e8fSThanos Makatosallows:
32*da198e8fSThanos Makatos
33*da198e8fSThanos Makatos* Sharing of client memory for DMA with the server.
34*da198e8fSThanos Makatos* Sharing of server memory with the client for fast MMIO.
35*da198e8fSThanos Makatos* Efficient sharing of eventfd's for triggering interrupts.
36*da198e8fSThanos Makatos
37*da198e8fSThanos MakatosOther socket types could be used which allow the server to run in a separate
38*da198e8fSThanos Makatosguest in the same host (``AF_VSOCK``) or remotely (``AF_INET``). Theoretically
39*da198e8fSThanos Makatosthe underlying transport does not necessarily have to be a socket, however we do
40*da198e8fSThanos Makatosnot examine such alternatives. In this protocol version we focus on using a UNIX
41*da198e8fSThanos Makatosdomain socket and introduce basic support for the other two types of sockets
42*da198e8fSThanos Makatoswithout considering performance implications.
43*da198e8fSThanos Makatos
44*da198e8fSThanos MakatosWhile passing of file descriptors is desirable for performance reasons, support
45*da198e8fSThanos Makatosis not necessary for either the client or the server in order to implement the
46*da198e8fSThanos Makatosprotocol. There is always an in-band, message-passing fall back mechanism.
47*da198e8fSThanos Makatos
48*da198e8fSThanos MakatosOverview
49*da198e8fSThanos Makatos========
50*da198e8fSThanos Makatos
51*da198e8fSThanos MakatosVFIO is a framework that allows a physical device to be securely passed through
52*da198e8fSThanos Makatosto a user space process; the device-specific kernel driver does not drive the
53*da198e8fSThanos Makatosdevice at all.  Typically, the user space process is a VMM and the device is
54*da198e8fSThanos Makatospassed through to it in order to achieve high performance. VFIO provides an API
55*da198e8fSThanos Makatosand the required functionality in the kernel. QEMU has adopted VFIO to allow a
56*da198e8fSThanos Makatosguest to directly access physical devices, instead of emulating them in
57*da198e8fSThanos Makatossoftware.
58*da198e8fSThanos Makatos
59*da198e8fSThanos Makatosvfio-user reuses the core VFIO concepts defined in its API, but implements them
60*da198e8fSThanos Makatosas messages to be sent over a socket. It does not change the kernel-based VFIO
61*da198e8fSThanos Makatosin any way, in fact none of the VFIO kernel modules need to be loaded to use
62*da198e8fSThanos Makatosvfio-user. It is also possible for the client to concurrently use the current
63*da198e8fSThanos Makatoskernel-based VFIO for one device, and vfio-user for another device.
64*da198e8fSThanos Makatos
65*da198e8fSThanos MakatosVFIO Device Model
66*da198e8fSThanos Makatos-----------------
67*da198e8fSThanos Makatos
68*da198e8fSThanos MakatosA device under VFIO presents a standard interface to the user process. Many of
69*da198e8fSThanos Makatosthe VFIO operations in the existing interface use the ``ioctl()`` system call, and
70*da198e8fSThanos Makatosreferences to the existing interface are called the ``ioctl()`` implementation in
71*da198e8fSThanos Makatosthis document.
72*da198e8fSThanos Makatos
73*da198e8fSThanos MakatosThe following sections describe the set of messages that implement the vfio-user
74*da198e8fSThanos Makatosinterface over a socket. In many cases, the messages are analogous to data
75*da198e8fSThanos Makatosstructures used in the ``ioctl()`` implementation. Messages derived from the
76*da198e8fSThanos Makatos``ioctl()`` will have a name derived from the ``ioctl()`` command name.  E.g., the
77*da198e8fSThanos Makatos``VFIO_DEVICE_GET_INFO`` ``ioctl()`` command becomes a
78*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_INFO`` message.  The purpose of this reuse is to share as
79*da198e8fSThanos Makatosmuch code as feasible with the ``ioctl()`` implementation``.
80*da198e8fSThanos Makatos
81*da198e8fSThanos MakatosConnection Initiation
82*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^^
83*da198e8fSThanos Makatos
84*da198e8fSThanos MakatosAfter the client connects to the server, the initial client message is
85*da198e8fSThanos Makatos``VFIO_USER_VERSION`` to propose a protocol version and set of capabilities to
86*da198e8fSThanos Makatosapply to the session. The server replies with a compatible version and set of
87*da198e8fSThanos Makatoscapabilities it supports, or closes the connection if it cannot support the
88*da198e8fSThanos Makatosadvertised version.
89*da198e8fSThanos Makatos
90*da198e8fSThanos MakatosDevice Information
91*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^
92*da198e8fSThanos Makatos
93*da198e8fSThanos MakatosThe client uses a ``VFIO_USER_DEVICE_GET_INFO`` message to query the server for
94*da198e8fSThanos Makatosinformation about the device. This information includes:
95*da198e8fSThanos Makatos
96*da198e8fSThanos Makatos* The device type and whether it supports reset (``VFIO_DEVICE_FLAGS_``),
97*da198e8fSThanos Makatos* the number of device regions, and
98*da198e8fSThanos Makatos* the device presents to the client the number of interrupt types the device
99*da198e8fSThanos Makatos  supports.
100*da198e8fSThanos Makatos
101*da198e8fSThanos MakatosRegion Information
102*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^
103*da198e8fSThanos Makatos
104*da198e8fSThanos MakatosThe client uses ``VFIO_USER_DEVICE_GET_REGION_INFO`` messages to query the
105*da198e8fSThanos Makatosserver for information about the device's regions. This information describes:
106*da198e8fSThanos Makatos
107*da198e8fSThanos Makatos* Read and write permissions, whether it can be memory mapped, and whether it
108*da198e8fSThanos Makatos  supports additional capabilities (``VFIO_REGION_INFO_CAP_``).
109*da198e8fSThanos Makatos* Region index, size, and offset.
110*da198e8fSThanos Makatos
111*da198e8fSThanos MakatosWhen a device region can be mapped by the client, the server provides a file
112*da198e8fSThanos Makatosdescriptor which the client can ``mmap()``. The server is responsible for
113*da198e8fSThanos Makatospolling for client updates to memory mapped regions.
114*da198e8fSThanos Makatos
115*da198e8fSThanos MakatosRegion Capabilities
116*da198e8fSThanos Makatos"""""""""""""""""""
117*da198e8fSThanos Makatos
118*da198e8fSThanos MakatosSome regions have additional capabilities that cannot be described adequately
119*da198e8fSThanos Makatosby the region info data structure. These capabilities are returned in the
120*da198e8fSThanos Makatosregion info reply in a list similar to PCI capabilities in a PCI device's
121*da198e8fSThanos Makatosconfiguration space.
122*da198e8fSThanos Makatos
123*da198e8fSThanos MakatosSparse Regions
124*da198e8fSThanos Makatos""""""""""""""
125*da198e8fSThanos MakatosA region can be memory-mappable in whole or in part. When only a subset of a
126*da198e8fSThanos Makatosregion can be mapped by the client, a ``VFIO_REGION_INFO_CAP_SPARSE_MMAP``
127*da198e8fSThanos Makatoscapability is included in the region info reply. This capability describes
128*da198e8fSThanos Makatoswhich portions can be mapped by the client.
129*da198e8fSThanos Makatos
130*da198e8fSThanos Makatos.. Note::
131*da198e8fSThanos Makatos   For example, in a virtual NVMe controller, sparse regions can be used so
132*da198e8fSThanos Makatos   that accesses to the NVMe registers (found in the beginning of BAR0) are
133*da198e8fSThanos Makatos   trapped (an infrequent event), while allowing direct access to the doorbells
134*da198e8fSThanos Makatos   (an extremely frequent event as every I/O submission requires a write to
135*da198e8fSThanos Makatos   BAR0), found in the next page after the NVMe registers in BAR0.
136*da198e8fSThanos Makatos
137*da198e8fSThanos MakatosDevice-Specific Regions
138*da198e8fSThanos Makatos"""""""""""""""""""""""
139*da198e8fSThanos Makatos
140*da198e8fSThanos MakatosA device can define regions additional to the standard ones (e.g. PCI indexes
141*da198e8fSThanos Makatos0-8). This is achieved by including a ``VFIO_REGION_INFO_CAP_TYPE`` capability
142*da198e8fSThanos Makatosin the region info reply of a device-specific region. Such regions are reflected
143*da198e8fSThanos Makatosin ``struct vfio_user_device_info.num_regions``. Thus, for PCI devices this
144*da198e8fSThanos Makatosvalue can be equal to, or higher than, ``VFIO_PCI_NUM_REGIONS``.
145*da198e8fSThanos Makatos
146*da198e8fSThanos MakatosRegion I/O via file descriptors
147*da198e8fSThanos Makatos-------------------------------
148*da198e8fSThanos Makatos
149*da198e8fSThanos MakatosFor unmapped regions, region I/O from the client is done via
150*da198e8fSThanos Makatos``VFIO_USER_REGION_READ/WRITE``.  As an optimization, ioeventfds or ioregionfds
151*da198e8fSThanos Makatosmay be configured for sub-regions of some regions. A client may request
152*da198e8fSThanos Makatosinformation on these sub-regions via ``VFIO_USER_DEVICE_GET_REGION_IO_FDS``; by
153*da198e8fSThanos Makatosconfiguring the returned file descriptors as ioeventfds or ioregionfds, the
154*da198e8fSThanos Makatosserver can be directly notified of I/O (for example, by KVM) without taking a
155*da198e8fSThanos Makatostrip through the client.
156*da198e8fSThanos Makatos
157*da198e8fSThanos MakatosInterrupts
158*da198e8fSThanos Makatos^^^^^^^^^^
159*da198e8fSThanos Makatos
160*da198e8fSThanos MakatosThe client uses ``VFIO_USER_DEVICE_GET_IRQ_INFO`` messages to query the server
161*da198e8fSThanos Makatosfor the device's interrupt types. The interrupt types are specific to the bus
162*da198e8fSThanos Makatosthe device is attached to, and the client is expected to know the capabilities
163*da198e8fSThanos Makatosof each interrupt type. The server can signal an interrupt by directly injecting
164*da198e8fSThanos Makatosinterrupts into the guest via an event file descriptor. The client configures
165*da198e8fSThanos Makatoshow the server signals an interrupt with ``VFIO_USER_SET_IRQS`` messages.
166*da198e8fSThanos Makatos
167*da198e8fSThanos MakatosDevice Read and Write
168*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^^
169*da198e8fSThanos Makatos
170*da198e8fSThanos MakatosWhen the guest executes load or store operations to an unmapped device region,
171*da198e8fSThanos Makatosthe client forwards these operations to the server with
172*da198e8fSThanos Makatos``VFIO_USER_REGION_READ`` or ``VFIO_USER_REGION_WRITE`` messages. The server
173*da198e8fSThanos Makatoswill reply with data from the device on read operations or an acknowledgement on
174*da198e8fSThanos Makatoswrite operations. See `Read and Write Operations`_.
175*da198e8fSThanos Makatos
176*da198e8fSThanos MakatosClient memory access
177*da198e8fSThanos Makatos--------------------
178*da198e8fSThanos Makatos
179*da198e8fSThanos MakatosThe client uses ``VFIO_USER_DMA_MAP`` and ``VFIO_USER_DMA_UNMAP`` messages to
180*da198e8fSThanos Makatosinform the server of the valid DMA ranges that the server can access on behalf
181*da198e8fSThanos Makatosof a device (typically, VM guest memory). DMA memory may be accessed by the
182*da198e8fSThanos Makatosserver via ``VFIO_USER_DMA_READ`` and ``VFIO_USER_DMA_WRITE`` messages over the
183*da198e8fSThanos Makatossocket. In this case, the "DMA" part of the naming is a misnomer.
184*da198e8fSThanos Makatos
185*da198e8fSThanos MakatosActual direct memory access of client memory from the server is possible if the
186*da198e8fSThanos Makatosclient provides file descriptors the server can ``mmap()``. Note that ``mmap()``
187*da198e8fSThanos Makatosprivileges cannot be revoked by the client, therefore file descriptors should
188*da198e8fSThanos Makatosonly be exported in environments where the client trusts the server not to
189*da198e8fSThanos Makatoscorrupt guest memory.
190*da198e8fSThanos Makatos
191*da198e8fSThanos MakatosSee `Read and Write Operations`_.
192*da198e8fSThanos Makatos
193*da198e8fSThanos MakatosClient/server interactions
194*da198e8fSThanos Makatos==========================
195*da198e8fSThanos Makatos
196*da198e8fSThanos MakatosSocket
197*da198e8fSThanos Makatos------
198*da198e8fSThanos Makatos
199*da198e8fSThanos MakatosA server can serve:
200*da198e8fSThanos Makatos
201*da198e8fSThanos Makatos1) one or more clients, and/or
202*da198e8fSThanos Makatos2) one or more virtual devices, belonging to one or more clients.
203*da198e8fSThanos Makatos
204*da198e8fSThanos MakatosThe current protocol specification requires a dedicated socket per
205*da198e8fSThanos Makatosclient/server connection. It is a server-side implementation detail whether a
206*da198e8fSThanos Makatossingle server handles multiple virtual devices from the same or multiple
207*da198e8fSThanos Makatosclients. The location of the socket is implementation-specific. Multiplexing
208*da198e8fSThanos Makatosclients, devices, and servers over the same socket is not supported in this
209*da198e8fSThanos Makatosversion of the protocol.
210*da198e8fSThanos Makatos
211*da198e8fSThanos MakatosAuthentication
212*da198e8fSThanos Makatos--------------
213*da198e8fSThanos Makatos
214*da198e8fSThanos MakatosFor ``AF_UNIX``, we rely on OS mandatory access controls on the socket files,
215*da198e8fSThanos Makatostherefore it is up to the management layer to set up the socket as required.
216*da198e8fSThanos MakatosSocket types that span guests or hosts will require a proper authentication
217*da198e8fSThanos Makatosmechanism. Defining that mechanism is deferred to a future version of the
218*da198e8fSThanos Makatosprotocol.
219*da198e8fSThanos Makatos
220*da198e8fSThanos MakatosCommand Concurrency
221*da198e8fSThanos Makatos-------------------
222*da198e8fSThanos Makatos
223*da198e8fSThanos MakatosA client may pipeline multiple commands without waiting for previous command
224*da198e8fSThanos Makatosreplies.  The server will process commands in the order they are received.  A
225*da198e8fSThanos Makatosconsequence of this is if a client issues a command with the *No_reply* bit,
226*da198e8fSThanos Makatosthen subsequently issues a command without *No_reply*, the older command will
227*da198e8fSThanos Makatoshave been processed before the reply to the younger command is sent by the
228*da198e8fSThanos Makatosserver.  The client must be aware of the device's capability to process
229*da198e8fSThanos Makatosconcurrent commands if pipelining is used.  For example, pipelining allows
230*da198e8fSThanos Makatosmultiple client threads to concurrently access device regions; the client must
231*da198e8fSThanos Makatosensure these accesses obey device semantics.
232*da198e8fSThanos Makatos
233*da198e8fSThanos MakatosAn example is a frame buffer device, where the device may allow concurrent
234*da198e8fSThanos Makatosaccess to different areas of video memory, but may have indeterminate behavior
235*da198e8fSThanos Makatosif concurrent accesses are performed to command or status registers.
236*da198e8fSThanos Makatos
237*da198e8fSThanos MakatosNote that unrelated messages sent from the server to the client can appear in
238*da198e8fSThanos Makatosbetween a client to server request/reply and vice versa.
239*da198e8fSThanos Makatos
240*da198e8fSThanos MakatosImplementers should be prepared for certain commands to exhibit potentially
241*da198e8fSThanos Makatosunbounded latencies.  For example, ``VFIO_USER_DEVICE_RESET`` may take an
242*da198e8fSThanos Makatosarbitrarily long time to complete; clients should take care not to block
243*da198e8fSThanos Makatosunnecessarily.
244*da198e8fSThanos Makatos
245*da198e8fSThanos MakatosSocket Disconnection Behavior
246*da198e8fSThanos Makatos-----------------------------
247*da198e8fSThanos MakatosThe server and the client can disconnect from each other, either intentionally
248*da198e8fSThanos Makatosor unexpectedly. Both the client and the server need to know how to handle such
249*da198e8fSThanos Makatosevents.
250*da198e8fSThanos Makatos
251*da198e8fSThanos MakatosServer Disconnection
252*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^
253*da198e8fSThanos MakatosA server disconnecting from the client may indicate that:
254*da198e8fSThanos Makatos
255*da198e8fSThanos Makatos1) A virtual device has been restarted, either intentionally (e.g. because of a
256*da198e8fSThanos Makatos   device update) or unintentionally (e.g. because of a crash).
257*da198e8fSThanos Makatos2) A virtual device has been shut down with no intention to be restarted.
258*da198e8fSThanos Makatos
259*da198e8fSThanos MakatosIt is impossible for the client to know whether or not a failure is
260*da198e8fSThanos Makatosintermittent or innocuous and should be retried, therefore the client should
261*da198e8fSThanos Makatosreset the VFIO device when it detects the socket has been disconnected.
262*da198e8fSThanos MakatosError recovery will be driven by the guest's device error handling
263*da198e8fSThanos Makatosbehavior.
264*da198e8fSThanos Makatos
265*da198e8fSThanos MakatosClient Disconnection
266*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^
267*da198e8fSThanos MakatosThe client disconnecting from the server primarily means that the client
268*da198e8fSThanos Makatoshas exited. Currently, this means that the guest is shut down so the device is
269*da198e8fSThanos Makatosno longer needed therefore the server can automatically exit. However, there
270*da198e8fSThanos Makatoscan be cases where a client disconnection should not result in a server exit:
271*da198e8fSThanos Makatos
272*da198e8fSThanos Makatos1) A single server serving multiple clients.
273*da198e8fSThanos Makatos2) A multi-process QEMU upgrading itself step by step, which is not yet
274*da198e8fSThanos Makatos   implemented.
275*da198e8fSThanos Makatos
276*da198e8fSThanos MakatosTherefore in order for the protocol to be forward compatible, the server should
277*da198e8fSThanos Makatosrespond to a client disconnection as follows:
278*da198e8fSThanos Makatos
279*da198e8fSThanos Makatos - all client memory regions are unmapped and cleaned up (including closing any
280*da198e8fSThanos Makatos   passed file descriptors)
281*da198e8fSThanos Makatos - all IRQ file descriptors passed from the old client are closed
282*da198e8fSThanos Makatos - the device state should otherwise be retained
283*da198e8fSThanos Makatos
284*da198e8fSThanos MakatosThe expectation is that when a client reconnects, it will re-establish IRQ and
285*da198e8fSThanos Makatosclient memory mappings.
286*da198e8fSThanos Makatos
287*da198e8fSThanos MakatosIf anything happens to the client (such as qemu really did exit), the control
288*da198e8fSThanos Makatosstack will know about it and can clean up resources accordingly.
289*da198e8fSThanos Makatos
290*da198e8fSThanos MakatosSecurity Considerations
291*da198e8fSThanos Makatos-----------------------
292*da198e8fSThanos Makatos
293*da198e8fSThanos MakatosSpeaking generally, vfio-user clients should not trust servers, and vice versa.
294*da198e8fSThanos MakatosStandard tools and mechanisms should be used on both sides to validate input and
295*da198e8fSThanos Makatosprevent against denial of service scenarios, buffer overflow, etc.
296*da198e8fSThanos Makatos
297*da198e8fSThanos MakatosRequest Retry and Response Timeout
298*da198e8fSThanos Makatos----------------------------------
299*da198e8fSThanos MakatosA failed command is a command that has been successfully sent and has been
300*da198e8fSThanos Makatosresponded to with an error code. Failure to send the command in the first place
301*da198e8fSThanos Makatos(e.g. because the socket is disconnected) is a different type of error examined
302*da198e8fSThanos Makatosearlier in the disconnect section.
303*da198e8fSThanos Makatos
304*da198e8fSThanos Makatos.. Note::
305*da198e8fSThanos Makatos   QEMU's VFIO retries certain operations if they fail. While this makes sense
306*da198e8fSThanos Makatos   for real HW, we don't know for sure whether it makes sense for virtual
307*da198e8fSThanos Makatos   devices.
308*da198e8fSThanos Makatos
309*da198e8fSThanos MakatosDefining a retry and timeout scheme is deferred to a future version of the
310*da198e8fSThanos Makatosprotocol.
311*da198e8fSThanos Makatos
312*da198e8fSThanos MakatosMessage sizes
313*da198e8fSThanos Makatos-------------
314*da198e8fSThanos Makatos
315*da198e8fSThanos MakatosSome requests have an ``argsz`` field. In a request, it defines the maximum
316*da198e8fSThanos Makatosexpected reply payload size, which should be at least the size of the fixed
317*da198e8fSThanos Makatosreply payload headers defined here. The *request* payload size is defined by the
318*da198e8fSThanos Makatosusual ``msg_size`` field in the header, not the ``argsz`` field.
319*da198e8fSThanos Makatos
320*da198e8fSThanos MakatosIn a reply, the server sets ``argsz`` field to the size needed for a full
321*da198e8fSThanos Makatospayload size. This may be less than the requested maximum size. This may be
322*da198e8fSThanos Makatoslarger than the requested maximum size: in that case, the full payload is not
323*da198e8fSThanos Makatosincluded in the reply, but the ``argsz`` field in the reply indicates the needed
324*da198e8fSThanos Makatossize, allowing a client to allocate a larger buffer for holding the reply before
325*da198e8fSThanos Makatostrying again.
326*da198e8fSThanos Makatos
327*da198e8fSThanos MakatosIn addition, during negotiation (see  `Version`_), the client and server may
328*da198e8fSThanos Makatoseach specify a ``max_data_xfer_size`` value; this defines the maximum data that
329*da198e8fSThanos Makatosmay be read or written via one of the ``VFIO_USER_DMA/REGION_READ/WRITE``
330*da198e8fSThanos Makatosmessages; see `Read and Write Operations`_.
331*da198e8fSThanos Makatos
332*da198e8fSThanos MakatosProtocol Specification
333*da198e8fSThanos Makatos======================
334*da198e8fSThanos Makatos
335*da198e8fSThanos MakatosTo distinguish from the base VFIO symbols, all vfio-user symbols are prefixed
336*da198e8fSThanos Makatoswith ``vfio_user`` or ``VFIO_USER``. In this revision, all data is in the
337*da198e8fSThanos Makatosendianness of the host system, although this may be relaxed in future
338*da198e8fSThanos Makatosrevisions in cases where the client and server run on different hosts
339*da198e8fSThanos Makatoswith different endianness.
340*da198e8fSThanos Makatos
341*da198e8fSThanos MakatosUnless otherwise specified, all sizes should be presumed to be in bytes.
342*da198e8fSThanos Makatos
343*da198e8fSThanos Makatos.. _Commands:
344*da198e8fSThanos Makatos
345*da198e8fSThanos MakatosCommands
346*da198e8fSThanos Makatos--------
347*da198e8fSThanos MakatosThe following table lists the VFIO message command IDs, and whether the
348*da198e8fSThanos Makatosmessage command is sent from the client or the server.
349*da198e8fSThanos Makatos
350*da198e8fSThanos Makatos======================================  =========  =================
351*da198e8fSThanos MakatosName                                    Command    Request Direction
352*da198e8fSThanos Makatos======================================  =========  =================
353*da198e8fSThanos Makatos``VFIO_USER_VERSION``                   1          client -> server
354*da198e8fSThanos Makatos``VFIO_USER_DMA_MAP``                   2          client -> server
355*da198e8fSThanos Makatos``VFIO_USER_DMA_UNMAP``                 3          client -> server
356*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_INFO``           4          client -> server
357*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_INFO``    5          client -> server
358*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_IO_FDS``  6          client -> server
359*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_IRQ_INFO``       7          client -> server
360*da198e8fSThanos Makatos``VFIO_USER_DEVICE_SET_IRQS``           8          client -> server
361*da198e8fSThanos Makatos``VFIO_USER_REGION_READ``               9          client -> server
362*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE``              10         client -> server
363*da198e8fSThanos Makatos``VFIO_USER_DMA_READ``                  11         server -> client
364*da198e8fSThanos Makatos``VFIO_USER_DMA_WRITE``                 12         server -> client
365*da198e8fSThanos Makatos``VFIO_USER_DEVICE_RESET``              13         client -> server
366*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE_MULTI``        15         client -> server
367*da198e8fSThanos Makatos======================================  =========  =================
368*da198e8fSThanos Makatos
369*da198e8fSThanos MakatosHeader
370*da198e8fSThanos Makatos------
371*da198e8fSThanos Makatos
372*da198e8fSThanos MakatosAll messages, both command messages and reply messages, are preceded by a
373*da198e8fSThanos Makatos16-byte header that contains basic information about the message. The header is
374*da198e8fSThanos Makatosfollowed by message-specific data described in the sections below.
375*da198e8fSThanos Makatos
376*da198e8fSThanos Makatos+----------------+--------+-------------+
377*da198e8fSThanos Makatos| Name           | Offset | Size        |
378*da198e8fSThanos Makatos+================+========+=============+
379*da198e8fSThanos Makatos| Message ID     | 0      | 2           |
380*da198e8fSThanos Makatos+----------------+--------+-------------+
381*da198e8fSThanos Makatos| Command        | 2      | 2           |
382*da198e8fSThanos Makatos+----------------+--------+-------------+
383*da198e8fSThanos Makatos| Message size   | 4      | 4           |
384*da198e8fSThanos Makatos+----------------+--------+-------------+
385*da198e8fSThanos Makatos| Flags          | 8      | 4           |
386*da198e8fSThanos Makatos+----------------+--------+-------------+
387*da198e8fSThanos Makatos|                | +-----+------------+ |
388*da198e8fSThanos Makatos|                | | Bit | Definition | |
389*da198e8fSThanos Makatos|                | +=====+============+ |
390*da198e8fSThanos Makatos|                | | 0-3 | Type       | |
391*da198e8fSThanos Makatos|                | +-----+------------+ |
392*da198e8fSThanos Makatos|                | | 4   | No_reply   | |
393*da198e8fSThanos Makatos|                | +-----+------------+ |
394*da198e8fSThanos Makatos|                | | 5   | Error      | |
395*da198e8fSThanos Makatos|                | +-----+------------+ |
396*da198e8fSThanos Makatos+----------------+--------+-------------+
397*da198e8fSThanos Makatos| Error          | 12     | 4           |
398*da198e8fSThanos Makatos+----------------+--------+-------------+
399*da198e8fSThanos Makatos| <message data> | 16     | variable    |
400*da198e8fSThanos Makatos+----------------+--------+-------------+
401*da198e8fSThanos Makatos
402*da198e8fSThanos Makatos* *Message ID* identifies the message, and is echoed in the command's reply
403*da198e8fSThanos Makatos  message. Message IDs belong entirely to the sender, can be re-used (even
404*da198e8fSThanos Makatos  concurrently) and the receiver must not make any assumptions about their
405*da198e8fSThanos Makatos  uniqueness.
406*da198e8fSThanos Makatos* *Command* specifies the command to be executed, listed in Commands_. It is
407*da198e8fSThanos Makatos  also set in the reply header.
408*da198e8fSThanos Makatos* *Message size* contains the size of the entire message, including the header.
409*da198e8fSThanos Makatos* *Flags* contains attributes of the message:
410*da198e8fSThanos Makatos
411*da198e8fSThanos Makatos  * The *Type* bits indicate the message type.
412*da198e8fSThanos Makatos
413*da198e8fSThanos Makatos    *  *Command* (value 0x0) indicates a command message.
414*da198e8fSThanos Makatos    *  *Reply* (value 0x1) indicates a reply message acknowledging a previous
415*da198e8fSThanos Makatos       command with the same message ID.
416*da198e8fSThanos Makatos  * *No_reply* in a command message indicates that no reply is needed for this
417*da198e8fSThanos Makatos    command.  This is commonly used when multiple commands are sent, and only
418*da198e8fSThanos Makatos    the last needs acknowledgement.
419*da198e8fSThanos Makatos  * *Error* in a reply message indicates the command being acknowledged had
420*da198e8fSThanos Makatos    an error. In this case, the *Error* field will be valid.
421*da198e8fSThanos Makatos
422*da198e8fSThanos Makatos* *Error* in a reply message is an optional UNIX errno value. It may be zero
423*da198e8fSThanos Makatos  even if the Error bit is set in Flags. It is reserved in a command message.
424*da198e8fSThanos Makatos
425*da198e8fSThanos MakatosEach command message in Commands_ must be replied to with a reply message,
426*da198e8fSThanos Makatosunless the message sets the *No_Reply* bit.  The reply consists of the header
427*da198e8fSThanos Makatoswith the *Reply* bit set, plus any additional data.
428*da198e8fSThanos Makatos
429*da198e8fSThanos MakatosIf an error occurs, the reply message must only include the reply header.
430*da198e8fSThanos Makatos
431*da198e8fSThanos MakatosAs the header is standard in both requests and replies, it is not included in
432*da198e8fSThanos Makatosthe command-specific specifications below; each message definition should be
433*da198e8fSThanos Makatosappended to the standard header, and the offsets are given from the end of the
434*da198e8fSThanos Makatosstandard header.
435*da198e8fSThanos Makatos
436*da198e8fSThanos Makatos``VFIO_USER_VERSION``
437*da198e8fSThanos Makatos---------------------
438*da198e8fSThanos Makatos
439*da198e8fSThanos Makatos.. _Version:
440*da198e8fSThanos Makatos
441*da198e8fSThanos MakatosThis is the initial message sent by the client after the socket connection is
442*da198e8fSThanos Makatosestablished; the same format is used for the server's reply.
443*da198e8fSThanos Makatos
444*da198e8fSThanos MakatosUpon establishing a connection, the client must send a ``VFIO_USER_VERSION``
445*da198e8fSThanos Makatosmessage proposing a protocol version and a set of capabilities. The server
446*da198e8fSThanos Makatoscompares these with the versions and capabilities it supports and sends a
447*da198e8fSThanos Makatos``VFIO_USER_VERSION`` reply according to the following rules.
448*da198e8fSThanos Makatos
449*da198e8fSThanos Makatos* The major version in the reply must be the same as proposed. If the client
450*da198e8fSThanos Makatos  does not support the proposed major, it closes the connection.
451*da198e8fSThanos Makatos* The minor version in the reply must be equal to or less than the minor
452*da198e8fSThanos Makatos  version proposed.
453*da198e8fSThanos Makatos* The capability list must be a subset of those proposed. If the server
454*da198e8fSThanos Makatos  requires a capability the client did not include, it closes the connection.
455*da198e8fSThanos Makatos
456*da198e8fSThanos MakatosThe protocol major version will only change when incompatible protocol changes
457*da198e8fSThanos Makatosare made, such as changing the message format. The minor version may change
458*da198e8fSThanos Makatoswhen compatible changes are made, such as adding new messages or capabilities,
459*da198e8fSThanos MakatosBoth the client and server must support all minor versions less than the
460*da198e8fSThanos Makatosmaximum minor version it supports. E.g., an implementation that supports
461*da198e8fSThanos Makatosversion 1.3 must also support 1.0 through 1.2.
462*da198e8fSThanos Makatos
463*da198e8fSThanos MakatosWhen making a change to this specification, the protocol version number must
464*da198e8fSThanos Makatosbe included in the form "added in version X.Y"
465*da198e8fSThanos Makatos
466*da198e8fSThanos MakatosRequest
467*da198e8fSThanos Makatos^^^^^^^
468*da198e8fSThanos Makatos
469*da198e8fSThanos Makatos==============  ======  ====
470*da198e8fSThanos MakatosName            Offset  Size
471*da198e8fSThanos Makatos==============  ======  ====
472*da198e8fSThanos Makatosversion major   0       2
473*da198e8fSThanos Makatosversion minor   2       2
474*da198e8fSThanos Makatosversion data    4       variable (including terminating NUL). Optional.
475*da198e8fSThanos Makatos==============  ======  ====
476*da198e8fSThanos Makatos
477*da198e8fSThanos MakatosThe version data is an optional UTF-8 encoded JSON byte array with the following
478*da198e8fSThanos Makatosformat:
479*da198e8fSThanos Makatos
480*da198e8fSThanos Makatos+--------------+--------+-----------------------------------+
481*da198e8fSThanos Makatos| Name         | Type   | Description                       |
482*da198e8fSThanos Makatos+==============+========+===================================+
483*da198e8fSThanos Makatos| capabilities | object | Contains common capabilities that |
484*da198e8fSThanos Makatos|              |        | the sender supports. Optional.    |
485*da198e8fSThanos Makatos+--------------+--------+-----------------------------------+
486*da198e8fSThanos Makatos
487*da198e8fSThanos MakatosCapabilities:
488*da198e8fSThanos Makatos
489*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+
490*da198e8fSThanos Makatos| Name               | Type    | Description                                    |
491*da198e8fSThanos Makatos+====================+=========+================================================+
492*da198e8fSThanos Makatos| max_msg_fds        | number  | Maximum number of file descriptors that can be |
493*da198e8fSThanos Makatos|                    |         | received by the sender in one message.         |
494*da198e8fSThanos Makatos|                    |         | Optional. If not specified then the receiver   |
495*da198e8fSThanos Makatos|                    |         | must assume a value of ``1``.                  |
496*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+
497*da198e8fSThanos Makatos| max_data_xfer_size | number  | Maximum ``count`` for data transfer messages;  |
498*da198e8fSThanos Makatos|                    |         | see `Read and Write Operations`_. Optional,    |
499*da198e8fSThanos Makatos|                    |         | with a default value of 1048576 bytes.         |
500*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+
501*da198e8fSThanos Makatos| pgsizes            | number  | Page sizes supported in DMA map operations     |
502*da198e8fSThanos Makatos|                    |         | or'ed together. Optional, with a default value |
503*da198e8fSThanos Makatos|                    |         | of supporting only 4k pages.                   |
504*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+
505*da198e8fSThanos Makatos| max_dma_maps       | number  | Maximum number DMA map windows that can be     |
506*da198e8fSThanos Makatos|                    |         | valid simultaneously.  Optional, with a        |
507*da198e8fSThanos Makatos|                    |         | value of 65535 (64k-1).                        |
508*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+
509*da198e8fSThanos Makatos| migration          | object  | Migration capability parameters. If missing    |
510*da198e8fSThanos Makatos|                    |         | then migration is not supported by the sender. |
511*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+
512*da198e8fSThanos Makatos| write_multiple     | boolean | ``VFIO_USER_REGION_WRITE_MULTI`` messages      |
513*da198e8fSThanos Makatos|                    |         | are supported if the value is ``true``.        |
514*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+
515*da198e8fSThanos Makatos
516*da198e8fSThanos MakatosThe migration capability contains the following name/value pairs:
517*da198e8fSThanos Makatos
518*da198e8fSThanos Makatos+-----------------+--------+--------------------------------------------------+
519*da198e8fSThanos Makatos| Name            | Type   | Description                                      |
520*da198e8fSThanos Makatos+=================+========+==================================================+
521*da198e8fSThanos Makatos| pgsize          | number | Page size of dirty pages bitmap. The smallest    |
522*da198e8fSThanos Makatos|                 |        | between the client and the server is used.       |
523*da198e8fSThanos Makatos+-----------------+--------+--------------------------------------------------+
524*da198e8fSThanos Makatos| max_bitmap_size | number | Maximum bitmap size in ``VFIO_USER_DIRTY_PAGES`` |
525*da198e8fSThanos Makatos|                 |        | and ``VFIO_DMA_UNMAP`` messages.  Optional,      |
526*da198e8fSThanos Makatos|                 |        | with a default value of 256MB.                   |
527*da198e8fSThanos Makatos+-----------------+--------+--------------------------------------------------+
528*da198e8fSThanos Makatos
529*da198e8fSThanos MakatosReply
530*da198e8fSThanos Makatos^^^^^
531*da198e8fSThanos Makatos
532*da198e8fSThanos MakatosThe same message format is used in the server's reply with the semantics
533*da198e8fSThanos Makatosdescribed above.
534*da198e8fSThanos Makatos
535*da198e8fSThanos Makatos``VFIO_USER_DMA_MAP``
536*da198e8fSThanos Makatos---------------------
537*da198e8fSThanos Makatos
538*da198e8fSThanos MakatosThis command message is sent by the client to the server to inform it of the
539*da198e8fSThanos Makatosmemory regions the server can access. It must be sent before the server can
540*da198e8fSThanos Makatosperform any DMA to the client. It is normally sent directly after the version
541*da198e8fSThanos Makatoshandshake is completed, but may also occur when memory is added to the client,
542*da198e8fSThanos Makatosor if the client uses a vIOMMU.
543*da198e8fSThanos Makatos
544*da198e8fSThanos MakatosRequest
545*da198e8fSThanos Makatos^^^^^^^
546*da198e8fSThanos Makatos
547*da198e8fSThanos MakatosThe request payload for this message is a structure of the following format:
548*da198e8fSThanos Makatos
549*da198e8fSThanos Makatos+-------------+--------+-------------+
550*da198e8fSThanos Makatos| Name        | Offset | Size        |
551*da198e8fSThanos Makatos+=============+========+=============+
552*da198e8fSThanos Makatos| argsz       | 0      | 4           |
553*da198e8fSThanos Makatos+-------------+--------+-------------+
554*da198e8fSThanos Makatos| flags       | 4      | 4           |
555*da198e8fSThanos Makatos+-------------+--------+-------------+
556*da198e8fSThanos Makatos|             | +-----+------------+ |
557*da198e8fSThanos Makatos|             | | Bit | Definition | |
558*da198e8fSThanos Makatos|             | +=====+============+ |
559*da198e8fSThanos Makatos|             | | 0   | readable   | |
560*da198e8fSThanos Makatos|             | +-----+------------+ |
561*da198e8fSThanos Makatos|             | | 1   | writeable  | |
562*da198e8fSThanos Makatos|             | +-----+------------+ |
563*da198e8fSThanos Makatos+-------------+--------+-------------+
564*da198e8fSThanos Makatos| offset      | 8      | 8           |
565*da198e8fSThanos Makatos+-------------+--------+-------------+
566*da198e8fSThanos Makatos| address     | 16     | 8           |
567*da198e8fSThanos Makatos+-------------+--------+-------------+
568*da198e8fSThanos Makatos| size        | 24     | 8           |
569*da198e8fSThanos Makatos+-------------+--------+-------------+
570*da198e8fSThanos Makatos
571*da198e8fSThanos Makatos* *argsz* is the size of the above structure. Note there is no reply payload,
572*da198e8fSThanos Makatos  so this field differs from other message types.
573*da198e8fSThanos Makatos* *flags* contains the following region attributes:
574*da198e8fSThanos Makatos
575*da198e8fSThanos Makatos  * *readable* indicates that the region can be read from.
576*da198e8fSThanos Makatos
577*da198e8fSThanos Makatos  * *writeable* indicates that the region can be written to.
578*da198e8fSThanos Makatos
579*da198e8fSThanos Makatos* *offset* is the file offset of the region with respect to the associated file
580*da198e8fSThanos Makatos  descriptor, or zero if the region is not mappable
581*da198e8fSThanos Makatos* *address* is the base DMA address of the region.
582*da198e8fSThanos Makatos* *size* is the size of the region.
583*da198e8fSThanos Makatos
584*da198e8fSThanos MakatosThis structure is 32 bytes in size, so the message size is 16 + 32 bytes.
585*da198e8fSThanos Makatos
586*da198e8fSThanos MakatosIf the DMA region being added can be directly mapped by the server, a file
587*da198e8fSThanos Makatosdescriptor must be sent as part of the message meta-data. The region can be
588*da198e8fSThanos Makatosmapped via the mmap() system call. On ``AF_UNIX`` sockets, the file descriptor
589*da198e8fSThanos Makatosmust be passed as ``SCM_RIGHTS`` type ancillary data.  Otherwise, if the DMA
590*da198e8fSThanos Makatosregion cannot be directly mapped by the server, no file descriptor must be sent
591*da198e8fSThanos Makatosas part of the message meta-data and the DMA region can be accessed by the
592*da198e8fSThanos Makatosserver using ``VFIO_USER_DMA_READ`` and ``VFIO_USER_DMA_WRITE`` messages,
593*da198e8fSThanos Makatosexplained in `Read and Write Operations`_. A command to map over an existing
594*da198e8fSThanos Makatosregion must be failed by the server with ``EEXIST`` set in error field in the
595*da198e8fSThanos Makatosreply.
596*da198e8fSThanos Makatos
597*da198e8fSThanos MakatosReply
598*da198e8fSThanos Makatos^^^^^
599*da198e8fSThanos Makatos
600*da198e8fSThanos MakatosThere is no payload in the reply message.
601*da198e8fSThanos Makatos
602*da198e8fSThanos Makatos``VFIO_USER_DMA_UNMAP``
603*da198e8fSThanos Makatos-----------------------
604*da198e8fSThanos Makatos
605*da198e8fSThanos MakatosThis command message is sent by the client to the server to inform it that a
606*da198e8fSThanos MakatosDMA region, previously made available via a ``VFIO_USER_DMA_MAP`` command
607*da198e8fSThanos Makatosmessage, is no longer available for DMA. It typically occurs when memory is
608*da198e8fSThanos Makatossubtracted from the client or if the client uses a vIOMMU. The DMA region is
609*da198e8fSThanos Makatosdescribed by the following structure:
610*da198e8fSThanos Makatos
611*da198e8fSThanos MakatosRequest
612*da198e8fSThanos Makatos^^^^^^^
613*da198e8fSThanos Makatos
614*da198e8fSThanos MakatosThe request payload for this message is a structure of the following format:
615*da198e8fSThanos Makatos
616*da198e8fSThanos Makatos+--------------+--------+------------------------+
617*da198e8fSThanos Makatos| Name         | Offset | Size                   |
618*da198e8fSThanos Makatos+==============+========+========================+
619*da198e8fSThanos Makatos| argsz        | 0      | 4                      |
620*da198e8fSThanos Makatos+--------------+--------+------------------------+
621*da198e8fSThanos Makatos| flags        | 4      | 4                      |
622*da198e8fSThanos Makatos+--------------+--------+------------------------+
623*da198e8fSThanos Makatos| address      | 8      | 8                      |
624*da198e8fSThanos Makatos+--------------+--------+------------------------+
625*da198e8fSThanos Makatos| size         | 16     | 8                      |
626*da198e8fSThanos Makatos+--------------+--------+------------------------+
627*da198e8fSThanos Makatos
628*da198e8fSThanos Makatos* *argsz* is the maximum size of the reply payload.
629*da198e8fSThanos Makatos* *flags* is unused in this version.
630*da198e8fSThanos Makatos* *address* is the base DMA address of the DMA region.
631*da198e8fSThanos Makatos* *size* is the size of the DMA region.
632*da198e8fSThanos Makatos
633*da198e8fSThanos MakatosThe address and size of the DMA region being unmapped must match exactly a
634*da198e8fSThanos Makatosprevious mapping.
635*da198e8fSThanos Makatos
636*da198e8fSThanos MakatosReply
637*da198e8fSThanos Makatos^^^^^
638*da198e8fSThanos Makatos
639*da198e8fSThanos MakatosUpon receiving a ``VFIO_USER_DMA_UNMAP`` command, if the file descriptor is
640*da198e8fSThanos Makatosmapped then the server must release all references to that DMA region before
641*da198e8fSThanos Makatosreplying, which potentially includes in-flight DMA transactions.
642*da198e8fSThanos Makatos
643*da198e8fSThanos MakatosThe server responds with the original DMA entry in the request.
644*da198e8fSThanos Makatos
645*da198e8fSThanos Makatos
646*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_INFO``
647*da198e8fSThanos Makatos-----------------------------
648*da198e8fSThanos Makatos
649*da198e8fSThanos MakatosThis command message is sent by the client to the server to query for basic
650*da198e8fSThanos Makatosinformation about the device.
651*da198e8fSThanos Makatos
652*da198e8fSThanos MakatosRequest
653*da198e8fSThanos Makatos^^^^^^^
654*da198e8fSThanos Makatos
655*da198e8fSThanos Makatos+-------------+--------+--------------------------+
656*da198e8fSThanos Makatos| Name        | Offset | Size                     |
657*da198e8fSThanos Makatos+=============+========+==========================+
658*da198e8fSThanos Makatos| argsz       | 0      | 4                        |
659*da198e8fSThanos Makatos+-------------+--------+--------------------------+
660*da198e8fSThanos Makatos| flags       | 4      | 4                        |
661*da198e8fSThanos Makatos+-------------+--------+--------------------------+
662*da198e8fSThanos Makatos|             | +-----+-------------------------+ |
663*da198e8fSThanos Makatos|             | | Bit | Definition              | |
664*da198e8fSThanos Makatos|             | +=====+=========================+ |
665*da198e8fSThanos Makatos|             | | 0   | VFIO_DEVICE_FLAGS_RESET | |
666*da198e8fSThanos Makatos|             | +-----+-------------------------+ |
667*da198e8fSThanos Makatos|             | | 1   | VFIO_DEVICE_FLAGS_PCI   | |
668*da198e8fSThanos Makatos|             | +-----+-------------------------+ |
669*da198e8fSThanos Makatos+-------------+--------+--------------------------+
670*da198e8fSThanos Makatos| num_regions | 8      | 4                        |
671*da198e8fSThanos Makatos+-------------+--------+--------------------------+
672*da198e8fSThanos Makatos| num_irqs    | 12     | 4                        |
673*da198e8fSThanos Makatos+-------------+--------+--------------------------+
674*da198e8fSThanos Makatos
675*da198e8fSThanos Makatos* *argsz* is the maximum size of the reply payload
676*da198e8fSThanos Makatos* all other fields must be zero.
677*da198e8fSThanos Makatos
678*da198e8fSThanos MakatosReply
679*da198e8fSThanos Makatos^^^^^
680*da198e8fSThanos Makatos
681*da198e8fSThanos Makatos+-------------+--------+--------------------------+
682*da198e8fSThanos Makatos| Name        | Offset | Size                     |
683*da198e8fSThanos Makatos+=============+========+==========================+
684*da198e8fSThanos Makatos| argsz       | 0      | 4                        |
685*da198e8fSThanos Makatos+-------------+--------+--------------------------+
686*da198e8fSThanos Makatos| flags       | 4      | 4                        |
687*da198e8fSThanos Makatos+-------------+--------+--------------------------+
688*da198e8fSThanos Makatos|             | +-----+-------------------------+ |
689*da198e8fSThanos Makatos|             | | Bit | Definition              | |
690*da198e8fSThanos Makatos|             | +=====+=========================+ |
691*da198e8fSThanos Makatos|             | | 0   | VFIO_DEVICE_FLAGS_RESET | |
692*da198e8fSThanos Makatos|             | +-----+-------------------------+ |
693*da198e8fSThanos Makatos|             | | 1   | VFIO_DEVICE_FLAGS_PCI   | |
694*da198e8fSThanos Makatos|             | +-----+-------------------------+ |
695*da198e8fSThanos Makatos+-------------+--------+--------------------------+
696*da198e8fSThanos Makatos| num_regions | 8      | 4                        |
697*da198e8fSThanos Makatos+-------------+--------+--------------------------+
698*da198e8fSThanos Makatos| num_irqs    | 12     | 4                        |
699*da198e8fSThanos Makatos+-------------+--------+--------------------------+
700*da198e8fSThanos Makatos
701*da198e8fSThanos Makatos* *argsz* is the size required for the full reply payload (16 bytes today)
702*da198e8fSThanos Makatos* *flags* contains the following device attributes.
703*da198e8fSThanos Makatos
704*da198e8fSThanos Makatos  * ``VFIO_DEVICE_FLAGS_RESET`` indicates that the device supports the
705*da198e8fSThanos Makatos    ``VFIO_USER_DEVICE_RESET`` message.
706*da198e8fSThanos Makatos  * ``VFIO_DEVICE_FLAGS_PCI`` indicates that the device is a PCI device.
707*da198e8fSThanos Makatos
708*da198e8fSThanos Makatos* *num_regions* is the number of memory regions that the device exposes.
709*da198e8fSThanos Makatos* *num_irqs* is the number of distinct interrupt types that the device supports.
710*da198e8fSThanos Makatos
711*da198e8fSThanos MakatosThis version of the protocol only supports PCI devices. Additional devices may
712*da198e8fSThanos Makatosbe supported in future versions.
713*da198e8fSThanos Makatos
714*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_INFO``
715*da198e8fSThanos Makatos------------------------------------
716*da198e8fSThanos Makatos
717*da198e8fSThanos MakatosThis command message is sent by the client to the server to query for
718*da198e8fSThanos Makatosinformation about device regions. The VFIO region info structure is defined in
719*da198e8fSThanos Makatos``<linux/vfio.h>`` (``struct vfio_region_info``).
720*da198e8fSThanos Makatos
721*da198e8fSThanos MakatosRequest
722*da198e8fSThanos Makatos^^^^^^^
723*da198e8fSThanos Makatos
724*da198e8fSThanos Makatos+------------+--------+------------------------------+
725*da198e8fSThanos Makatos| Name       | Offset | Size                         |
726*da198e8fSThanos Makatos+============+========+==============================+
727*da198e8fSThanos Makatos| argsz      | 0      | 4                            |
728*da198e8fSThanos Makatos+------------+--------+------------------------------+
729*da198e8fSThanos Makatos| flags      | 4      | 4                            |
730*da198e8fSThanos Makatos+------------+--------+------------------------------+
731*da198e8fSThanos Makatos| index      | 8      | 4                            |
732*da198e8fSThanos Makatos+------------+--------+------------------------------+
733*da198e8fSThanos Makatos| cap_offset | 12     | 4                            |
734*da198e8fSThanos Makatos+------------+--------+------------------------------+
735*da198e8fSThanos Makatos| size       | 16     | 8                            |
736*da198e8fSThanos Makatos+------------+--------+------------------------------+
737*da198e8fSThanos Makatos| offset     | 24     | 8                            |
738*da198e8fSThanos Makatos+------------+--------+------------------------------+
739*da198e8fSThanos Makatos
740*da198e8fSThanos Makatos* *argsz* the maximum size of the reply payload
741*da198e8fSThanos Makatos* *index* is the index of memory region being queried, it is the only field
742*da198e8fSThanos Makatos  that is required to be set in the command message.
743*da198e8fSThanos Makatos* all other fields must be zero.
744*da198e8fSThanos Makatos
745*da198e8fSThanos MakatosReply
746*da198e8fSThanos Makatos^^^^^
747*da198e8fSThanos Makatos
748*da198e8fSThanos Makatos+------------+--------+------------------------------+
749*da198e8fSThanos Makatos| Name       | Offset | Size                         |
750*da198e8fSThanos Makatos+============+========+==============================+
751*da198e8fSThanos Makatos| argsz      | 0      | 4                            |
752*da198e8fSThanos Makatos+------------+--------+------------------------------+
753*da198e8fSThanos Makatos| flags      | 4      | 4                            |
754*da198e8fSThanos Makatos+------------+--------+------------------------------+
755*da198e8fSThanos Makatos|            | +-----+-----------------------------+ |
756*da198e8fSThanos Makatos|            | | Bit | Definition                  | |
757*da198e8fSThanos Makatos|            | +=====+=============================+ |
758*da198e8fSThanos Makatos|            | | 0   | VFIO_REGION_INFO_FLAG_READ  | |
759*da198e8fSThanos Makatos|            | +-----+-----------------------------+ |
760*da198e8fSThanos Makatos|            | | 1   | VFIO_REGION_INFO_FLAG_WRITE | |
761*da198e8fSThanos Makatos|            | +-----+-----------------------------+ |
762*da198e8fSThanos Makatos|            | | 2   | VFIO_REGION_INFO_FLAG_MMAP  | |
763*da198e8fSThanos Makatos|            | +-----+-----------------------------+ |
764*da198e8fSThanos Makatos|            | | 3   | VFIO_REGION_INFO_FLAG_CAPS  | |
765*da198e8fSThanos Makatos|            | +-----+-----------------------------+ |
766*da198e8fSThanos Makatos+------------+--------+------------------------------+
767*da198e8fSThanos Makatos+------------+--------+------------------------------+
768*da198e8fSThanos Makatos| index      | 8      | 4                            |
769*da198e8fSThanos Makatos+------------+--------+------------------------------+
770*da198e8fSThanos Makatos| cap_offset | 12     | 4                            |
771*da198e8fSThanos Makatos+------------+--------+------------------------------+
772*da198e8fSThanos Makatos| size       | 16     | 8                            |
773*da198e8fSThanos Makatos+------------+--------+------------------------------+
774*da198e8fSThanos Makatos| offset     | 24     | 8                            |
775*da198e8fSThanos Makatos+------------+--------+------------------------------+
776*da198e8fSThanos Makatos
777*da198e8fSThanos Makatos* *argsz* is the size required for the full reply payload (region info structure
778*da198e8fSThanos Makatos  plus the size of any region capabilities)
779*da198e8fSThanos Makatos* *flags* are attributes of the region:
780*da198e8fSThanos Makatos
781*da198e8fSThanos Makatos  * ``VFIO_REGION_INFO_FLAG_READ`` allows client read access to the region.
782*da198e8fSThanos Makatos  * ``VFIO_REGION_INFO_FLAG_WRITE`` allows client write access to the region.
783*da198e8fSThanos Makatos  * ``VFIO_REGION_INFO_FLAG_MMAP`` specifies the client can mmap() the region.
784*da198e8fSThanos Makatos    When this flag is set, the reply will include a file descriptor in its
785*da198e8fSThanos Makatos    meta-data. On ``AF_UNIX`` sockets, the file descriptors will be passed as
786*da198e8fSThanos Makatos    ``SCM_RIGHTS`` type ancillary data.
787*da198e8fSThanos Makatos  * ``VFIO_REGION_INFO_FLAG_CAPS`` indicates additional capabilities found in the
788*da198e8fSThanos Makatos    reply.
789*da198e8fSThanos Makatos
790*da198e8fSThanos Makatos* *index* is the index of memory region being queried, it is the only field
791*da198e8fSThanos Makatos  that is required to be set in the command message.
792*da198e8fSThanos Makatos* *cap_offset* describes where additional region capabilities can be found.
793*da198e8fSThanos Makatos  cap_offset is relative to the beginning of the VFIO region info structure.
794*da198e8fSThanos Makatos  The data structure it points is a VFIO cap header defined in
795*da198e8fSThanos Makatos  ``<linux/vfio.h>``.
796*da198e8fSThanos Makatos* *size* is the size of the region.
797*da198e8fSThanos Makatos* *offset* is the offset that should be given to the mmap() system call for
798*da198e8fSThanos Makatos  regions with the MMAP attribute. It is also used as the base offset when
799*da198e8fSThanos Makatos  mapping a VFIO sparse mmap area, described below.
800*da198e8fSThanos Makatos
801*da198e8fSThanos MakatosVFIO region capabilities
802*da198e8fSThanos Makatos""""""""""""""""""""""""
803*da198e8fSThanos Makatos
804*da198e8fSThanos MakatosThe VFIO region information can also include a capabilities list. This list is
805*da198e8fSThanos Makatossimilar to a PCI capability list - each entry has a common header that
806*da198e8fSThanos Makatosidentifies a capability and where the next capability in the list can be found.
807*da198e8fSThanos MakatosThe VFIO capability header format is defined in ``<linux/vfio.h>`` (``struct
808*da198e8fSThanos Makatosvfio_info_cap_header``).
809*da198e8fSThanos Makatos
810*da198e8fSThanos MakatosVFIO cap header format
811*da198e8fSThanos Makatos""""""""""""""""""""""
812*da198e8fSThanos Makatos
813*da198e8fSThanos Makatos+---------+--------+------+
814*da198e8fSThanos Makatos| Name    | Offset | Size |
815*da198e8fSThanos Makatos+=========+========+======+
816*da198e8fSThanos Makatos| id      | 0      | 2    |
817*da198e8fSThanos Makatos+---------+--------+------+
818*da198e8fSThanos Makatos| version | 2      | 2    |
819*da198e8fSThanos Makatos+---------+--------+------+
820*da198e8fSThanos Makatos| next    | 4      | 4    |
821*da198e8fSThanos Makatos+---------+--------+------+
822*da198e8fSThanos Makatos
823*da198e8fSThanos Makatos* *id* is the capability identity.
824*da198e8fSThanos Makatos* *version* is a capability-specific version number.
825*da198e8fSThanos Makatos* *next* specifies the offset of the next capability in the capability list. It
826*da198e8fSThanos Makatos  is relative to the beginning of the VFIO region info structure.
827*da198e8fSThanos Makatos
828*da198e8fSThanos MakatosVFIO sparse mmap cap header
829*da198e8fSThanos Makatos"""""""""""""""""""""""""""
830*da198e8fSThanos Makatos
831*da198e8fSThanos Makatos+------------------+----------------------------------+
832*da198e8fSThanos Makatos| Name             | Value                            |
833*da198e8fSThanos Makatos+==================+==================================+
834*da198e8fSThanos Makatos| id               | VFIO_REGION_INFO_CAP_SPARSE_MMAP |
835*da198e8fSThanos Makatos+------------------+----------------------------------+
836*da198e8fSThanos Makatos| version          | 0x1                              |
837*da198e8fSThanos Makatos+------------------+----------------------------------+
838*da198e8fSThanos Makatos| next             | <next>                           |
839*da198e8fSThanos Makatos+------------------+----------------------------------+
840*da198e8fSThanos Makatos| sparse mmap info | VFIO region info sparse mmap     |
841*da198e8fSThanos Makatos+------------------+----------------------------------+
842*da198e8fSThanos Makatos
843*da198e8fSThanos MakatosThis capability is defined when only a subrange of the region supports
844*da198e8fSThanos Makatosdirect access by the client via mmap(). The VFIO sparse mmap area is defined in
845*da198e8fSThanos Makatos``<linux/vfio.h>`` (``struct vfio_region_sparse_mmap_area`` and ``struct
846*da198e8fSThanos Makatosvfio_region_info_cap_sparse_mmap``).
847*da198e8fSThanos Makatos
848*da198e8fSThanos MakatosVFIO region info cap sparse mmap
849*da198e8fSThanos Makatos""""""""""""""""""""""""""""""""
850*da198e8fSThanos Makatos
851*da198e8fSThanos Makatos+----------+--------+------+
852*da198e8fSThanos Makatos| Name     | Offset | Size |
853*da198e8fSThanos Makatos+==========+========+======+
854*da198e8fSThanos Makatos| nr_areas | 0      | 4    |
855*da198e8fSThanos Makatos+----------+--------+------+
856*da198e8fSThanos Makatos| reserved | 4      | 4    |
857*da198e8fSThanos Makatos+----------+--------+------+
858*da198e8fSThanos Makatos| offset   | 8      | 8    |
859*da198e8fSThanos Makatos+----------+--------+------+
860*da198e8fSThanos Makatos| size     | 16     | 8    |
861*da198e8fSThanos Makatos+----------+--------+------+
862*da198e8fSThanos Makatos| ...      |        |      |
863*da198e8fSThanos Makatos+----------+--------+------+
864*da198e8fSThanos Makatos
865*da198e8fSThanos Makatos* *nr_areas* is the number of sparse mmap areas in the region.
866*da198e8fSThanos Makatos* *offset* and size describe a single area that can be mapped by the client.
867*da198e8fSThanos Makatos  There will be *nr_areas* pairs of offset and size. The offset will be added to
868*da198e8fSThanos Makatos  the base offset given in the ``VFIO_USER_DEVICE_GET_REGION_INFO`` to form the
869*da198e8fSThanos Makatos  offset argument of the subsequent mmap() call.
870*da198e8fSThanos Makatos
871*da198e8fSThanos MakatosThe VFIO sparse mmap area is defined in ``<linux/vfio.h>`` (``struct
872*da198e8fSThanos Makatosvfio_region_info_cap_sparse_mmap``).
873*da198e8fSThanos Makatos
874*da198e8fSThanos Makatos
875*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_IO_FDS``
876*da198e8fSThanos Makatos--------------------------------------
877*da198e8fSThanos Makatos
878*da198e8fSThanos MakatosClients can access regions via ``VFIO_USER_REGION_READ/WRITE`` or, if provided, by
879*da198e8fSThanos Makatos``mmap()`` of a file descriptor provided by the server.
880*da198e8fSThanos Makatos
881*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_IO_FDS`` provides an alternative access mechanism via
882*da198e8fSThanos Makatosfile descriptors. This is an optional feature intended for performance
883*da198e8fSThanos Makatosimprovements where an underlying sub-system (such as KVM) supports communication
884*da198e8fSThanos Makatosacross such file descriptors to the vfio-user server, without needing to
885*da198e8fSThanos Makatosround-trip through the client.
886*da198e8fSThanos Makatos
887*da198e8fSThanos MakatosThe server returns an array of sub-regions for the requested region. Each
888*da198e8fSThanos Makatossub-region describes a span (offset and size) of a region, along with the
889*da198e8fSThanos Makatosrequested file descriptor notification mechanism to use.  Each sub-region in the
890*da198e8fSThanos Makatosresponse message may choose to use a different method, as defined below.  The
891*da198e8fSThanos Makatostwo mechanisms supported in this specification are ioeventfds and ioregionfds.
892*da198e8fSThanos Makatos
893*da198e8fSThanos MakatosThe server in addition returns a file descriptor in the ancillary data; clients
894*da198e8fSThanos Makatosare expected to configure each sub-region's file descriptor with the requested
895*da198e8fSThanos Makatosnotification method. For example, a client could configure KVM with the
896*da198e8fSThanos Makatosrequested ioeventfd via a ``KVM_IOEVENTFD`` ``ioctl()``.
897*da198e8fSThanos Makatos
898*da198e8fSThanos MakatosRequest
899*da198e8fSThanos Makatos^^^^^^^
900*da198e8fSThanos Makatos
901*da198e8fSThanos Makatos+-------------+--------+------+
902*da198e8fSThanos Makatos| Name        | Offset | Size |
903*da198e8fSThanos Makatos+=============+========+======+
904*da198e8fSThanos Makatos| argsz       | 0      | 4    |
905*da198e8fSThanos Makatos+-------------+--------+------+
906*da198e8fSThanos Makatos| flags       | 4      | 4    |
907*da198e8fSThanos Makatos+-------------+--------+------+
908*da198e8fSThanos Makatos| index       | 8      | 4    |
909*da198e8fSThanos Makatos+-------------+--------+------+
910*da198e8fSThanos Makatos| count       | 12     | 4    |
911*da198e8fSThanos Makatos+-------------+--------+------+
912*da198e8fSThanos Makatos
913*da198e8fSThanos Makatos* *argsz* the maximum size of the reply payload
914*da198e8fSThanos Makatos* *index* is the index of memory region being queried
915*da198e8fSThanos Makatos* all other fields must be zero
916*da198e8fSThanos Makatos
917*da198e8fSThanos MakatosThe client must set ``flags`` to zero and specify the region being queried in
918*da198e8fSThanos Makatosthe ``index``.
919*da198e8fSThanos Makatos
920*da198e8fSThanos MakatosReply
921*da198e8fSThanos Makatos^^^^^
922*da198e8fSThanos Makatos
923*da198e8fSThanos Makatos+-------------+--------+------+
924*da198e8fSThanos Makatos| Name        | Offset | Size |
925*da198e8fSThanos Makatos+=============+========+======+
926*da198e8fSThanos Makatos| argsz       | 0      | 4    |
927*da198e8fSThanos Makatos+-------------+--------+------+
928*da198e8fSThanos Makatos| flags       | 4      | 4    |
929*da198e8fSThanos Makatos+-------------+--------+------+
930*da198e8fSThanos Makatos| index       | 8      | 4    |
931*da198e8fSThanos Makatos+-------------+--------+------+
932*da198e8fSThanos Makatos| count       | 12     | 4    |
933*da198e8fSThanos Makatos+-------------+--------+------+
934*da198e8fSThanos Makatos| sub-regions | 16     | ...  |
935*da198e8fSThanos Makatos+-------------+--------+------+
936*da198e8fSThanos Makatos
937*da198e8fSThanos Makatos* *argsz* is the size of the region IO FD info structure plus the
938*da198e8fSThanos Makatos  total size of the sub-region array. Thus, each array entry "i" is at offset
939*da198e8fSThanos Makatos  i * ((argsz - 32) / count). Note that currently this is 40 bytes for both IO
940*da198e8fSThanos Makatos  FD types, but this is not to be relied on. As elsewhere, this indicates the
941*da198e8fSThanos Makatos  full reply payload size needed.
942*da198e8fSThanos Makatos* *flags* must be zero
943*da198e8fSThanos Makatos* *index* is the index of memory region being queried
944*da198e8fSThanos Makatos* *count* is the number of sub-regions in the array
945*da198e8fSThanos Makatos* *sub-regions* is the array of Sub-Region IO FD info structures
946*da198e8fSThanos Makatos
947*da198e8fSThanos MakatosThe reply message will additionally include at least one file descriptor in the
948*da198e8fSThanos Makatosancillary data. Note that more than one sub-region may share the same file
949*da198e8fSThanos Makatosdescriptor.
950*da198e8fSThanos Makatos
951*da198e8fSThanos MakatosNote that it is the client's responsibility to verify the requested values (for
952*da198e8fSThanos Makatosexample, that the requested offset does not exceed the region's bounds).
953*da198e8fSThanos Makatos
954*da198e8fSThanos MakatosEach sub-region given in the response has one of two possible structures,
955*da198e8fSThanos Makatosdepending whether *type* is ``VFIO_USER_IO_FD_TYPE_IOEVENTFD`` or
956*da198e8fSThanos Makatos``VFIO_USER_IO_FD_TYPE_IOREGIONFD``:
957*da198e8fSThanos Makatos
958*da198e8fSThanos MakatosSub-Region IO FD info format (ioeventfd)
959*da198e8fSThanos Makatos""""""""""""""""""""""""""""""""""""""""
960*da198e8fSThanos Makatos
961*da198e8fSThanos Makatos+-----------+--------+------+
962*da198e8fSThanos Makatos| Name      | Offset | Size |
963*da198e8fSThanos Makatos+===========+========+======+
964*da198e8fSThanos Makatos| offset    | 0      | 8    |
965*da198e8fSThanos Makatos+-----------+--------+------+
966*da198e8fSThanos Makatos| size      | 8      | 8    |
967*da198e8fSThanos Makatos+-----------+--------+------+
968*da198e8fSThanos Makatos| fd_index  | 16     | 4    |
969*da198e8fSThanos Makatos+-----------+--------+------+
970*da198e8fSThanos Makatos| type      | 20     | 4    |
971*da198e8fSThanos Makatos+-----------+--------+------+
972*da198e8fSThanos Makatos| flags     | 24     | 4    |
973*da198e8fSThanos Makatos+-----------+--------+------+
974*da198e8fSThanos Makatos| padding   | 28     | 4    |
975*da198e8fSThanos Makatos+-----------+--------+------+
976*da198e8fSThanos Makatos| datamatch | 32     | 8    |
977*da198e8fSThanos Makatos+-----------+--------+------+
978*da198e8fSThanos Makatos
979*da198e8fSThanos Makatos* *offset* is the offset of the start of the sub-region within the region
980*da198e8fSThanos Makatos  requested ("physical address offset" for the region)
981*da198e8fSThanos Makatos* *size* is the length of the sub-region. This may be zero if the access size is
982*da198e8fSThanos Makatos  not relevant, which may allow for optimizations
983*da198e8fSThanos Makatos* *fd_index* is the index in the ancillary data of the FD to use for ioeventfd
984*da198e8fSThanos Makatos  notification; it may be shared.
985*da198e8fSThanos Makatos* *type* is ``VFIO_USER_IO_FD_TYPE_IOEVENTFD``
986*da198e8fSThanos Makatos* *flags* is any of:
987*da198e8fSThanos Makatos
988*da198e8fSThanos Makatos  * ``KVM_IOEVENTFD_FLAG_DATAMATCH``
989*da198e8fSThanos Makatos  * ``KVM_IOEVENTFD_FLAG_PIO``
990*da198e8fSThanos Makatos  * ``KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY`` (FIXME: makes sense?)
991*da198e8fSThanos Makatos
992*da198e8fSThanos Makatos* *datamatch* is the datamatch value if needed
993*da198e8fSThanos Makatos
994*da198e8fSThanos MakatosSee https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt, *4.59
995*da198e8fSThanos MakatosKVM_IOEVENTFD* for further context on the ioeventfd-specific fields.
996*da198e8fSThanos Makatos
997*da198e8fSThanos MakatosSub-Region IO FD info format (ioregionfd)
998*da198e8fSThanos Makatos"""""""""""""""""""""""""""""""""""""""""
999*da198e8fSThanos Makatos
1000*da198e8fSThanos Makatos+-----------+--------+------+
1001*da198e8fSThanos Makatos| Name      | Offset | Size |
1002*da198e8fSThanos Makatos+===========+========+======+
1003*da198e8fSThanos Makatos| offset    | 0      | 8    |
1004*da198e8fSThanos Makatos+-----------+--------+------+
1005*da198e8fSThanos Makatos| size      | 8      | 8    |
1006*da198e8fSThanos Makatos+-----------+--------+------+
1007*da198e8fSThanos Makatos| fd_index  | 16     | 4    |
1008*da198e8fSThanos Makatos+-----------+--------+------+
1009*da198e8fSThanos Makatos| type      | 20     | 4    |
1010*da198e8fSThanos Makatos+-----------+--------+------+
1011*da198e8fSThanos Makatos| flags     | 24     | 4    |
1012*da198e8fSThanos Makatos+-----------+--------+------+
1013*da198e8fSThanos Makatos| padding   | 28     | 4    |
1014*da198e8fSThanos Makatos+-----------+--------+------+
1015*da198e8fSThanos Makatos| user_data | 32     | 8    |
1016*da198e8fSThanos Makatos+-----------+--------+------+
1017*da198e8fSThanos Makatos
1018*da198e8fSThanos Makatos* *offset* is the offset of the start of the sub-region within the region
1019*da198e8fSThanos Makatos  requested ("physical address offset" for the region)
1020*da198e8fSThanos Makatos* *size* is the length of the sub-region. This may be zero if the access size is
1021*da198e8fSThanos Makatos  not relevant, which may allow for optimizations; ``KVM_IOREGION_POSTED_WRITES``
1022*da198e8fSThanos Makatos  must be set in *flags* in this case
1023*da198e8fSThanos Makatos* *fd_index* is the index in the ancillary data of the FD to use for ioregionfd
1024*da198e8fSThanos Makatos  messages; it may be shared
1025*da198e8fSThanos Makatos* *type* is ``VFIO_USER_IO_FD_TYPE_IOREGIONFD``
1026*da198e8fSThanos Makatos* *flags* is any of:
1027*da198e8fSThanos Makatos
1028*da198e8fSThanos Makatos  * ``KVM_IOREGION_PIO``
1029*da198e8fSThanos Makatos  * ``KVM_IOREGION_POSTED_WRITES``
1030*da198e8fSThanos Makatos
1031*da198e8fSThanos Makatos* *user_data* is an opaque value passed back to the server via a message on the
1032*da198e8fSThanos Makatos  file descriptor
1033*da198e8fSThanos Makatos
1034*da198e8fSThanos MakatosFor further information on the ioregionfd-specific fields, see:
1035*da198e8fSThanos Makatoshttps://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
1036*da198e8fSThanos Makatos
1037*da198e8fSThanos Makatos(FIXME: update with final API docs.)
1038*da198e8fSThanos Makatos
1039*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_IRQ_INFO``
1040*da198e8fSThanos Makatos---------------------------------
1041*da198e8fSThanos Makatos
1042*da198e8fSThanos MakatosThis command message is sent by the client to the server to query for
1043*da198e8fSThanos Makatosinformation about device interrupt types. The VFIO IRQ info structure is
1044*da198e8fSThanos Makatosdefined in ``<linux/vfio.h>`` (``struct vfio_irq_info``).
1045*da198e8fSThanos Makatos
1046*da198e8fSThanos MakatosRequest
1047*da198e8fSThanos Makatos^^^^^^^
1048*da198e8fSThanos Makatos
1049*da198e8fSThanos Makatos+-------+--------+---------------------------+
1050*da198e8fSThanos Makatos| Name  | Offset | Size                      |
1051*da198e8fSThanos Makatos+=======+========+===========================+
1052*da198e8fSThanos Makatos| argsz | 0      | 4                         |
1053*da198e8fSThanos Makatos+-------+--------+---------------------------+
1054*da198e8fSThanos Makatos| flags | 4      | 4                         |
1055*da198e8fSThanos Makatos+-------+--------+---------------------------+
1056*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1057*da198e8fSThanos Makatos|       | | Bit | Definition               | |
1058*da198e8fSThanos Makatos|       | +=====+==========================+ |
1059*da198e8fSThanos Makatos|       | | 0   | VFIO_IRQ_INFO_EVENTFD    | |
1060*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1061*da198e8fSThanos Makatos|       | | 1   | VFIO_IRQ_INFO_MASKABLE   | |
1062*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1063*da198e8fSThanos Makatos|       | | 2   | VFIO_IRQ_INFO_AUTOMASKED | |
1064*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1065*da198e8fSThanos Makatos|       | | 3   | VFIO_IRQ_INFO_NORESIZE   | |
1066*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1067*da198e8fSThanos Makatos+-------+--------+---------------------------+
1068*da198e8fSThanos Makatos| index | 8      | 4                         |
1069*da198e8fSThanos Makatos+-------+--------+---------------------------+
1070*da198e8fSThanos Makatos| count | 12     | 4                         |
1071*da198e8fSThanos Makatos+-------+--------+---------------------------+
1072*da198e8fSThanos Makatos
1073*da198e8fSThanos Makatos* *argsz* is the maximum size of the reply payload (16 bytes today)
1074*da198e8fSThanos Makatos* index is the index of IRQ type being queried (e.g. ``VFIO_PCI_MSIX_IRQ_INDEX``)
1075*da198e8fSThanos Makatos* all other fields must be zero
1076*da198e8fSThanos Makatos
1077*da198e8fSThanos MakatosReply
1078*da198e8fSThanos Makatos^^^^^
1079*da198e8fSThanos Makatos
1080*da198e8fSThanos Makatos+-------+--------+---------------------------+
1081*da198e8fSThanos Makatos| Name  | Offset | Size                      |
1082*da198e8fSThanos Makatos+=======+========+===========================+
1083*da198e8fSThanos Makatos| argsz | 0      | 4                         |
1084*da198e8fSThanos Makatos+-------+--------+---------------------------+
1085*da198e8fSThanos Makatos| flags | 4      | 4                         |
1086*da198e8fSThanos Makatos+-------+--------+---------------------------+
1087*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1088*da198e8fSThanos Makatos|       | | Bit | Definition               | |
1089*da198e8fSThanos Makatos|       | +=====+==========================+ |
1090*da198e8fSThanos Makatos|       | | 0   | VFIO_IRQ_INFO_EVENTFD    | |
1091*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1092*da198e8fSThanos Makatos|       | | 1   | VFIO_IRQ_INFO_MASKABLE   | |
1093*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1094*da198e8fSThanos Makatos|       | | 2   | VFIO_IRQ_INFO_AUTOMASKED | |
1095*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1096*da198e8fSThanos Makatos|       | | 3   | VFIO_IRQ_INFO_NORESIZE   | |
1097*da198e8fSThanos Makatos|       | +-----+--------------------------+ |
1098*da198e8fSThanos Makatos+-------+--------+---------------------------+
1099*da198e8fSThanos Makatos| index | 8      | 4                         |
1100*da198e8fSThanos Makatos+-------+--------+---------------------------+
1101*da198e8fSThanos Makatos| count | 12     | 4                         |
1102*da198e8fSThanos Makatos+-------+--------+---------------------------+
1103*da198e8fSThanos Makatos
1104*da198e8fSThanos Makatos* *argsz* is the size required for the full reply payload (16 bytes today)
1105*da198e8fSThanos Makatos* *flags* defines IRQ attributes:
1106*da198e8fSThanos Makatos
1107*da198e8fSThanos Makatos  * ``VFIO_IRQ_INFO_EVENTFD`` indicates the IRQ type can support server eventfd
1108*da198e8fSThanos Makatos    signalling.
1109*da198e8fSThanos Makatos  * ``VFIO_IRQ_INFO_MASKABLE`` indicates that the IRQ type supports the ``MASK``
1110*da198e8fSThanos Makatos    and ``UNMASK`` actions in a ``VFIO_USER_DEVICE_SET_IRQS`` message.
1111*da198e8fSThanos Makatos  * ``VFIO_IRQ_INFO_AUTOMASKED`` indicates the IRQ type masks itself after being
1112*da198e8fSThanos Makatos    triggered, and the client must send an ``UNMASK`` action to receive new
1113*da198e8fSThanos Makatos    interrupts.
1114*da198e8fSThanos Makatos  * ``VFIO_IRQ_INFO_NORESIZE`` indicates ``VFIO_USER_SET_IRQS`` operations setup
1115*da198e8fSThanos Makatos    interrupts as a set, and new sub-indexes cannot be enabled without disabling
1116*da198e8fSThanos Makatos    the entire type.
1117*da198e8fSThanos Makatos* index is the index of IRQ type being queried
1118*da198e8fSThanos Makatos* count describes the number of interrupts of the queried type.
1119*da198e8fSThanos Makatos
1120*da198e8fSThanos Makatos``VFIO_USER_DEVICE_SET_IRQS``
1121*da198e8fSThanos Makatos-----------------------------
1122*da198e8fSThanos Makatos
1123*da198e8fSThanos MakatosThis command message is sent by the client to the server to set actions for
1124*da198e8fSThanos Makatosdevice interrupt types. The VFIO IRQ set structure is defined in
1125*da198e8fSThanos Makatos``<linux/vfio.h>`` (``struct vfio_irq_set``).
1126*da198e8fSThanos Makatos
1127*da198e8fSThanos MakatosRequest
1128*da198e8fSThanos Makatos^^^^^^^
1129*da198e8fSThanos Makatos
1130*da198e8fSThanos Makatos+-------+--------+------------------------------+
1131*da198e8fSThanos Makatos| Name  | Offset | Size                         |
1132*da198e8fSThanos Makatos+=======+========+==============================+
1133*da198e8fSThanos Makatos| argsz | 0      | 4                            |
1134*da198e8fSThanos Makatos+-------+--------+------------------------------+
1135*da198e8fSThanos Makatos| flags | 4      | 4                            |
1136*da198e8fSThanos Makatos+-------+--------+------------------------------+
1137*da198e8fSThanos Makatos|       | +-----+-----------------------------+ |
1138*da198e8fSThanos Makatos|       | | Bit | Definition                  | |
1139*da198e8fSThanos Makatos|       | +=====+=============================+ |
1140*da198e8fSThanos Makatos|       | | 0   | VFIO_IRQ_SET_DATA_NONE      | |
1141*da198e8fSThanos Makatos|       | +-----+-----------------------------+ |
1142*da198e8fSThanos Makatos|       | | 1   | VFIO_IRQ_SET_DATA_BOOL      | |
1143*da198e8fSThanos Makatos|       | +-----+-----------------------------+ |
1144*da198e8fSThanos Makatos|       | | 2   | VFIO_IRQ_SET_DATA_EVENTFD   | |
1145*da198e8fSThanos Makatos|       | +-----+-----------------------------+ |
1146*da198e8fSThanos Makatos|       | | 3   | VFIO_IRQ_SET_ACTION_MASK    | |
1147*da198e8fSThanos Makatos|       | +-----+-----------------------------+ |
1148*da198e8fSThanos Makatos|       | | 4   | VFIO_IRQ_SET_ACTION_UNMASK  | |
1149*da198e8fSThanos Makatos|       | +-----+-----------------------------+ |
1150*da198e8fSThanos Makatos|       | | 5   | VFIO_IRQ_SET_ACTION_TRIGGER | |
1151*da198e8fSThanos Makatos|       | +-----+-----------------------------+ |
1152*da198e8fSThanos Makatos+-------+--------+------------------------------+
1153*da198e8fSThanos Makatos| index | 8      | 4                            |
1154*da198e8fSThanos Makatos+-------+--------+------------------------------+
1155*da198e8fSThanos Makatos| start | 12     | 4                            |
1156*da198e8fSThanos Makatos+-------+--------+------------------------------+
1157*da198e8fSThanos Makatos| count | 16     | 4                            |
1158*da198e8fSThanos Makatos+-------+--------+------------------------------+
1159*da198e8fSThanos Makatos| data  | 20     | variable                     |
1160*da198e8fSThanos Makatos+-------+--------+------------------------------+
1161*da198e8fSThanos Makatos
1162*da198e8fSThanos Makatos* *argsz* is the size of the VFIO IRQ set request payload, including any *data*
1163*da198e8fSThanos Makatos  field. Note there is no reply payload, so this field differs from other
1164*da198e8fSThanos Makatos  message types.
1165*da198e8fSThanos Makatos* *flags* defines the action performed on the interrupt range. The ``DATA``
1166*da198e8fSThanos Makatos  flags describe the data field sent in the message; the ``ACTION`` flags
1167*da198e8fSThanos Makatos  describe the action to be performed. The flags are mutually exclusive for
1168*da198e8fSThanos Makatos  both sets.
1169*da198e8fSThanos Makatos
1170*da198e8fSThanos Makatos  * ``VFIO_IRQ_SET_DATA_NONE`` indicates there is no data field in the command.
1171*da198e8fSThanos Makatos    The action is performed unconditionally.
1172*da198e8fSThanos Makatos  * ``VFIO_IRQ_SET_DATA_BOOL`` indicates the data field is an array of boolean
1173*da198e8fSThanos Makatos    bytes. The action is performed if the corresponding boolean is true.
1174*da198e8fSThanos Makatos  * ``VFIO_IRQ_SET_DATA_EVENTFD`` indicates an array of event file descriptors
1175*da198e8fSThanos Makatos    was sent in the message meta-data. These descriptors will be signalled when
1176*da198e8fSThanos Makatos    the action defined by the action flags occurs. In ``AF_UNIX`` sockets, the
1177*da198e8fSThanos Makatos    descriptors are sent as ``SCM_RIGHTS`` type ancillary data.
1178*da198e8fSThanos Makatos    If no file descriptors are provided, this de-assigns the specified
1179*da198e8fSThanos Makatos    previously configured interrupts.
1180*da198e8fSThanos Makatos  * ``VFIO_IRQ_SET_ACTION_MASK`` indicates a masking event. It can be used with
1181*da198e8fSThanos Makatos    ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to mask an interrupt,
1182*da198e8fSThanos Makatos    or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event when the guest masks
1183*da198e8fSThanos Makatos    the interrupt.
1184*da198e8fSThanos Makatos  * ``VFIO_IRQ_SET_ACTION_UNMASK`` indicates an unmasking event. It can be used
1185*da198e8fSThanos Makatos    with ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to unmask an
1186*da198e8fSThanos Makatos    interrupt, or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event when the
1187*da198e8fSThanos Makatos    guest unmasks the interrupt.
1188*da198e8fSThanos Makatos  * ``VFIO_IRQ_SET_ACTION_TRIGGER`` indicates a triggering event. It can be used
1189*da198e8fSThanos Makatos    with ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to trigger an
1190*da198e8fSThanos Makatos    interrupt, or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event when the
1191*da198e8fSThanos Makatos    server triggers the interrupt.
1192*da198e8fSThanos Makatos
1193*da198e8fSThanos Makatos* *index* is the index of IRQ type being setup.
1194*da198e8fSThanos Makatos* *start* is the start of the sub-index being set.
1195*da198e8fSThanos Makatos* *count* describes the number of sub-indexes being set. As a special case, a
1196*da198e8fSThanos Makatos  count (and start) of 0, with data flags of ``VFIO_IRQ_SET_DATA_NONE`` disables
1197*da198e8fSThanos Makatos  all interrupts of the index.
1198*da198e8fSThanos Makatos* *data* is an optional field included when the
1199*da198e8fSThanos Makatos  ``VFIO_IRQ_SET_DATA_BOOL`` flag is present. It contains an array of booleans
1200*da198e8fSThanos Makatos  that specify whether the action is to be performed on the corresponding
1201*da198e8fSThanos Makatos  index. It's used when the action is only performed on a subset of the range
1202*da198e8fSThanos Makatos  specified.
1203*da198e8fSThanos Makatos
1204*da198e8fSThanos MakatosNot all interrupt types support every combination of data and action flags.
1205*da198e8fSThanos MakatosThe client must know the capabilities of the device and IRQ index before it
1206*da198e8fSThanos Makatossends a ``VFIO_USER_DEVICE_SET_IRQ`` message.
1207*da198e8fSThanos Makatos
1208*da198e8fSThanos MakatosIn typical operation, a specific IRQ may operate as follows:
1209*da198e8fSThanos Makatos
1210*da198e8fSThanos Makatos1. The client sends a ``VFIO_USER_DEVICE_SET_IRQ`` message with
1211*da198e8fSThanos Makatos   ``flags=(VFIO_IRQ_SET_DATA_EVENTFD|VFIO_IRQ_SET_ACTION_TRIGGER)`` along
1212*da198e8fSThanos Makatos   with an eventfd. This associates the IRQ with a particular eventfd on the
1213*da198e8fSThanos Makatos   server side.
1214*da198e8fSThanos Makatos
1215*da198e8fSThanos Makatos#. The client may send a ``VFIO_USER_DEVICE_SET_IRQ`` message with
1216*da198e8fSThanos Makatos   ``flags=(VFIO_IRQ_SET_DATA_EVENTFD|VFIO_IRQ_SET_ACTION_MASK/UNMASK)`` along
1217*da198e8fSThanos Makatos   with another eventfd. This associates the given eventfd with the
1218*da198e8fSThanos Makatos   mask/unmask state on the server side.
1219*da198e8fSThanos Makatos
1220*da198e8fSThanos Makatos#. The server may trigger the IRQ by writing 1 to the eventfd.
1221*da198e8fSThanos Makatos
1222*da198e8fSThanos Makatos#. The server may mask/unmask an IRQ which will write 1 to the corresponding
1223*da198e8fSThanos Makatos   mask/unmask eventfd, if there is one.
1224*da198e8fSThanos Makatos
1225*da198e8fSThanos Makatos5. A client may trigger a device IRQ itself, by sending a
1226*da198e8fSThanos Makatos   ``VFIO_USER_DEVICE_SET_IRQ`` message with
1227*da198e8fSThanos Makatos   ``flags=(VFIO_IRQ_SET_DATA_NONE/BOOL|VFIO_IRQ_SET_ACTION_TRIGGER)``.
1228*da198e8fSThanos Makatos
1229*da198e8fSThanos Makatos6. A client may mask or unmask the IRQ, by sending a
1230*da198e8fSThanos Makatos   ``VFIO_USER_DEVICE_SET_IRQ`` message with
1231*da198e8fSThanos Makatos   ``flags=(VFIO_IRQ_SET_DATA_NONE/BOOL|VFIO_IRQ_SET_ACTION_MASK/UNMASK)``.
1232*da198e8fSThanos Makatos
1233*da198e8fSThanos MakatosReply
1234*da198e8fSThanos Makatos^^^^^
1235*da198e8fSThanos Makatos
1236*da198e8fSThanos MakatosThere is no payload in the reply.
1237*da198e8fSThanos Makatos
1238*da198e8fSThanos Makatos.. _Read and Write Operations:
1239*da198e8fSThanos Makatos
1240*da198e8fSThanos MakatosNote that all of these operations must be supported by the client and/or server,
1241*da198e8fSThanos Makatoseven if the corresponding memory or device region has been shared as mappable.
1242*da198e8fSThanos Makatos
1243*da198e8fSThanos MakatosThe ``count`` field must not exceed the value of ``max_data_xfer_size`` of the
1244*da198e8fSThanos Makatospeer, for both reads and writes.
1245*da198e8fSThanos Makatos
1246*da198e8fSThanos Makatos``VFIO_USER_REGION_READ``
1247*da198e8fSThanos Makatos-------------------------
1248*da198e8fSThanos Makatos
1249*da198e8fSThanos MakatosIf a device region is not mappable, it's not directly accessible by the client
1250*da198e8fSThanos Makatosvia ``mmap()`` of the underlying file descriptor. In this case, a client can
1251*da198e8fSThanos Makatosread from a device region with this message.
1252*da198e8fSThanos Makatos
1253*da198e8fSThanos MakatosRequest
1254*da198e8fSThanos Makatos^^^^^^^
1255*da198e8fSThanos Makatos
1256*da198e8fSThanos Makatos+--------+--------+----------+
1257*da198e8fSThanos Makatos| Name   | Offset | Size     |
1258*da198e8fSThanos Makatos+========+========+==========+
1259*da198e8fSThanos Makatos| offset | 0      | 8        |
1260*da198e8fSThanos Makatos+--------+--------+----------+
1261*da198e8fSThanos Makatos| region | 8      | 4        |
1262*da198e8fSThanos Makatos+--------+--------+----------+
1263*da198e8fSThanos Makatos| count  | 12     | 4        |
1264*da198e8fSThanos Makatos+--------+--------+----------+
1265*da198e8fSThanos Makatos
1266*da198e8fSThanos Makatos* *offset* into the region being accessed.
1267*da198e8fSThanos Makatos* *region* is the index of the region being accessed.
1268*da198e8fSThanos Makatos* *count* is the size of the data to be transferred.
1269*da198e8fSThanos Makatos
1270*da198e8fSThanos MakatosReply
1271*da198e8fSThanos Makatos^^^^^
1272*da198e8fSThanos Makatos
1273*da198e8fSThanos Makatos+--------+--------+----------+
1274*da198e8fSThanos Makatos| Name   | Offset | Size     |
1275*da198e8fSThanos Makatos+========+========+==========+
1276*da198e8fSThanos Makatos| offset | 0      | 8        |
1277*da198e8fSThanos Makatos+--------+--------+----------+
1278*da198e8fSThanos Makatos| region | 8      | 4        |
1279*da198e8fSThanos Makatos+--------+--------+----------+
1280*da198e8fSThanos Makatos| count  | 12     | 4        |
1281*da198e8fSThanos Makatos+--------+--------+----------+
1282*da198e8fSThanos Makatos| data   | 16     | variable |
1283*da198e8fSThanos Makatos+--------+--------+----------+
1284*da198e8fSThanos Makatos
1285*da198e8fSThanos Makatos* *offset* into the region accessed.
1286*da198e8fSThanos Makatos* *region* is the index of the region accessed.
1287*da198e8fSThanos Makatos* *count* is the size of the data transferred.
1288*da198e8fSThanos Makatos* *data* is the data that was read from the device region.
1289*da198e8fSThanos Makatos
1290*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE``
1291*da198e8fSThanos Makatos--------------------------
1292*da198e8fSThanos Makatos
1293*da198e8fSThanos MakatosIf a device region is not mappable, it's not directly accessible by the client
1294*da198e8fSThanos Makatosvia mmap() of the underlying fd. In this case, a client can write to a device
1295*da198e8fSThanos Makatosregion with this message.
1296*da198e8fSThanos Makatos
1297*da198e8fSThanos MakatosRequest
1298*da198e8fSThanos Makatos^^^^^^^
1299*da198e8fSThanos Makatos
1300*da198e8fSThanos Makatos+--------+--------+----------+
1301*da198e8fSThanos Makatos| Name   | Offset | Size     |
1302*da198e8fSThanos Makatos+========+========+==========+
1303*da198e8fSThanos Makatos| offset | 0      | 8        |
1304*da198e8fSThanos Makatos+--------+--------+----------+
1305*da198e8fSThanos Makatos| region | 8      | 4        |
1306*da198e8fSThanos Makatos+--------+--------+----------+
1307*da198e8fSThanos Makatos| count  | 12     | 4        |
1308*da198e8fSThanos Makatos+--------+--------+----------+
1309*da198e8fSThanos Makatos| data   | 16     | variable |
1310*da198e8fSThanos Makatos+--------+--------+----------+
1311*da198e8fSThanos Makatos
1312*da198e8fSThanos Makatos* *offset* into the region being accessed.
1313*da198e8fSThanos Makatos* *region* is the index of the region being accessed.
1314*da198e8fSThanos Makatos* *count* is the size of the data to be transferred.
1315*da198e8fSThanos Makatos* *data* is the data to write
1316*da198e8fSThanos Makatos
1317*da198e8fSThanos MakatosReply
1318*da198e8fSThanos Makatos^^^^^
1319*da198e8fSThanos Makatos
1320*da198e8fSThanos Makatos+--------+--------+----------+
1321*da198e8fSThanos Makatos| Name   | Offset | Size     |
1322*da198e8fSThanos Makatos+========+========+==========+
1323*da198e8fSThanos Makatos| offset | 0      | 8        |
1324*da198e8fSThanos Makatos+--------+--------+----------+
1325*da198e8fSThanos Makatos| region | 8      | 4        |
1326*da198e8fSThanos Makatos+--------+--------+----------+
1327*da198e8fSThanos Makatos| count  | 12     | 4        |
1328*da198e8fSThanos Makatos+--------+--------+----------+
1329*da198e8fSThanos Makatos
1330*da198e8fSThanos Makatos* *offset* into the region accessed.
1331*da198e8fSThanos Makatos* *region* is the index of the region accessed.
1332*da198e8fSThanos Makatos* *count* is the size of the data transferred.
1333*da198e8fSThanos Makatos
1334*da198e8fSThanos Makatos``VFIO_USER_DMA_READ``
1335*da198e8fSThanos Makatos-----------------------
1336*da198e8fSThanos Makatos
1337*da198e8fSThanos MakatosIf the client has not shared mappable memory, the server can use this message to
1338*da198e8fSThanos Makatosread from guest memory.
1339*da198e8fSThanos Makatos
1340*da198e8fSThanos MakatosRequest
1341*da198e8fSThanos Makatos^^^^^^^
1342*da198e8fSThanos Makatos
1343*da198e8fSThanos Makatos+---------+--------+----------+
1344*da198e8fSThanos Makatos| Name    | Offset | Size     |
1345*da198e8fSThanos Makatos+=========+========+==========+
1346*da198e8fSThanos Makatos| address | 0      | 8        |
1347*da198e8fSThanos Makatos+---------+--------+----------+
1348*da198e8fSThanos Makatos| count   | 8      | 8        |
1349*da198e8fSThanos Makatos+---------+--------+----------+
1350*da198e8fSThanos Makatos
1351*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed. This address must have
1352*da198e8fSThanos Makatos  been previously exported to the server with a ``VFIO_USER_DMA_MAP`` message.
1353*da198e8fSThanos Makatos* *count* is the size of the data to be transferred.
1354*da198e8fSThanos Makatos
1355*da198e8fSThanos MakatosReply
1356*da198e8fSThanos Makatos^^^^^
1357*da198e8fSThanos Makatos
1358*da198e8fSThanos Makatos+---------+--------+----------+
1359*da198e8fSThanos Makatos| Name    | Offset | Size     |
1360*da198e8fSThanos Makatos+=========+========+==========+
1361*da198e8fSThanos Makatos| address | 0      | 8        |
1362*da198e8fSThanos Makatos+---------+--------+----------+
1363*da198e8fSThanos Makatos| count   | 8      | 8        |
1364*da198e8fSThanos Makatos+---------+--------+----------+
1365*da198e8fSThanos Makatos| data    | 16     | variable |
1366*da198e8fSThanos Makatos+---------+--------+----------+
1367*da198e8fSThanos Makatos
1368*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed.
1369*da198e8fSThanos Makatos* *count* is the size of the data transferred.
1370*da198e8fSThanos Makatos* *data* is the data read.
1371*da198e8fSThanos Makatos
1372*da198e8fSThanos Makatos``VFIO_USER_DMA_WRITE``
1373*da198e8fSThanos Makatos-----------------------
1374*da198e8fSThanos Makatos
1375*da198e8fSThanos MakatosIf the client has not shared mappable memory, the server can use this message to
1376*da198e8fSThanos Makatoswrite to guest memory.
1377*da198e8fSThanos Makatos
1378*da198e8fSThanos MakatosRequest
1379*da198e8fSThanos Makatos^^^^^^^
1380*da198e8fSThanos Makatos
1381*da198e8fSThanos Makatos+---------+--------+----------+
1382*da198e8fSThanos Makatos| Name    | Offset | Size     |
1383*da198e8fSThanos Makatos+=========+========+==========+
1384*da198e8fSThanos Makatos| address | 0      | 8        |
1385*da198e8fSThanos Makatos+---------+--------+----------+
1386*da198e8fSThanos Makatos| count   | 8      | 8        |
1387*da198e8fSThanos Makatos+---------+--------+----------+
1388*da198e8fSThanos Makatos| data    | 16     | variable |
1389*da198e8fSThanos Makatos+---------+--------+----------+
1390*da198e8fSThanos Makatos
1391*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed. This address must have
1392*da198e8fSThanos Makatos  been previously exported to the server with a ``VFIO_USER_DMA_MAP`` message.
1393*da198e8fSThanos Makatos* *count* is the size of the data to be transferred.
1394*da198e8fSThanos Makatos* *data* is the data to write
1395*da198e8fSThanos Makatos
1396*da198e8fSThanos MakatosReply
1397*da198e8fSThanos Makatos^^^^^
1398*da198e8fSThanos Makatos
1399*da198e8fSThanos Makatos+---------+--------+----------+
1400*da198e8fSThanos Makatos| Name    | Offset | Size     |
1401*da198e8fSThanos Makatos+=========+========+==========+
1402*da198e8fSThanos Makatos| address | 0      | 8        |
1403*da198e8fSThanos Makatos+---------+--------+----------+
1404*da198e8fSThanos Makatos| count   | 8      | 4        |
1405*da198e8fSThanos Makatos+---------+--------+----------+
1406*da198e8fSThanos Makatos
1407*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed.
1408*da198e8fSThanos Makatos* *count* is the size of the data transferred.
1409*da198e8fSThanos Makatos
1410*da198e8fSThanos Makatos``VFIO_USER_DEVICE_RESET``
1411*da198e8fSThanos Makatos--------------------------
1412*da198e8fSThanos Makatos
1413*da198e8fSThanos MakatosThis command message is sent from the client to the server to reset the device.
1414*da198e8fSThanos MakatosNeither the request or reply have a payload.
1415*da198e8fSThanos Makatos
1416*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE_MULTI``
1417*da198e8fSThanos Makatos--------------------------------
1418*da198e8fSThanos Makatos
1419*da198e8fSThanos MakatosThis message can be used to coalesce multiple device write operations
1420*da198e8fSThanos Makatosinto a single messgage.  It is only used as an optimization when the
1421*da198e8fSThanos Makatosoutgoing message queue is relatively full.
1422*da198e8fSThanos Makatos
1423*da198e8fSThanos MakatosRequest
1424*da198e8fSThanos Makatos^^^^^^^
1425*da198e8fSThanos Makatos
1426*da198e8fSThanos Makatos+---------+--------+----------+
1427*da198e8fSThanos Makatos| Name    | Offset | Size     |
1428*da198e8fSThanos Makatos+=========+========+==========+
1429*da198e8fSThanos Makatos| wr_cnt  | 0      | 8        |
1430*da198e8fSThanos Makatos+---------+--------+----------+
1431*da198e8fSThanos Makatos| wrs     | 8      | variable |
1432*da198e8fSThanos Makatos+---------+--------+----------+
1433*da198e8fSThanos Makatos
1434*da198e8fSThanos Makatos* *wr_cnt* is the number of device writes coalesced in the message
1435*da198e8fSThanos Makatos* *wrs* is an array of device writes defined below
1436*da198e8fSThanos Makatos
1437*da198e8fSThanos MakatosSingle Device Write Format
1438*da198e8fSThanos Makatos""""""""""""""""""""""""""
1439*da198e8fSThanos Makatos
1440*da198e8fSThanos Makatos+--------+--------+----------+
1441*da198e8fSThanos Makatos| Name   | Offset | Size     |
1442*da198e8fSThanos Makatos+========+========+==========+
1443*da198e8fSThanos Makatos| offset | 0      | 8        |
1444*da198e8fSThanos Makatos+--------+--------+----------+
1445*da198e8fSThanos Makatos| region | 8      | 4        |
1446*da198e8fSThanos Makatos+--------+--------+----------+
1447*da198e8fSThanos Makatos| count  | 12     | 4        |
1448*da198e8fSThanos Makatos+--------+--------+----------+
1449*da198e8fSThanos Makatos| data   | 16     | 8        |
1450*da198e8fSThanos Makatos+--------+--------+----------+
1451*da198e8fSThanos Makatos
1452*da198e8fSThanos Makatos* *offset* into the region being accessed.
1453*da198e8fSThanos Makatos* *region* is the index of the region being accessed.
1454*da198e8fSThanos Makatos* *count* is the size of the data to be transferred.  This format can
1455*da198e8fSThanos Makatos  only describe writes of 8 bytes or less.
1456*da198e8fSThanos Makatos* *data* is the data to write.
1457*da198e8fSThanos Makatos
1458*da198e8fSThanos MakatosReply
1459*da198e8fSThanos Makatos^^^^^
1460*da198e8fSThanos Makatos
1461*da198e8fSThanos Makatos+---------+--------+----------+
1462*da198e8fSThanos Makatos| Name    | Offset | Size     |
1463*da198e8fSThanos Makatos+=========+========+==========+
1464*da198e8fSThanos Makatos| wr_cnt  | 0      | 8        |
1465*da198e8fSThanos Makatos+---------+--------+----------+
1466*da198e8fSThanos Makatos
1467*da198e8fSThanos Makatos* *wr_cnt* is the number of device writes completed.
1468*da198e8fSThanos Makatos
1469*da198e8fSThanos Makatos
1470*da198e8fSThanos MakatosAppendices
1471*da198e8fSThanos Makatos==========
1472*da198e8fSThanos Makatos
1473*da198e8fSThanos MakatosUnused VFIO ``ioctl()`` commands
1474*da198e8fSThanos Makatos--------------------------------
1475*da198e8fSThanos Makatos
1476*da198e8fSThanos MakatosThe following VFIO commands do not have an equivalent vfio-user command:
1477*da198e8fSThanos Makatos
1478*da198e8fSThanos Makatos* ``VFIO_GET_API_VERSION``
1479*da198e8fSThanos Makatos* ``VFIO_CHECK_EXTENSION``
1480*da198e8fSThanos Makatos* ``VFIO_SET_IOMMU``
1481*da198e8fSThanos Makatos* ``VFIO_GROUP_GET_STATUS``
1482*da198e8fSThanos Makatos* ``VFIO_GROUP_SET_CONTAINER``
1483*da198e8fSThanos Makatos* ``VFIO_GROUP_UNSET_CONTAINER``
1484*da198e8fSThanos Makatos* ``VFIO_GROUP_GET_DEVICE_FD``
1485*da198e8fSThanos Makatos* ``VFIO_IOMMU_GET_INFO``
1486*da198e8fSThanos Makatos
1487*da198e8fSThanos MakatosHowever, once support for live migration for VFIO devices is finalized some
1488*da198e8fSThanos Makatosof the above commands may have to be handled by the client in their
1489*da198e8fSThanos Makatoscorresponding vfio-user form. This will be addressed in a future protocol
1490*da198e8fSThanos Makatosversion.
1491*da198e8fSThanos Makatos
1492*da198e8fSThanos MakatosVFIO groups and containers
1493*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^^^^^^^
1494*da198e8fSThanos Makatos
1495*da198e8fSThanos MakatosThe current VFIO implementation includes group and container idioms that
1496*da198e8fSThanos Makatosdescribe how a device relates to the host IOMMU. In the vfio-user
1497*da198e8fSThanos Makatosimplementation, the IOMMU is implemented in SW by the client, and is not
1498*da198e8fSThanos Makatosvisible to the server. The simplest idea would be that the client put each
1499*da198e8fSThanos Makatosdevice into its own group and container.
1500*da198e8fSThanos Makatos
1501*da198e8fSThanos MakatosBackend Program Conventions
1502*da198e8fSThanos Makatos---------------------------
1503*da198e8fSThanos Makatos
1504*da198e8fSThanos Makatosvfio-user backend program conventions are based on the vhost-user ones.
1505*da198e8fSThanos Makatos
1506*da198e8fSThanos Makatos* The backend program must not daemonize itself.
1507*da198e8fSThanos Makatos* No assumptions must be made as to what access the backend program has on the
1508*da198e8fSThanos Makatos  system.
1509*da198e8fSThanos Makatos* File descriptors 0, 1 and 2 must exist, must have regular
1510*da198e8fSThanos Makatos  stdin/stdout/stderr semantics, and can be redirected.
1511*da198e8fSThanos Makatos* The backend program must honor the SIGTERM signal.
1512*da198e8fSThanos Makatos* The backend program must accept the following commands line options:
1513*da198e8fSThanos Makatos
1514*da198e8fSThanos Makatos  * ``--socket-path=PATH``: path to UNIX domain socket,
1515*da198e8fSThanos Makatos  * ``--fd=FDNUM``: file descriptor for UNIX domain socket, incompatible with
1516*da198e8fSThanos Makatos    ``--socket-path``
1517*da198e8fSThanos Makatos* The backend program must be accompanied with a JSON file stored under
1518*da198e8fSThanos Makatos  ``/usr/share/vfio-user``.
1519*da198e8fSThanos Makatos
1520*da198e8fSThanos MakatosTODO add schema similar to docs/interop/vhost-user.json.
1521