1*da198e8fSThanos Makatos.. include:: <isonum.txt> 2*da198e8fSThanos Makatos.. SPDX-License-Identifier: GPL-2.0-or-later 3*da198e8fSThanos Makatos 4*da198e8fSThanos Makatos================================ 5*da198e8fSThanos Makatosvfio-user Protocol Specification 6*da198e8fSThanos Makatos================================ 7*da198e8fSThanos Makatos 8*da198e8fSThanos Makatos.. contents:: Table of Contents 9*da198e8fSThanos Makatos 10*da198e8fSThanos MakatosIntroduction 11*da198e8fSThanos Makatos============ 12*da198e8fSThanos Makatosvfio-user is a protocol that allows a device to be emulated in a separate 13*da198e8fSThanos Makatosprocess outside of a Virtual Machine Monitor (VMM). vfio-user devices consist 14*da198e8fSThanos Makatosof a generic VFIO device type, living inside the VMM, which we call the client, 15*da198e8fSThanos Makatosand the core device implementation, living outside the VMM, which we call the 16*da198e8fSThanos Makatosserver. 17*da198e8fSThanos Makatos 18*da198e8fSThanos MakatosThe vfio-user specification is partly based on the 19*da198e8fSThanos Makatos`Linux VFIO ioctl interface <https://www.kernel.org/doc/html/latest/driver-api/vfio.html>`_. 20*da198e8fSThanos Makatos 21*da198e8fSThanos MakatosVFIO is a mature and stable API, backed by an extensively used framework. The 22*da198e8fSThanos Makatosexisting VFIO client implementation in QEMU (``qemu/hw/vfio/``) can be largely 23*da198e8fSThanos Makatosre-used, though there is nothing in this specification that requires that 24*da198e8fSThanos Makatosparticular implementation. None of the VFIO kernel modules are required for 25*da198e8fSThanos Makatossupporting the protocol, on either the client or server side. Some source 26*da198e8fSThanos Makatosdefinitions in VFIO are re-used for vfio-user. 27*da198e8fSThanos Makatos 28*da198e8fSThanos MakatosThe main idea is to allow a virtual device to function in a separate process in 29*da198e8fSThanos Makatosthe same host over a UNIX domain socket. A UNIX domain socket (``AF_UNIX``) is 30*da198e8fSThanos Makatoschosen because file descriptors can be trivially sent over it, which in turn 31*da198e8fSThanos Makatosallows: 32*da198e8fSThanos Makatos 33*da198e8fSThanos Makatos* Sharing of client memory for DMA with the server. 34*da198e8fSThanos Makatos* Sharing of server memory with the client for fast MMIO. 35*da198e8fSThanos Makatos* Efficient sharing of eventfd's for triggering interrupts. 36*da198e8fSThanos Makatos 37*da198e8fSThanos MakatosOther socket types could be used which allow the server to run in a separate 38*da198e8fSThanos Makatosguest in the same host (``AF_VSOCK``) or remotely (``AF_INET``). Theoretically 39*da198e8fSThanos Makatosthe underlying transport does not necessarily have to be a socket, however we do 40*da198e8fSThanos Makatosnot examine such alternatives. In this protocol version we focus on using a UNIX 41*da198e8fSThanos Makatosdomain socket and introduce basic support for the other two types of sockets 42*da198e8fSThanos Makatoswithout considering performance implications. 43*da198e8fSThanos Makatos 44*da198e8fSThanos MakatosWhile passing of file descriptors is desirable for performance reasons, support 45*da198e8fSThanos Makatosis not necessary for either the client or the server in order to implement the 46*da198e8fSThanos Makatosprotocol. There is always an in-band, message-passing fall back mechanism. 47*da198e8fSThanos Makatos 48*da198e8fSThanos MakatosOverview 49*da198e8fSThanos Makatos======== 50*da198e8fSThanos Makatos 51*da198e8fSThanos MakatosVFIO is a framework that allows a physical device to be securely passed through 52*da198e8fSThanos Makatosto a user space process; the device-specific kernel driver does not drive the 53*da198e8fSThanos Makatosdevice at all. Typically, the user space process is a VMM and the device is 54*da198e8fSThanos Makatospassed through to it in order to achieve high performance. VFIO provides an API 55*da198e8fSThanos Makatosand the required functionality in the kernel. QEMU has adopted VFIO to allow a 56*da198e8fSThanos Makatosguest to directly access physical devices, instead of emulating them in 57*da198e8fSThanos Makatossoftware. 58*da198e8fSThanos Makatos 59*da198e8fSThanos Makatosvfio-user reuses the core VFIO concepts defined in its API, but implements them 60*da198e8fSThanos Makatosas messages to be sent over a socket. It does not change the kernel-based VFIO 61*da198e8fSThanos Makatosin any way, in fact none of the VFIO kernel modules need to be loaded to use 62*da198e8fSThanos Makatosvfio-user. It is also possible for the client to concurrently use the current 63*da198e8fSThanos Makatoskernel-based VFIO for one device, and vfio-user for another device. 64*da198e8fSThanos Makatos 65*da198e8fSThanos MakatosVFIO Device Model 66*da198e8fSThanos Makatos----------------- 67*da198e8fSThanos Makatos 68*da198e8fSThanos MakatosA device under VFIO presents a standard interface to the user process. Many of 69*da198e8fSThanos Makatosthe VFIO operations in the existing interface use the ``ioctl()`` system call, and 70*da198e8fSThanos Makatosreferences to the existing interface are called the ``ioctl()`` implementation in 71*da198e8fSThanos Makatosthis document. 72*da198e8fSThanos Makatos 73*da198e8fSThanos MakatosThe following sections describe the set of messages that implement the vfio-user 74*da198e8fSThanos Makatosinterface over a socket. In many cases, the messages are analogous to data 75*da198e8fSThanos Makatosstructures used in the ``ioctl()`` implementation. Messages derived from the 76*da198e8fSThanos Makatos``ioctl()`` will have a name derived from the ``ioctl()`` command name. E.g., the 77*da198e8fSThanos Makatos``VFIO_DEVICE_GET_INFO`` ``ioctl()`` command becomes a 78*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_INFO`` message. The purpose of this reuse is to share as 79*da198e8fSThanos Makatosmuch code as feasible with the ``ioctl()`` implementation``. 80*da198e8fSThanos Makatos 81*da198e8fSThanos MakatosConnection Initiation 82*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^^ 83*da198e8fSThanos Makatos 84*da198e8fSThanos MakatosAfter the client connects to the server, the initial client message is 85*da198e8fSThanos Makatos``VFIO_USER_VERSION`` to propose a protocol version and set of capabilities to 86*da198e8fSThanos Makatosapply to the session. The server replies with a compatible version and set of 87*da198e8fSThanos Makatoscapabilities it supports, or closes the connection if it cannot support the 88*da198e8fSThanos Makatosadvertised version. 89*da198e8fSThanos Makatos 90*da198e8fSThanos MakatosDevice Information 91*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^ 92*da198e8fSThanos Makatos 93*da198e8fSThanos MakatosThe client uses a ``VFIO_USER_DEVICE_GET_INFO`` message to query the server for 94*da198e8fSThanos Makatosinformation about the device. This information includes: 95*da198e8fSThanos Makatos 96*da198e8fSThanos Makatos* The device type and whether it supports reset (``VFIO_DEVICE_FLAGS_``), 97*da198e8fSThanos Makatos* the number of device regions, and 98*da198e8fSThanos Makatos* the device presents to the client the number of interrupt types the device 99*da198e8fSThanos Makatos supports. 100*da198e8fSThanos Makatos 101*da198e8fSThanos MakatosRegion Information 102*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^ 103*da198e8fSThanos Makatos 104*da198e8fSThanos MakatosThe client uses ``VFIO_USER_DEVICE_GET_REGION_INFO`` messages to query the 105*da198e8fSThanos Makatosserver for information about the device's regions. This information describes: 106*da198e8fSThanos Makatos 107*da198e8fSThanos Makatos* Read and write permissions, whether it can be memory mapped, and whether it 108*da198e8fSThanos Makatos supports additional capabilities (``VFIO_REGION_INFO_CAP_``). 109*da198e8fSThanos Makatos* Region index, size, and offset. 110*da198e8fSThanos Makatos 111*da198e8fSThanos MakatosWhen a device region can be mapped by the client, the server provides a file 112*da198e8fSThanos Makatosdescriptor which the client can ``mmap()``. The server is responsible for 113*da198e8fSThanos Makatospolling for client updates to memory mapped regions. 114*da198e8fSThanos Makatos 115*da198e8fSThanos MakatosRegion Capabilities 116*da198e8fSThanos Makatos""""""""""""""""""" 117*da198e8fSThanos Makatos 118*da198e8fSThanos MakatosSome regions have additional capabilities that cannot be described adequately 119*da198e8fSThanos Makatosby the region info data structure. These capabilities are returned in the 120*da198e8fSThanos Makatosregion info reply in a list similar to PCI capabilities in a PCI device's 121*da198e8fSThanos Makatosconfiguration space. 122*da198e8fSThanos Makatos 123*da198e8fSThanos MakatosSparse Regions 124*da198e8fSThanos Makatos"""""""""""""" 125*da198e8fSThanos MakatosA region can be memory-mappable in whole or in part. When only a subset of a 126*da198e8fSThanos Makatosregion can be mapped by the client, a ``VFIO_REGION_INFO_CAP_SPARSE_MMAP`` 127*da198e8fSThanos Makatoscapability is included in the region info reply. This capability describes 128*da198e8fSThanos Makatoswhich portions can be mapped by the client. 129*da198e8fSThanos Makatos 130*da198e8fSThanos Makatos.. Note:: 131*da198e8fSThanos Makatos For example, in a virtual NVMe controller, sparse regions can be used so 132*da198e8fSThanos Makatos that accesses to the NVMe registers (found in the beginning of BAR0) are 133*da198e8fSThanos Makatos trapped (an infrequent event), while allowing direct access to the doorbells 134*da198e8fSThanos Makatos (an extremely frequent event as every I/O submission requires a write to 135*da198e8fSThanos Makatos BAR0), found in the next page after the NVMe registers in BAR0. 136*da198e8fSThanos Makatos 137*da198e8fSThanos MakatosDevice-Specific Regions 138*da198e8fSThanos Makatos""""""""""""""""""""""" 139*da198e8fSThanos Makatos 140*da198e8fSThanos MakatosA device can define regions additional to the standard ones (e.g. PCI indexes 141*da198e8fSThanos Makatos0-8). This is achieved by including a ``VFIO_REGION_INFO_CAP_TYPE`` capability 142*da198e8fSThanos Makatosin the region info reply of a device-specific region. Such regions are reflected 143*da198e8fSThanos Makatosin ``struct vfio_user_device_info.num_regions``. Thus, for PCI devices this 144*da198e8fSThanos Makatosvalue can be equal to, or higher than, ``VFIO_PCI_NUM_REGIONS``. 145*da198e8fSThanos Makatos 146*da198e8fSThanos MakatosRegion I/O via file descriptors 147*da198e8fSThanos Makatos------------------------------- 148*da198e8fSThanos Makatos 149*da198e8fSThanos MakatosFor unmapped regions, region I/O from the client is done via 150*da198e8fSThanos Makatos``VFIO_USER_REGION_READ/WRITE``. As an optimization, ioeventfds or ioregionfds 151*da198e8fSThanos Makatosmay be configured for sub-regions of some regions. A client may request 152*da198e8fSThanos Makatosinformation on these sub-regions via ``VFIO_USER_DEVICE_GET_REGION_IO_FDS``; by 153*da198e8fSThanos Makatosconfiguring the returned file descriptors as ioeventfds or ioregionfds, the 154*da198e8fSThanos Makatosserver can be directly notified of I/O (for example, by KVM) without taking a 155*da198e8fSThanos Makatostrip through the client. 156*da198e8fSThanos Makatos 157*da198e8fSThanos MakatosInterrupts 158*da198e8fSThanos Makatos^^^^^^^^^^ 159*da198e8fSThanos Makatos 160*da198e8fSThanos MakatosThe client uses ``VFIO_USER_DEVICE_GET_IRQ_INFO`` messages to query the server 161*da198e8fSThanos Makatosfor the device's interrupt types. The interrupt types are specific to the bus 162*da198e8fSThanos Makatosthe device is attached to, and the client is expected to know the capabilities 163*da198e8fSThanos Makatosof each interrupt type. The server can signal an interrupt by directly injecting 164*da198e8fSThanos Makatosinterrupts into the guest via an event file descriptor. The client configures 165*da198e8fSThanos Makatoshow the server signals an interrupt with ``VFIO_USER_SET_IRQS`` messages. 166*da198e8fSThanos Makatos 167*da198e8fSThanos MakatosDevice Read and Write 168*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^^ 169*da198e8fSThanos Makatos 170*da198e8fSThanos MakatosWhen the guest executes load or store operations to an unmapped device region, 171*da198e8fSThanos Makatosthe client forwards these operations to the server with 172*da198e8fSThanos Makatos``VFIO_USER_REGION_READ`` or ``VFIO_USER_REGION_WRITE`` messages. The server 173*da198e8fSThanos Makatoswill reply with data from the device on read operations or an acknowledgement on 174*da198e8fSThanos Makatoswrite operations. See `Read and Write Operations`_. 175*da198e8fSThanos Makatos 176*da198e8fSThanos MakatosClient memory access 177*da198e8fSThanos Makatos-------------------- 178*da198e8fSThanos Makatos 179*da198e8fSThanos MakatosThe client uses ``VFIO_USER_DMA_MAP`` and ``VFIO_USER_DMA_UNMAP`` messages to 180*da198e8fSThanos Makatosinform the server of the valid DMA ranges that the server can access on behalf 181*da198e8fSThanos Makatosof a device (typically, VM guest memory). DMA memory may be accessed by the 182*da198e8fSThanos Makatosserver via ``VFIO_USER_DMA_READ`` and ``VFIO_USER_DMA_WRITE`` messages over the 183*da198e8fSThanos Makatossocket. In this case, the "DMA" part of the naming is a misnomer. 184*da198e8fSThanos Makatos 185*da198e8fSThanos MakatosActual direct memory access of client memory from the server is possible if the 186*da198e8fSThanos Makatosclient provides file descriptors the server can ``mmap()``. Note that ``mmap()`` 187*da198e8fSThanos Makatosprivileges cannot be revoked by the client, therefore file descriptors should 188*da198e8fSThanos Makatosonly be exported in environments where the client trusts the server not to 189*da198e8fSThanos Makatoscorrupt guest memory. 190*da198e8fSThanos Makatos 191*da198e8fSThanos MakatosSee `Read and Write Operations`_. 192*da198e8fSThanos Makatos 193*da198e8fSThanos MakatosClient/server interactions 194*da198e8fSThanos Makatos========================== 195*da198e8fSThanos Makatos 196*da198e8fSThanos MakatosSocket 197*da198e8fSThanos Makatos------ 198*da198e8fSThanos Makatos 199*da198e8fSThanos MakatosA server can serve: 200*da198e8fSThanos Makatos 201*da198e8fSThanos Makatos1) one or more clients, and/or 202*da198e8fSThanos Makatos2) one or more virtual devices, belonging to one or more clients. 203*da198e8fSThanos Makatos 204*da198e8fSThanos MakatosThe current protocol specification requires a dedicated socket per 205*da198e8fSThanos Makatosclient/server connection. It is a server-side implementation detail whether a 206*da198e8fSThanos Makatossingle server handles multiple virtual devices from the same or multiple 207*da198e8fSThanos Makatosclients. The location of the socket is implementation-specific. Multiplexing 208*da198e8fSThanos Makatosclients, devices, and servers over the same socket is not supported in this 209*da198e8fSThanos Makatosversion of the protocol. 210*da198e8fSThanos Makatos 211*da198e8fSThanos MakatosAuthentication 212*da198e8fSThanos Makatos-------------- 213*da198e8fSThanos Makatos 214*da198e8fSThanos MakatosFor ``AF_UNIX``, we rely on OS mandatory access controls on the socket files, 215*da198e8fSThanos Makatostherefore it is up to the management layer to set up the socket as required. 216*da198e8fSThanos MakatosSocket types that span guests or hosts will require a proper authentication 217*da198e8fSThanos Makatosmechanism. Defining that mechanism is deferred to a future version of the 218*da198e8fSThanos Makatosprotocol. 219*da198e8fSThanos Makatos 220*da198e8fSThanos MakatosCommand Concurrency 221*da198e8fSThanos Makatos------------------- 222*da198e8fSThanos Makatos 223*da198e8fSThanos MakatosA client may pipeline multiple commands without waiting for previous command 224*da198e8fSThanos Makatosreplies. The server will process commands in the order they are received. A 225*da198e8fSThanos Makatosconsequence of this is if a client issues a command with the *No_reply* bit, 226*da198e8fSThanos Makatosthen subsequently issues a command without *No_reply*, the older command will 227*da198e8fSThanos Makatoshave been processed before the reply to the younger command is sent by the 228*da198e8fSThanos Makatosserver. The client must be aware of the device's capability to process 229*da198e8fSThanos Makatosconcurrent commands if pipelining is used. For example, pipelining allows 230*da198e8fSThanos Makatosmultiple client threads to concurrently access device regions; the client must 231*da198e8fSThanos Makatosensure these accesses obey device semantics. 232*da198e8fSThanos Makatos 233*da198e8fSThanos MakatosAn example is a frame buffer device, where the device may allow concurrent 234*da198e8fSThanos Makatosaccess to different areas of video memory, but may have indeterminate behavior 235*da198e8fSThanos Makatosif concurrent accesses are performed to command or status registers. 236*da198e8fSThanos Makatos 237*da198e8fSThanos MakatosNote that unrelated messages sent from the server to the client can appear in 238*da198e8fSThanos Makatosbetween a client to server request/reply and vice versa. 239*da198e8fSThanos Makatos 240*da198e8fSThanos MakatosImplementers should be prepared for certain commands to exhibit potentially 241*da198e8fSThanos Makatosunbounded latencies. For example, ``VFIO_USER_DEVICE_RESET`` may take an 242*da198e8fSThanos Makatosarbitrarily long time to complete; clients should take care not to block 243*da198e8fSThanos Makatosunnecessarily. 244*da198e8fSThanos Makatos 245*da198e8fSThanos MakatosSocket Disconnection Behavior 246*da198e8fSThanos Makatos----------------------------- 247*da198e8fSThanos MakatosThe server and the client can disconnect from each other, either intentionally 248*da198e8fSThanos Makatosor unexpectedly. Both the client and the server need to know how to handle such 249*da198e8fSThanos Makatosevents. 250*da198e8fSThanos Makatos 251*da198e8fSThanos MakatosServer Disconnection 252*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^ 253*da198e8fSThanos MakatosA server disconnecting from the client may indicate that: 254*da198e8fSThanos Makatos 255*da198e8fSThanos Makatos1) A virtual device has been restarted, either intentionally (e.g. because of a 256*da198e8fSThanos Makatos device update) or unintentionally (e.g. because of a crash). 257*da198e8fSThanos Makatos2) A virtual device has been shut down with no intention to be restarted. 258*da198e8fSThanos Makatos 259*da198e8fSThanos MakatosIt is impossible for the client to know whether or not a failure is 260*da198e8fSThanos Makatosintermittent or innocuous and should be retried, therefore the client should 261*da198e8fSThanos Makatosreset the VFIO device when it detects the socket has been disconnected. 262*da198e8fSThanos MakatosError recovery will be driven by the guest's device error handling 263*da198e8fSThanos Makatosbehavior. 264*da198e8fSThanos Makatos 265*da198e8fSThanos MakatosClient Disconnection 266*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^ 267*da198e8fSThanos MakatosThe client disconnecting from the server primarily means that the client 268*da198e8fSThanos Makatoshas exited. Currently, this means that the guest is shut down so the device is 269*da198e8fSThanos Makatosno longer needed therefore the server can automatically exit. However, there 270*da198e8fSThanos Makatoscan be cases where a client disconnection should not result in a server exit: 271*da198e8fSThanos Makatos 272*da198e8fSThanos Makatos1) A single server serving multiple clients. 273*da198e8fSThanos Makatos2) A multi-process QEMU upgrading itself step by step, which is not yet 274*da198e8fSThanos Makatos implemented. 275*da198e8fSThanos Makatos 276*da198e8fSThanos MakatosTherefore in order for the protocol to be forward compatible, the server should 277*da198e8fSThanos Makatosrespond to a client disconnection as follows: 278*da198e8fSThanos Makatos 279*da198e8fSThanos Makatos - all client memory regions are unmapped and cleaned up (including closing any 280*da198e8fSThanos Makatos passed file descriptors) 281*da198e8fSThanos Makatos - all IRQ file descriptors passed from the old client are closed 282*da198e8fSThanos Makatos - the device state should otherwise be retained 283*da198e8fSThanos Makatos 284*da198e8fSThanos MakatosThe expectation is that when a client reconnects, it will re-establish IRQ and 285*da198e8fSThanos Makatosclient memory mappings. 286*da198e8fSThanos Makatos 287*da198e8fSThanos MakatosIf anything happens to the client (such as qemu really did exit), the control 288*da198e8fSThanos Makatosstack will know about it and can clean up resources accordingly. 289*da198e8fSThanos Makatos 290*da198e8fSThanos MakatosSecurity Considerations 291*da198e8fSThanos Makatos----------------------- 292*da198e8fSThanos Makatos 293*da198e8fSThanos MakatosSpeaking generally, vfio-user clients should not trust servers, and vice versa. 294*da198e8fSThanos MakatosStandard tools and mechanisms should be used on both sides to validate input and 295*da198e8fSThanos Makatosprevent against denial of service scenarios, buffer overflow, etc. 296*da198e8fSThanos Makatos 297*da198e8fSThanos MakatosRequest Retry and Response Timeout 298*da198e8fSThanos Makatos---------------------------------- 299*da198e8fSThanos MakatosA failed command is a command that has been successfully sent and has been 300*da198e8fSThanos Makatosresponded to with an error code. Failure to send the command in the first place 301*da198e8fSThanos Makatos(e.g. because the socket is disconnected) is a different type of error examined 302*da198e8fSThanos Makatosearlier in the disconnect section. 303*da198e8fSThanos Makatos 304*da198e8fSThanos Makatos.. Note:: 305*da198e8fSThanos Makatos QEMU's VFIO retries certain operations if they fail. While this makes sense 306*da198e8fSThanos Makatos for real HW, we don't know for sure whether it makes sense for virtual 307*da198e8fSThanos Makatos devices. 308*da198e8fSThanos Makatos 309*da198e8fSThanos MakatosDefining a retry and timeout scheme is deferred to a future version of the 310*da198e8fSThanos Makatosprotocol. 311*da198e8fSThanos Makatos 312*da198e8fSThanos MakatosMessage sizes 313*da198e8fSThanos Makatos------------- 314*da198e8fSThanos Makatos 315*da198e8fSThanos MakatosSome requests have an ``argsz`` field. In a request, it defines the maximum 316*da198e8fSThanos Makatosexpected reply payload size, which should be at least the size of the fixed 317*da198e8fSThanos Makatosreply payload headers defined here. The *request* payload size is defined by the 318*da198e8fSThanos Makatosusual ``msg_size`` field in the header, not the ``argsz`` field. 319*da198e8fSThanos Makatos 320*da198e8fSThanos MakatosIn a reply, the server sets ``argsz`` field to the size needed for a full 321*da198e8fSThanos Makatospayload size. This may be less than the requested maximum size. This may be 322*da198e8fSThanos Makatoslarger than the requested maximum size: in that case, the full payload is not 323*da198e8fSThanos Makatosincluded in the reply, but the ``argsz`` field in the reply indicates the needed 324*da198e8fSThanos Makatossize, allowing a client to allocate a larger buffer for holding the reply before 325*da198e8fSThanos Makatostrying again. 326*da198e8fSThanos Makatos 327*da198e8fSThanos MakatosIn addition, during negotiation (see `Version`_), the client and server may 328*da198e8fSThanos Makatoseach specify a ``max_data_xfer_size`` value; this defines the maximum data that 329*da198e8fSThanos Makatosmay be read or written via one of the ``VFIO_USER_DMA/REGION_READ/WRITE`` 330*da198e8fSThanos Makatosmessages; see `Read and Write Operations`_. 331*da198e8fSThanos Makatos 332*da198e8fSThanos MakatosProtocol Specification 333*da198e8fSThanos Makatos====================== 334*da198e8fSThanos Makatos 335*da198e8fSThanos MakatosTo distinguish from the base VFIO symbols, all vfio-user symbols are prefixed 336*da198e8fSThanos Makatoswith ``vfio_user`` or ``VFIO_USER``. In this revision, all data is in the 337*da198e8fSThanos Makatosendianness of the host system, although this may be relaxed in future 338*da198e8fSThanos Makatosrevisions in cases where the client and server run on different hosts 339*da198e8fSThanos Makatoswith different endianness. 340*da198e8fSThanos Makatos 341*da198e8fSThanos MakatosUnless otherwise specified, all sizes should be presumed to be in bytes. 342*da198e8fSThanos Makatos 343*da198e8fSThanos Makatos.. _Commands: 344*da198e8fSThanos Makatos 345*da198e8fSThanos MakatosCommands 346*da198e8fSThanos Makatos-------- 347*da198e8fSThanos MakatosThe following table lists the VFIO message command IDs, and whether the 348*da198e8fSThanos Makatosmessage command is sent from the client or the server. 349*da198e8fSThanos Makatos 350*da198e8fSThanos Makatos====================================== ========= ================= 351*da198e8fSThanos MakatosName Command Request Direction 352*da198e8fSThanos Makatos====================================== ========= ================= 353*da198e8fSThanos Makatos``VFIO_USER_VERSION`` 1 client -> server 354*da198e8fSThanos Makatos``VFIO_USER_DMA_MAP`` 2 client -> server 355*da198e8fSThanos Makatos``VFIO_USER_DMA_UNMAP`` 3 client -> server 356*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_INFO`` 4 client -> server 357*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_INFO`` 5 client -> server 358*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_IO_FDS`` 6 client -> server 359*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_IRQ_INFO`` 7 client -> server 360*da198e8fSThanos Makatos``VFIO_USER_DEVICE_SET_IRQS`` 8 client -> server 361*da198e8fSThanos Makatos``VFIO_USER_REGION_READ`` 9 client -> server 362*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE`` 10 client -> server 363*da198e8fSThanos Makatos``VFIO_USER_DMA_READ`` 11 server -> client 364*da198e8fSThanos Makatos``VFIO_USER_DMA_WRITE`` 12 server -> client 365*da198e8fSThanos Makatos``VFIO_USER_DEVICE_RESET`` 13 client -> server 366*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE_MULTI`` 15 client -> server 367*da198e8fSThanos Makatos====================================== ========= ================= 368*da198e8fSThanos Makatos 369*da198e8fSThanos MakatosHeader 370*da198e8fSThanos Makatos------ 371*da198e8fSThanos Makatos 372*da198e8fSThanos MakatosAll messages, both command messages and reply messages, are preceded by a 373*da198e8fSThanos Makatos16-byte header that contains basic information about the message. The header is 374*da198e8fSThanos Makatosfollowed by message-specific data described in the sections below. 375*da198e8fSThanos Makatos 376*da198e8fSThanos Makatos+----------------+--------+-------------+ 377*da198e8fSThanos Makatos| Name | Offset | Size | 378*da198e8fSThanos Makatos+================+========+=============+ 379*da198e8fSThanos Makatos| Message ID | 0 | 2 | 380*da198e8fSThanos Makatos+----------------+--------+-------------+ 381*da198e8fSThanos Makatos| Command | 2 | 2 | 382*da198e8fSThanos Makatos+----------------+--------+-------------+ 383*da198e8fSThanos Makatos| Message size | 4 | 4 | 384*da198e8fSThanos Makatos+----------------+--------+-------------+ 385*da198e8fSThanos Makatos| Flags | 8 | 4 | 386*da198e8fSThanos Makatos+----------------+--------+-------------+ 387*da198e8fSThanos Makatos| | +-----+------------+ | 388*da198e8fSThanos Makatos| | | Bit | Definition | | 389*da198e8fSThanos Makatos| | +=====+============+ | 390*da198e8fSThanos Makatos| | | 0-3 | Type | | 391*da198e8fSThanos Makatos| | +-----+------------+ | 392*da198e8fSThanos Makatos| | | 4 | No_reply | | 393*da198e8fSThanos Makatos| | +-----+------------+ | 394*da198e8fSThanos Makatos| | | 5 | Error | | 395*da198e8fSThanos Makatos| | +-----+------------+ | 396*da198e8fSThanos Makatos+----------------+--------+-------------+ 397*da198e8fSThanos Makatos| Error | 12 | 4 | 398*da198e8fSThanos Makatos+----------------+--------+-------------+ 399*da198e8fSThanos Makatos| <message data> | 16 | variable | 400*da198e8fSThanos Makatos+----------------+--------+-------------+ 401*da198e8fSThanos Makatos 402*da198e8fSThanos Makatos* *Message ID* identifies the message, and is echoed in the command's reply 403*da198e8fSThanos Makatos message. Message IDs belong entirely to the sender, can be re-used (even 404*da198e8fSThanos Makatos concurrently) and the receiver must not make any assumptions about their 405*da198e8fSThanos Makatos uniqueness. 406*da198e8fSThanos Makatos* *Command* specifies the command to be executed, listed in Commands_. It is 407*da198e8fSThanos Makatos also set in the reply header. 408*da198e8fSThanos Makatos* *Message size* contains the size of the entire message, including the header. 409*da198e8fSThanos Makatos* *Flags* contains attributes of the message: 410*da198e8fSThanos Makatos 411*da198e8fSThanos Makatos * The *Type* bits indicate the message type. 412*da198e8fSThanos Makatos 413*da198e8fSThanos Makatos * *Command* (value 0x0) indicates a command message. 414*da198e8fSThanos Makatos * *Reply* (value 0x1) indicates a reply message acknowledging a previous 415*da198e8fSThanos Makatos command with the same message ID. 416*da198e8fSThanos Makatos * *No_reply* in a command message indicates that no reply is needed for this 417*da198e8fSThanos Makatos command. This is commonly used when multiple commands are sent, and only 418*da198e8fSThanos Makatos the last needs acknowledgement. 419*da198e8fSThanos Makatos * *Error* in a reply message indicates the command being acknowledged had 420*da198e8fSThanos Makatos an error. In this case, the *Error* field will be valid. 421*da198e8fSThanos Makatos 422*da198e8fSThanos Makatos* *Error* in a reply message is an optional UNIX errno value. It may be zero 423*da198e8fSThanos Makatos even if the Error bit is set in Flags. It is reserved in a command message. 424*da198e8fSThanos Makatos 425*da198e8fSThanos MakatosEach command message in Commands_ must be replied to with a reply message, 426*da198e8fSThanos Makatosunless the message sets the *No_Reply* bit. The reply consists of the header 427*da198e8fSThanos Makatoswith the *Reply* bit set, plus any additional data. 428*da198e8fSThanos Makatos 429*da198e8fSThanos MakatosIf an error occurs, the reply message must only include the reply header. 430*da198e8fSThanos Makatos 431*da198e8fSThanos MakatosAs the header is standard in both requests and replies, it is not included in 432*da198e8fSThanos Makatosthe command-specific specifications below; each message definition should be 433*da198e8fSThanos Makatosappended to the standard header, and the offsets are given from the end of the 434*da198e8fSThanos Makatosstandard header. 435*da198e8fSThanos Makatos 436*da198e8fSThanos Makatos``VFIO_USER_VERSION`` 437*da198e8fSThanos Makatos--------------------- 438*da198e8fSThanos Makatos 439*da198e8fSThanos Makatos.. _Version: 440*da198e8fSThanos Makatos 441*da198e8fSThanos MakatosThis is the initial message sent by the client after the socket connection is 442*da198e8fSThanos Makatosestablished; the same format is used for the server's reply. 443*da198e8fSThanos Makatos 444*da198e8fSThanos MakatosUpon establishing a connection, the client must send a ``VFIO_USER_VERSION`` 445*da198e8fSThanos Makatosmessage proposing a protocol version and a set of capabilities. The server 446*da198e8fSThanos Makatoscompares these with the versions and capabilities it supports and sends a 447*da198e8fSThanos Makatos``VFIO_USER_VERSION`` reply according to the following rules. 448*da198e8fSThanos Makatos 449*da198e8fSThanos Makatos* The major version in the reply must be the same as proposed. If the client 450*da198e8fSThanos Makatos does not support the proposed major, it closes the connection. 451*da198e8fSThanos Makatos* The minor version in the reply must be equal to or less than the minor 452*da198e8fSThanos Makatos version proposed. 453*da198e8fSThanos Makatos* The capability list must be a subset of those proposed. If the server 454*da198e8fSThanos Makatos requires a capability the client did not include, it closes the connection. 455*da198e8fSThanos Makatos 456*da198e8fSThanos MakatosThe protocol major version will only change when incompatible protocol changes 457*da198e8fSThanos Makatosare made, such as changing the message format. The minor version may change 458*da198e8fSThanos Makatoswhen compatible changes are made, such as adding new messages or capabilities, 459*da198e8fSThanos MakatosBoth the client and server must support all minor versions less than the 460*da198e8fSThanos Makatosmaximum minor version it supports. E.g., an implementation that supports 461*da198e8fSThanos Makatosversion 1.3 must also support 1.0 through 1.2. 462*da198e8fSThanos Makatos 463*da198e8fSThanos MakatosWhen making a change to this specification, the protocol version number must 464*da198e8fSThanos Makatosbe included in the form "added in version X.Y" 465*da198e8fSThanos Makatos 466*da198e8fSThanos MakatosRequest 467*da198e8fSThanos Makatos^^^^^^^ 468*da198e8fSThanos Makatos 469*da198e8fSThanos Makatos============== ====== ==== 470*da198e8fSThanos MakatosName Offset Size 471*da198e8fSThanos Makatos============== ====== ==== 472*da198e8fSThanos Makatosversion major 0 2 473*da198e8fSThanos Makatosversion minor 2 2 474*da198e8fSThanos Makatosversion data 4 variable (including terminating NUL). Optional. 475*da198e8fSThanos Makatos============== ====== ==== 476*da198e8fSThanos Makatos 477*da198e8fSThanos MakatosThe version data is an optional UTF-8 encoded JSON byte array with the following 478*da198e8fSThanos Makatosformat: 479*da198e8fSThanos Makatos 480*da198e8fSThanos Makatos+--------------+--------+-----------------------------------+ 481*da198e8fSThanos Makatos| Name | Type | Description | 482*da198e8fSThanos Makatos+==============+========+===================================+ 483*da198e8fSThanos Makatos| capabilities | object | Contains common capabilities that | 484*da198e8fSThanos Makatos| | | the sender supports. Optional. | 485*da198e8fSThanos Makatos+--------------+--------+-----------------------------------+ 486*da198e8fSThanos Makatos 487*da198e8fSThanos MakatosCapabilities: 488*da198e8fSThanos Makatos 489*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+ 490*da198e8fSThanos Makatos| Name | Type | Description | 491*da198e8fSThanos Makatos+====================+=========+================================================+ 492*da198e8fSThanos Makatos| max_msg_fds | number | Maximum number of file descriptors that can be | 493*da198e8fSThanos Makatos| | | received by the sender in one message. | 494*da198e8fSThanos Makatos| | | Optional. If not specified then the receiver | 495*da198e8fSThanos Makatos| | | must assume a value of ``1``. | 496*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+ 497*da198e8fSThanos Makatos| max_data_xfer_size | number | Maximum ``count`` for data transfer messages; | 498*da198e8fSThanos Makatos| | | see `Read and Write Operations`_. Optional, | 499*da198e8fSThanos Makatos| | | with a default value of 1048576 bytes. | 500*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+ 501*da198e8fSThanos Makatos| pgsizes | number | Page sizes supported in DMA map operations | 502*da198e8fSThanos Makatos| | | or'ed together. Optional, with a default value | 503*da198e8fSThanos Makatos| | | of supporting only 4k pages. | 504*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+ 505*da198e8fSThanos Makatos| max_dma_maps | number | Maximum number DMA map windows that can be | 506*da198e8fSThanos Makatos| | | valid simultaneously. Optional, with a | 507*da198e8fSThanos Makatos| | | value of 65535 (64k-1). | 508*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+ 509*da198e8fSThanos Makatos| migration | object | Migration capability parameters. If missing | 510*da198e8fSThanos Makatos| | | then migration is not supported by the sender. | 511*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+ 512*da198e8fSThanos Makatos| write_multiple | boolean | ``VFIO_USER_REGION_WRITE_MULTI`` messages | 513*da198e8fSThanos Makatos| | | are supported if the value is ``true``. | 514*da198e8fSThanos Makatos+--------------------+---------+------------------------------------------------+ 515*da198e8fSThanos Makatos 516*da198e8fSThanos MakatosThe migration capability contains the following name/value pairs: 517*da198e8fSThanos Makatos 518*da198e8fSThanos Makatos+-----------------+--------+--------------------------------------------------+ 519*da198e8fSThanos Makatos| Name | Type | Description | 520*da198e8fSThanos Makatos+=================+========+==================================================+ 521*da198e8fSThanos Makatos| pgsize | number | Page size of dirty pages bitmap. The smallest | 522*da198e8fSThanos Makatos| | | between the client and the server is used. | 523*da198e8fSThanos Makatos+-----------------+--------+--------------------------------------------------+ 524*da198e8fSThanos Makatos| max_bitmap_size | number | Maximum bitmap size in ``VFIO_USER_DIRTY_PAGES`` | 525*da198e8fSThanos Makatos| | | and ``VFIO_DMA_UNMAP`` messages. Optional, | 526*da198e8fSThanos Makatos| | | with a default value of 256MB. | 527*da198e8fSThanos Makatos+-----------------+--------+--------------------------------------------------+ 528*da198e8fSThanos Makatos 529*da198e8fSThanos MakatosReply 530*da198e8fSThanos Makatos^^^^^ 531*da198e8fSThanos Makatos 532*da198e8fSThanos MakatosThe same message format is used in the server's reply with the semantics 533*da198e8fSThanos Makatosdescribed above. 534*da198e8fSThanos Makatos 535*da198e8fSThanos Makatos``VFIO_USER_DMA_MAP`` 536*da198e8fSThanos Makatos--------------------- 537*da198e8fSThanos Makatos 538*da198e8fSThanos MakatosThis command message is sent by the client to the server to inform it of the 539*da198e8fSThanos Makatosmemory regions the server can access. It must be sent before the server can 540*da198e8fSThanos Makatosperform any DMA to the client. It is normally sent directly after the version 541*da198e8fSThanos Makatoshandshake is completed, but may also occur when memory is added to the client, 542*da198e8fSThanos Makatosor if the client uses a vIOMMU. 543*da198e8fSThanos Makatos 544*da198e8fSThanos MakatosRequest 545*da198e8fSThanos Makatos^^^^^^^ 546*da198e8fSThanos Makatos 547*da198e8fSThanos MakatosThe request payload for this message is a structure of the following format: 548*da198e8fSThanos Makatos 549*da198e8fSThanos Makatos+-------------+--------+-------------+ 550*da198e8fSThanos Makatos| Name | Offset | Size | 551*da198e8fSThanos Makatos+=============+========+=============+ 552*da198e8fSThanos Makatos| argsz | 0 | 4 | 553*da198e8fSThanos Makatos+-------------+--------+-------------+ 554*da198e8fSThanos Makatos| flags | 4 | 4 | 555*da198e8fSThanos Makatos+-------------+--------+-------------+ 556*da198e8fSThanos Makatos| | +-----+------------+ | 557*da198e8fSThanos Makatos| | | Bit | Definition | | 558*da198e8fSThanos Makatos| | +=====+============+ | 559*da198e8fSThanos Makatos| | | 0 | readable | | 560*da198e8fSThanos Makatos| | +-----+------------+ | 561*da198e8fSThanos Makatos| | | 1 | writeable | | 562*da198e8fSThanos Makatos| | +-----+------------+ | 563*da198e8fSThanos Makatos+-------------+--------+-------------+ 564*da198e8fSThanos Makatos| offset | 8 | 8 | 565*da198e8fSThanos Makatos+-------------+--------+-------------+ 566*da198e8fSThanos Makatos| address | 16 | 8 | 567*da198e8fSThanos Makatos+-------------+--------+-------------+ 568*da198e8fSThanos Makatos| size | 24 | 8 | 569*da198e8fSThanos Makatos+-------------+--------+-------------+ 570*da198e8fSThanos Makatos 571*da198e8fSThanos Makatos* *argsz* is the size of the above structure. Note there is no reply payload, 572*da198e8fSThanos Makatos so this field differs from other message types. 573*da198e8fSThanos Makatos* *flags* contains the following region attributes: 574*da198e8fSThanos Makatos 575*da198e8fSThanos Makatos * *readable* indicates that the region can be read from. 576*da198e8fSThanos Makatos 577*da198e8fSThanos Makatos * *writeable* indicates that the region can be written to. 578*da198e8fSThanos Makatos 579*da198e8fSThanos Makatos* *offset* is the file offset of the region with respect to the associated file 580*da198e8fSThanos Makatos descriptor, or zero if the region is not mappable 581*da198e8fSThanos Makatos* *address* is the base DMA address of the region. 582*da198e8fSThanos Makatos* *size* is the size of the region. 583*da198e8fSThanos Makatos 584*da198e8fSThanos MakatosThis structure is 32 bytes in size, so the message size is 16 + 32 bytes. 585*da198e8fSThanos Makatos 586*da198e8fSThanos MakatosIf the DMA region being added can be directly mapped by the server, a file 587*da198e8fSThanos Makatosdescriptor must be sent as part of the message meta-data. The region can be 588*da198e8fSThanos Makatosmapped via the mmap() system call. On ``AF_UNIX`` sockets, the file descriptor 589*da198e8fSThanos Makatosmust be passed as ``SCM_RIGHTS`` type ancillary data. Otherwise, if the DMA 590*da198e8fSThanos Makatosregion cannot be directly mapped by the server, no file descriptor must be sent 591*da198e8fSThanos Makatosas part of the message meta-data and the DMA region can be accessed by the 592*da198e8fSThanos Makatosserver using ``VFIO_USER_DMA_READ`` and ``VFIO_USER_DMA_WRITE`` messages, 593*da198e8fSThanos Makatosexplained in `Read and Write Operations`_. A command to map over an existing 594*da198e8fSThanos Makatosregion must be failed by the server with ``EEXIST`` set in error field in the 595*da198e8fSThanos Makatosreply. 596*da198e8fSThanos Makatos 597*da198e8fSThanos MakatosReply 598*da198e8fSThanos Makatos^^^^^ 599*da198e8fSThanos Makatos 600*da198e8fSThanos MakatosThere is no payload in the reply message. 601*da198e8fSThanos Makatos 602*da198e8fSThanos Makatos``VFIO_USER_DMA_UNMAP`` 603*da198e8fSThanos Makatos----------------------- 604*da198e8fSThanos Makatos 605*da198e8fSThanos MakatosThis command message is sent by the client to the server to inform it that a 606*da198e8fSThanos MakatosDMA region, previously made available via a ``VFIO_USER_DMA_MAP`` command 607*da198e8fSThanos Makatosmessage, is no longer available for DMA. It typically occurs when memory is 608*da198e8fSThanos Makatossubtracted from the client or if the client uses a vIOMMU. The DMA region is 609*da198e8fSThanos Makatosdescribed by the following structure: 610*da198e8fSThanos Makatos 611*da198e8fSThanos MakatosRequest 612*da198e8fSThanos Makatos^^^^^^^ 613*da198e8fSThanos Makatos 614*da198e8fSThanos MakatosThe request payload for this message is a structure of the following format: 615*da198e8fSThanos Makatos 616*da198e8fSThanos Makatos+--------------+--------+------------------------+ 617*da198e8fSThanos Makatos| Name | Offset | Size | 618*da198e8fSThanos Makatos+==============+========+========================+ 619*da198e8fSThanos Makatos| argsz | 0 | 4 | 620*da198e8fSThanos Makatos+--------------+--------+------------------------+ 621*da198e8fSThanos Makatos| flags | 4 | 4 | 622*da198e8fSThanos Makatos+--------------+--------+------------------------+ 623*da198e8fSThanos Makatos| address | 8 | 8 | 624*da198e8fSThanos Makatos+--------------+--------+------------------------+ 625*da198e8fSThanos Makatos| size | 16 | 8 | 626*da198e8fSThanos Makatos+--------------+--------+------------------------+ 627*da198e8fSThanos Makatos 628*da198e8fSThanos Makatos* *argsz* is the maximum size of the reply payload. 629*da198e8fSThanos Makatos* *flags* is unused in this version. 630*da198e8fSThanos Makatos* *address* is the base DMA address of the DMA region. 631*da198e8fSThanos Makatos* *size* is the size of the DMA region. 632*da198e8fSThanos Makatos 633*da198e8fSThanos MakatosThe address and size of the DMA region being unmapped must match exactly a 634*da198e8fSThanos Makatosprevious mapping. 635*da198e8fSThanos Makatos 636*da198e8fSThanos MakatosReply 637*da198e8fSThanos Makatos^^^^^ 638*da198e8fSThanos Makatos 639*da198e8fSThanos MakatosUpon receiving a ``VFIO_USER_DMA_UNMAP`` command, if the file descriptor is 640*da198e8fSThanos Makatosmapped then the server must release all references to that DMA region before 641*da198e8fSThanos Makatosreplying, which potentially includes in-flight DMA transactions. 642*da198e8fSThanos Makatos 643*da198e8fSThanos MakatosThe server responds with the original DMA entry in the request. 644*da198e8fSThanos Makatos 645*da198e8fSThanos Makatos 646*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_INFO`` 647*da198e8fSThanos Makatos----------------------------- 648*da198e8fSThanos Makatos 649*da198e8fSThanos MakatosThis command message is sent by the client to the server to query for basic 650*da198e8fSThanos Makatosinformation about the device. 651*da198e8fSThanos Makatos 652*da198e8fSThanos MakatosRequest 653*da198e8fSThanos Makatos^^^^^^^ 654*da198e8fSThanos Makatos 655*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 656*da198e8fSThanos Makatos| Name | Offset | Size | 657*da198e8fSThanos Makatos+=============+========+==========================+ 658*da198e8fSThanos Makatos| argsz | 0 | 4 | 659*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 660*da198e8fSThanos Makatos| flags | 4 | 4 | 661*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 662*da198e8fSThanos Makatos| | +-----+-------------------------+ | 663*da198e8fSThanos Makatos| | | Bit | Definition | | 664*da198e8fSThanos Makatos| | +=====+=========================+ | 665*da198e8fSThanos Makatos| | | 0 | VFIO_DEVICE_FLAGS_RESET | | 666*da198e8fSThanos Makatos| | +-----+-------------------------+ | 667*da198e8fSThanos Makatos| | | 1 | VFIO_DEVICE_FLAGS_PCI | | 668*da198e8fSThanos Makatos| | +-----+-------------------------+ | 669*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 670*da198e8fSThanos Makatos| num_regions | 8 | 4 | 671*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 672*da198e8fSThanos Makatos| num_irqs | 12 | 4 | 673*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 674*da198e8fSThanos Makatos 675*da198e8fSThanos Makatos* *argsz* is the maximum size of the reply payload 676*da198e8fSThanos Makatos* all other fields must be zero. 677*da198e8fSThanos Makatos 678*da198e8fSThanos MakatosReply 679*da198e8fSThanos Makatos^^^^^ 680*da198e8fSThanos Makatos 681*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 682*da198e8fSThanos Makatos| Name | Offset | Size | 683*da198e8fSThanos Makatos+=============+========+==========================+ 684*da198e8fSThanos Makatos| argsz | 0 | 4 | 685*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 686*da198e8fSThanos Makatos| flags | 4 | 4 | 687*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 688*da198e8fSThanos Makatos| | +-----+-------------------------+ | 689*da198e8fSThanos Makatos| | | Bit | Definition | | 690*da198e8fSThanos Makatos| | +=====+=========================+ | 691*da198e8fSThanos Makatos| | | 0 | VFIO_DEVICE_FLAGS_RESET | | 692*da198e8fSThanos Makatos| | +-----+-------------------------+ | 693*da198e8fSThanos Makatos| | | 1 | VFIO_DEVICE_FLAGS_PCI | | 694*da198e8fSThanos Makatos| | +-----+-------------------------+ | 695*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 696*da198e8fSThanos Makatos| num_regions | 8 | 4 | 697*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 698*da198e8fSThanos Makatos| num_irqs | 12 | 4 | 699*da198e8fSThanos Makatos+-------------+--------+--------------------------+ 700*da198e8fSThanos Makatos 701*da198e8fSThanos Makatos* *argsz* is the size required for the full reply payload (16 bytes today) 702*da198e8fSThanos Makatos* *flags* contains the following device attributes. 703*da198e8fSThanos Makatos 704*da198e8fSThanos Makatos * ``VFIO_DEVICE_FLAGS_RESET`` indicates that the device supports the 705*da198e8fSThanos Makatos ``VFIO_USER_DEVICE_RESET`` message. 706*da198e8fSThanos Makatos * ``VFIO_DEVICE_FLAGS_PCI`` indicates that the device is a PCI device. 707*da198e8fSThanos Makatos 708*da198e8fSThanos Makatos* *num_regions* is the number of memory regions that the device exposes. 709*da198e8fSThanos Makatos* *num_irqs* is the number of distinct interrupt types that the device supports. 710*da198e8fSThanos Makatos 711*da198e8fSThanos MakatosThis version of the protocol only supports PCI devices. Additional devices may 712*da198e8fSThanos Makatosbe supported in future versions. 713*da198e8fSThanos Makatos 714*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_INFO`` 715*da198e8fSThanos Makatos------------------------------------ 716*da198e8fSThanos Makatos 717*da198e8fSThanos MakatosThis command message is sent by the client to the server to query for 718*da198e8fSThanos Makatosinformation about device regions. The VFIO region info structure is defined in 719*da198e8fSThanos Makatos``<linux/vfio.h>`` (``struct vfio_region_info``). 720*da198e8fSThanos Makatos 721*da198e8fSThanos MakatosRequest 722*da198e8fSThanos Makatos^^^^^^^ 723*da198e8fSThanos Makatos 724*da198e8fSThanos Makatos+------------+--------+------------------------------+ 725*da198e8fSThanos Makatos| Name | Offset | Size | 726*da198e8fSThanos Makatos+============+========+==============================+ 727*da198e8fSThanos Makatos| argsz | 0 | 4 | 728*da198e8fSThanos Makatos+------------+--------+------------------------------+ 729*da198e8fSThanos Makatos| flags | 4 | 4 | 730*da198e8fSThanos Makatos+------------+--------+------------------------------+ 731*da198e8fSThanos Makatos| index | 8 | 4 | 732*da198e8fSThanos Makatos+------------+--------+------------------------------+ 733*da198e8fSThanos Makatos| cap_offset | 12 | 4 | 734*da198e8fSThanos Makatos+------------+--------+------------------------------+ 735*da198e8fSThanos Makatos| size | 16 | 8 | 736*da198e8fSThanos Makatos+------------+--------+------------------------------+ 737*da198e8fSThanos Makatos| offset | 24 | 8 | 738*da198e8fSThanos Makatos+------------+--------+------------------------------+ 739*da198e8fSThanos Makatos 740*da198e8fSThanos Makatos* *argsz* the maximum size of the reply payload 741*da198e8fSThanos Makatos* *index* is the index of memory region being queried, it is the only field 742*da198e8fSThanos Makatos that is required to be set in the command message. 743*da198e8fSThanos Makatos* all other fields must be zero. 744*da198e8fSThanos Makatos 745*da198e8fSThanos MakatosReply 746*da198e8fSThanos Makatos^^^^^ 747*da198e8fSThanos Makatos 748*da198e8fSThanos Makatos+------------+--------+------------------------------+ 749*da198e8fSThanos Makatos| Name | Offset | Size | 750*da198e8fSThanos Makatos+============+========+==============================+ 751*da198e8fSThanos Makatos| argsz | 0 | 4 | 752*da198e8fSThanos Makatos+------------+--------+------------------------------+ 753*da198e8fSThanos Makatos| flags | 4 | 4 | 754*da198e8fSThanos Makatos+------------+--------+------------------------------+ 755*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 756*da198e8fSThanos Makatos| | | Bit | Definition | | 757*da198e8fSThanos Makatos| | +=====+=============================+ | 758*da198e8fSThanos Makatos| | | 0 | VFIO_REGION_INFO_FLAG_READ | | 759*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 760*da198e8fSThanos Makatos| | | 1 | VFIO_REGION_INFO_FLAG_WRITE | | 761*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 762*da198e8fSThanos Makatos| | | 2 | VFIO_REGION_INFO_FLAG_MMAP | | 763*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 764*da198e8fSThanos Makatos| | | 3 | VFIO_REGION_INFO_FLAG_CAPS | | 765*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 766*da198e8fSThanos Makatos+------------+--------+------------------------------+ 767*da198e8fSThanos Makatos+------------+--------+------------------------------+ 768*da198e8fSThanos Makatos| index | 8 | 4 | 769*da198e8fSThanos Makatos+------------+--------+------------------------------+ 770*da198e8fSThanos Makatos| cap_offset | 12 | 4 | 771*da198e8fSThanos Makatos+------------+--------+------------------------------+ 772*da198e8fSThanos Makatos| size | 16 | 8 | 773*da198e8fSThanos Makatos+------------+--------+------------------------------+ 774*da198e8fSThanos Makatos| offset | 24 | 8 | 775*da198e8fSThanos Makatos+------------+--------+------------------------------+ 776*da198e8fSThanos Makatos 777*da198e8fSThanos Makatos* *argsz* is the size required for the full reply payload (region info structure 778*da198e8fSThanos Makatos plus the size of any region capabilities) 779*da198e8fSThanos Makatos* *flags* are attributes of the region: 780*da198e8fSThanos Makatos 781*da198e8fSThanos Makatos * ``VFIO_REGION_INFO_FLAG_READ`` allows client read access to the region. 782*da198e8fSThanos Makatos * ``VFIO_REGION_INFO_FLAG_WRITE`` allows client write access to the region. 783*da198e8fSThanos Makatos * ``VFIO_REGION_INFO_FLAG_MMAP`` specifies the client can mmap() the region. 784*da198e8fSThanos Makatos When this flag is set, the reply will include a file descriptor in its 785*da198e8fSThanos Makatos meta-data. On ``AF_UNIX`` sockets, the file descriptors will be passed as 786*da198e8fSThanos Makatos ``SCM_RIGHTS`` type ancillary data. 787*da198e8fSThanos Makatos * ``VFIO_REGION_INFO_FLAG_CAPS`` indicates additional capabilities found in the 788*da198e8fSThanos Makatos reply. 789*da198e8fSThanos Makatos 790*da198e8fSThanos Makatos* *index* is the index of memory region being queried, it is the only field 791*da198e8fSThanos Makatos that is required to be set in the command message. 792*da198e8fSThanos Makatos* *cap_offset* describes where additional region capabilities can be found. 793*da198e8fSThanos Makatos cap_offset is relative to the beginning of the VFIO region info structure. 794*da198e8fSThanos Makatos The data structure it points is a VFIO cap header defined in 795*da198e8fSThanos Makatos ``<linux/vfio.h>``. 796*da198e8fSThanos Makatos* *size* is the size of the region. 797*da198e8fSThanos Makatos* *offset* is the offset that should be given to the mmap() system call for 798*da198e8fSThanos Makatos regions with the MMAP attribute. It is also used as the base offset when 799*da198e8fSThanos Makatos mapping a VFIO sparse mmap area, described below. 800*da198e8fSThanos Makatos 801*da198e8fSThanos MakatosVFIO region capabilities 802*da198e8fSThanos Makatos"""""""""""""""""""""""" 803*da198e8fSThanos Makatos 804*da198e8fSThanos MakatosThe VFIO region information can also include a capabilities list. This list is 805*da198e8fSThanos Makatossimilar to a PCI capability list - each entry has a common header that 806*da198e8fSThanos Makatosidentifies a capability and where the next capability in the list can be found. 807*da198e8fSThanos MakatosThe VFIO capability header format is defined in ``<linux/vfio.h>`` (``struct 808*da198e8fSThanos Makatosvfio_info_cap_header``). 809*da198e8fSThanos Makatos 810*da198e8fSThanos MakatosVFIO cap header format 811*da198e8fSThanos Makatos"""""""""""""""""""""" 812*da198e8fSThanos Makatos 813*da198e8fSThanos Makatos+---------+--------+------+ 814*da198e8fSThanos Makatos| Name | Offset | Size | 815*da198e8fSThanos Makatos+=========+========+======+ 816*da198e8fSThanos Makatos| id | 0 | 2 | 817*da198e8fSThanos Makatos+---------+--------+------+ 818*da198e8fSThanos Makatos| version | 2 | 2 | 819*da198e8fSThanos Makatos+---------+--------+------+ 820*da198e8fSThanos Makatos| next | 4 | 4 | 821*da198e8fSThanos Makatos+---------+--------+------+ 822*da198e8fSThanos Makatos 823*da198e8fSThanos Makatos* *id* is the capability identity. 824*da198e8fSThanos Makatos* *version* is a capability-specific version number. 825*da198e8fSThanos Makatos* *next* specifies the offset of the next capability in the capability list. It 826*da198e8fSThanos Makatos is relative to the beginning of the VFIO region info structure. 827*da198e8fSThanos Makatos 828*da198e8fSThanos MakatosVFIO sparse mmap cap header 829*da198e8fSThanos Makatos""""""""""""""""""""""""""" 830*da198e8fSThanos Makatos 831*da198e8fSThanos Makatos+------------------+----------------------------------+ 832*da198e8fSThanos Makatos| Name | Value | 833*da198e8fSThanos Makatos+==================+==================================+ 834*da198e8fSThanos Makatos| id | VFIO_REGION_INFO_CAP_SPARSE_MMAP | 835*da198e8fSThanos Makatos+------------------+----------------------------------+ 836*da198e8fSThanos Makatos| version | 0x1 | 837*da198e8fSThanos Makatos+------------------+----------------------------------+ 838*da198e8fSThanos Makatos| next | <next> | 839*da198e8fSThanos Makatos+------------------+----------------------------------+ 840*da198e8fSThanos Makatos| sparse mmap info | VFIO region info sparse mmap | 841*da198e8fSThanos Makatos+------------------+----------------------------------+ 842*da198e8fSThanos Makatos 843*da198e8fSThanos MakatosThis capability is defined when only a subrange of the region supports 844*da198e8fSThanos Makatosdirect access by the client via mmap(). The VFIO sparse mmap area is defined in 845*da198e8fSThanos Makatos``<linux/vfio.h>`` (``struct vfio_region_sparse_mmap_area`` and ``struct 846*da198e8fSThanos Makatosvfio_region_info_cap_sparse_mmap``). 847*da198e8fSThanos Makatos 848*da198e8fSThanos MakatosVFIO region info cap sparse mmap 849*da198e8fSThanos Makatos"""""""""""""""""""""""""""""""" 850*da198e8fSThanos Makatos 851*da198e8fSThanos Makatos+----------+--------+------+ 852*da198e8fSThanos Makatos| Name | Offset | Size | 853*da198e8fSThanos Makatos+==========+========+======+ 854*da198e8fSThanos Makatos| nr_areas | 0 | 4 | 855*da198e8fSThanos Makatos+----------+--------+------+ 856*da198e8fSThanos Makatos| reserved | 4 | 4 | 857*da198e8fSThanos Makatos+----------+--------+------+ 858*da198e8fSThanos Makatos| offset | 8 | 8 | 859*da198e8fSThanos Makatos+----------+--------+------+ 860*da198e8fSThanos Makatos| size | 16 | 8 | 861*da198e8fSThanos Makatos+----------+--------+------+ 862*da198e8fSThanos Makatos| ... | | | 863*da198e8fSThanos Makatos+----------+--------+------+ 864*da198e8fSThanos Makatos 865*da198e8fSThanos Makatos* *nr_areas* is the number of sparse mmap areas in the region. 866*da198e8fSThanos Makatos* *offset* and size describe a single area that can be mapped by the client. 867*da198e8fSThanos Makatos There will be *nr_areas* pairs of offset and size. The offset will be added to 868*da198e8fSThanos Makatos the base offset given in the ``VFIO_USER_DEVICE_GET_REGION_INFO`` to form the 869*da198e8fSThanos Makatos offset argument of the subsequent mmap() call. 870*da198e8fSThanos Makatos 871*da198e8fSThanos MakatosThe VFIO sparse mmap area is defined in ``<linux/vfio.h>`` (``struct 872*da198e8fSThanos Makatosvfio_region_info_cap_sparse_mmap``). 873*da198e8fSThanos Makatos 874*da198e8fSThanos Makatos 875*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_IO_FDS`` 876*da198e8fSThanos Makatos-------------------------------------- 877*da198e8fSThanos Makatos 878*da198e8fSThanos MakatosClients can access regions via ``VFIO_USER_REGION_READ/WRITE`` or, if provided, by 879*da198e8fSThanos Makatos``mmap()`` of a file descriptor provided by the server. 880*da198e8fSThanos Makatos 881*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_REGION_IO_FDS`` provides an alternative access mechanism via 882*da198e8fSThanos Makatosfile descriptors. This is an optional feature intended for performance 883*da198e8fSThanos Makatosimprovements where an underlying sub-system (such as KVM) supports communication 884*da198e8fSThanos Makatosacross such file descriptors to the vfio-user server, without needing to 885*da198e8fSThanos Makatosround-trip through the client. 886*da198e8fSThanos Makatos 887*da198e8fSThanos MakatosThe server returns an array of sub-regions for the requested region. Each 888*da198e8fSThanos Makatossub-region describes a span (offset and size) of a region, along with the 889*da198e8fSThanos Makatosrequested file descriptor notification mechanism to use. Each sub-region in the 890*da198e8fSThanos Makatosresponse message may choose to use a different method, as defined below. The 891*da198e8fSThanos Makatostwo mechanisms supported in this specification are ioeventfds and ioregionfds. 892*da198e8fSThanos Makatos 893*da198e8fSThanos MakatosThe server in addition returns a file descriptor in the ancillary data; clients 894*da198e8fSThanos Makatosare expected to configure each sub-region's file descriptor with the requested 895*da198e8fSThanos Makatosnotification method. For example, a client could configure KVM with the 896*da198e8fSThanos Makatosrequested ioeventfd via a ``KVM_IOEVENTFD`` ``ioctl()``. 897*da198e8fSThanos Makatos 898*da198e8fSThanos MakatosRequest 899*da198e8fSThanos Makatos^^^^^^^ 900*da198e8fSThanos Makatos 901*da198e8fSThanos Makatos+-------------+--------+------+ 902*da198e8fSThanos Makatos| Name | Offset | Size | 903*da198e8fSThanos Makatos+=============+========+======+ 904*da198e8fSThanos Makatos| argsz | 0 | 4 | 905*da198e8fSThanos Makatos+-------------+--------+------+ 906*da198e8fSThanos Makatos| flags | 4 | 4 | 907*da198e8fSThanos Makatos+-------------+--------+------+ 908*da198e8fSThanos Makatos| index | 8 | 4 | 909*da198e8fSThanos Makatos+-------------+--------+------+ 910*da198e8fSThanos Makatos| count | 12 | 4 | 911*da198e8fSThanos Makatos+-------------+--------+------+ 912*da198e8fSThanos Makatos 913*da198e8fSThanos Makatos* *argsz* the maximum size of the reply payload 914*da198e8fSThanos Makatos* *index* is the index of memory region being queried 915*da198e8fSThanos Makatos* all other fields must be zero 916*da198e8fSThanos Makatos 917*da198e8fSThanos MakatosThe client must set ``flags`` to zero and specify the region being queried in 918*da198e8fSThanos Makatosthe ``index``. 919*da198e8fSThanos Makatos 920*da198e8fSThanos MakatosReply 921*da198e8fSThanos Makatos^^^^^ 922*da198e8fSThanos Makatos 923*da198e8fSThanos Makatos+-------------+--------+------+ 924*da198e8fSThanos Makatos| Name | Offset | Size | 925*da198e8fSThanos Makatos+=============+========+======+ 926*da198e8fSThanos Makatos| argsz | 0 | 4 | 927*da198e8fSThanos Makatos+-------------+--------+------+ 928*da198e8fSThanos Makatos| flags | 4 | 4 | 929*da198e8fSThanos Makatos+-------------+--------+------+ 930*da198e8fSThanos Makatos| index | 8 | 4 | 931*da198e8fSThanos Makatos+-------------+--------+------+ 932*da198e8fSThanos Makatos| count | 12 | 4 | 933*da198e8fSThanos Makatos+-------------+--------+------+ 934*da198e8fSThanos Makatos| sub-regions | 16 | ... | 935*da198e8fSThanos Makatos+-------------+--------+------+ 936*da198e8fSThanos Makatos 937*da198e8fSThanos Makatos* *argsz* is the size of the region IO FD info structure plus the 938*da198e8fSThanos Makatos total size of the sub-region array. Thus, each array entry "i" is at offset 939*da198e8fSThanos Makatos i * ((argsz - 32) / count). Note that currently this is 40 bytes for both IO 940*da198e8fSThanos Makatos FD types, but this is not to be relied on. As elsewhere, this indicates the 941*da198e8fSThanos Makatos full reply payload size needed. 942*da198e8fSThanos Makatos* *flags* must be zero 943*da198e8fSThanos Makatos* *index* is the index of memory region being queried 944*da198e8fSThanos Makatos* *count* is the number of sub-regions in the array 945*da198e8fSThanos Makatos* *sub-regions* is the array of Sub-Region IO FD info structures 946*da198e8fSThanos Makatos 947*da198e8fSThanos MakatosThe reply message will additionally include at least one file descriptor in the 948*da198e8fSThanos Makatosancillary data. Note that more than one sub-region may share the same file 949*da198e8fSThanos Makatosdescriptor. 950*da198e8fSThanos Makatos 951*da198e8fSThanos MakatosNote that it is the client's responsibility to verify the requested values (for 952*da198e8fSThanos Makatosexample, that the requested offset does not exceed the region's bounds). 953*da198e8fSThanos Makatos 954*da198e8fSThanos MakatosEach sub-region given in the response has one of two possible structures, 955*da198e8fSThanos Makatosdepending whether *type* is ``VFIO_USER_IO_FD_TYPE_IOEVENTFD`` or 956*da198e8fSThanos Makatos``VFIO_USER_IO_FD_TYPE_IOREGIONFD``: 957*da198e8fSThanos Makatos 958*da198e8fSThanos MakatosSub-Region IO FD info format (ioeventfd) 959*da198e8fSThanos Makatos"""""""""""""""""""""""""""""""""""""""" 960*da198e8fSThanos Makatos 961*da198e8fSThanos Makatos+-----------+--------+------+ 962*da198e8fSThanos Makatos| Name | Offset | Size | 963*da198e8fSThanos Makatos+===========+========+======+ 964*da198e8fSThanos Makatos| offset | 0 | 8 | 965*da198e8fSThanos Makatos+-----------+--------+------+ 966*da198e8fSThanos Makatos| size | 8 | 8 | 967*da198e8fSThanos Makatos+-----------+--------+------+ 968*da198e8fSThanos Makatos| fd_index | 16 | 4 | 969*da198e8fSThanos Makatos+-----------+--------+------+ 970*da198e8fSThanos Makatos| type | 20 | 4 | 971*da198e8fSThanos Makatos+-----------+--------+------+ 972*da198e8fSThanos Makatos| flags | 24 | 4 | 973*da198e8fSThanos Makatos+-----------+--------+------+ 974*da198e8fSThanos Makatos| padding | 28 | 4 | 975*da198e8fSThanos Makatos+-----------+--------+------+ 976*da198e8fSThanos Makatos| datamatch | 32 | 8 | 977*da198e8fSThanos Makatos+-----------+--------+------+ 978*da198e8fSThanos Makatos 979*da198e8fSThanos Makatos* *offset* is the offset of the start of the sub-region within the region 980*da198e8fSThanos Makatos requested ("physical address offset" for the region) 981*da198e8fSThanos Makatos* *size* is the length of the sub-region. This may be zero if the access size is 982*da198e8fSThanos Makatos not relevant, which may allow for optimizations 983*da198e8fSThanos Makatos* *fd_index* is the index in the ancillary data of the FD to use for ioeventfd 984*da198e8fSThanos Makatos notification; it may be shared. 985*da198e8fSThanos Makatos* *type* is ``VFIO_USER_IO_FD_TYPE_IOEVENTFD`` 986*da198e8fSThanos Makatos* *flags* is any of: 987*da198e8fSThanos Makatos 988*da198e8fSThanos Makatos * ``KVM_IOEVENTFD_FLAG_DATAMATCH`` 989*da198e8fSThanos Makatos * ``KVM_IOEVENTFD_FLAG_PIO`` 990*da198e8fSThanos Makatos * ``KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY`` (FIXME: makes sense?) 991*da198e8fSThanos Makatos 992*da198e8fSThanos Makatos* *datamatch* is the datamatch value if needed 993*da198e8fSThanos Makatos 994*da198e8fSThanos MakatosSee https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt, *4.59 995*da198e8fSThanos MakatosKVM_IOEVENTFD* for further context on the ioeventfd-specific fields. 996*da198e8fSThanos Makatos 997*da198e8fSThanos MakatosSub-Region IO FD info format (ioregionfd) 998*da198e8fSThanos Makatos""""""""""""""""""""""""""""""""""""""""" 999*da198e8fSThanos Makatos 1000*da198e8fSThanos Makatos+-----------+--------+------+ 1001*da198e8fSThanos Makatos| Name | Offset | Size | 1002*da198e8fSThanos Makatos+===========+========+======+ 1003*da198e8fSThanos Makatos| offset | 0 | 8 | 1004*da198e8fSThanos Makatos+-----------+--------+------+ 1005*da198e8fSThanos Makatos| size | 8 | 8 | 1006*da198e8fSThanos Makatos+-----------+--------+------+ 1007*da198e8fSThanos Makatos| fd_index | 16 | 4 | 1008*da198e8fSThanos Makatos+-----------+--------+------+ 1009*da198e8fSThanos Makatos| type | 20 | 4 | 1010*da198e8fSThanos Makatos+-----------+--------+------+ 1011*da198e8fSThanos Makatos| flags | 24 | 4 | 1012*da198e8fSThanos Makatos+-----------+--------+------+ 1013*da198e8fSThanos Makatos| padding | 28 | 4 | 1014*da198e8fSThanos Makatos+-----------+--------+------+ 1015*da198e8fSThanos Makatos| user_data | 32 | 8 | 1016*da198e8fSThanos Makatos+-----------+--------+------+ 1017*da198e8fSThanos Makatos 1018*da198e8fSThanos Makatos* *offset* is the offset of the start of the sub-region within the region 1019*da198e8fSThanos Makatos requested ("physical address offset" for the region) 1020*da198e8fSThanos Makatos* *size* is the length of the sub-region. This may be zero if the access size is 1021*da198e8fSThanos Makatos not relevant, which may allow for optimizations; ``KVM_IOREGION_POSTED_WRITES`` 1022*da198e8fSThanos Makatos must be set in *flags* in this case 1023*da198e8fSThanos Makatos* *fd_index* is the index in the ancillary data of the FD to use for ioregionfd 1024*da198e8fSThanos Makatos messages; it may be shared 1025*da198e8fSThanos Makatos* *type* is ``VFIO_USER_IO_FD_TYPE_IOREGIONFD`` 1026*da198e8fSThanos Makatos* *flags* is any of: 1027*da198e8fSThanos Makatos 1028*da198e8fSThanos Makatos * ``KVM_IOREGION_PIO`` 1029*da198e8fSThanos Makatos * ``KVM_IOREGION_POSTED_WRITES`` 1030*da198e8fSThanos Makatos 1031*da198e8fSThanos Makatos* *user_data* is an opaque value passed back to the server via a message on the 1032*da198e8fSThanos Makatos file descriptor 1033*da198e8fSThanos Makatos 1034*da198e8fSThanos MakatosFor further information on the ioregionfd-specific fields, see: 1035*da198e8fSThanos Makatoshttps://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/ 1036*da198e8fSThanos Makatos 1037*da198e8fSThanos Makatos(FIXME: update with final API docs.) 1038*da198e8fSThanos Makatos 1039*da198e8fSThanos Makatos``VFIO_USER_DEVICE_GET_IRQ_INFO`` 1040*da198e8fSThanos Makatos--------------------------------- 1041*da198e8fSThanos Makatos 1042*da198e8fSThanos MakatosThis command message is sent by the client to the server to query for 1043*da198e8fSThanos Makatosinformation about device interrupt types. The VFIO IRQ info structure is 1044*da198e8fSThanos Makatosdefined in ``<linux/vfio.h>`` (``struct vfio_irq_info``). 1045*da198e8fSThanos Makatos 1046*da198e8fSThanos MakatosRequest 1047*da198e8fSThanos Makatos^^^^^^^ 1048*da198e8fSThanos Makatos 1049*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1050*da198e8fSThanos Makatos| Name | Offset | Size | 1051*da198e8fSThanos Makatos+=======+========+===========================+ 1052*da198e8fSThanos Makatos| argsz | 0 | 4 | 1053*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1054*da198e8fSThanos Makatos| flags | 4 | 4 | 1055*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1056*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1057*da198e8fSThanos Makatos| | | Bit | Definition | | 1058*da198e8fSThanos Makatos| | +=====+==========================+ | 1059*da198e8fSThanos Makatos| | | 0 | VFIO_IRQ_INFO_EVENTFD | | 1060*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1061*da198e8fSThanos Makatos| | | 1 | VFIO_IRQ_INFO_MASKABLE | | 1062*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1063*da198e8fSThanos Makatos| | | 2 | VFIO_IRQ_INFO_AUTOMASKED | | 1064*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1065*da198e8fSThanos Makatos| | | 3 | VFIO_IRQ_INFO_NORESIZE | | 1066*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1067*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1068*da198e8fSThanos Makatos| index | 8 | 4 | 1069*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1070*da198e8fSThanos Makatos| count | 12 | 4 | 1071*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1072*da198e8fSThanos Makatos 1073*da198e8fSThanos Makatos* *argsz* is the maximum size of the reply payload (16 bytes today) 1074*da198e8fSThanos Makatos* index is the index of IRQ type being queried (e.g. ``VFIO_PCI_MSIX_IRQ_INDEX``) 1075*da198e8fSThanos Makatos* all other fields must be zero 1076*da198e8fSThanos Makatos 1077*da198e8fSThanos MakatosReply 1078*da198e8fSThanos Makatos^^^^^ 1079*da198e8fSThanos Makatos 1080*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1081*da198e8fSThanos Makatos| Name | Offset | Size | 1082*da198e8fSThanos Makatos+=======+========+===========================+ 1083*da198e8fSThanos Makatos| argsz | 0 | 4 | 1084*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1085*da198e8fSThanos Makatos| flags | 4 | 4 | 1086*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1087*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1088*da198e8fSThanos Makatos| | | Bit | Definition | | 1089*da198e8fSThanos Makatos| | +=====+==========================+ | 1090*da198e8fSThanos Makatos| | | 0 | VFIO_IRQ_INFO_EVENTFD | | 1091*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1092*da198e8fSThanos Makatos| | | 1 | VFIO_IRQ_INFO_MASKABLE | | 1093*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1094*da198e8fSThanos Makatos| | | 2 | VFIO_IRQ_INFO_AUTOMASKED | | 1095*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1096*da198e8fSThanos Makatos| | | 3 | VFIO_IRQ_INFO_NORESIZE | | 1097*da198e8fSThanos Makatos| | +-----+--------------------------+ | 1098*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1099*da198e8fSThanos Makatos| index | 8 | 4 | 1100*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1101*da198e8fSThanos Makatos| count | 12 | 4 | 1102*da198e8fSThanos Makatos+-------+--------+---------------------------+ 1103*da198e8fSThanos Makatos 1104*da198e8fSThanos Makatos* *argsz* is the size required for the full reply payload (16 bytes today) 1105*da198e8fSThanos Makatos* *flags* defines IRQ attributes: 1106*da198e8fSThanos Makatos 1107*da198e8fSThanos Makatos * ``VFIO_IRQ_INFO_EVENTFD`` indicates the IRQ type can support server eventfd 1108*da198e8fSThanos Makatos signalling. 1109*da198e8fSThanos Makatos * ``VFIO_IRQ_INFO_MASKABLE`` indicates that the IRQ type supports the ``MASK`` 1110*da198e8fSThanos Makatos and ``UNMASK`` actions in a ``VFIO_USER_DEVICE_SET_IRQS`` message. 1111*da198e8fSThanos Makatos * ``VFIO_IRQ_INFO_AUTOMASKED`` indicates the IRQ type masks itself after being 1112*da198e8fSThanos Makatos triggered, and the client must send an ``UNMASK`` action to receive new 1113*da198e8fSThanos Makatos interrupts. 1114*da198e8fSThanos Makatos * ``VFIO_IRQ_INFO_NORESIZE`` indicates ``VFIO_USER_SET_IRQS`` operations setup 1115*da198e8fSThanos Makatos interrupts as a set, and new sub-indexes cannot be enabled without disabling 1116*da198e8fSThanos Makatos the entire type. 1117*da198e8fSThanos Makatos* index is the index of IRQ type being queried 1118*da198e8fSThanos Makatos* count describes the number of interrupts of the queried type. 1119*da198e8fSThanos Makatos 1120*da198e8fSThanos Makatos``VFIO_USER_DEVICE_SET_IRQS`` 1121*da198e8fSThanos Makatos----------------------------- 1122*da198e8fSThanos Makatos 1123*da198e8fSThanos MakatosThis command message is sent by the client to the server to set actions for 1124*da198e8fSThanos Makatosdevice interrupt types. The VFIO IRQ set structure is defined in 1125*da198e8fSThanos Makatos``<linux/vfio.h>`` (``struct vfio_irq_set``). 1126*da198e8fSThanos Makatos 1127*da198e8fSThanos MakatosRequest 1128*da198e8fSThanos Makatos^^^^^^^ 1129*da198e8fSThanos Makatos 1130*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1131*da198e8fSThanos Makatos| Name | Offset | Size | 1132*da198e8fSThanos Makatos+=======+========+==============================+ 1133*da198e8fSThanos Makatos| argsz | 0 | 4 | 1134*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1135*da198e8fSThanos Makatos| flags | 4 | 4 | 1136*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1137*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 1138*da198e8fSThanos Makatos| | | Bit | Definition | | 1139*da198e8fSThanos Makatos| | +=====+=============================+ | 1140*da198e8fSThanos Makatos| | | 0 | VFIO_IRQ_SET_DATA_NONE | | 1141*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 1142*da198e8fSThanos Makatos| | | 1 | VFIO_IRQ_SET_DATA_BOOL | | 1143*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 1144*da198e8fSThanos Makatos| | | 2 | VFIO_IRQ_SET_DATA_EVENTFD | | 1145*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 1146*da198e8fSThanos Makatos| | | 3 | VFIO_IRQ_SET_ACTION_MASK | | 1147*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 1148*da198e8fSThanos Makatos| | | 4 | VFIO_IRQ_SET_ACTION_UNMASK | | 1149*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 1150*da198e8fSThanos Makatos| | | 5 | VFIO_IRQ_SET_ACTION_TRIGGER | | 1151*da198e8fSThanos Makatos| | +-----+-----------------------------+ | 1152*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1153*da198e8fSThanos Makatos| index | 8 | 4 | 1154*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1155*da198e8fSThanos Makatos| start | 12 | 4 | 1156*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1157*da198e8fSThanos Makatos| count | 16 | 4 | 1158*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1159*da198e8fSThanos Makatos| data | 20 | variable | 1160*da198e8fSThanos Makatos+-------+--------+------------------------------+ 1161*da198e8fSThanos Makatos 1162*da198e8fSThanos Makatos* *argsz* is the size of the VFIO IRQ set request payload, including any *data* 1163*da198e8fSThanos Makatos field. Note there is no reply payload, so this field differs from other 1164*da198e8fSThanos Makatos message types. 1165*da198e8fSThanos Makatos* *flags* defines the action performed on the interrupt range. The ``DATA`` 1166*da198e8fSThanos Makatos flags describe the data field sent in the message; the ``ACTION`` flags 1167*da198e8fSThanos Makatos describe the action to be performed. The flags are mutually exclusive for 1168*da198e8fSThanos Makatos both sets. 1169*da198e8fSThanos Makatos 1170*da198e8fSThanos Makatos * ``VFIO_IRQ_SET_DATA_NONE`` indicates there is no data field in the command. 1171*da198e8fSThanos Makatos The action is performed unconditionally. 1172*da198e8fSThanos Makatos * ``VFIO_IRQ_SET_DATA_BOOL`` indicates the data field is an array of boolean 1173*da198e8fSThanos Makatos bytes. The action is performed if the corresponding boolean is true. 1174*da198e8fSThanos Makatos * ``VFIO_IRQ_SET_DATA_EVENTFD`` indicates an array of event file descriptors 1175*da198e8fSThanos Makatos was sent in the message meta-data. These descriptors will be signalled when 1176*da198e8fSThanos Makatos the action defined by the action flags occurs. In ``AF_UNIX`` sockets, the 1177*da198e8fSThanos Makatos descriptors are sent as ``SCM_RIGHTS`` type ancillary data. 1178*da198e8fSThanos Makatos If no file descriptors are provided, this de-assigns the specified 1179*da198e8fSThanos Makatos previously configured interrupts. 1180*da198e8fSThanos Makatos * ``VFIO_IRQ_SET_ACTION_MASK`` indicates a masking event. It can be used with 1181*da198e8fSThanos Makatos ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to mask an interrupt, 1182*da198e8fSThanos Makatos or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event when the guest masks 1183*da198e8fSThanos Makatos the interrupt. 1184*da198e8fSThanos Makatos * ``VFIO_IRQ_SET_ACTION_UNMASK`` indicates an unmasking event. It can be used 1185*da198e8fSThanos Makatos with ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to unmask an 1186*da198e8fSThanos Makatos interrupt, or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event when the 1187*da198e8fSThanos Makatos guest unmasks the interrupt. 1188*da198e8fSThanos Makatos * ``VFIO_IRQ_SET_ACTION_TRIGGER`` indicates a triggering event. It can be used 1189*da198e8fSThanos Makatos with ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to trigger an 1190*da198e8fSThanos Makatos interrupt, or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event when the 1191*da198e8fSThanos Makatos server triggers the interrupt. 1192*da198e8fSThanos Makatos 1193*da198e8fSThanos Makatos* *index* is the index of IRQ type being setup. 1194*da198e8fSThanos Makatos* *start* is the start of the sub-index being set. 1195*da198e8fSThanos Makatos* *count* describes the number of sub-indexes being set. As a special case, a 1196*da198e8fSThanos Makatos count (and start) of 0, with data flags of ``VFIO_IRQ_SET_DATA_NONE`` disables 1197*da198e8fSThanos Makatos all interrupts of the index. 1198*da198e8fSThanos Makatos* *data* is an optional field included when the 1199*da198e8fSThanos Makatos ``VFIO_IRQ_SET_DATA_BOOL`` flag is present. It contains an array of booleans 1200*da198e8fSThanos Makatos that specify whether the action is to be performed on the corresponding 1201*da198e8fSThanos Makatos index. It's used when the action is only performed on a subset of the range 1202*da198e8fSThanos Makatos specified. 1203*da198e8fSThanos Makatos 1204*da198e8fSThanos MakatosNot all interrupt types support every combination of data and action flags. 1205*da198e8fSThanos MakatosThe client must know the capabilities of the device and IRQ index before it 1206*da198e8fSThanos Makatossends a ``VFIO_USER_DEVICE_SET_IRQ`` message. 1207*da198e8fSThanos Makatos 1208*da198e8fSThanos MakatosIn typical operation, a specific IRQ may operate as follows: 1209*da198e8fSThanos Makatos 1210*da198e8fSThanos Makatos1. The client sends a ``VFIO_USER_DEVICE_SET_IRQ`` message with 1211*da198e8fSThanos Makatos ``flags=(VFIO_IRQ_SET_DATA_EVENTFD|VFIO_IRQ_SET_ACTION_TRIGGER)`` along 1212*da198e8fSThanos Makatos with an eventfd. This associates the IRQ with a particular eventfd on the 1213*da198e8fSThanos Makatos server side. 1214*da198e8fSThanos Makatos 1215*da198e8fSThanos Makatos#. The client may send a ``VFIO_USER_DEVICE_SET_IRQ`` message with 1216*da198e8fSThanos Makatos ``flags=(VFIO_IRQ_SET_DATA_EVENTFD|VFIO_IRQ_SET_ACTION_MASK/UNMASK)`` along 1217*da198e8fSThanos Makatos with another eventfd. This associates the given eventfd with the 1218*da198e8fSThanos Makatos mask/unmask state on the server side. 1219*da198e8fSThanos Makatos 1220*da198e8fSThanos Makatos#. The server may trigger the IRQ by writing 1 to the eventfd. 1221*da198e8fSThanos Makatos 1222*da198e8fSThanos Makatos#. The server may mask/unmask an IRQ which will write 1 to the corresponding 1223*da198e8fSThanos Makatos mask/unmask eventfd, if there is one. 1224*da198e8fSThanos Makatos 1225*da198e8fSThanos Makatos5. A client may trigger a device IRQ itself, by sending a 1226*da198e8fSThanos Makatos ``VFIO_USER_DEVICE_SET_IRQ`` message with 1227*da198e8fSThanos Makatos ``flags=(VFIO_IRQ_SET_DATA_NONE/BOOL|VFIO_IRQ_SET_ACTION_TRIGGER)``. 1228*da198e8fSThanos Makatos 1229*da198e8fSThanos Makatos6. A client may mask or unmask the IRQ, by sending a 1230*da198e8fSThanos Makatos ``VFIO_USER_DEVICE_SET_IRQ`` message with 1231*da198e8fSThanos Makatos ``flags=(VFIO_IRQ_SET_DATA_NONE/BOOL|VFIO_IRQ_SET_ACTION_MASK/UNMASK)``. 1232*da198e8fSThanos Makatos 1233*da198e8fSThanos MakatosReply 1234*da198e8fSThanos Makatos^^^^^ 1235*da198e8fSThanos Makatos 1236*da198e8fSThanos MakatosThere is no payload in the reply. 1237*da198e8fSThanos Makatos 1238*da198e8fSThanos Makatos.. _Read and Write Operations: 1239*da198e8fSThanos Makatos 1240*da198e8fSThanos MakatosNote that all of these operations must be supported by the client and/or server, 1241*da198e8fSThanos Makatoseven if the corresponding memory or device region has been shared as mappable. 1242*da198e8fSThanos Makatos 1243*da198e8fSThanos MakatosThe ``count`` field must not exceed the value of ``max_data_xfer_size`` of the 1244*da198e8fSThanos Makatospeer, for both reads and writes. 1245*da198e8fSThanos Makatos 1246*da198e8fSThanos Makatos``VFIO_USER_REGION_READ`` 1247*da198e8fSThanos Makatos------------------------- 1248*da198e8fSThanos Makatos 1249*da198e8fSThanos MakatosIf a device region is not mappable, it's not directly accessible by the client 1250*da198e8fSThanos Makatosvia ``mmap()`` of the underlying file descriptor. In this case, a client can 1251*da198e8fSThanos Makatosread from a device region with this message. 1252*da198e8fSThanos Makatos 1253*da198e8fSThanos MakatosRequest 1254*da198e8fSThanos Makatos^^^^^^^ 1255*da198e8fSThanos Makatos 1256*da198e8fSThanos Makatos+--------+--------+----------+ 1257*da198e8fSThanos Makatos| Name | Offset | Size | 1258*da198e8fSThanos Makatos+========+========+==========+ 1259*da198e8fSThanos Makatos| offset | 0 | 8 | 1260*da198e8fSThanos Makatos+--------+--------+----------+ 1261*da198e8fSThanos Makatos| region | 8 | 4 | 1262*da198e8fSThanos Makatos+--------+--------+----------+ 1263*da198e8fSThanos Makatos| count | 12 | 4 | 1264*da198e8fSThanos Makatos+--------+--------+----------+ 1265*da198e8fSThanos Makatos 1266*da198e8fSThanos Makatos* *offset* into the region being accessed. 1267*da198e8fSThanos Makatos* *region* is the index of the region being accessed. 1268*da198e8fSThanos Makatos* *count* is the size of the data to be transferred. 1269*da198e8fSThanos Makatos 1270*da198e8fSThanos MakatosReply 1271*da198e8fSThanos Makatos^^^^^ 1272*da198e8fSThanos Makatos 1273*da198e8fSThanos Makatos+--------+--------+----------+ 1274*da198e8fSThanos Makatos| Name | Offset | Size | 1275*da198e8fSThanos Makatos+========+========+==========+ 1276*da198e8fSThanos Makatos| offset | 0 | 8 | 1277*da198e8fSThanos Makatos+--------+--------+----------+ 1278*da198e8fSThanos Makatos| region | 8 | 4 | 1279*da198e8fSThanos Makatos+--------+--------+----------+ 1280*da198e8fSThanos Makatos| count | 12 | 4 | 1281*da198e8fSThanos Makatos+--------+--------+----------+ 1282*da198e8fSThanos Makatos| data | 16 | variable | 1283*da198e8fSThanos Makatos+--------+--------+----------+ 1284*da198e8fSThanos Makatos 1285*da198e8fSThanos Makatos* *offset* into the region accessed. 1286*da198e8fSThanos Makatos* *region* is the index of the region accessed. 1287*da198e8fSThanos Makatos* *count* is the size of the data transferred. 1288*da198e8fSThanos Makatos* *data* is the data that was read from the device region. 1289*da198e8fSThanos Makatos 1290*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE`` 1291*da198e8fSThanos Makatos-------------------------- 1292*da198e8fSThanos Makatos 1293*da198e8fSThanos MakatosIf a device region is not mappable, it's not directly accessible by the client 1294*da198e8fSThanos Makatosvia mmap() of the underlying fd. In this case, a client can write to a device 1295*da198e8fSThanos Makatosregion with this message. 1296*da198e8fSThanos Makatos 1297*da198e8fSThanos MakatosRequest 1298*da198e8fSThanos Makatos^^^^^^^ 1299*da198e8fSThanos Makatos 1300*da198e8fSThanos Makatos+--------+--------+----------+ 1301*da198e8fSThanos Makatos| Name | Offset | Size | 1302*da198e8fSThanos Makatos+========+========+==========+ 1303*da198e8fSThanos Makatos| offset | 0 | 8 | 1304*da198e8fSThanos Makatos+--------+--------+----------+ 1305*da198e8fSThanos Makatos| region | 8 | 4 | 1306*da198e8fSThanos Makatos+--------+--------+----------+ 1307*da198e8fSThanos Makatos| count | 12 | 4 | 1308*da198e8fSThanos Makatos+--------+--------+----------+ 1309*da198e8fSThanos Makatos| data | 16 | variable | 1310*da198e8fSThanos Makatos+--------+--------+----------+ 1311*da198e8fSThanos Makatos 1312*da198e8fSThanos Makatos* *offset* into the region being accessed. 1313*da198e8fSThanos Makatos* *region* is the index of the region being accessed. 1314*da198e8fSThanos Makatos* *count* is the size of the data to be transferred. 1315*da198e8fSThanos Makatos* *data* is the data to write 1316*da198e8fSThanos Makatos 1317*da198e8fSThanos MakatosReply 1318*da198e8fSThanos Makatos^^^^^ 1319*da198e8fSThanos Makatos 1320*da198e8fSThanos Makatos+--------+--------+----------+ 1321*da198e8fSThanos Makatos| Name | Offset | Size | 1322*da198e8fSThanos Makatos+========+========+==========+ 1323*da198e8fSThanos Makatos| offset | 0 | 8 | 1324*da198e8fSThanos Makatos+--------+--------+----------+ 1325*da198e8fSThanos Makatos| region | 8 | 4 | 1326*da198e8fSThanos Makatos+--------+--------+----------+ 1327*da198e8fSThanos Makatos| count | 12 | 4 | 1328*da198e8fSThanos Makatos+--------+--------+----------+ 1329*da198e8fSThanos Makatos 1330*da198e8fSThanos Makatos* *offset* into the region accessed. 1331*da198e8fSThanos Makatos* *region* is the index of the region accessed. 1332*da198e8fSThanos Makatos* *count* is the size of the data transferred. 1333*da198e8fSThanos Makatos 1334*da198e8fSThanos Makatos``VFIO_USER_DMA_READ`` 1335*da198e8fSThanos Makatos----------------------- 1336*da198e8fSThanos Makatos 1337*da198e8fSThanos MakatosIf the client has not shared mappable memory, the server can use this message to 1338*da198e8fSThanos Makatosread from guest memory. 1339*da198e8fSThanos Makatos 1340*da198e8fSThanos MakatosRequest 1341*da198e8fSThanos Makatos^^^^^^^ 1342*da198e8fSThanos Makatos 1343*da198e8fSThanos Makatos+---------+--------+----------+ 1344*da198e8fSThanos Makatos| Name | Offset | Size | 1345*da198e8fSThanos Makatos+=========+========+==========+ 1346*da198e8fSThanos Makatos| address | 0 | 8 | 1347*da198e8fSThanos Makatos+---------+--------+----------+ 1348*da198e8fSThanos Makatos| count | 8 | 8 | 1349*da198e8fSThanos Makatos+---------+--------+----------+ 1350*da198e8fSThanos Makatos 1351*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed. This address must have 1352*da198e8fSThanos Makatos been previously exported to the server with a ``VFIO_USER_DMA_MAP`` message. 1353*da198e8fSThanos Makatos* *count* is the size of the data to be transferred. 1354*da198e8fSThanos Makatos 1355*da198e8fSThanos MakatosReply 1356*da198e8fSThanos Makatos^^^^^ 1357*da198e8fSThanos Makatos 1358*da198e8fSThanos Makatos+---------+--------+----------+ 1359*da198e8fSThanos Makatos| Name | Offset | Size | 1360*da198e8fSThanos Makatos+=========+========+==========+ 1361*da198e8fSThanos Makatos| address | 0 | 8 | 1362*da198e8fSThanos Makatos+---------+--------+----------+ 1363*da198e8fSThanos Makatos| count | 8 | 8 | 1364*da198e8fSThanos Makatos+---------+--------+----------+ 1365*da198e8fSThanos Makatos| data | 16 | variable | 1366*da198e8fSThanos Makatos+---------+--------+----------+ 1367*da198e8fSThanos Makatos 1368*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed. 1369*da198e8fSThanos Makatos* *count* is the size of the data transferred. 1370*da198e8fSThanos Makatos* *data* is the data read. 1371*da198e8fSThanos Makatos 1372*da198e8fSThanos Makatos``VFIO_USER_DMA_WRITE`` 1373*da198e8fSThanos Makatos----------------------- 1374*da198e8fSThanos Makatos 1375*da198e8fSThanos MakatosIf the client has not shared mappable memory, the server can use this message to 1376*da198e8fSThanos Makatoswrite to guest memory. 1377*da198e8fSThanos Makatos 1378*da198e8fSThanos MakatosRequest 1379*da198e8fSThanos Makatos^^^^^^^ 1380*da198e8fSThanos Makatos 1381*da198e8fSThanos Makatos+---------+--------+----------+ 1382*da198e8fSThanos Makatos| Name | Offset | Size | 1383*da198e8fSThanos Makatos+=========+========+==========+ 1384*da198e8fSThanos Makatos| address | 0 | 8 | 1385*da198e8fSThanos Makatos+---------+--------+----------+ 1386*da198e8fSThanos Makatos| count | 8 | 8 | 1387*da198e8fSThanos Makatos+---------+--------+----------+ 1388*da198e8fSThanos Makatos| data | 16 | variable | 1389*da198e8fSThanos Makatos+---------+--------+----------+ 1390*da198e8fSThanos Makatos 1391*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed. This address must have 1392*da198e8fSThanos Makatos been previously exported to the server with a ``VFIO_USER_DMA_MAP`` message. 1393*da198e8fSThanos Makatos* *count* is the size of the data to be transferred. 1394*da198e8fSThanos Makatos* *data* is the data to write 1395*da198e8fSThanos Makatos 1396*da198e8fSThanos MakatosReply 1397*da198e8fSThanos Makatos^^^^^ 1398*da198e8fSThanos Makatos 1399*da198e8fSThanos Makatos+---------+--------+----------+ 1400*da198e8fSThanos Makatos| Name | Offset | Size | 1401*da198e8fSThanos Makatos+=========+========+==========+ 1402*da198e8fSThanos Makatos| address | 0 | 8 | 1403*da198e8fSThanos Makatos+---------+--------+----------+ 1404*da198e8fSThanos Makatos| count | 8 | 4 | 1405*da198e8fSThanos Makatos+---------+--------+----------+ 1406*da198e8fSThanos Makatos 1407*da198e8fSThanos Makatos* *address* is the client DMA memory address being accessed. 1408*da198e8fSThanos Makatos* *count* is the size of the data transferred. 1409*da198e8fSThanos Makatos 1410*da198e8fSThanos Makatos``VFIO_USER_DEVICE_RESET`` 1411*da198e8fSThanos Makatos-------------------------- 1412*da198e8fSThanos Makatos 1413*da198e8fSThanos MakatosThis command message is sent from the client to the server to reset the device. 1414*da198e8fSThanos MakatosNeither the request or reply have a payload. 1415*da198e8fSThanos Makatos 1416*da198e8fSThanos Makatos``VFIO_USER_REGION_WRITE_MULTI`` 1417*da198e8fSThanos Makatos-------------------------------- 1418*da198e8fSThanos Makatos 1419*da198e8fSThanos MakatosThis message can be used to coalesce multiple device write operations 1420*da198e8fSThanos Makatosinto a single messgage. It is only used as an optimization when the 1421*da198e8fSThanos Makatosoutgoing message queue is relatively full. 1422*da198e8fSThanos Makatos 1423*da198e8fSThanos MakatosRequest 1424*da198e8fSThanos Makatos^^^^^^^ 1425*da198e8fSThanos Makatos 1426*da198e8fSThanos Makatos+---------+--------+----------+ 1427*da198e8fSThanos Makatos| Name | Offset | Size | 1428*da198e8fSThanos Makatos+=========+========+==========+ 1429*da198e8fSThanos Makatos| wr_cnt | 0 | 8 | 1430*da198e8fSThanos Makatos+---------+--------+----------+ 1431*da198e8fSThanos Makatos| wrs | 8 | variable | 1432*da198e8fSThanos Makatos+---------+--------+----------+ 1433*da198e8fSThanos Makatos 1434*da198e8fSThanos Makatos* *wr_cnt* is the number of device writes coalesced in the message 1435*da198e8fSThanos Makatos* *wrs* is an array of device writes defined below 1436*da198e8fSThanos Makatos 1437*da198e8fSThanos MakatosSingle Device Write Format 1438*da198e8fSThanos Makatos"""""""""""""""""""""""""" 1439*da198e8fSThanos Makatos 1440*da198e8fSThanos Makatos+--------+--------+----------+ 1441*da198e8fSThanos Makatos| Name | Offset | Size | 1442*da198e8fSThanos Makatos+========+========+==========+ 1443*da198e8fSThanos Makatos| offset | 0 | 8 | 1444*da198e8fSThanos Makatos+--------+--------+----------+ 1445*da198e8fSThanos Makatos| region | 8 | 4 | 1446*da198e8fSThanos Makatos+--------+--------+----------+ 1447*da198e8fSThanos Makatos| count | 12 | 4 | 1448*da198e8fSThanos Makatos+--------+--------+----------+ 1449*da198e8fSThanos Makatos| data | 16 | 8 | 1450*da198e8fSThanos Makatos+--------+--------+----------+ 1451*da198e8fSThanos Makatos 1452*da198e8fSThanos Makatos* *offset* into the region being accessed. 1453*da198e8fSThanos Makatos* *region* is the index of the region being accessed. 1454*da198e8fSThanos Makatos* *count* is the size of the data to be transferred. This format can 1455*da198e8fSThanos Makatos only describe writes of 8 bytes or less. 1456*da198e8fSThanos Makatos* *data* is the data to write. 1457*da198e8fSThanos Makatos 1458*da198e8fSThanos MakatosReply 1459*da198e8fSThanos Makatos^^^^^ 1460*da198e8fSThanos Makatos 1461*da198e8fSThanos Makatos+---------+--------+----------+ 1462*da198e8fSThanos Makatos| Name | Offset | Size | 1463*da198e8fSThanos Makatos+=========+========+==========+ 1464*da198e8fSThanos Makatos| wr_cnt | 0 | 8 | 1465*da198e8fSThanos Makatos+---------+--------+----------+ 1466*da198e8fSThanos Makatos 1467*da198e8fSThanos Makatos* *wr_cnt* is the number of device writes completed. 1468*da198e8fSThanos Makatos 1469*da198e8fSThanos Makatos 1470*da198e8fSThanos MakatosAppendices 1471*da198e8fSThanos Makatos========== 1472*da198e8fSThanos Makatos 1473*da198e8fSThanos MakatosUnused VFIO ``ioctl()`` commands 1474*da198e8fSThanos Makatos-------------------------------- 1475*da198e8fSThanos Makatos 1476*da198e8fSThanos MakatosThe following VFIO commands do not have an equivalent vfio-user command: 1477*da198e8fSThanos Makatos 1478*da198e8fSThanos Makatos* ``VFIO_GET_API_VERSION`` 1479*da198e8fSThanos Makatos* ``VFIO_CHECK_EXTENSION`` 1480*da198e8fSThanos Makatos* ``VFIO_SET_IOMMU`` 1481*da198e8fSThanos Makatos* ``VFIO_GROUP_GET_STATUS`` 1482*da198e8fSThanos Makatos* ``VFIO_GROUP_SET_CONTAINER`` 1483*da198e8fSThanos Makatos* ``VFIO_GROUP_UNSET_CONTAINER`` 1484*da198e8fSThanos Makatos* ``VFIO_GROUP_GET_DEVICE_FD`` 1485*da198e8fSThanos Makatos* ``VFIO_IOMMU_GET_INFO`` 1486*da198e8fSThanos Makatos 1487*da198e8fSThanos MakatosHowever, once support for live migration for VFIO devices is finalized some 1488*da198e8fSThanos Makatosof the above commands may have to be handled by the client in their 1489*da198e8fSThanos Makatoscorresponding vfio-user form. This will be addressed in a future protocol 1490*da198e8fSThanos Makatosversion. 1491*da198e8fSThanos Makatos 1492*da198e8fSThanos MakatosVFIO groups and containers 1493*da198e8fSThanos Makatos^^^^^^^^^^^^^^^^^^^^^^^^^^ 1494*da198e8fSThanos Makatos 1495*da198e8fSThanos MakatosThe current VFIO implementation includes group and container idioms that 1496*da198e8fSThanos Makatosdescribe how a device relates to the host IOMMU. In the vfio-user 1497*da198e8fSThanos Makatosimplementation, the IOMMU is implemented in SW by the client, and is not 1498*da198e8fSThanos Makatosvisible to the server. The simplest idea would be that the client put each 1499*da198e8fSThanos Makatosdevice into its own group and container. 1500*da198e8fSThanos Makatos 1501*da198e8fSThanos MakatosBackend Program Conventions 1502*da198e8fSThanos Makatos--------------------------- 1503*da198e8fSThanos Makatos 1504*da198e8fSThanos Makatosvfio-user backend program conventions are based on the vhost-user ones. 1505*da198e8fSThanos Makatos 1506*da198e8fSThanos Makatos* The backend program must not daemonize itself. 1507*da198e8fSThanos Makatos* No assumptions must be made as to what access the backend program has on the 1508*da198e8fSThanos Makatos system. 1509*da198e8fSThanos Makatos* File descriptors 0, 1 and 2 must exist, must have regular 1510*da198e8fSThanos Makatos stdin/stdout/stderr semantics, and can be redirected. 1511*da198e8fSThanos Makatos* The backend program must honor the SIGTERM signal. 1512*da198e8fSThanos Makatos* The backend program must accept the following commands line options: 1513*da198e8fSThanos Makatos 1514*da198e8fSThanos Makatos * ``--socket-path=PATH``: path to UNIX domain socket, 1515*da198e8fSThanos Makatos * ``--fd=FDNUM``: file descriptor for UNIX domain socket, incompatible with 1516*da198e8fSThanos Makatos ``--socket-path`` 1517*da198e8fSThanos Makatos* The backend program must be accompanied with a JSON file stored under 1518*da198e8fSThanos Makatos ``/usr/share/vfio-user``. 1519*da198e8fSThanos Makatos 1520*da198e8fSThanos MakatosTODO add schema similar to docs/interop/vhost-user.json. 1521