xref: /qemu/docs/specs/ivshmem-spec.rst (revision bb1cff6ee044cb13e2e81609a0b9a86378f85f1f)
1*bb1cff6eSPeter Maydell======================================================
2*bb1cff6eSPeter MaydellDevice Specification for Inter-VM shared memory device
3*bb1cff6eSPeter Maydell======================================================
4fdee2025SMarkus Armbruster
5fdee2025SMarkus ArmbrusterThe Inter-VM shared memory device (ivshmem) is designed to share a
6fdee2025SMarkus Armbrustermemory region between multiple QEMU processes running different guests
7fdee2025SMarkus Armbrusterand the host.  In order for all guests to be able to pick up the
8fdee2025SMarkus Armbrustershared memory area, it is modeled by QEMU as a PCI device exposing
9fdee2025SMarkus Armbrustersaid memory to the guest as a PCI BAR.
10fdee2025SMarkus Armbruster
11fdee2025SMarkus ArmbrusterThe device can use a shared memory object on the host directly, or it
12fdee2025SMarkus Armbrustercan obtain one from an ivshmem server.
13fdee2025SMarkus Armbruster
14fdee2025SMarkus ArmbrusterIn the latter case, the device can additionally interrupt its peers, and
15fdee2025SMarkus Armbrusterget interrupted by its peers.
16fdee2025SMarkus Armbruster
17*bb1cff6eSPeter MaydellFor information on configuring the ivshmem device on the QEMU
18*bb1cff6eSPeter Maydellcommand line, see :doc:`../system/devices/ivshmem`.
19fdee2025SMarkus Armbruster
20*bb1cff6eSPeter MaydellThe ivshmem PCI device's guest interface
21*bb1cff6eSPeter Maydell========================================
22fdee2025SMarkus Armbruster
235400c02bSMarkus ArmbrusterThe device has vendor ID 1af4, device ID 1110, revision 1.  Before
245400c02bSMarkus ArmbrusterQEMU 2.6.0, it had revision 0.
25fdee2025SMarkus Armbruster
26*bb1cff6eSPeter MaydellPCI BARs
27*bb1cff6eSPeter Maydell--------
28fdee2025SMarkus Armbruster
29fdee2025SMarkus ArmbrusterThe ivshmem PCI device has two or three BARs:
30fdee2025SMarkus Armbruster
31fdee2025SMarkus Armbruster- BAR0 holds device registers (256 Byte MMIO)
325400c02bSMarkus Armbruster- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
33fdee2025SMarkus Armbruster- BAR2 maps the shared memory object
34fdee2025SMarkus Armbruster
35fdee2025SMarkus ArmbrusterThere are two ways to use this device:
36fdee2025SMarkus Armbruster
37fdee2025SMarkus Armbruster- If you only need the shared memory part, BAR2 suffices.  This way,
38fdee2025SMarkus Armbruster  you have access to the shared memory in the guest and can use it as
39*bb1cff6eSPeter Maydell  you see fit.
40fdee2025SMarkus Armbruster
41fdee2025SMarkus Armbruster- If you additionally need the capability for peers to interrupt each
425400c02bSMarkus Armbruster  other, you need BAR0 and BAR1.  You will most likely want to write a
435400c02bSMarkus Armbruster  kernel driver to handle interrupts.  Requires the device to be
445400c02bSMarkus Armbruster  configured for interrupts, obviously.
45fdee2025SMarkus Armbruster
461309cf44SMarkus ArmbrusterBefore QEMU 2.6.0, BAR2 can initially be invalid if the device is
471309cf44SMarkus Armbrusterconfigured for interrupts.  It becomes safely accessible only after
485400c02bSMarkus Armbrusterthe ivshmem server provided the shared memory.  These devices have PCI
495400c02bSMarkus Armbrusterrevision 0 rather than 1.  Guest software should wait for the
505400c02bSMarkus ArmbrusterIVPosition register (described below) to become non-negative before
515400c02bSMarkus Armbrusteraccessing BAR2.
52fdee2025SMarkus Armbruster
535400c02bSMarkus ArmbrusterRevision 0 of the device is not capable to tell guest software whether
545400c02bSMarkus Armbrusterit is configured for interrupts.
55fdee2025SMarkus Armbruster
56*bb1cff6eSPeter MaydellPCI device registers
57*bb1cff6eSPeter Maydell--------------------
58fdee2025SMarkus Armbruster
59fdee2025SMarkus ArmbrusterBAR 0 contains the following registers:
60fdee2025SMarkus Armbruster
61*bb1cff6eSPeter Maydell::
62*bb1cff6eSPeter Maydell
63fdee2025SMarkus Armbruster    Offset  Size  Access      On reset  Function
64fdee2025SMarkus Armbruster        0     4   read/write        0   Interrupt Mask
655400c02bSMarkus Armbruster                                        bit 0: peer interrupt (rev 0)
665400c02bSMarkus Armbruster                                               reserved       (rev 1)
67fdee2025SMarkus Armbruster                                        bit 1..31: reserved
68fdee2025SMarkus Armbruster        4     4   read/write        0   Interrupt Status
695400c02bSMarkus Armbruster                                        bit 0: peer interrupt (rev 0)
705400c02bSMarkus Armbruster                                               reserved       (rev 1)
71fdee2025SMarkus Armbruster                                        bit 1..31: reserved
721309cf44SMarkus Armbruster        8     4   read-only   0 or ID   IVPosition
73fdee2025SMarkus Armbruster       12     4   write-only      N/A   Doorbell
74fdee2025SMarkus Armbruster                                        bit 0..15: vector
75fdee2025SMarkus Armbruster                                        bit 16..31: peer ID
76fdee2025SMarkus Armbruster       16   240   none            N/A   reserved
77fdee2025SMarkus Armbruster
78fdee2025SMarkus ArmbrusterSoftware should only access the registers as specified in column
79fdee2025SMarkus Armbruster"Access".  Reserved bits should be ignored on read, and preserved on
80fdee2025SMarkus Armbrusterwrite.
81fdee2025SMarkus Armbruster
825400c02bSMarkus ArmbrusterIn revision 0 of the device, Interrupt Status and Mask Register
835400c02bSMarkus Armbrustertogether control the legacy INTx interrupt when the device has no
845400c02bSMarkus ArmbrusterMSI-X capability: INTx is asserted when the bit-wise AND of Status and
855400c02bSMarkus ArmbrusterMask is non-zero and the device has no MSI-X capability.  Interrupt
865400c02bSMarkus ArmbrusterStatus Register bit 0 becomes 1 when an interrupt request from a peer
875400c02bSMarkus Armbrusteris received.  Reading the register clears it.
88fdee2025SMarkus Armbruster
89fdee2025SMarkus ArmbrusterIVPosition Register: if the device is not configured for interrupts,
901309cf44SMarkus Armbrusterthis is zero.  Else, it is the device's ID (between 0 and 65535).
911309cf44SMarkus Armbruster
921309cf44SMarkus ArmbrusterBefore QEMU 2.6.0, the register may read -1 for a short while after
935400c02bSMarkus Armbrusterreset.  These devices have PCI revision 0 rather than 1.
94fdee2025SMarkus Armbruster
95fdee2025SMarkus ArmbrusterThere is no good way for software to find out whether the device is
96fdee2025SMarkus Armbrusterconfigured for interrupts.  A positive IVPosition means interrupts,
971309cf44SMarkus Armbrusterbut zero could be either.
98fdee2025SMarkus Armbruster
99fdee2025SMarkus ArmbrusterDoorbell Register: writing this register requests to interrupt a peer.
100fdee2025SMarkus ArmbrusterThe written value's high 16 bits are the ID of the peer to interrupt,
101fdee2025SMarkus Armbrusterand its low 16 bits select an interrupt vector.
102fdee2025SMarkus Armbruster
103fdee2025SMarkus ArmbrusterIf the device is not configured for interrupts, the write is ignored.
104fdee2025SMarkus Armbruster
105fdee2025SMarkus ArmbrusterIf the interrupt hasn't completed setup, the write is ignored.  The
106fdee2025SMarkus Armbrusterdevice is not capable to tell guest software whether setup is
107fdee2025SMarkus Armbrustercomplete.  Interrupts can regress to this state on migration.
108fdee2025SMarkus Armbruster
109fdee2025SMarkus ArmbrusterIf the peer with the requested ID isn't connected, or it has fewer
110fdee2025SMarkus Armbrusterinterrupt vectors connected, the write is ignored.  The device is not
111fdee2025SMarkus Armbrustercapable to tell guest software what peers are connected, or how many
112fdee2025SMarkus Armbrusterinterrupt vectors are connected.
113fdee2025SMarkus Armbruster
1145400c02bSMarkus ArmbrusterThe peer's interrupt for this vector then becomes pending.  There is
1155400c02bSMarkus Armbrusterno way for software to clear the pending bit, and a polling mode of
1165400c02bSMarkus Armbrusteroperation is therefore impossible.
117fdee2025SMarkus Armbruster
1185400c02bSMarkus ArmbrusterIf the peer is a revision 0 device without MSI-X capability, its
1195400c02bSMarkus ArmbrusterInterrupt Status register is set to 1.  This asserts INTx unless
1205400c02bSMarkus Armbrustermasked by the Interrupt Mask register.  The device is not capable to
1215400c02bSMarkus Armbrustercommunicate the interrupt vector to guest software then.
122fdee2025SMarkus Armbruster
123fdee2025SMarkus ArmbrusterWith multiple MSI-X vectors, different vectors can be used to indicate
124fdee2025SMarkus Armbrusterdifferent events have occurred.  The semantics of interrupt vectors
125fdee2025SMarkus Armbrusterare left to the application.
126fdee2025SMarkus Armbruster
127*bb1cff6eSPeter MaydellInterrupt infrastructure
128*bb1cff6eSPeter Maydell========================
129fdee2025SMarkus Armbruster
130fdee2025SMarkus ArmbrusterWhen configured for interrupts, the peers share eventfd objects in
131fdee2025SMarkus Armbrusteraddition to shared memory.  The shared resources are managed by an
132fdee2025SMarkus Armbrusterivshmem server.
133fdee2025SMarkus Armbruster
134*bb1cff6eSPeter MaydellThe ivshmem server
135*bb1cff6eSPeter Maydell------------------
136fdee2025SMarkus Armbruster
137fdee2025SMarkus ArmbrusterThe server listens on a UNIX domain socket.
138fdee2025SMarkus Armbruster
139fdee2025SMarkus ArmbrusterFor each new client that connects to the server, the server
140*bb1cff6eSPeter Maydell
141fdee2025SMarkus Armbruster- picks an ID,
142fdee2025SMarkus Armbruster- creates eventfd file descriptors for the interrupt vectors,
143fdee2025SMarkus Armbruster- sends the ID and the file descriptor for the shared memory to the
144fdee2025SMarkus Armbruster  new client,
145fdee2025SMarkus Armbruster- sends connect notifications for the new client to the other clients
146fdee2025SMarkus Armbruster  (these contain file descriptors for sending interrupts),
147fdee2025SMarkus Armbruster- sends connect notifications for the other clients to the new client,
148fdee2025SMarkus Armbruster  and
149fdee2025SMarkus Armbruster- sends interrupt setup messages to the new client (these contain file
150fdee2025SMarkus Armbruster  descriptors for receiving interrupts).
151fdee2025SMarkus Armbruster
15262a830b6SMarkus ArmbrusterThe first client to connect to the server receives ID zero.
15362a830b6SMarkus Armbruster
154fdee2025SMarkus ArmbrusterWhen a client disconnects from the server, the server sends disconnect
155fdee2025SMarkus Armbrusternotifications to the other clients.
156fdee2025SMarkus Armbruster
157fdee2025SMarkus ArmbrusterThe next section describes the protocol in detail.
158fdee2025SMarkus Armbruster
159fdee2025SMarkus ArmbrusterIf the server terminates without sending disconnect notifications for
160fdee2025SMarkus Armbrusterits connected clients, the clients can elect to continue.  They can
161fdee2025SMarkus Armbrustercommunicate with each other normally, but won't receive disconnect
162fdee2025SMarkus Armbrusternotification on disconnect, and no new clients can connect.  There is
163fdee2025SMarkus Armbrusterno way for the clients to connect to a restarted server.  The device
164fdee2025SMarkus Armbrusteris not capable to tell guest software whether the server is still up.
165fdee2025SMarkus Armbruster
166fdee2025SMarkus ArmbrusterExample server code is in contrib/ivshmem-server/.  Not to be used in
167fdee2025SMarkus Armbrusterproduction.  It assumes all clients use the same number of interrupt
168fdee2025SMarkus Armbrustervectors.
169fdee2025SMarkus Armbruster
170fdee2025SMarkus ArmbrusterA standalone client is in contrib/ivshmem-client/.  It can be useful
171fdee2025SMarkus Armbrusterfor debugging.
172fdee2025SMarkus Armbruster
173*bb1cff6eSPeter MaydellThe ivshmem Client-Server Protocol
174*bb1cff6eSPeter Maydell----------------------------------
175fdee2025SMarkus Armbruster
176fdee2025SMarkus ArmbrusterAn ivshmem device configured for interrupts connects to an ivshmem
177fdee2025SMarkus Armbrusterserver.  This section details the protocol between the two.
178fdee2025SMarkus Armbruster
179fdee2025SMarkus ArmbrusterThe connection is one-way: the server sends messages to the client.
180fdee2025SMarkus ArmbrusterEach message consists of a single 8 byte little-endian signed number,
181fdee2025SMarkus Armbrusterand may be accompanied by a file descriptor via SCM_RIGHTS.  Both
182fdee2025SMarkus Armbrusterclient and server close the connection on error.
183fdee2025SMarkus Armbruster
18471c26581SMarkus ArmbrusterNote: QEMU currently doesn't close the connection right on error, but
18571c26581SMarkus Armbrusteronly when the character device is destroyed.
18671c26581SMarkus Armbruster
187fdee2025SMarkus ArmbrusterOn connect, the server sends the following messages in order:
188fdee2025SMarkus Armbruster
189fdee2025SMarkus Armbruster1. The protocol version number, currently zero.  The client should
190fdee2025SMarkus Armbruster   close the connection on receipt of versions it can't handle.
191fdee2025SMarkus Armbruster
192fdee2025SMarkus Armbruster2. The client's ID.  This is unique among all clients of this server.
193fdee2025SMarkus Armbruster   IDs must be between 0 and 65535, because the Doorbell register
194fdee2025SMarkus Armbruster   provides only 16 bits for them.
195fdee2025SMarkus Armbruster
196fdee2025SMarkus Armbruster3. The number -1, accompanied by the file descriptor for the shared
197fdee2025SMarkus Armbruster   memory.
198fdee2025SMarkus Armbruster
199fdee2025SMarkus Armbruster4. Connect notifications for existing other clients, if any.  This is
200fdee2025SMarkus Armbruster   a peer ID (number between 0 and 65535 other than the client's ID),
201fdee2025SMarkus Armbruster   repeated N times.  Each repetition is accompanied by one file
202fdee2025SMarkus Armbruster   descriptor.  These are for interrupting the peer with that ID using
203fdee2025SMarkus Armbruster   vector 0,..,N-1, in order.  If the client is configured for fewer
204fdee2025SMarkus Armbruster   vectors, it closes the extra file descriptors.  If it is configured
205fdee2025SMarkus Armbruster   for more, the extra vectors remain unconnected.
206fdee2025SMarkus Armbruster
207fdee2025SMarkus Armbruster5. Interrupt setup.  This is the client's own ID, repeated N times.
208fdee2025SMarkus Armbruster   Each repetition is accompanied by one file descriptor.  These are
209fdee2025SMarkus Armbruster   for receiving interrupts from peers using vector 0,..,N-1, in
210fdee2025SMarkus Armbruster   order.  If the client is configured for fewer vectors, it closes
211fdee2025SMarkus Armbruster   the extra file descriptors.  If it is configured for more, the
212fdee2025SMarkus Armbruster   extra vectors remain unconnected.
213fdee2025SMarkus Armbruster
214fdee2025SMarkus ArmbrusterFrom then on, the server sends these kinds of messages:
215fdee2025SMarkus Armbruster
216fdee2025SMarkus Armbruster6. Connection / disconnection notification.  This is a peer ID.
217fdee2025SMarkus Armbruster
218fdee2025SMarkus Armbruster  - If the number comes with a file descriptor, it's a connection
219fdee2025SMarkus Armbruster    notification, exactly like in step 4.
220fdee2025SMarkus Armbruster
221fdee2025SMarkus Armbruster  - Else, it's a disconnection notification for the peer with that ID.
222fdee2025SMarkus Armbruster
223fdee2025SMarkus ArmbrusterKnown bugs:
224fdee2025SMarkus Armbruster
225fdee2025SMarkus Armbruster* The protocol changed incompatibly in QEMU 2.5.  Before, messages
226fdee2025SMarkus Armbruster  were native endian long, and there was no version number.
227fdee2025SMarkus Armbruster
228fdee2025SMarkus Armbruster* The protocol is poorly designed.
229fdee2025SMarkus Armbruster
230*bb1cff6eSPeter MaydellThe ivshmem Client-Client Protocol
231*bb1cff6eSPeter Maydell----------------------------------
232fdee2025SMarkus Armbruster
233fdee2025SMarkus ArmbrusterAn ivshmem device configured for interrupts receives eventfd file
234fdee2025SMarkus Armbrusterdescriptors for interrupting peers and getting interrupted by peers
235fdee2025SMarkus Armbrusterfrom the server, as explained in the previous section.
236fdee2025SMarkus Armbruster
237fdee2025SMarkus ArmbrusterTo interrupt a peer, the device writes the 8-byte integer 1 in native
238fdee2025SMarkus Armbrusterbyte order to the respective file descriptor.
239fdee2025SMarkus Armbruster
240fdee2025SMarkus ArmbrusterTo receive an interrupt, the device reads and discards as many 8-byte
241fdee2025SMarkus Armbrusterintegers as it can.
242