Lines Matching +full:qemu +full:- +full:kvm

1 Multi-process QEMU
6 This is the design document for multi-process QEMU. It does not
13 https://wiki.qemu.org/Features/MultiProcessQEMU
15 QEMU is often used as the hypervisor for virtual machines running in the
22 QEMU can be susceptible to security attacks because it is a large,
24 Many of these features can be configured out of QEMU, but even a reduced
25 configuration QEMU has a large amount of code a guest can potentially
26 attack. Separating QEMU reduces the attack surface by aiding to
30 QEMU services
31 -------------
33 QEMU can be broadly described as providing three main services. One is a
34 VM control point, where VMs can be created, migrated, re-configured, and
40 A multi-process QEMU
43 A multi-process QEMU involves separating QEMU services into separate
51 A QEMU control process would remain, but in multi-process mode, will
53 provide the user interface to hot-plug devices or live migrate the VM.
55 A first step in creating a multi-process QEMU is to separate IO services
56 from the main QEMU program, which would continue to provide CPU
62 ----------------------
65 begin for a couple of reasons. One is the sheer number of IO devices QEMU
68 Another is the modular nature of QEMU device emulation code provides
69 interface points where the QEMU functions that perform device emulation
70 can be separated from the QEMU functions that manage the emulation of
74 QEMU device emulation
77 QEMU uses an object oriented SW architecture for device emulation code.
78 Configured objects are all compiled into the QEMU binary, then objects
80 code to emulate a device named "foo" is always present in QEMU, but its
82 VM. (e.g., via the QEMU command line as *-device foo*)
85 parent object (such as "pci-device" for a PCI device) and QEMU will
94 a couple of existing QEMU features that can run emulation code
95 separately from the main QEMU process. These are examined below.
102 device drivers in the guest and vhost user device objects in QEMU, but
103 once the QEMU vhost user code has configured the vhost user application,
104 mission-mode IO is performed by the application. The vhost user
112 QEMU is to contact the vhost application and send it configuration
121 VMs are often run using HW virtualization features via the KVM kernel
122 driver. This driver allows QEMU to accelerate the emulation of guest CPU
125 execution returns to the KVM driver so it can inform QEMU to emulate the
128 One of the events that can cause a return to QEMU is when a guest device
129 driver accesses an IO location. QEMU then dispatches the memory
130 operation to the corresponding QEMU device object. In the case of a
132 socket to the vhost application. This path is accelerated by the QEMU
134 application can directly receive MMIO store notifications from the KVM
135 driver, instead of needing them to be sent to the QEMU process first.
141 directly inject interrupts into the VM via the KVM driver, again,
142 bypassing the need to send the interrupt back to the QEMU process first.
143 The QEMU virtio setup code configures the KVM driver with an eventfd
152 instead of needing to send the data as messages to QEMU. This is also
153 done with file descriptors sent to the vhost user application by QEMU.
159 address. This case is handled by having vhost code within QEMU register
162 QEMU on cache misses, and in turn receiving flush requests from QEMU
168 Much of the vhost model can be re-used by separated device emulation. In
169 particular, the ideas of using a socket between QEMU and the device
171 the VM via KVM, and allowing the application to ``mmap()`` the guest
179 break vhost store acceleration since they are synchronous - guest
185 support multiple QEMU instances. This is contrary to the security regime
188 #### qemu-io model
190 ``qemu-io`` is a test harness used to test changes to the QEMU block backend
192 emulation). ``qemu-io`` is not a device emulation application per se, but it
193 does compile the QEMU block objects into a separate binary from the main
194 QEMU one. This could be useful for disk device emulation, since its
195 emulation applications will need to include the QEMU block objects.
198 -------------------------------------------
200 A different model based on proxy objects in the QEMU program
209 The remote emulation process will run the QEMU object hierarchy without
211 QEMU code, because for anything but the simplest device, it would not be
212 a tractable to re-implement both the object model and the many device
213 backends that QEMU has.
215 The processes will communicate with the QEMU process over UNIX domain
217 or be executed by QEMU. In both cases, the host backends the emulation
219 be for QEMU. For example:
223 disk-proc -blockdev driver=file,node-name=file0,filename=disk-file0 \
224 -blockdev driver=qcow2,node-name=drive0,file=file0
226 would indicate process *disk-proc* uses a qcow2 emulated disk named
234 communication with QEMU
242 disk-proc <socket number> <backend list>
247 Remote emulation processes can be monitored via QMP, similar to QEMU
248 itself. The QMP monitor socket is specified the same as for a QEMU
253 disk-proc -qmp unix:/tmp/disk-mon,server
255 can be monitored over the UNIX socket path */tmp/disk-mon*.
257 QEMU command line
261 represented as a *-device* of type *pci-proxy-dev*. A socket
262 sub-option to this option specifies the Unix socket that connects
263 to the remote process. An *id* sub-option is required, and it should
268 qemu-system-x86_64 ... -device pci-proxy-dev,id=lsi0,socket=3
273 QEMU management of remote processes
276 QEMU is not aware of the type of type of the remote PCI device. It is
277 a pass through device as far as QEMU is concerned.
286 the remote process. It is also used to pass on device-agnostic commands
289 per-device channels
292 Each remote device communicates with QEMU using a dedicated communication
296 QEMU device proxy objects
299 QEMU has an object model based on sub-classes inherited from the
300 "object" super-class. The sub-classes that are of interest here are the
301 "device" and "bus" sub-classes whose child sub-classes make up the
302 device tree of a QEMU emulated system.
305 device emulation code within the QEMU process. These objects will live
308 sub-class of the "pci-device" class, and will have the same PCI bus
319 which any other QEMU device would be initialized.
322 - Parses the "socket" sub option and connects to the remote process
324 - Uses the "id" sub-option to connect to the emulated device on the
342 Other tasks will be device-specific. For example, PCI device objects
344 tree within the QEMU process.
350 or ports. The QEMU device emulation code uses QEMU's memory region
352 functions that QEMU will invoke when the guest accesses the device's
354 device, the VM will exit HW virtualization mode and return to QEMU,
359 own callbacks. When invoked by QEMU as a result of a guest IO operation,
368 read-only, but certain registers (especially BAR and MSI-related ones)
375 "pci-device-proxy" class that can serve as the parent of a PCI device
376 proxy object. This class's parent would be "pci-device" and it would
398 QEMU remote device operation
412 must be backed by shared file-backed memory, for example, using
413 *-object memory-backend-file,share=on* and setting that memory backend
420 QEMU will need to create a socket for IOMMU requests from the emulation
429 device hot-plug via QMP
434 *-device* command line option does. The remote process may either be one
435 started at QEMU startup, or be one added by the "add-process" QMP
456 The parts of QEMU that the emulation program will need include the
460 handle requests from the QEMU process, and route machine-level requests
461 (such as interrupts or IOMMU mappings) back to the QEMU process.
467 followed by QEMU. It will first initialize the backend objects, then
468 device emulation objects. The JSON descriptions sent by the QEMU process
471 - address spaces
478 - RAM
487 - PCI
490 QEMU process. For a PCI device, a PCI bus will need to be created with
501 those handlers with a PCI BAR, as they do within QEMU currently.
504 handle MMIO requests from QEMU, the PCI physical addresses must be the
505 same in the QEMU process and the device emulation process. In order to
506 accomplish that, guest BAR programming must also be forwarded from QEMU
517 - PCI pin interrupts
521 it. The IOAPIC object, in turn, calls the KVM driver to inject the
524 ``pci_bus_irqs()``) to send a interrupt request back to the QEMU
528 - PCI MSI/X interrupts
531 CPU-specific PCI address. In QEMU on x86, a KVM APIC object receives
532 these DMA writes, then calls into the KVM driver to inject the interrupt
534 the MSI DMA address from QEMU as a message at initialization, then
536 message back to QEMU.
550 When an IOMMU is in use in QEMU, DMA translation uses IOMMU memory
555 - IOTLB cache
561 to QEMU requesting the corresponding translation entry, which be both be
564 - IOTLB purge
566 The IOMMU emulation will also need to act on unmap requests from QEMU.
573 When a remote process receives a live migration indication from QEMU, it
577 the process's device state back to QEMU. This method will be reversed on
578 restore - the channel will be passed to ``qemu_loadvm_state()`` to
584 The messages that are required to be sent between QEMU and the emulation
587 emulation process to communicate directly with the kernel KVM driver.
588 The KVM file descriptors created would be passed to the emulation process
593 from KVM. The issue with the eventfd mechanism used by vhost user is
598 The expanded idea would require a new type of KVM device:
599 *KVM\_DEV\_TYPE\_USER*. This device has two file descriptors: a master
600 descriptor that QEMU can use for configuration, and a slave descriptor
601 that the emulation process can use to receive MMIO notifications. QEMU
602 would create both descriptors using the KVM driver, and pass the slave
608 - guest physical range
620 +--------+----------------------------+
624 +--------+----------------------------+
626 +--------+----------------------------+
628 +--------+----------------------------+
630 +--------+----------------------------+
632 - MMIO request structure
640 +----------+------------------------+
644 +----------+------------------------+
646 +----------+------------------------+
648 +----------+------------------------+
650 +----------+------------------------+
652 +----------+------------------------+
654 +----------+------------------------+
656 - MMIO request queues
664 - scoreboard
666 Each CPU in the VM is emulated in QEMU by a separate thread, so multiple
669 wait queue and sequence number for the per-CPU threads, allowing them to
676 - device shadow memory
678 Some MMIO loads do not have device side-effects. These MMIOs can be
681 with the KVM driver.
683 The emulation program will ask the KVM driver to allocate memory for the
685 emulation program can control KVM access to the shadow image by sending
686 KVM an access map telling it which areas of the image have no
687 side-effects (and can be completed immediately), and which require a
689 the KVM drive which size accesses are allowed to the image.
694 The master descriptor is used by QEMU to configure the new KVM device.
695 The descriptor would be returned by the KVM driver when QEMU issues a
696 *KVM\_CREATE\_DEVICE* ``ioctl()`` with a *KVM\_DEV\_TYPE\_USER* type.
698 KVM\_DEV\_TYPE\_USER device ops
701 The *KVM\_DEV\_TYPE\_USER* operations vector will be registered by a
702 ``kvm_register_device_ops()`` call when the KVM system in initialized by
703 ``kvm_init()``. These device ops are called by the KVM driver when QEMU
704 executes certain ``ioctl()`` operations on its KVM file descriptor. They
707 - create
709 This routine is called when QEMU issues a *KVM\_CREATE\_DEVICE*
710 ``ioctl()`` on its per-VM file descriptor. It will allocate and
711 initialize a KVM user device specific data structure, and assign the
712 *kvm\_device* private field to it.
714 - ioctl
716 This routine is invoked when QEMU issues an ``ioctl()`` on the master
717 descriptor. The ``ioctl()`` commands supported are defined by the KVM
718 device type. *KVM\_DEV\_TYPE\_USER* ones will need several commands:
720 *KVM\_DEV\_USER\_SLAVE\_FD* creates the slave file descriptor that will
725 The *KVM\_DEV\_USER\_PA\_RANGE* command configures a guest physical
732 *KVM\_DEV\_USER\_PA\_RANGE* will use ``kvm_io_bus_register_dev()`` to
733 register *kvm\_io\_device\_ops* callbacks to be invoked when the guest
738 *KVM\_DEV\_USER\_TIMEOUT* will configure a timeout value that specifies
739 how long KVM will wait for the emulation process to respond to a MMIO
742 - destroy
746 driver, as well as the *kvm\_device* structure itself.
755 - read
757 A read returns any pending MMIO requests from the KVM driver as MMIO
764 - write
769 removed, then the number of posted stores in the per-CPU scoreboard is
770 decremented. When the number is zero, and a non side-effect load was
773 - ioctl
778 A *KVM\_DEV\_USER\_SHADOW\_SIZE* ``ioctl()`` causes the KVM driver to
781 device memory with the KVM driver.
783 A *KVM\_DEV\_USER\_SHADOW\_CTRL* ``ioctl()`` controls access to the
784 shadow image. It will send the KVM driver a shadow control map, which
789 - poll
795 - mmap
798 image allocated by the KVM driver. As device emulation updates device
799 memory, changes with no side-effects will be reflected in the shadow,
800 and the KVM driver can satisfy guest loads from the shadow image without
803 kvm\_io\_device ops
806 Each KVM per-CPU thread can handle MMIO operation on behalf of the guest
807 VM. KVM will use the MMIO's guest physical address to search for a
808 matching *kvm\_io\_device* to see if the MMIO can be handled by the KVM
809 driver instead of exiting back to QEMU. If a match is found, the
812 - read
815 Loads with side-effects must be handled synchronously, with the KVM
816 driver putting the QEMU thread to sleep waiting for the emulation
817 process reply before re-starting the guest. Loads that do not have
818 side-effects may be optimized by satisfying them from the shadow image,
823 - write
826 queue is full. In this case, the QEMU thread must sleep waiting for
828 the per-CPU scoreboard, in order to implement the PCI ordering
835 application does, where the QEMU process sets up *eventfds* that cause
836 the device's corresponding interrupt to be triggered by the KVM driver.
845 irq file descriptor, a re-sampling file descriptor needs to be sent to
848 acknowledged by the guest, so they can re-trigger the interrupt if their
849 device has not de-asserted its interrupt.
855 ``using event_notifier_init()`` to create the irq and re-sampling
864 pin is connected to. The proxy object in QEMU will use
867 interrupt logic to change the route: de-assigning the existing irq
890 The guest may dynamically update several MSI-related tables in the
891 device's PCI config space. These include per-MSI interrupt enables and
895 consistent between QEMU and the emulation program.
897 --------------
900 ---------------------------
904 QEMU control function. There are no object separation points for this
908 --------------------
910 Separating QEMU relies on the host OS's access restriction mechanisms to
920 QEMU separation, since it only provides three separate access controls:
940 type can perform on a file with a given type. QEMU separation could take
942 different types, both from the main QEMU process, and from the emulation
946 types separate from the main QEMU process and non-disk emulation
949 emulation processes can have a type separate from the main QEMU process
950 and non-network emulation process, and only that type can access the