1f3a0208fSAlexander Bulekov======== 2f3a0208fSAlexander BulekovFuzzing 3f3a0208fSAlexander Bulekov======== 4f3a0208fSAlexander Bulekov 5f3a0208fSAlexander BulekovThis document describes the virtual-device fuzzing infrastructure in QEMU and 6f3a0208fSAlexander Bulekovhow to use it to implement additional fuzzers. 7f3a0208fSAlexander Bulekov 8f3a0208fSAlexander BulekovBasics 9f3a0208fSAlexander Bulekov------ 10f3a0208fSAlexander Bulekov 11f3a0208fSAlexander BulekovFuzzing operates by passing inputs to an entry point/target function. The 12f3a0208fSAlexander Bulekovfuzzer tracks the code coverage triggered by the input. Based on these 13f3a0208fSAlexander Bulekovfindings, the fuzzer mutates the input and repeats the fuzzing. 14f3a0208fSAlexander Bulekov 15f3a0208fSAlexander BulekovTo fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer 16f3a0208fSAlexander Bulekovis an *in-process* fuzzer. For the developer, this means that it is their 17f3a0208fSAlexander Bulekovresponsibility to ensure that state is reset between fuzzing-runs. 18f3a0208fSAlexander Bulekov 19f3a0208fSAlexander BulekovBuilding the fuzzers 20f3a0208fSAlexander Bulekov-------------------- 21f3a0208fSAlexander Bulekov 22f3a0208fSAlexander Bulekov*NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is 23f3a0208fSAlexander Bulekovmuch faster, since the page-map has a smaller size. This is due to the fact that 24f3a0208fSAlexander BulekovAddressSanitizer maps ~20TB of memory, as part of its detection. This results 25f3a0208fSAlexander Bulekovin a large page-map, and a much slower ``fork()``. 26f3a0208fSAlexander Bulekov 27f3a0208fSAlexander BulekovTo build the fuzzers, install a recent version of clang: 28f3a0208fSAlexander BulekovConfigure with (substitute the clang binaries with the version you installed). 29f3a0208fSAlexander BulekovHere, enable-sanitizers, is optional but it allows us to reliably detect bugs 30f3a0208fSAlexander Bulekovsuch as out-of-bounds accesses, use-after-frees, double-frees etc.:: 31f3a0208fSAlexander Bulekov 32f3a0208fSAlexander Bulekov CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \ 33f3a0208fSAlexander Bulekov --enable-sanitizers 34f3a0208fSAlexander Bulekov 35f3a0208fSAlexander BulekovFuzz targets are built similarly to system targets:: 36f3a0208fSAlexander Bulekov 37*e6a3e132SAlexander Bulekov make qemu-fuzz-i386 38f3a0208fSAlexander Bulekov 39*e6a3e132SAlexander BulekovThis builds ``./qemu-fuzz-i386`` 40f3a0208fSAlexander Bulekov 41f3a0208fSAlexander BulekovThe first option to this command is: ``--fuzz-target=FUZZ_NAME`` 42f3a0208fSAlexander BulekovTo list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments. 43f3a0208fSAlexander Bulekov 44f3a0208fSAlexander BulekovFor example:: 45f3a0208fSAlexander Bulekov 46*e6a3e132SAlexander Bulekov ./qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz 47f3a0208fSAlexander Bulekov 48f3a0208fSAlexander BulekovInternally, libfuzzer parses all arguments that do not begin with ``"--"``. 49f3a0208fSAlexander BulekovInformation about these is available by passing ``-help=1`` 50f3a0208fSAlexander Bulekov 51f3a0208fSAlexander BulekovNow the only thing left to do is wait for the fuzzer to trigger potential 52f3a0208fSAlexander Bulekovcrashes. 53f3a0208fSAlexander Bulekov 54f3a0208fSAlexander BulekovUseful libFuzzer flags 55f3a0208fSAlexander Bulekov---------------------- 56f3a0208fSAlexander Bulekov 57f3a0208fSAlexander BulekovAs mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will 58f3a0208fSAlexander Bulekovlist the available arguments. In particular, these arguments might be helpful: 59f3a0208fSAlexander Bulekov 60f3a0208fSAlexander Bulekov* ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer. 61f3a0208fSAlexander Bulekov libFuzzer stores each "interesting" input in this corpus directory. The next 62f3a0208fSAlexander Bulekov time you run libFuzzer, it will read all of the inputs from the corpus, and 63f3a0208fSAlexander Bulekov continue fuzzing from there. You can also specify multiple directories. 64f3a0208fSAlexander Bulekov libFuzzer loads existing inputs from all specified directories, but will only 65f3a0208fSAlexander Bulekov write new ones to the first one specified. 66f3a0208fSAlexander Bulekov 67f3a0208fSAlexander Bulekov* ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer 68f3a0208fSAlexander Bulekov will generate. 69f3a0208fSAlexander Bulekov 70f3a0208fSAlexander Bulekov* ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that 71f3a0208fSAlexander Bulekov trigger many debug/error messages, or create output on the serial console. 72f3a0208fSAlexander Bulekov 73f3a0208fSAlexander Bulekov* ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 fuzzers in 74f3a0208fSAlexander Bulekov parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only 75f3a0208fSAlexander Bulekov ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or equal 76f3a0208fSAlexander Bulekov to half the available CPU cores. Replace 4 with a number appropriate for your 77f3a0208fSAlexander Bulekov machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel 78f3a0208fSAlexander Bulekov fuzzers to share information about the interesting inputs they find. 79f3a0208fSAlexander Bulekov 80f3a0208fSAlexander Bulekov* ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes 81f3a0208fSAlexander Bulekov ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the 82f3a0208fSAlexander Bulekov coverage table. Useful for targets with "magic" constants. If Arg1 came from 83f3a0208fSAlexander Bulekov the fuzzer's input and Arg2 is a magic constant, then each time the Hamming 84f3a0208fSAlexander Bulekov distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the 85f3a0208fSAlexander Bulekov corpus. 86f3a0208fSAlexander Bulekov 87f3a0208fSAlexander Bulekov* ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to 88f3a0208fSAlexander Bulekov better coverage performance, depending on the target. 89f3a0208fSAlexander Bulekov 90f3a0208fSAlexander BulekovNote that libFuzzer's exact behavior will depend on the version of 91f3a0208fSAlexander Bulekovclang and libFuzzer used to build the device fuzzers. 92f3a0208fSAlexander Bulekov 93f3a0208fSAlexander BulekovGenerating Coverage Reports 94f3a0208fSAlexander Bulekov--------------------------- 95f3a0208fSAlexander Bulekov 96f3a0208fSAlexander BulekovCode coverage is a crucial metric for evaluating a fuzzer's performance. 97f3a0208fSAlexander BulekovlibFuzzer's output provides a "cov: " column that provides a total number of 98f3a0208fSAlexander Bulekovunique blocks/edges covered. To examine coverage on a line-by-line basis we 99f3a0208fSAlexander Bulekovcan use Clang coverage: 100f3a0208fSAlexander Bulekov 101f3a0208fSAlexander Bulekov 1. Configure libFuzzer to store a corpus of all interesting inputs (see 102f3a0208fSAlexander Bulekov CORPUS_DIR above) 103f3a0208fSAlexander Bulekov 2. ``./configure`` the QEMU build with :: 104f3a0208fSAlexander Bulekov 105f3a0208fSAlexander Bulekov --enable-fuzzing \ 106f3a0208fSAlexander Bulekov --extra-cflags="-fprofile-instr-generate -fcoverage-mapping" 107f3a0208fSAlexander Bulekov 108f3a0208fSAlexander Bulekov 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer 109f3a0208fSAlexander Bulekov to execute all of the inputs in $CORPUS_DIR and exit. Once the process 110f3a0208fSAlexander Bulekov exits, you should find a file, "default.profraw" in the working directory. 111f3a0208fSAlexander Bulekov 4. Execute these commands to generate a detailed HTML coverage-report:: 112f3a0208fSAlexander Bulekov 113f3a0208fSAlexander Bulekov llvm-profdata merge -output=default.profdata default.profraw 114f3a0208fSAlexander Bulekov llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \ 115f3a0208fSAlexander Bulekov --format html -output-dir=/path/to/output/report 116f3a0208fSAlexander Bulekov 117f3a0208fSAlexander BulekovAdding a new fuzzer 118f3a0208fSAlexander Bulekov------------------- 119f3a0208fSAlexander Bulekov 120f3a0208fSAlexander BulekovCoverage over virtual devices can be improved by adding additional fuzzers. 121f3a0208fSAlexander BulekovFuzzers are kept in ``tests/qtest/fuzz/`` and should be added to 122f3a0208fSAlexander Bulekov``tests/qtest/fuzz/Makefile.include`` 123f3a0208fSAlexander Bulekov 124f3a0208fSAlexander BulekovFuzzers can rely on both qtest and libqos to communicate with virtual devices. 125f3a0208fSAlexander Bulekov 126f3a0208fSAlexander Bulekov1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``. 127f3a0208fSAlexander Bulekov 128f3a0208fSAlexander Bulekov2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers 129f3a0208fSAlexander Bulekov for reference. 130f3a0208fSAlexander Bulekov 131f3a0208fSAlexander Bulekov3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the 132f3a0208fSAlexander Bulekov corresponding object to fuzz-obj-y 133f3a0208fSAlexander Bulekov 134f3a0208fSAlexander BulekovFuzzers can be more-or-less thought of as special qtest programs which can 135f3a0208fSAlexander Bulekovmodify the qtest commands and/or qtest command arguments based on inputs 136f3a0208fSAlexander Bulekovprovided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the 137f3a0208fSAlexander Bulekovfuzzer loops over the byte-array interpreting it as a list of qtest commands, 138f3a0208fSAlexander Bulekovaddresses, or values. 139f3a0208fSAlexander Bulekov 140f3a0208fSAlexander BulekovThe Generic Fuzzer 141f3a0208fSAlexander Bulekov------------------ 142f3a0208fSAlexander Bulekov 143f3a0208fSAlexander BulekovWriting a fuzz target can be a lot of effort (especially if a device driver has 144f3a0208fSAlexander Bulekovnot be built-out within libqos). Many devices can be fuzzed to some degree, 145f3a0208fSAlexander Bulekovwithout any device-specific code, using the generic-fuzz target. 146f3a0208fSAlexander Bulekov 147f3a0208fSAlexander BulekovThe generic-fuzz target is capable of fuzzing devices over their PIO, MMIO, 148f3a0208fSAlexander Bulekovand DMA input-spaces. To apply the generic-fuzz to a device, we need to define 149f3a0208fSAlexander Bulekovtwo env-variables, at minimum: 150f3a0208fSAlexander Bulekov 151f3a0208fSAlexander Bulekov* ``QEMU_FUZZ_ARGS=`` is the set of QEMU arguments used to configure a machine, with 152f3a0208fSAlexander Bulekov the device attached. For example, if we want to fuzz the virtio-net device 153f3a0208fSAlexander Bulekov attached to a pc-i440fx machine, we can specify:: 154f3a0208fSAlexander Bulekov 155f3a0208fSAlexander Bulekov QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \ 156f3a0208fSAlexander Bulekov -device virtio-net,netdev=user0" 157f3a0208fSAlexander Bulekov 158f3a0208fSAlexander Bulekov* ``QEMU_FUZZ_OBJECTS=`` is a set of space-delimited strings used to identify 159f3a0208fSAlexander Bulekov the MemoryRegions that will be fuzzed. These strings are compared against 160f3a0208fSAlexander Bulekov MemoryRegion names and MemoryRegion owner names, to decide whether each 161f3a0208fSAlexander Bulekov MemoryRegion should be fuzzed. These strings support globbing. For the 162f3a0208fSAlexander Bulekov virtio-net example, we could use one of :: 163f3a0208fSAlexander Bulekov 164f3a0208fSAlexander Bulekov QEMU_FUZZ_OBJECTS='virtio-net' 165f3a0208fSAlexander Bulekov QEMU_FUZZ_OBJECTS='virtio*' 166f3a0208fSAlexander Bulekov QEMU_FUZZ_OBJECTS='virtio* pcspk' # Fuzz the virtio devices and the speaker 167f3a0208fSAlexander Bulekov QEMU_FUZZ_OBJECTS='*' # Fuzz the whole machine`` 168f3a0208fSAlexander Bulekov 169f3a0208fSAlexander BulekovThe ``"info mtree"`` and ``"info qom-tree"`` monitor commands can be especially 170f3a0208fSAlexander Bulekovuseful for identifying the ``MemoryRegion`` and ``Object`` names used for 171f3a0208fSAlexander Bulekovmatching. 172f3a0208fSAlexander Bulekov 173f3a0208fSAlexander BulekovAs a generic rule-of-thumb, the more ``MemoryRegions``/Devices we match, the 174f3a0208fSAlexander Bulekovgreater the input-space, and the smaller the probability of finding crashing 175f3a0208fSAlexander Bulekovinputs for individual devices. As such, it is usually a good idea to limit the 176f3a0208fSAlexander Bulekovfuzzer to only a few ``MemoryRegions``. 177f3a0208fSAlexander Bulekov 178f3a0208fSAlexander BulekovTo ensure that these env variables have been configured correctly, we can use:: 179f3a0208fSAlexander Bulekov 180f3a0208fSAlexander Bulekov ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0 181f3a0208fSAlexander Bulekov 182f3a0208fSAlexander BulekovThe output should contain a complete list of matched MemoryRegions. 183f3a0208fSAlexander Bulekov 184f3a0208fSAlexander BulekovImplementation Details / Fuzzer Lifecycle 185f3a0208fSAlexander Bulekov----------------------------------------- 186f3a0208fSAlexander Bulekov 187f3a0208fSAlexander BulekovThe fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's 188f3a0208fSAlexander Bulekovown ``main()``, which performs some setup, and calls the entrypoints: 189f3a0208fSAlexander Bulekov 190f3a0208fSAlexander Bulekov``LLVMFuzzerInitialize``: called prior to fuzzing. Used to initialize all of the 191f3a0208fSAlexander Bulekovnecessary state 192f3a0208fSAlexander Bulekov 193f3a0208fSAlexander Bulekov``LLVMFuzzerTestOneInput``: called for each fuzzing run. Processes the input and 194f3a0208fSAlexander Bulekovresets the state at the end of each run. 195f3a0208fSAlexander Bulekov 196f3a0208fSAlexander BulekovIn more detail: 197f3a0208fSAlexander Bulekov 198f3a0208fSAlexander Bulekov``LLVMFuzzerInitialize`` parses the arguments to the fuzzer (must start with two 199f3a0208fSAlexander Bulekovdashes, so they are ignored by libfuzzer ``main()``). Currently, the arguments 200f3a0208fSAlexander Bulekovselect the fuzz target. Then, the qtest client is initialized. If the target 201f3a0208fSAlexander Bulekovrequires qos, qgraph is set up and the QOM/LIBQOS modules are initialized. 202f3a0208fSAlexander BulekovThen the QGraph is walked and the QEMU cmd_line is determined and saved. 203f3a0208fSAlexander Bulekov 204f3a0208fSAlexander BulekovAfter this, the ``vl.c:qemu_main`` is called to set up the guest. There are 205f3a0208fSAlexander Bulekovtarget-specific hooks that can be called before and after qemu_main, for 206f3a0208fSAlexander Bulekovadditional setup(e.g. PCI setup, or VM snapshotting). 207f3a0208fSAlexander Bulekov 208f3a0208fSAlexander Bulekov``LLVMFuzzerTestOneInput``: Uses qtest/qos functions to act based on the fuzz 209f3a0208fSAlexander Bulekovinput. It is also responsible for manually calling ``main_loop_wait`` to ensure 210f3a0208fSAlexander Bulekovthat bottom halves are executed and any cleanup required before the next input. 211f3a0208fSAlexander Bulekov 212f3a0208fSAlexander BulekovSince the same process is reused for many fuzzing runs, QEMU state needs to 213f3a0208fSAlexander Bulekovbe reset at the end of each run. There are currently two implemented 214f3a0208fSAlexander Bulekovoptions for resetting state: 215f3a0208fSAlexander Bulekov 216f3a0208fSAlexander Bulekov- Reboot the guest between runs. 217f3a0208fSAlexander Bulekov - *Pros*: Straightforward and fast for simple fuzz targets. 218f3a0208fSAlexander Bulekov 219f3a0208fSAlexander Bulekov - *Cons*: Depending on the device, does not reset all device state. If the 220f3a0208fSAlexander Bulekov device requires some initialization prior to being ready for fuzzing (common 221f3a0208fSAlexander Bulekov for QOS-based targets), this initialization needs to be done after each 222f3a0208fSAlexander Bulekov reboot. 223f3a0208fSAlexander Bulekov 224f3a0208fSAlexander Bulekov - *Example target*: ``i440fx-qtest-reboot-fuzz`` 225f3a0208fSAlexander Bulekov 226f3a0208fSAlexander Bulekov- Run each test case in a separate forked process and copy the coverage 227f3a0208fSAlexander Bulekov information back to the parent. This is fairly similar to AFL's "deferred" 228f3a0208fSAlexander Bulekov fork-server mode [3] 229f3a0208fSAlexander Bulekov 230f3a0208fSAlexander Bulekov - *Pros*: Relatively fast. Devices only need to be initialized once. No need to 231f3a0208fSAlexander Bulekov do slow reboots or vmloads. 232f3a0208fSAlexander Bulekov 233f3a0208fSAlexander Bulekov - *Cons*: Not officially supported by libfuzzer. Does not work well for 234f3a0208fSAlexander Bulekov devices that rely on dedicated threads. 235f3a0208fSAlexander Bulekov 236f3a0208fSAlexander Bulekov - *Example target*: ``virtio-net-fork-fuzz`` 237