xref: /qemu/docs/devel/testing/fuzzing.rst (revision e6a3e1322ba9e05a7919d9cd10d05c8c23fa8698)
1f3a0208fSAlexander Bulekov========
2f3a0208fSAlexander BulekovFuzzing
3f3a0208fSAlexander Bulekov========
4f3a0208fSAlexander Bulekov
5f3a0208fSAlexander BulekovThis document describes the virtual-device fuzzing infrastructure in QEMU and
6f3a0208fSAlexander Bulekovhow to use it to implement additional fuzzers.
7f3a0208fSAlexander Bulekov
8f3a0208fSAlexander BulekovBasics
9f3a0208fSAlexander Bulekov------
10f3a0208fSAlexander Bulekov
11f3a0208fSAlexander BulekovFuzzing operates by passing inputs to an entry point/target function. The
12f3a0208fSAlexander Bulekovfuzzer tracks the code coverage triggered by the input. Based on these
13f3a0208fSAlexander Bulekovfindings, the fuzzer mutates the input and repeats the fuzzing.
14f3a0208fSAlexander Bulekov
15f3a0208fSAlexander BulekovTo fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
16f3a0208fSAlexander Bulekovis an *in-process* fuzzer. For the developer, this means that it is their
17f3a0208fSAlexander Bulekovresponsibility to ensure that state is reset between fuzzing-runs.
18f3a0208fSAlexander Bulekov
19f3a0208fSAlexander BulekovBuilding the fuzzers
20f3a0208fSAlexander Bulekov--------------------
21f3a0208fSAlexander Bulekov
22f3a0208fSAlexander Bulekov*NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
23f3a0208fSAlexander Bulekovmuch faster, since the page-map has a smaller size. This is due to the fact that
24f3a0208fSAlexander BulekovAddressSanitizer maps ~20TB of memory, as part of its detection. This results
25f3a0208fSAlexander Bulekovin a large page-map, and a much slower ``fork()``.
26f3a0208fSAlexander Bulekov
27f3a0208fSAlexander BulekovTo build the fuzzers, install a recent version of clang:
28f3a0208fSAlexander BulekovConfigure with (substitute the clang binaries with the version you installed).
29f3a0208fSAlexander BulekovHere, enable-sanitizers, is optional but it allows us to reliably detect bugs
30f3a0208fSAlexander Bulekovsuch as out-of-bounds accesses, use-after-frees, double-frees etc.::
31f3a0208fSAlexander Bulekov
32f3a0208fSAlexander Bulekov    CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
33f3a0208fSAlexander Bulekov                                                --enable-sanitizers
34f3a0208fSAlexander Bulekov
35f3a0208fSAlexander BulekovFuzz targets are built similarly to system targets::
36f3a0208fSAlexander Bulekov
37*e6a3e132SAlexander Bulekov    make qemu-fuzz-i386
38f3a0208fSAlexander Bulekov
39*e6a3e132SAlexander BulekovThis builds ``./qemu-fuzz-i386``
40f3a0208fSAlexander Bulekov
41f3a0208fSAlexander BulekovThe first option to this command is: ``--fuzz-target=FUZZ_NAME``
42f3a0208fSAlexander BulekovTo list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments.
43f3a0208fSAlexander Bulekov
44f3a0208fSAlexander BulekovFor example::
45f3a0208fSAlexander Bulekov
46*e6a3e132SAlexander Bulekov    ./qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
47f3a0208fSAlexander Bulekov
48f3a0208fSAlexander BulekovInternally, libfuzzer parses all arguments that do not begin with ``"--"``.
49f3a0208fSAlexander BulekovInformation about these is available by passing ``-help=1``
50f3a0208fSAlexander Bulekov
51f3a0208fSAlexander BulekovNow the only thing left to do is wait for the fuzzer to trigger potential
52f3a0208fSAlexander Bulekovcrashes.
53f3a0208fSAlexander Bulekov
54f3a0208fSAlexander BulekovUseful libFuzzer flags
55f3a0208fSAlexander Bulekov----------------------
56f3a0208fSAlexander Bulekov
57f3a0208fSAlexander BulekovAs mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will
58f3a0208fSAlexander Bulekovlist the available arguments. In particular, these arguments might be helpful:
59f3a0208fSAlexander Bulekov
60f3a0208fSAlexander Bulekov* ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer.
61f3a0208fSAlexander Bulekov  libFuzzer stores each "interesting" input in this corpus directory. The next
62f3a0208fSAlexander Bulekov  time you run libFuzzer, it will read all of the inputs from the corpus, and
63f3a0208fSAlexander Bulekov  continue fuzzing from there. You can also specify multiple directories.
64f3a0208fSAlexander Bulekov  libFuzzer loads existing inputs from all specified directories, but will only
65f3a0208fSAlexander Bulekov  write new ones to the first one specified.
66f3a0208fSAlexander Bulekov
67f3a0208fSAlexander Bulekov* ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer
68f3a0208fSAlexander Bulekov  will generate.
69f3a0208fSAlexander Bulekov
70f3a0208fSAlexander Bulekov* ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that
71f3a0208fSAlexander Bulekov  trigger many debug/error messages, or create output on the serial console.
72f3a0208fSAlexander Bulekov
73f3a0208fSAlexander Bulekov* ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 fuzzers in
74f3a0208fSAlexander Bulekov  parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
75f3a0208fSAlexander Bulekov  ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or equal
76f3a0208fSAlexander Bulekov  to half the available CPU cores. Replace 4 with a number appropriate for your
77f3a0208fSAlexander Bulekov  machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel
78f3a0208fSAlexander Bulekov  fuzzers to share information about the interesting inputs they find.
79f3a0208fSAlexander Bulekov
80f3a0208fSAlexander Bulekov* ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes
81f3a0208fSAlexander Bulekov  ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the
82f3a0208fSAlexander Bulekov  coverage table. Useful for targets with "magic" constants. If Arg1 came from
83f3a0208fSAlexander Bulekov  the fuzzer's input and Arg2 is a magic constant, then each time the Hamming
84f3a0208fSAlexander Bulekov  distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the
85f3a0208fSAlexander Bulekov  corpus.
86f3a0208fSAlexander Bulekov
87f3a0208fSAlexander Bulekov* ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to
88f3a0208fSAlexander Bulekov  better coverage performance, depending on the target.
89f3a0208fSAlexander Bulekov
90f3a0208fSAlexander BulekovNote that libFuzzer's exact behavior will depend on the version of
91f3a0208fSAlexander Bulekovclang and libFuzzer used to build the device fuzzers.
92f3a0208fSAlexander Bulekov
93f3a0208fSAlexander BulekovGenerating Coverage Reports
94f3a0208fSAlexander Bulekov---------------------------
95f3a0208fSAlexander Bulekov
96f3a0208fSAlexander BulekovCode coverage is a crucial metric for evaluating a fuzzer's performance.
97f3a0208fSAlexander BulekovlibFuzzer's output provides a "cov: " column that provides a total number of
98f3a0208fSAlexander Bulekovunique blocks/edges covered. To examine coverage on a line-by-line basis we
99f3a0208fSAlexander Bulekovcan use Clang coverage:
100f3a0208fSAlexander Bulekov
101f3a0208fSAlexander Bulekov 1. Configure libFuzzer to store a corpus of all interesting inputs (see
102f3a0208fSAlexander Bulekov    CORPUS_DIR above)
103f3a0208fSAlexander Bulekov 2. ``./configure`` the QEMU build with ::
104f3a0208fSAlexander Bulekov
105f3a0208fSAlexander Bulekov    --enable-fuzzing \
106f3a0208fSAlexander Bulekov    --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
107f3a0208fSAlexander Bulekov
108f3a0208fSAlexander Bulekov 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
109f3a0208fSAlexander Bulekov    to execute all of the inputs in $CORPUS_DIR and exit. Once the process
110f3a0208fSAlexander Bulekov    exits, you should find a file, "default.profraw" in the working directory.
111f3a0208fSAlexander Bulekov 4. Execute these commands to generate a detailed HTML coverage-report::
112f3a0208fSAlexander Bulekov
113f3a0208fSAlexander Bulekov      llvm-profdata merge -output=default.profdata default.profraw
114f3a0208fSAlexander Bulekov      llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
115f3a0208fSAlexander Bulekov      --format html -output-dir=/path/to/output/report
116f3a0208fSAlexander Bulekov
117f3a0208fSAlexander BulekovAdding a new fuzzer
118f3a0208fSAlexander Bulekov-------------------
119f3a0208fSAlexander Bulekov
120f3a0208fSAlexander BulekovCoverage over virtual devices can be improved by adding additional fuzzers.
121f3a0208fSAlexander BulekovFuzzers are kept in ``tests/qtest/fuzz/`` and should be added to
122f3a0208fSAlexander Bulekov``tests/qtest/fuzz/Makefile.include``
123f3a0208fSAlexander Bulekov
124f3a0208fSAlexander BulekovFuzzers can rely on both qtest and libqos to communicate with virtual devices.
125f3a0208fSAlexander Bulekov
126f3a0208fSAlexander Bulekov1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
127f3a0208fSAlexander Bulekov
128f3a0208fSAlexander Bulekov2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
129f3a0208fSAlexander Bulekov   for reference.
130f3a0208fSAlexander Bulekov
131f3a0208fSAlexander Bulekov3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the
132f3a0208fSAlexander Bulekov   corresponding object to fuzz-obj-y
133f3a0208fSAlexander Bulekov
134f3a0208fSAlexander BulekovFuzzers can be more-or-less thought of as special qtest programs which can
135f3a0208fSAlexander Bulekovmodify the qtest commands and/or qtest command arguments based on inputs
136f3a0208fSAlexander Bulekovprovided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
137f3a0208fSAlexander Bulekovfuzzer loops over the byte-array interpreting it as a list of qtest commands,
138f3a0208fSAlexander Bulekovaddresses, or values.
139f3a0208fSAlexander Bulekov
140f3a0208fSAlexander BulekovThe Generic Fuzzer
141f3a0208fSAlexander Bulekov------------------
142f3a0208fSAlexander Bulekov
143f3a0208fSAlexander BulekovWriting a fuzz target can be a lot of effort (especially if a device driver has
144f3a0208fSAlexander Bulekovnot be built-out within libqos). Many devices can be fuzzed to some degree,
145f3a0208fSAlexander Bulekovwithout any device-specific code, using the generic-fuzz target.
146f3a0208fSAlexander Bulekov
147f3a0208fSAlexander BulekovThe generic-fuzz target is capable of fuzzing devices over their PIO, MMIO,
148f3a0208fSAlexander Bulekovand DMA input-spaces. To apply the generic-fuzz to a device, we need to define
149f3a0208fSAlexander Bulekovtwo env-variables, at minimum:
150f3a0208fSAlexander Bulekov
151f3a0208fSAlexander Bulekov* ``QEMU_FUZZ_ARGS=`` is the set of QEMU arguments used to configure a machine, with
152f3a0208fSAlexander Bulekov  the device attached. For example, if we want to fuzz the virtio-net device
153f3a0208fSAlexander Bulekov  attached to a pc-i440fx machine, we can specify::
154f3a0208fSAlexander Bulekov
155f3a0208fSAlexander Bulekov    QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \
156f3a0208fSAlexander Bulekov    -device virtio-net,netdev=user0"
157f3a0208fSAlexander Bulekov
158f3a0208fSAlexander Bulekov* ``QEMU_FUZZ_OBJECTS=`` is a set of space-delimited strings used to identify
159f3a0208fSAlexander Bulekov  the MemoryRegions that will be fuzzed. These strings are compared against
160f3a0208fSAlexander Bulekov  MemoryRegion names and MemoryRegion owner names, to decide whether each
161f3a0208fSAlexander Bulekov  MemoryRegion should be fuzzed. These strings support globbing. For the
162f3a0208fSAlexander Bulekov  virtio-net example, we could use one of ::
163f3a0208fSAlexander Bulekov
164f3a0208fSAlexander Bulekov    QEMU_FUZZ_OBJECTS='virtio-net'
165f3a0208fSAlexander Bulekov    QEMU_FUZZ_OBJECTS='virtio*'
166f3a0208fSAlexander Bulekov    QEMU_FUZZ_OBJECTS='virtio* pcspk' # Fuzz the virtio devices and the speaker
167f3a0208fSAlexander Bulekov    QEMU_FUZZ_OBJECTS='*' # Fuzz the whole machine``
168f3a0208fSAlexander Bulekov
169f3a0208fSAlexander BulekovThe ``"info mtree"`` and ``"info qom-tree"`` monitor commands can be especially
170f3a0208fSAlexander Bulekovuseful for identifying the ``MemoryRegion`` and ``Object`` names used for
171f3a0208fSAlexander Bulekovmatching.
172f3a0208fSAlexander Bulekov
173f3a0208fSAlexander BulekovAs a generic rule-of-thumb, the more ``MemoryRegions``/Devices we match, the
174f3a0208fSAlexander Bulekovgreater the input-space, and the smaller the probability of finding crashing
175f3a0208fSAlexander Bulekovinputs for individual devices. As such, it is usually a good idea to limit the
176f3a0208fSAlexander Bulekovfuzzer to only a few ``MemoryRegions``.
177f3a0208fSAlexander Bulekov
178f3a0208fSAlexander BulekovTo ensure that these env variables have been configured correctly, we can use::
179f3a0208fSAlexander Bulekov
180f3a0208fSAlexander Bulekov    ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0
181f3a0208fSAlexander Bulekov
182f3a0208fSAlexander BulekovThe output should contain a complete list of matched MemoryRegions.
183f3a0208fSAlexander Bulekov
184f3a0208fSAlexander BulekovImplementation Details / Fuzzer Lifecycle
185f3a0208fSAlexander Bulekov-----------------------------------------
186f3a0208fSAlexander Bulekov
187f3a0208fSAlexander BulekovThe fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
188f3a0208fSAlexander Bulekovown ``main()``, which performs some setup, and calls the entrypoints:
189f3a0208fSAlexander Bulekov
190f3a0208fSAlexander Bulekov``LLVMFuzzerInitialize``: called prior to fuzzing. Used to initialize all of the
191f3a0208fSAlexander Bulekovnecessary state
192f3a0208fSAlexander Bulekov
193f3a0208fSAlexander Bulekov``LLVMFuzzerTestOneInput``: called for each fuzzing run. Processes the input and
194f3a0208fSAlexander Bulekovresets the state at the end of each run.
195f3a0208fSAlexander Bulekov
196f3a0208fSAlexander BulekovIn more detail:
197f3a0208fSAlexander Bulekov
198f3a0208fSAlexander Bulekov``LLVMFuzzerInitialize`` parses the arguments to the fuzzer (must start with two
199f3a0208fSAlexander Bulekovdashes, so they are ignored by libfuzzer ``main()``). Currently, the arguments
200f3a0208fSAlexander Bulekovselect the fuzz target. Then, the qtest client is initialized. If the target
201f3a0208fSAlexander Bulekovrequires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
202f3a0208fSAlexander BulekovThen the QGraph is walked and the QEMU cmd_line is determined and saved.
203f3a0208fSAlexander Bulekov
204f3a0208fSAlexander BulekovAfter this, the ``vl.c:qemu_main`` is called to set up the guest. There are
205f3a0208fSAlexander Bulekovtarget-specific hooks that can be called before and after qemu_main, for
206f3a0208fSAlexander Bulekovadditional setup(e.g. PCI setup, or VM snapshotting).
207f3a0208fSAlexander Bulekov
208f3a0208fSAlexander Bulekov``LLVMFuzzerTestOneInput``: Uses qtest/qos functions to act based on the fuzz
209f3a0208fSAlexander Bulekovinput. It is also responsible for manually calling ``main_loop_wait`` to ensure
210f3a0208fSAlexander Bulekovthat bottom halves are executed and any cleanup required before the next input.
211f3a0208fSAlexander Bulekov
212f3a0208fSAlexander BulekovSince the same process is reused for many fuzzing runs, QEMU state needs to
213f3a0208fSAlexander Bulekovbe reset at the end of each run. There are currently two implemented
214f3a0208fSAlexander Bulekovoptions for resetting state:
215f3a0208fSAlexander Bulekov
216f3a0208fSAlexander Bulekov- Reboot the guest between runs.
217f3a0208fSAlexander Bulekov  - *Pros*: Straightforward and fast for simple fuzz targets.
218f3a0208fSAlexander Bulekov
219f3a0208fSAlexander Bulekov  - *Cons*: Depending on the device, does not reset all device state. If the
220f3a0208fSAlexander Bulekov    device requires some initialization prior to being ready for fuzzing (common
221f3a0208fSAlexander Bulekov    for QOS-based targets), this initialization needs to be done after each
222f3a0208fSAlexander Bulekov    reboot.
223f3a0208fSAlexander Bulekov
224f3a0208fSAlexander Bulekov  - *Example target*: ``i440fx-qtest-reboot-fuzz``
225f3a0208fSAlexander Bulekov
226f3a0208fSAlexander Bulekov- Run each test case in a separate forked process and copy the coverage
227f3a0208fSAlexander Bulekov   information back to the parent. This is fairly similar to AFL's "deferred"
228f3a0208fSAlexander Bulekov   fork-server mode [3]
229f3a0208fSAlexander Bulekov
230f3a0208fSAlexander Bulekov  - *Pros*: Relatively fast. Devices only need to be initialized once. No need to
231f3a0208fSAlexander Bulekov    do slow reboots or vmloads.
232f3a0208fSAlexander Bulekov
233f3a0208fSAlexander Bulekov  - *Cons*: Not officially supported by libfuzzer. Does not work well for
234f3a0208fSAlexander Bulekov     devices that rely on dedicated threads.
235f3a0208fSAlexander Bulekov
236f3a0208fSAlexander Bulekov  - *Example target*: ``virtio-net-fork-fuzz``
237