docs/devel/multi-thread-tcg.rst

2   Copyright (c) 2015-2020 Linaro Ltd.
5   later. See the COPYING file in the top-level directory.
10 Multi-threaded TCG
13 This document outlines the design for multi-threaded TCG (a.k.a MTTCG)
14 system-mode emulation. user-mode emulation has always mirrored the
17 linux-user emulation.
19 The original system-mode TCG implementation was single threaded and
20 dealt with multiple CPUs with simple round-robin scheduling. This
22 being emulated gained additional cores and per-core performance gains
29 user-space thread. This is enabled by default for all FE/BE
38 * forced by --accel tcg,thread=single
39 * enabling --icount mode
42 inter-vCPU dependencies and all vCPUs should be able to run at full
51 -------------
54 structures associated with the hot-path through the main run-loop.
58     tb_jmp_cache (per-vCPU, cache of recent jumps)
59     tb_ctx.htable (global hash table, phys address->tb lookup)
68 The hot-path avoids using locks where possible. The tb_jmp_cache is
72 have their block-to-block jumps patched.
75 ----------------
77 User-mode emulation
86 per-vCPU basis won't need locking unless other vCPUs will need to
96 !User-mode emulation
103 ------------------
110   - debugging operations (breakpoint insertion/removal)
111   - some CPU helper functions
112   - linux-user spawning its first thread
113   - operations related to TCG Plugins
122   - code modification (self modify code, patching code)
123   - page changes (new page mapping in linux-user mode)
126 being used when looked up in the hot-path there are a number of other
127 book-keeping structures that need to be safely cleared.
133 There are a number of look-up caches that need to be properly updated
136   - jump lookup cache
137   - the physical-to-tb lookup hash table
138   - the global page table
140 The global page table (l1_map) which provides a multi-level look-up
148                       - safely patch/revert direct jumps
149                       - remove central PageDesc lookup entries
150                       - ensure lookup caches/hashes are safely updated
156 searching for linked pages are done under the protection of tb->jmp_lock,
172 --------------------
176 hot-path can be handled entirely within translated code. This is
177 handled with a per-vCPU TLB structure which once populated will allow
180 will ensure the slow-path is taken for each access. This can be done
183   - Memory regions (dividing up access to PIO, MMIO and RAM)
184   - Dirty page tracking (for code gen, SMC detection, migration and display)
185   - Virtual TLB (for translating guest address->real address)
196   - TLB Flush All/Page
197     - can be across-vCPUs
198     - cross vCPU TLB flush may need other vCPU brought to halt
199     - change may need to be visible to the calling vCPU immediately
200   - TLB Flag Update
201     - usually cross-vCPU
202     - want change to be visible as soon as possible
203   - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
204     - This is a per-vCPU table - by definition can't race
205     - updated by its own thread when the slow-path is forced
224 -----------------------
226 Currently thanks to KVM work any access to IO memory is automatically protected
227 by the BQL (Big QEMU Lock). Any IO region that doesn't use the BQL is expected
230 However IO memory isn't the only way emulated hardware state can be
255 ordered hosts needs to ensure things like store-after-load re-ordering
259 ---------------
266 The Linux kernel has an excellent `write-up
267 <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barrier…
280 complete at the memory barrier. On single-core non-SMP strongly
295        - host systems with stronger implied guarantees can skip some barriers
296        - merge consecutive barriers to the strongest one
304 originally developed and tested for linux-user based systems. All
306 following front-ends have been updated to emit fences when required:
308     - target-i386
309     - target-arm
310     - target-aarch64
311     - target-alpha
312     - target-mips
315 ------------------------------
323 --------------------------
347   - Support classic atomic instructions
348   - Support load/store exclusive (or load link/store conditional) pairs
349   - Generic enough infrastructure to support all guest architectures
351   - How problematic is the ABA problem in general?
359 this may be a problem - typically presenting a locking ABI which
362 The code also includes a fall-back for cases where multi-threaded TCG
363 ops can't work (e.g. guest atomic width > host atomic width). In this