Lines Matching full:rcu

2 A Tour Through RCU's Requirements
18 Read-copy update (RCU) is a synchronization mechanism that is often used
19 as a replacement for reader-writer locking. RCU is unusual in that
20 updaters do not block readers, which means that RCU's read-side
23 this concurrency between RCU readers and updaters does raise the
24 question of exactly what RCU readers are doing, which in turn raises the
25 question of exactly what RCU's requirements are.
27 This document therefore summarizes RCU's requirements, and can be
28 thought of as an informal, high-level specification for RCU. It is
29 important to understand that RCU's specification is primarily empirical
36 All that aside, here are the categories of currently known RCU
45 #. `Other RCU Flavors`_
55 RCU's fundamental requirements are the closest thing RCU has to hard
61 #. `RCU Primitives Guaranteed to Execute Unconditionally`_
67 RCU's grace-period guarantee is unusual in being premeditated: Jack
69 on RCU (then called “rclock”) in the early 1990s. That said, the past
70 two decades of experience with RCU have produced a much more detailed
73 RCU's grace-period guarantee allows updaters to wait for the completion
74 of all pre-existing RCU read-side critical sections. An RCU read-side
77 RCU treats a nested set as one big RCU read-side critical section.
135 This scenario resembles one of the first uses of RCU in
173 The RCU read-side critical section in ``do_something_dlm()`` works with
190 In order to avoid fatal problems such as deadlocks, an RCU read-side
192 Similarly, an RCU read-side critical section must not contain anything
196 Although RCU's grace-period guarantee is useful in and of itself, with
198 be good to be able to use RCU to coordinate read-side access to linked
246 If an RCU reader fetches ``gp`` just after ``add_gp_buggy_optimized``
257 RCU's publish-subscribe guarantee allows data to be inserted into a
258 linked data structure without disrupting RCU readers. The updater uses
305 control its accesses to the RCU-protected data, as shown in
368 [PDF] <http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf>`__
372 outermost RCU read-side critical section containing that
374 element has been passed from RCU to some other synchronization
376 counting <https://www.kernel.org/doc/Documentation/RCU/rcuref.txt>`__.
379 ``rcu_dereference()``, and these two RCU API elements work together to
382 Of course, it is also necessary to remove elements from RCU-protected
386 #. Wait for all pre-existing RCU read-side critical sections to complete
425 contrast, ``rcu_dereference()`` must either be within an RCU
455 In short, RCU's publish-subscribe guarantee is provided by the
457 guarantee allows data elements to be safely added to RCU-protected
458 linked data structures without disrupting RCU readers. This guarantee
460 data elements to be removed from RCU-protected linked data structures,
461 again without disrupting RCU readers.
483 demonstrates the need for RCU's stringent memory-ordering guarantees on
486 #. Each CPU that has an RCU read-side critical section that begins
488 memory barrier between the time that the RCU read-side critical
490 this guarantee, a pre-existing RCU read-side critical section might
493 #. Each CPU that has an RCU read-side critical section that ends after
496 time that the RCU read-side critical section begins. Without this
497 guarantee, a later RCU read-side critical section running after the
516 | Given that multiple CPUs can start RCU read-side critical sections at |
517 | any time without any ordering whatsoever, how can RCU possibly tell |
518 | whether or not a given RCU read-side critical section starts before a |
523 | If RCU cannot tell whether or not a given RCU read-side critical |
525 | it must assume that the RCU read-side critical section started first. |
527 | waiting on a given RCU read-side critical section only if it can |
533 | within the enclosed RCU read-side critical section to the code |
535 | then a given RCU read-side critical section begins before a given |
541 | As of late 2016, mathematical models of RCU take this viewpoint, for |
570 | end of the RCU read-side critical section and the end of the grace |
587 | grace period and the beginning of the RCU read-side critical section, |
602 | compiler might arbitrarily rearrange consecutive RCU read-side |
603 | critical sections. Given such rearrangement, if a given RCU read-side |
604 | critical section is done, how can you be sure that all prior RCU |
611 | absolutely no code, RCU infers quiescent states only at special |
615 | RCU detects the end of a given RCU read-side critical section, it |
616 | will necessarily detect the end of all prior RCU read-side critical |
621 | that sort of scrambling, you have broken far more than just RCU! |
625 fundamental RCU requirement that a grace period wait for all
631 RCU Primitives Guaranteed to Execute Unconditionally
634 The common-case RCU primitives are unconditional. They are invoked, they
636 to retry. This is a key RCU design philosophy.
639 comes up with a good justification for a particular conditional RCU
642 nature of the RCU primitives was initially an accident of
646 primitive to RCU would need to be based on detailed and compelling use
652 As far as RCU is concerned, it is always possible to carry out an update
653 within an RCU read-side critical section. For example, that RCU
656 element, all while remaining in that RCU read-side critical section. Of
657 course, it is necessary to exit the RCU read-side critical section
669 | It doesn't, just like normal RCU updates, which also do not exclude |
670 | RCU readers. |
675 DYNIX/ptx RCU documentation.
680 RCU provides extremely lightweight readers, and its read-side
682 therefore all too easy to assume that RCU is guaranteeing more than it
683 really is. Of course, the list of things that RCU does not guarantee is
811 It is tempting to assume that if any part of one RCU read-side critical
812 section precedes a given grace period, and if any part of another RCU
814 the first RCU read-side critical section must precede all of the second.
816 partition the set of RCU read-side critical sections. An example of this
852 with each circled ``QS`` indicating the point at which RCU recorded a
853 *quiescent state* for each thread, that is, a state in which RCU knows
854 that the thread cannot be in the midst of an RCU read-side critical
859 If it is necessary to partition RCU read-side critical sections in this
901 the two RCU read-side critical sections cannot overlap, guaranteeing
911 studying RCU's interaction with memory ordering.
916 It is also tempting to assume that if an RCU read-side critical section
971 Again, an RCU read-side critical section can overlap almost all of a
973 period. As a result, an RCU read-side critical section cannot partition
974 a pair of RCU grace periods.
979 | How long a sequence of grace periods, each separated by an RCU |
980 | read-side critical section, would be required to partition the RCU |
987 | Therefore, even in practice, RCU users must abide by the theoretical |
994 These parallelism facts of life are by no means specific to RCU, but the
995 RCU implementation must abide by them. They therefore bear repeating:
1004 than about 20 seconds can result in splats, the RCU implementation is
1009 matters, RCU must use compiler directives and memory-barrier
1014 dramatic slowdowns. RCU is therefore obligated to use algorithms that
1018 carried out under the protection of any given exclusive lock. RCU
1020 #. Counters are finite, especially on 32-bit systems. RCU's use of
1024 a century much less so. As an example of the latter, RCU's
1032 kernel in a single shared-memory environment. RCU must therefore pay
1035 This last parallelism fact of life means that RCU must pay special
1045 RCU implementation that ignores these requirements could still be used,
1061 RCU is and always has been intended primarily for read-mostly
1062 situations, which means that RCU's read-side primitives are optimized,
1067 RCU works great!
1068 #. Read-mostly data, where data must be consistent: RCU works well.
1069 #. Read-write data, where data must be consistent: RCU *might* work OK.
1071 #. Write-mostly data, where data must be consistent: RCU is very
1073 exceptions, where RCU can provide:
1078 This focus on read-mostly situations means that RCU must interoperate
1080 ``remove_gp_synchronous()`` examples discussed earlier use RCU to
1083 primitives be legal within RCU read-side critical sections, including
1094 | These are forbidden within Linux-kernel RCU read-side critical |
1096 | case, voluntary context switch) within an RCU read-side critical |
1097 | section. However, sleeping locks may be used within userspace RCU |
1099 | RCU `(SRCU) <#Sleepable%20RCU>`__ read-side critical sections. In |
1103 | locks!) may be acquire within -rt-Linux-kernel RCU read-side critical |
1105 | Note that it *is* legal for a normal RCU read-side critical section |
1152 to the corresponding in-kernel data structure is protected by RCU, but
1156 within the RCU read-side critical section, and this is indicated by the
1160 In short, RCU is not required to maintain consistency, and other
1161 mechanisms may be used in concert with RCU when consistency is required.
1162 RCU's specialization allows it to do its job extremely well, and its
1170 Linux-kernel RCU implementations must therefore avoid unnecessarily
1175 energy-efficiency shortcomings of the Linux-kernel RCU implementation.
1186 RCU <https://lkml.kernel.org/g/20090113221724.GA15307@linux.vnet.ibm.com>`__
1189 project, which resulted in `SRCU <#Sleepable%20RCU>`__ becoming optional
1193 unsurprising. For example, in keeping with RCU's read-side
1199 In preemptible environments, in the case where the RCU read-side
1205 branches. However, in the case where the RCU read-side critical section
1207 interrupts. This is why it is better to nest an RCU read-side critical
1214 addition to the duration of the longest RCU read-side critical section.
1220 invocations <https://www.usenix.org/conference/2004-usenix-annual-technical-conference/making-rcu-s…
1229 microseconds on small systems, at least in cases where the RCU read-side
1232 consistent with the empirical nature of the RCU specification, that is
1291 userspace, it is bad practice to write an RCU callback function that
1347 fact that RCU was not heavily used within DYNIX/ptx, so the very few
1400 On line 14, ``get_state_synchronize_rcu()`` obtains a “cookie” from RCU,
1407 RCU thus provides a range of tools to allow updaters to strike the
1416 difficult to distinguish from system hangs. Therefore, RCU must provide
1420 example, an infinite loop in an RCU read-side critical section must by
1430 responsibility. However, short of this level of abuse, RCU is required
1434 RCU takes the following steps to encourage timely completion of grace
1437 #. If a grace period fails to complete within 100 milliseconds, RCU
1439 to provide an RCU quiescent state. RCU also causes those CPUs'
1444 defeats the above ``need_resched()`` strategem. RCU will therefore
1448 has been preempted within an RCU read-side critical section is
1449 holding out for more than 500 milliseconds, RCU will resort to
1451 #. If a CPU is still holding out 10 seconds into the grace period, RCU
1457 the relevant Kconfig options and kernel boot parameters. RCU currently
1460 are provided only for RCU, not for `SRCU <#Sleepable%20RCU>`__ or `Tasks
1461 RCU <#Tasks%20RCU>`__.
1463 RCU takes the following steps in ``call_rcu()`` to encourage timely
1480 RCU, not for `SRCU <#Sleepable%20RCU>`__ or `Tasks
1481 RCU <#Tasks%20RCU>`__. Even for RCU, callback-invocation forward
1494 use. And in theory, RCU read-side critical sections may be composed, and
1499 Implementations of RCU for which ``rcu_read_lock()`` and
1500 ``rcu_read_unlock()`` generate no code, such as Linux-kernel RCU when
1510 kernel) you will get an RCU CPU stall warning. Nevertheless, this class
1511 of RCU implementations is one of the most composable constructs in
1514 RCU implementations that explicitly track nesting depth are limited by
1516 RCU limits nesting to ``INT_MAX``. This should suffice for almost all
1517 practical purposes. That said, a consecutive pair of RCU read-side
1519 grace period cannot be enclosed in another RCU read-side critical
1521 within an RCU read-side critical section: To do so would result either
1522 in deadlock or in RCU implicitly splitting the enclosing RCU read-side
1526 It is worth noting that RCU is not alone in limiting composability. For
1533 In short, although RCU read-side critical sections are highly
1540 A given RCU workload might have an endless and intense stream of RCU
1542 never a point in time during which there was not at least one RCU
1543 read-side critical section in flight. RCU cannot allow this situation to
1544 block grace periods: As long as all the RCU read-side critical sections
1547 That said, preemptible RCU implementations could potentially result in
1548 RCU read-side critical sections being preempted for long durations,
1549 which has the effect of creating a long-duration RCU read-side critical
1552 Therefore, RCU priority boosting is provided to help deal with this
1553 case. That said, the exact requirements on RCU priority boosting will
1557 argue that such workloads should instead use something other than RCU,
1558 the fact remains that RCU must handle such workloads gracefully. This
1561 RCU callbacks in the ``call_rcu()`` code path. Finally, high update
1562 rates should not delay RCU read-side critical sections, although some
1571 motivated addition of some RCU code to react to high update rates, for
1572 example, if a given CPU finds itself with more than 10,000 RCU callbacks
1573 queued, it will cause RCU to take evasive action by more aggressively
1576 complete more quickly, but at the cost of restricting RCU's batching
1588 splat if ``rcu_dereference()`` is used outside of an RCU read-side
1601 that it has been invoked within an RCU read-side critical section. I
1603 Gleixner audited a number of RCU uses.
1604 #. A given function might wish to check for RCU-related preconditions
1605 upon entry, before using any other RCU API. The
1610 assignment. To catch this sort of error, a given RCU-protected
1628 #. An infinite loop in an RCU read-side critical section will eventually
1629 trigger an RCU CPU stall warning splat, with the duration of
1632 ``rcupdate.rcu_cpu_stall_timeout`` boot/sysfs parameter. However, RCU
1634 waiting on that particular RCU read-side critical section.
1636 Some extreme workloads might intentionally delay RCU grace periods,
1639 kernel parameter may also be set via ``sysfs``. Furthermore, RCU CPU
1641 panics. RCU therefore supplies the ``rcu_sysrq_start()`` and
1643 sysrq dumps. RCU also supplies the ``rcu_panic()`` notifier that is
1645 RCU CPU stall warnings.
1652 #. Although it would be very good to detect pointers leaking out of RCU
1655 leaking and pointers that have been handed off from RCU to some other
1657 #. In kernels built with ``CONFIG_RCU_TRACE=y``, RCU-related information
1661 error-prone. Therefore, RCU-protected `linked
1662 lists <https://lwn.net/Articles/609973/#RCU%20List%20APIs>`__ and,
1663 more recently, RCU-protected `hash
1665 other special-purpose RCU-protected data structures are available in
1666 the Linux kernel and the userspace RCU library.
1675 This not a hard-and-fast list: RCU's diagnostic capabilities will
1677 real-world RCU usage.
1683 software, including RCU. Some of the relevant points of interest are as
1692 #. `Scheduler and RCU`_
1693 #. `Tracing and RCU`_
1694 #. `Accesses to User Memory and RCU`_
1696 #. `Scheduling-Clock Interrupts and RCU`_
1707 RCU's goal is automatic configuration, so that almost nobody needs to
1708 worry about RCU's ``Kconfig`` options. And for almost all users, RCU
1730 sometimes by a large factor. If RCU naively believed the firmware, as it
1736 RCU must therefore wait for a given CPU to actually come online before
1745 The Linux kernel's boot sequence is an interesting process, and RCU is
1747 RCU's primitives can be used as soon as the initial task's
1755 callbacks are not guaranteed to be invoked until after all of RCU's
1757 This delay in callback invocation is due to the fact that RCU does not
1760 itself to the point where RCU can spawn and run its kthreads. In theory,
1775 reason is that an RCU read-side critical section might be preempted,
1780 ``early_initcalls()`` time. But this is no excuse: RCU is nevertheless
1782 period. Once all of its kthreads are up and running, RCU starts running
1788 | How can RCU possibly handle grace periods before all of its kthreads |
1795 | first task and the time that all of RCU's kthreads have been spawned, |
1808 | kthread and the time that RCU's kthreads have all been spawned. If |
1821 The Linux kernel has interrupts, and RCU read-side critical sections are
1829 “half-interrupts” mean that RCU has to be very careful about how it
1831 way during a rewrite of RCU's dyntick-idle code.
1833 The Linux kernel has non-maskable interrupts (NMIs), and RCU read-side
1834 critical sections are legal within NMI handlers. Thankfully, RCU
1839 nested NMIs, which RCU must handle correctly. Andy Lutomirski `surprised
1845 Furthermore, NMI handlers can be interrupted by what appear to RCU to be
1863 Unfortunately, there is no way to cancel an RCU callback; once you
1867 module unload request, we need some other way to deal with in-flight RCU
1870 RCU therefore provides ``rcu_barrier()``, which waits until all
1871 in-flight RCU callbacks have been invoked. If a module uses
1880 ``rcu_barrier()`` into RCU. The need for ``rcu_barrier()`` for module
1887 to wait for RCU callbacks that have already been posted. Therefore, if
1888 there are no RCU callbacks posted anywhere in the system,
1896 | Wait a minute! Each RCU callbacks must wait for a grace period to |
1905 | Yes, each RCU callbacks must wait for a grace period to complete, but |
1922 and go. It is of course illegal to use any RCU API member from an
1923 offline CPU, with the exception of `SRCU <#Sleepable%20RCU>`__ read-side
1929 to allow the various kernel subsystems (including RCU) to respond
1930 appropriately to a given CPU-hotplug operation. Most RCU operations may
1943 Scheduler and RCU
1946 RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
1948 RCU's violation of it when running context-switch-heavy workloads when
1951 RCU has made good progress towards meeting this requirement, even for
1958 somewhere within the corresponding RCU read-side critical section.
1963 Similarly, the RCU flavor consolidation has removed the need for negative
1964 nesting. The fact that interrupt-disabled regions of code act as RCU
1966 to result in destructive recursion via interrupt handler's use of RCU.
1968 Tracing and RCU
1971 It is possible to use tracing on RCU code, but tracing itself uses RCU.
1975 where RCU readers execute in environments in which tracing cannot be
1979 Accesses to User Memory and RCU
1988 reorder a ``get_user()`` invocation into an RCU read-side critical section.
2015 state in the middle of an RCU read-side critical section. This misplaced
2030 of RCU read-side critical sections.
2036 by people with battery-powered embedded systems. RCU therefore conserves
2042 Because RCU avoids interrupting idle CPUs, it is illegal to execute an
2043 RCU read-side critical section on an idle CPU. (Kernels built with
2047 whether or not it is currently legal to run RCU read-side critical
2081 running in userspace. RCU must therefore track ``nohz_full`` userspace
2082 execution. RCU must therefore be able to sample state at two points in
2088 clean-sheet rewrites of RCU's energy-efficiency code, the last of which
2094 not sufficient to fully vent their ire at RCU's energy-efficiency bugs!
2096 Scheduling-Clock Interrupts and RCU
2100 execution, and the idle loop. Depending on kernel configuration, RCU
2107 | | scheduling-clock | scheduling-clock | RCU's |
2114 | | scheduling-clock | scheduling-clock | RCU's |
2121 | | sometimes rely | RCU's | RCU's |
2147 However, RCU must be reliably informed as to whether any given CPU is
2151 scheduling-clock interrupt be enabled when RCU needs it to be:
2153 #. If a CPU is either idle or executing in usermode, and RCU believes it
2155 Otherwise, you will get RCU CPU stall warnings. Or at best, very long
2158 #. If a CPU is in a portion of the kernel that executes RCU read-side
2159 critical sections, and RCU believes this CPU to be idle, you will get
2164 no-joking guaranteed to never execute any RCU read-side critical
2165 sections, and RCU believes this CPU to be idle, no problem. This
2173 fact joking about not doing RCU read-side critical sections.
2175 interrupt disabled and RCU believes this CPU to be non-idle, and if
2176 the CPU goes idle (from an RCU perspective) every few jiffies, no
2179 If the gap grows too long, you get RCU CPU stall warnings.
2180 #. If a CPU is either idle or executing in usermode, and RCU believes it
2188 long, you get RCU CPU stall warnings.
2205 But as long as RCU is properly informed of kernel state transitions
2207 as the scheduling-clock interrupt is enabled when RCU needs it to be,
2209 part of RCU or some other part of the kernel!
2214 Although small-memory non-realtime systems can simply use Tiny RCU, code
2218 pair of pointers, it does appear in many RCU-protected data structures,
2223 This need for memory efficiency is one reason that RCU uses hand-crafted
2241 ``->next`` field. RCU makes this guarantee as long as ``call_rcu()`` is
2246 That said, there are limits. RCU requires that the ``rcu_head``
2267 discussion <#Performance%20and%20Scalability>`__, RCU is used heavily by
2269 networking, security, virtualization, and scheduling code paths. RCU
2271 read-side primitives. To that end, it would be good if preemptible RCU's
2277 which means that RCU must be extremely scalable. Algorithms that involve
2279 global variables simply cannot be tolerated within the RCU
2280 implementation. RCU therefore makes heavy use of a combining tree based
2281 on the ``rcu_node`` structure. RCU is required to tolerate all CPUs
2282 continuously invoking any combination of RCU's runtime primitives with
2287 rule, RCU must cheerfully accept whatever the rest of the Linux kernel
2294 approach of disabling preemption across RCU read-side critical sections
2296 an RCU implementation that allows RCU read-side critical sections to be
2300 conjunction with some `RCU
2304 In addition, RCU must make do with a sub-100-microsecond real-time
2307 whole kernel, including RCU. RCU's scalability and latency must
2316 RCU must avoid degrading real-time response for CPU-bound threads,
2320 milliseconds in order to avoid receiving an IPI from RCU.
2322 Finally, RCU's status as a synchronization primitive means that any RCU
2324 difficult to debug. This means that RCU must be extremely reliable,
2325 which in practice also means that RCU must have an aggressive
2336 Suppose that RCU contains a race condition that manifests on average
2338 three times per *day* across the installed base. RCU could simply hide
2347 bug in RCU killed someone. Which might explain my recent focus on
2350 Other RCU Flavors
2353 One of the more surprising things about RCU is that there are now no
2362 #. `Sleepable RCU`_
2363 #. `Tasks RCU`_
2368 The RCU-bh flavor of RCU has since been expressed in terms of the other
2369 RCU flavors as part of a consolidation of the three flavors into a
2375 flavor of RCU, or *RCU-bh*, was developed by Dipankar Sarma to provide a
2376 flavor of RCU that could withstand the network-based denial-of-service
2380 context switch, which, in the RCU implementation of that time, prevented
2384 The solution was the creation of RCU-bh, which does
2388 offline. This means that RCU-bh grace periods can complete even when
2390 algorithms based on RCU-bh to withstand network-based denial-of-service
2395 during the RCU-bh read-side critical section will be deferred. In this
2398 overhead should be associated with the code following the RCU-bh
2402 RCU-bh read-side critical section executes during a time of heavy
2409 The `RCU-bh
2410 API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
2416 simple wrappers for other RCU flavors, namely RCU-sched in
2417 CONFIG_PREEMPT=n kernels and RCU-preempt otherwise.
2422 The RCU-sched flavor of RCU has since been expressed in terms of the
2423 other RCU flavors as part of a consolidation of the three flavors into a
2428 Before preemptible RCU, waiting for an RCU grace period had the side
2430 However, there are legitimate preemptible-RCU implementations that do
2432 RCU read-side critical section can be a quiescent state. Therefore,
2433 *RCU-sched* was created, which follows “classic” RCU in that an
2434 RCU-sched grace period waits for pre-existing interrupt and NMI
2435 handlers. In kernels built with ``CONFIG_PREEMPT=n``, the RCU and
2436 RCU-sched APIs have identical implementations, while kernels built with
2442 preemption attempt during the RCU-sched read-side critical section,
2450 The `RCU-sched
2451 API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
2458 preemption also marks an RCU-sched read-side critical section, including
2462 Sleepable RCU
2465 For well over a decade, someone saying “I need to block within an RCU
2467 did not understand RCU. After all, if you are always blocking in an RCU
2470 the advent of the Linux kernel's notifiers, whose RCU read-side critical
2472 introduction of `sleepable RCU <https://lwn.net/Articles/202847/>`__, or
2521 Unlike the other RCU flavors, SRCU read-side critical sections can run
2524 which means that SRCU readers will run a bit slower than would RCU
2529 Also unlike other RCU flavors, ``synchronize_srcu()`` may **not** be
2541 SRCU also differs from other RCU flavors in that SRCU's expedited and
2564 API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
2573 Tasks RCU
2579 RCU. However, because it is necessary to be able to install a trace
2594 RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side
2598 userspace execution also delimit tasks-RCU read-side critical sections.
2600 The tasks-RCU API is quite compact, consisting only of
2612 One of the tricks that RCU uses to attain update-side scalability is to
2618 RCU disables CPU hotplug in a few places, perhaps most notably in the
2631 The multiprocessor implementations of RCU use a combining tree that
2649 Please note that arrangements that require RCU to remap CPU numbers will
2653 RCU's various kthreads are reasonably recent additions. It is quite
2656 utilization by RCU's kthreads and softirq handlers to the code that
2657 instigated this CPU utilization. For example, RCU callback overhead
2668 This document has presented more than two decade's worth of RCU