1.. _whatisrcu_doc: 2 3What is RCU? -- "Read, Copy, Update" 4====================================== 5 6Please note that the "What is RCU?" LWN series is an excellent place 7to start learning about RCU: 8 9| 1. What is RCU, Fundamentally? https://lwn.net/Articles/262464/ 10| 2. What is RCU? Part 2: Usage https://lwn.net/Articles/263130/ 11| 3. RCU part 3: the RCU API https://lwn.net/Articles/264090/ 12| 4. The RCU API, 2010 Edition https://lwn.net/Articles/418853/ 13| 2010 Big API Table https://lwn.net/Articles/419086/ 14| 5. The RCU API, 2014 Edition https://lwn.net/Articles/609904/ 15| 2014 Big API Table https://lwn.net/Articles/609973/ 16| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/ 17| 2019 Big API Table https://lwn.net/Articles/777165/ 18| 7. The RCU API, 2024 Edition https://lwn.net/Articles/988638/ 19| 2024 Background Information https://lwn.net/Articles/988641/ 20| 2024 Big API Table https://lwn.net/Articles/988666/ 21 22For those preferring video: 23 24| 1. Unraveling RCU Mysteries: Fundamentals https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries 25| 2. Unraveling RCU Mysteries: Additional Use Cases https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases 26 27 28What is RCU? 29 30RCU is a synchronization mechanism that was added to the Linux kernel 31during the 2.5 development effort that is optimized for read-mostly 32situations. Although RCU is actually quite simple, making effective use 33of it requires you to think differently about your code. Another part 34of the problem is the mistaken assumption that there is "one true way" to 35describe and to use RCU. Instead, the experience has been that different 36people must take different paths to arrive at an understanding of RCU, 37depending on their experiences and use cases. This document provides 38several different paths, as follows: 39 40:ref:`1. RCU OVERVIEW <1_whatisRCU>` 41 42:ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>` 43 44:ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>` 45 46:ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>` 47 48:ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>` 49 50:ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>` 51 52:ref:`7. ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>` 53 54:ref:`8. FULL LIST OF RCU APIs <8_whatisRCU>` 55 56:ref:`9. ANSWERS TO QUICK QUIZZES <9_whatisRCU>` 57 58People who prefer starting with a conceptual overview should focus on 59Section 1, though most readers will profit by reading this section at 60some point. People who prefer to start with an API that they can then 61experiment with should focus on Section 2. People who prefer to start 62with example uses should focus on Sections 3 and 4. People who need to 63understand the RCU implementation should focus on Section 5, then dive 64into the kernel source code. People who reason best by analogy should 65focus on Section 6 and 7. Section 8 serves as an index to the docbook 66API documentation, and Section 9 is the traditional answer key. 67 68So, start with the section that makes the most sense to you and your 69preferred method of learning. If you need to know everything about 70everything, feel free to read the whole thing -- but if you are really 71that type of person, you have perused the source code and will therefore 72never need this document anyway. ;-) 73 74.. _1_whatisRCU: 75 761. RCU OVERVIEW 77---------------- 78 79The basic idea behind RCU is to split updates into "removal" and 80"reclamation" phases. The removal phase removes references to data items 81within a data structure (possibly by replacing them with references to 82new versions of these data items), and can run concurrently with readers. 83The reason that it is safe to run the removal phase concurrently with 84readers is the semantics of modern CPUs guarantee that readers will see 85either the old or the new version of the data structure rather than a 86partially updated reference. The reclamation phase does the work of reclaiming 87(e.g., freeing) the data items removed from the data structure during the 88removal phase. Because reclaiming data items can disrupt any readers 89concurrently referencing those data items, the reclamation phase must 90not start until readers no longer hold references to those data items. 91 92Splitting the update into removal and reclamation phases permits the 93updater to perform the removal phase immediately, and to defer the 94reclamation phase until all readers active during the removal phase have 95completed, either by blocking until they finish or by registering a 96callback that is invoked after they finish. Only readers that are active 97during the removal phase need be considered, because any reader starting 98after the removal phase will be unable to gain a reference to the removed 99data items, and therefore cannot be disrupted by the reclamation phase. 100 101So the typical RCU update sequence goes something like the following: 102 103a. Remove pointers to a data structure, so that subsequent 104 readers cannot gain a reference to it. 105 106b. Wait for all previous readers to complete their RCU read-side 107 critical sections. 108 109c. At this point, there cannot be any readers who hold references 110 to the data structure, so it now may safely be reclaimed 111 (e.g., kfree()d). 112 113Step (b) above is the key idea underlying RCU's deferred destruction. 114The ability to wait until all readers are done allows RCU readers to 115use much lighter-weight synchronization, in some cases, absolutely no 116synchronization at all. In contrast, in more conventional lock-based 117schemes, readers must use heavy-weight synchronization in order to 118prevent an updater from deleting the data structure out from under them. 119This is because lock-based updaters typically update data items in place, 120and must therefore exclude readers. In contrast, RCU-based updaters 121typically take advantage of the fact that writes to single aligned 122pointers are atomic on modern CPUs, allowing atomic insertion, removal, 123and replacement of data items in a linked structure without disrupting 124readers. Concurrent RCU readers can then continue accessing the old 125versions, and can dispense with the atomic operations, memory barriers, 126and communications cache misses that are so expensive on present-day 127SMP computer systems, even in absence of lock contention. 128 129In the three-step procedure shown above, the updater is performing both 130the removal and the reclamation step, but it is often helpful for an 131entirely different thread to do the reclamation, as is in fact the case 132in the Linux kernel's directory-entry cache (dcache). Even if the same 133thread performs both the update step (step (a) above) and the reclamation 134step (step (c) above), it is often helpful to think of them separately. 135For example, RCU readers and updaters need not communicate at all, 136but RCU provides implicit low-overhead communication between readers 137and reclaimers, namely, in step (b) above. 138 139So how the heck can a reclaimer tell when a reader is done, given 140that readers are not doing any sort of synchronization operations??? 141Read on to learn about how RCU's API makes this easy. 142 143.. _2_whatisRCU: 144 1452. WHAT IS RCU'S CORE API? 146--------------------------- 147 148The core RCU API is quite small: 149 150a. rcu_read_lock() 151b. rcu_read_unlock() 152c. synchronize_rcu() / call_rcu() 153d. rcu_assign_pointer() 154e. rcu_dereference() 155 156There are many other members of the RCU API, but the rest can be 157expressed in terms of these five, though most implementations instead 158express synchronize_rcu() in terms of the call_rcu() callback API. 159 160The five core RCU APIs are described below, the other 18 will be enumerated 161later. See the kernel docbook documentation for more info, or look directly 162at the function header comments. 163 164rcu_read_lock() 165^^^^^^^^^^^^^^^ 166 void rcu_read_lock(void); 167 168 This temporal primitive is used by a reader to inform the 169 reclaimer that the reader is entering an RCU read-side critical 170 section. It is illegal to block while in an RCU read-side 171 critical section, though kernels built with CONFIG_PREEMPT_RCU 172 can preempt RCU read-side critical sections. Any RCU-protected 173 data structure accessed during an RCU read-side critical section 174 is guaranteed to remain unreclaimed for the full duration of that 175 critical section. Reference counts may be used in conjunction 176 with RCU to maintain longer-term references to data structures. 177 178 Note that anything that disables bottom halves, preemption, 179 or interrupts also enters an RCU read-side critical section. 180 Acquiring a spinlock also enters an RCU read-side critical 181 sections, even for spinlocks that do not disable preemption, 182 as is the case in kernels built with CONFIG_PREEMPT_RT=y. 183 Sleeplocks do *not* enter RCU read-side critical sections. 184 185rcu_read_unlock() 186^^^^^^^^^^^^^^^^^ 187 void rcu_read_unlock(void); 188 189 This temporal primitives is used by a reader to inform the 190 reclaimer that the reader is exiting an RCU read-side critical 191 section. Anything that enables bottom halves, preemption, 192 or interrupts also exits an RCU read-side critical section. 193 Releasing a spinlock also exits an RCU read-side critical section. 194 195 Note that RCU read-side critical sections may be nested and/or 196 overlapping. 197 198synchronize_rcu() 199^^^^^^^^^^^^^^^^^ 200 void synchronize_rcu(void); 201 202 This temporal primitive marks the end of updater code and the 203 beginning of reclaimer code. It does this by blocking until 204 all pre-existing RCU read-side critical sections on all CPUs 205 have completed. Note that synchronize_rcu() will **not** 206 necessarily wait for any subsequent RCU read-side critical 207 sections to complete. For example, consider the following 208 sequence of events:: 209 210 CPU 0 CPU 1 CPU 2 211 ----------------- ------------------------- --------------- 212 1. rcu_read_lock() 213 2. enters synchronize_rcu() 214 3. rcu_read_lock() 215 4. rcu_read_unlock() 216 5. exits synchronize_rcu() 217 6. rcu_read_unlock() 218 219 To reiterate, synchronize_rcu() waits only for ongoing RCU 220 read-side critical sections to complete, not necessarily for 221 any that begin after synchronize_rcu() is invoked. 222 223 Of course, synchronize_rcu() does not necessarily return 224 **immediately** after the last pre-existing RCU read-side critical 225 section completes. For one thing, there might well be scheduling 226 delays. For another thing, many RCU implementations process 227 requests in batches in order to improve efficiencies, which can 228 further delay synchronize_rcu(). 229 230 Since synchronize_rcu() is the API that must figure out when 231 readers are done, its implementation is key to RCU. For RCU 232 to be useful in all but the most read-intensive situations, 233 synchronize_rcu()'s overhead must also be quite small. 234 235 The call_rcu() API is an asynchronous callback form of 236 synchronize_rcu(), and is described in more detail in a later 237 section. Instead of blocking, it registers a function and 238 argument which are invoked after all ongoing RCU read-side 239 critical sections have completed. This callback variant is 240 particularly useful in situations where it is illegal to block 241 or where update-side performance is critically important. 242 243 However, the call_rcu() API should not be used lightly, as use 244 of the synchronize_rcu() API generally results in simpler code. 245 In addition, the synchronize_rcu() API has the nice property 246 of automatically limiting update rate should grace periods 247 be delayed. This property results in system resilience in face 248 of denial-of-service attacks. Code using call_rcu() should limit 249 update rate in order to gain this same sort of resilience. See 250 checklist.rst for some approaches to limiting the update rate. 251 252rcu_assign_pointer() 253^^^^^^^^^^^^^^^^^^^^ 254 void rcu_assign_pointer(p, typeof(p) v); 255 256 Yes, rcu_assign_pointer() **is** implemented as a macro, though 257 it would be cool to be able to declare a function in this manner. 258 (And there has been some discussion of adding overloaded functions 259 to the C language, so who knows?) 260 261 The updater uses this spatial macro to assign a new value to an 262 RCU-protected pointer, in order to safely communicate the change 263 in value from the updater to the reader. This is a spatial (as 264 opposed to temporal) macro. It does not evaluate to an rvalue, 265 but it does provide any compiler directives and memory-barrier 266 instructions required for a given compile or CPU architecture. 267 Its ordering properties are that of a store-release operation, 268 that is, any prior loads and stores required to initialize the 269 structure are ordered before the store that publishes the pointer 270 to that structure. 271 272 Perhaps just as important, rcu_assign_pointer() serves to document 273 (1) which pointers are protected by RCU and (2) the point at which 274 a given structure becomes accessible to other CPUs. That said, 275 rcu_assign_pointer() is most frequently used indirectly, via 276 the _rcu list-manipulation primitives such as list_add_rcu(). 277 278rcu_dereference() 279^^^^^^^^^^^^^^^^^ 280 typeof(p) rcu_dereference(p); 281 282 Like rcu_assign_pointer(), rcu_dereference() must be implemented 283 as a macro. 284 285 The reader uses the spatial rcu_dereference() macro to fetch 286 an RCU-protected pointer, which returns a value that may 287 then be safely dereferenced. Note that rcu_dereference() 288 does not actually dereference the pointer, instead, it 289 protects the pointer for later dereferencing. It also 290 executes any needed memory-barrier instructions for a given 291 CPU architecture. Currently, only Alpha needs memory barriers 292 within rcu_dereference() -- on other CPUs, it compiles to a 293 volatile load. However, no mainstream C compilers respect 294 address dependencies, so rcu_dereference() uses volatile casts, 295 which, in combination with the coding guidelines listed in 296 rcu_dereference.rst, prevent current compilers from breaking 297 these dependencies. 298 299 Common coding practice uses rcu_dereference() to copy an 300 RCU-protected pointer to a local variable, then dereferences 301 this local variable, for example as follows:: 302 303 p = rcu_dereference(head.next); 304 return p->data; 305 306 However, in this case, one could just as easily combine these 307 into one statement:: 308 309 return rcu_dereference(head.next)->data; 310 311 If you are going to be fetching multiple fields from the 312 RCU-protected structure, using the local variable is of 313 course preferred. Repeated rcu_dereference() calls look 314 ugly, do not guarantee that the same pointer will be returned 315 if an update happened while in the critical section, and incur 316 unnecessary overhead on Alpha CPUs. 317 318 Note that the value returned by rcu_dereference() is valid 319 only within the enclosing RCU read-side critical section [1]_. 320 For example, the following is **not** legal:: 321 322 rcu_read_lock(); 323 p = rcu_dereference(head.next); 324 rcu_read_unlock(); 325 x = p->address; /* BUG!!! */ 326 rcu_read_lock(); 327 y = p->data; /* BUG!!! */ 328 rcu_read_unlock(); 329 330 Holding a reference from one RCU read-side critical section 331 to another is just as illegal as holding a reference from 332 one lock-based critical section to another! Similarly, 333 using a reference outside of the critical section in which 334 it was acquired is just as illegal as doing so with normal 335 locking. 336 337 As with rcu_assign_pointer(), an important function of 338 rcu_dereference() is to document which pointers are protected by 339 RCU, in particular, flagging a pointer that is subject to changing 340 at any time, including immediately after the rcu_dereference(). 341 And, again like rcu_assign_pointer(), rcu_dereference() is 342 typically used indirectly, via the _rcu list-manipulation 343 primitives, such as list_for_each_entry_rcu() [2]_. 344 345.. [1] The variant rcu_dereference_protected() can be used outside 346 of an RCU read-side critical section as long as the usage is 347 protected by locks acquired by the update-side code. This variant 348 avoids the lockdep warning that would happen when using (for 349 example) rcu_dereference() without rcu_read_lock() protection. 350 Using rcu_dereference_protected() also has the advantage 351 of permitting compiler optimizations that rcu_dereference() 352 must prohibit. The rcu_dereference_protected() variant takes 353 a lockdep expression to indicate which locks must be acquired 354 by the caller. If the indicated protection is not provided, 355 a lockdep splat is emitted. See Design/Requirements/Requirements.rst 356 and the API's code comments for more details and example usage. 357 358.. [2] If the list_for_each_entry_rcu() instance might be used by 359 update-side code as well as by RCU readers, then an additional 360 lockdep expression can be added to its list of arguments. 361 For example, given an additional "lock_is_held(&mylock)" argument, 362 the RCU lockdep code would complain only if this instance was 363 invoked outside of an RCU read-side critical section and without 364 the protection of mylock. 365 366The following diagram shows how each API communicates among the 367reader, updater, and reclaimer. 368:: 369 370 371 rcu_assign_pointer() 372 +--------+ 373 +---------------------->| reader |---------+ 374 | +--------+ | 375 | | | 376 | | | Protect: 377 | | | rcu_read_lock() 378 | | | rcu_read_unlock() 379 | rcu_dereference() | | 380 +---------+ | | 381 | updater |<----------------+ | 382 +---------+ V 383 | +-----------+ 384 +----------------------------------->| reclaimer | 385 +-----------+ 386 Defer: 387 synchronize_rcu() & call_rcu() 388 389 390The RCU infrastructure observes the temporal sequence of rcu_read_lock(), 391rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in 392order to determine when (1) synchronize_rcu() invocations may return 393to their callers and (2) call_rcu() callbacks may be invoked. Efficient 394implementations of the RCU infrastructure make heavy use of batching in 395order to amortize their overhead over many uses of the corresponding APIs. 396The rcu_assign_pointer() and rcu_dereference() invocations communicate 397spatial changes via stores to and loads from the RCU-protected pointer in 398question. 399 400There are at least three flavors of RCU usage in the Linux kernel. The diagram 401above shows the most common one. On the updater side, the rcu_assign_pointer(), 402synchronize_rcu() and call_rcu() primitives used are the same for all three 403flavors. However for protection (on the reader side), the primitives used vary 404depending on the flavor: 405 406a. rcu_read_lock() / rcu_read_unlock() 407 rcu_dereference() 408 409b. rcu_read_lock_bh() / rcu_read_unlock_bh() 410 local_bh_disable() / local_bh_enable() 411 rcu_dereference_bh() 412 413c. rcu_read_lock_sched() / rcu_read_unlock_sched() 414 preempt_disable() / preempt_enable() 415 local_irq_save() / local_irq_restore() 416 hardirq enter / hardirq exit 417 NMI enter / NMI exit 418 rcu_dereference_sched() 419 420These three flavors are used as follows: 421 422a. RCU applied to normal data structures. 423 424b. RCU applied to networking data structures that may be subjected 425 to remote denial-of-service attacks. 426 427c. RCU applied to scheduler and interrupt/NMI-handler tasks. 428 429Again, most uses will be of (a). The (b) and (c) cases are important 430for specialized uses, but are relatively uncommon. The SRCU, RCU-Tasks, 431RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among 432their assorted primitives. 433 434.. _3_whatisRCU: 435 4363. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? 437----------------------------------------------- 438 439This section shows a simple use of the core RCU API to protect a 440global pointer to a dynamically allocated structure. More-typical 441uses of RCU may be found in listRCU.rst and NMI-RCU.rst. 442:: 443 444 struct foo { 445 int a; 446 char b; 447 long c; 448 }; 449 DEFINE_SPINLOCK(foo_mutex); 450 451 struct foo __rcu *gbl_foo; 452 453 /* 454 * Create a new struct foo that is the same as the one currently 455 * pointed to by gbl_foo, except that field "a" is replaced 456 * with "new_a". Points gbl_foo to the new structure, and 457 * frees up the old structure after a grace period. 458 * 459 * Uses rcu_assign_pointer() to ensure that concurrent readers 460 * see the initialized version of the new structure. 461 * 462 * Uses synchronize_rcu() to ensure that any readers that might 463 * have references to the old structure complete before freeing 464 * the old structure. 465 */ 466 void foo_update_a(int new_a) 467 { 468 struct foo *new_fp; 469 struct foo *old_fp; 470 471 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); 472 spin_lock(&foo_mutex); 473 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex)); 474 *new_fp = *old_fp; 475 new_fp->a = new_a; 476 rcu_assign_pointer(gbl_foo, new_fp); 477 spin_unlock(&foo_mutex); 478 synchronize_rcu(); 479 kfree(old_fp); 480 } 481 482 /* 483 * Return the value of field "a" of the current gbl_foo 484 * structure. Use rcu_read_lock() and rcu_read_unlock() 485 * to ensure that the structure does not get deleted out 486 * from under us, and use rcu_dereference() to ensure that 487 * we see the initialized version of the structure (important 488 * for DEC Alpha and for people reading the code). 489 */ 490 int foo_get_a(void) 491 { 492 int retval; 493 494 rcu_read_lock(); 495 retval = rcu_dereference(gbl_foo)->a; 496 rcu_read_unlock(); 497 return retval; 498 } 499 500So, to sum up: 501 502- Use rcu_read_lock() and rcu_read_unlock() to guard RCU 503 read-side critical sections. 504 505- Within an RCU read-side critical section, use rcu_dereference() 506 to dereference RCU-protected pointers. 507 508- Use some solid design (such as locks or semaphores) to 509 keep concurrent updates from interfering with each other. 510 511- Use rcu_assign_pointer() to update an RCU-protected pointer. 512 This primitive protects concurrent readers from the updater, 513 **not** concurrent updates from each other! You therefore still 514 need to use locking (or something similar) to keep concurrent 515 rcu_assign_pointer() primitives from interfering with each other. 516 517- Use synchronize_rcu() **after** removing a data element from an 518 RCU-protected data structure, but **before** reclaiming/freeing 519 the data element, in order to wait for the completion of all 520 RCU read-side critical sections that might be referencing that 521 data item. 522 523See checklist.rst for additional rules to follow when using RCU. 524And again, more-typical uses of RCU may be found in listRCU.rst 525and NMI-RCU.rst. 526 527.. _4_whatisRCU: 528 5294. WHAT IF MY UPDATING THREAD CANNOT BLOCK? 530-------------------------------------------- 531 532In the example above, foo_update_a() blocks until a grace period elapses. 533This is quite simple, but in some cases one cannot afford to wait so 534long -- there might be other high-priority work to be done. 535 536In such cases, one uses call_rcu() rather than synchronize_rcu(). 537The call_rcu() API is as follows:: 538 539 void call_rcu(struct rcu_head *head, rcu_callback_t func); 540 541This function invokes func(head) after a grace period has elapsed. 542This invocation might happen from either softirq or process context, 543so the function is not permitted to block. The foo struct needs to 544have an rcu_head structure added, perhaps as follows:: 545 546 struct foo { 547 int a; 548 char b; 549 long c; 550 struct rcu_head rcu; 551 }; 552 553The foo_update_a() function might then be written as follows:: 554 555 /* 556 * Create a new struct foo that is the same as the one currently 557 * pointed to by gbl_foo, except that field "a" is replaced 558 * with "new_a". Points gbl_foo to the new structure, and 559 * frees up the old structure after a grace period. 560 * 561 * Uses rcu_assign_pointer() to ensure that concurrent readers 562 * see the initialized version of the new structure. 563 * 564 * Uses call_rcu() to ensure that any readers that might have 565 * references to the old structure complete before freeing the 566 * old structure. 567 */ 568 void foo_update_a(int new_a) 569 { 570 struct foo *new_fp; 571 struct foo *old_fp; 572 573 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); 574 spin_lock(&foo_mutex); 575 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex)); 576 *new_fp = *old_fp; 577 new_fp->a = new_a; 578 rcu_assign_pointer(gbl_foo, new_fp); 579 spin_unlock(&foo_mutex); 580 call_rcu(&old_fp->rcu, foo_reclaim); 581 } 582 583The foo_reclaim() function might appear as follows:: 584 585 void foo_reclaim(struct rcu_head *rp) 586 { 587 struct foo *fp = container_of(rp, struct foo, rcu); 588 589 foo_cleanup(fp->a); 590 591 kfree(fp); 592 } 593 594The container_of() primitive is a macro that, given a pointer into a 595struct, the type of the struct, and the pointed-to field within the 596struct, returns a pointer to the beginning of the struct. 597 598The use of call_rcu() permits the caller of foo_update_a() to 599immediately regain control, without needing to worry further about the 600old version of the newly updated element. It also clearly shows the 601RCU distinction between updater, namely foo_update_a(), and reclaimer, 602namely foo_reclaim(). 603 604The summary of advice is the same as for the previous section, except 605that we are now using call_rcu() rather than synchronize_rcu(): 606 607- Use call_rcu() **after** removing a data element from an 608 RCU-protected data structure in order to register a callback 609 function that will be invoked after the completion of all RCU 610 read-side critical sections that might be referencing that 611 data item. 612 613If the callback for call_rcu() is not doing anything more than calling 614kfree() on the structure, you can use kfree_rcu() instead of call_rcu() 615to avoid having to write your own callback:: 616 617 kfree_rcu(old_fp, rcu); 618 619If the occasional sleep is permitted, the single-argument form may 620be used, omitting the rcu_head structure from struct foo. 621 622 kfree_rcu_mightsleep(old_fp); 623 624This variant almost never blocks, but might do so by invoking 625synchronize_rcu() in response to memory-allocation failure. 626 627Again, see checklist.rst for additional rules governing the use of RCU. 628 629.. _5_whatisRCU: 630 6315. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? 632------------------------------------------------ 633 634One of the nice things about RCU is that it has extremely simple "toy" 635implementations that are a good first step towards understanding the 636production-quality implementations in the Linux kernel. This section 637presents two such "toy" implementations of RCU, one that is implemented 638in terms of familiar locking primitives, and another that more closely 639resembles "classic" RCU. Both are way too simple for real-world use, 640lacking both functionality and performance. However, they are useful 641in getting a feel for how RCU works. See kernel/rcu/update.c for a 642production-quality implementation, and see: 643 644 https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit 645 646for papers describing the Linux kernel RCU implementation. The OLS'01 647and OLS'02 papers are a good introduction, and the dissertation provides 648more details on the current implementation as of early 2004. 649 650 6515A. "TOY" IMPLEMENTATION #1: LOCKING 652^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 653This section presents a "toy" RCU implementation that is based on 654familiar locking primitives. Its overhead makes it a non-starter for 655real-life use, as does its lack of scalability. It is also unsuitable 656for realtime use, since it allows scheduling latency to "bleed" from 657one read-side critical section to another. It also assumes recursive 658reader-writer locks: If you try this with non-recursive locks, and 659you allow nested rcu_read_lock() calls, you can deadlock. 660 661However, it is probably the easiest implementation to relate to, so is 662a good starting point. 663 664It is extremely simple:: 665 666 static DEFINE_RWLOCK(rcu_gp_mutex); 667 668 void rcu_read_lock(void) 669 { 670 read_lock(&rcu_gp_mutex); 671 } 672 673 void rcu_read_unlock(void) 674 { 675 read_unlock(&rcu_gp_mutex); 676 } 677 678 void synchronize_rcu(void) 679 { 680 write_lock(&rcu_gp_mutex); 681 smp_mb__after_spinlock(); 682 write_unlock(&rcu_gp_mutex); 683 } 684 685[You can ignore rcu_assign_pointer() and rcu_dereference() without missing 686much. But here are simplified versions anyway. And whatever you do, 687don't forget about them when submitting patches making use of RCU!]:: 688 689 #define rcu_assign_pointer(p, v) \ 690 ({ \ 691 smp_store_release(&(p), (v)); \ 692 }) 693 694 #define rcu_dereference(p) \ 695 ({ \ 696 typeof(p) _________p1 = READ_ONCE(p); \ 697 (_________p1); \ 698 }) 699 700 701The rcu_read_lock() and rcu_read_unlock() primitive read-acquire 702and release a global reader-writer lock. The synchronize_rcu() 703primitive write-acquires this same lock, then releases it. This means 704that once synchronize_rcu() exits, all RCU read-side critical sections 705that were in progress before synchronize_rcu() was called are guaranteed 706to have completed -- there is no way that synchronize_rcu() would have 707been able to write-acquire the lock otherwise. The smp_mb__after_spinlock() 708promotes synchronize_rcu() to a full memory barrier in compliance with 709the "Memory-Barrier Guarantees" listed in: 710 711 Design/Requirements/Requirements.rst 712 713It is possible to nest rcu_read_lock(), since reader-writer locks may 714be recursively acquired. Note also that rcu_read_lock() is immune 715from deadlock (an important property of RCU). The reason for this is 716that the only thing that can block rcu_read_lock() is a synchronize_rcu(). 717But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, 718so there can be no deadlock cycle. 719 720.. _quiz_1: 721 722Quick Quiz #1: 723 Why is this argument naive? How could a deadlock 724 occur when using this algorithm in a real-world Linux 725 kernel? How could this deadlock be avoided? 726 727:ref:`Answers to Quick Quiz <9_whatisRCU>` 728 7295B. "TOY" EXAMPLE #2: CLASSIC RCU 730^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 731This section presents a "toy" RCU implementation that is based on 732"classic RCU". It is also short on performance (but only for updates) and 733on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION 734kernels. The definitions of rcu_dereference() and rcu_assign_pointer() 735are the same as those shown in the preceding section, so they are omitted. 736:: 737 738 void rcu_read_lock(void) { } 739 740 void rcu_read_unlock(void) { } 741 742 void synchronize_rcu(void) 743 { 744 int cpu; 745 746 for_each_possible_cpu(cpu) 747 run_on(cpu); 748 } 749 750Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing. 751This is the great strength of classic RCU in a non-preemptive kernel: 752read-side overhead is precisely zero, at least on non-Alpha CPUs. 753And there is absolutely no way that rcu_read_lock() can possibly 754participate in a deadlock cycle! 755 756The implementation of synchronize_rcu() simply schedules itself on each 757CPU in turn. The run_on() primitive can be implemented straightforwardly 758in terms of the sched_setaffinity() primitive. Of course, a somewhat less 759"toy" implementation would restore the affinity upon completion rather 760than just leaving all tasks running on the last CPU, but when I said 761"toy", I meant **toy**! 762 763So how the heck is this supposed to work??? 764 765Remember that it is illegal to block while in an RCU read-side critical 766section. Therefore, if a given CPU executes a context switch, we know 767that it must have completed all preceding RCU read-side critical sections. 768Once **all** CPUs have executed a context switch, then **all** preceding 769RCU read-side critical sections will have completed. 770 771So, suppose that we remove a data item from its structure and then invoke 772synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed 773that there are no RCU read-side critical sections holding a reference 774to that data item, so we can safely reclaim it. 775 776.. _quiz_2: 777 778Quick Quiz #2: 779 Give an example where Classic RCU's read-side 780 overhead is **negative**. 781 782:ref:`Answers to Quick Quiz <9_whatisRCU>` 783 784.. _quiz_3: 785 786Quick Quiz #3: 787 If it is illegal to block in an RCU read-side 788 critical section, what the heck do you do in 789 CONFIG_PREEMPT_RT, where normal spinlocks can block??? 790 791:ref:`Answers to Quick Quiz <9_whatisRCU>` 792 793.. _6_whatisRCU: 794 7956. ANALOGY WITH READER-WRITER LOCKING 796-------------------------------------- 797 798Although RCU can be used in many different ways, a very common use of 799RCU is analogous to reader-writer locking. The following unified 800diff shows how closely related RCU and reader-writer locking can be. 801:: 802 803 @@ -5,5 +5,5 @@ struct el { 804 int data; 805 /* Other data fields */ 806 }; 807 -rwlock_t listmutex; 808 +spinlock_t listmutex; 809 struct el head; 810 811 @@ -13,15 +14,15 @@ 812 struct list_head *lp; 813 struct el *p; 814 815 - read_lock(&listmutex); 816 - list_for_each_entry(p, head, lp) { 817 + rcu_read_lock(); 818 + list_for_each_entry_rcu(p, head, lp) { 819 if (p->key == key) { 820 *result = p->data; 821 - read_unlock(&listmutex); 822 + rcu_read_unlock(); 823 return 1; 824 } 825 } 826 - read_unlock(&listmutex); 827 + rcu_read_unlock(); 828 return 0; 829 } 830 831 @@ -29,15 +30,16 @@ 832 { 833 struct el *p; 834 835 - write_lock(&listmutex); 836 + spin_lock(&listmutex); 837 list_for_each_entry(p, head, lp) { 838 if (p->key == key) { 839 - list_del(&p->list); 840 - write_unlock(&listmutex); 841 + list_del_rcu(&p->list); 842 + spin_unlock(&listmutex); 843 + synchronize_rcu(); 844 kfree(p); 845 return 1; 846 } 847 } 848 - write_unlock(&listmutex); 849 + spin_unlock(&listmutex); 850 return 0; 851 } 852 853Or, for those who prefer a side-by-side listing:: 854 855 1 struct el { 1 struct el { 856 2 struct list_head list; 2 struct list_head list; 857 3 long key; 3 long key; 858 4 spinlock_t mutex; 4 spinlock_t mutex; 859 5 int data; 5 int data; 860 6 /* Other data fields */ 6 /* Other data fields */ 861 7 }; 7 }; 862 8 rwlock_t listmutex; 8 spinlock_t listmutex; 863 9 struct el head; 9 struct el head; 864 865:: 866 867 1 int search(long key, int *result) 1 int search(long key, int *result) 868 2 { 2 { 869 3 struct list_head *lp; 3 struct list_head *lp; 870 4 struct el *p; 4 struct el *p; 871 5 5 872 6 read_lock(&listmutex); 6 rcu_read_lock(); 873 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { 874 8 if (p->key == key) { 8 if (p->key == key) { 875 9 *result = p->data; 9 *result = p->data; 876 10 read_unlock(&listmutex); 10 rcu_read_unlock(); 877 11 return 1; 11 return 1; 878 12 } 12 } 879 13 } 13 } 880 14 read_unlock(&listmutex); 14 rcu_read_unlock(); 881 15 return 0; 15 return 0; 882 16 } 16 } 883 884:: 885 886 1 int delete(long key) 1 int delete(long key) 887 2 { 2 { 888 3 struct el *p; 3 struct el *p; 889 4 4 890 5 write_lock(&listmutex); 5 spin_lock(&listmutex); 891 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 892 7 if (p->key == key) { 7 if (p->key == key) { 893 8 list_del(&p->list); 8 list_del_rcu(&p->list); 894 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 895 10 synchronize_rcu(); 896 10 kfree(p); 11 kfree(p); 897 11 return 1; 12 return 1; 898 12 } 13 } 899 13 } 14 } 900 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); 901 15 return 0; 16 return 0; 902 16 } 17 } 903 904Either way, the differences are quite small. Read-side locking moves 905to rcu_read_lock() and rcu_read_unlock, update-side locking moves from 906a reader-writer lock to a simple spinlock, and a synchronize_rcu() 907precedes the kfree(). 908 909However, there is one potential catch: the read-side and update-side 910critical sections can now run concurrently. In many cases, this will 911not be a problem, but it is necessary to check carefully regardless. 912For example, if multiple independent list updates must be seen as 913a single atomic update, converting to RCU will require special care. 914 915Also, the presence of synchronize_rcu() means that the RCU version of 916delete() can now block. If this is a problem, there is a callback-based 917mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can 918be used in place of synchronize_rcu(). 919 920.. _7_whatisRCU: 921 9227. ANALOGY WITH REFERENCE COUNTING 923----------------------------------- 924 925The reader-writer analogy (illustrated by the previous section) is not 926always the best way to think about using RCU. Another helpful analogy 927considers RCU an effective reference count on everything which is 928protected by RCU. 929 930A reference count typically does not prevent the referenced object's 931values from changing, but does prevent changes to type -- particularly the 932gross change of type that happens when that object's memory is freed and 933re-allocated for some other purpose. Once a type-safe reference to the 934object is obtained, some other mechanism is needed to ensure consistent 935access to the data in the object. This could involve taking a spinlock, 936but with RCU the typical approach is to perform reads with SMP-aware 937operations such as smp_load_acquire(), to perform updates with atomic 938read-modify-write operations, and to provide the necessary ordering. 939RCU provides a number of support functions that embed the required 940operations and ordering, such as the list_for_each_entry_rcu() macro 941used in the previous section. 942 943A more focused view of the reference counting behavior is that, 944between rcu_read_lock() and rcu_read_unlock(), any reference taken with 945rcu_dereference() on a pointer marked as ``__rcu`` can be treated as 946though a reference-count on that object has been temporarily increased. 947This prevents the object from changing type. Exactly what this means 948will depend on normal expectations of objects of that type, but it 949typically includes that spinlocks can still be safely locked, normal 950reference counters can be safely manipulated, and ``__rcu`` pointers 951can be safely dereferenced. 952 953Some operations that one might expect to see on an object for 954which an RCU reference is held include: 955 956 - Copying out data that is guaranteed to be stable by the object's type. 957 - Using kref_get_unless_zero() or similar to get a longer-term 958 reference. This may fail of course. 959 - Acquiring a spinlock in the object, and checking if the object still 960 is the expected object and if so, manipulating it freely. 961 962The understanding that RCU provides a reference that only prevents a 963change of type is particularly visible with objects allocated from a 964slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a 965reference to an object from such a cache that has been concurrently freed 966and the memory reallocated to a completely different object, though of 967the same type. In this case RCU doesn't even protect the identity of the 968object from changing, only its type. So the object found may not be the 969one expected, but it will be one where it is safe to take a reference 970(and then potentially acquiring a spinlock), allowing subsequent code 971to check whether the identity matches expectations. It is tempting 972to simply acquire the spinlock without first taking the reference, but 973unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be 974initialized after each and every call to kmem_cache_alloc(), which renders 975reference-free spinlock acquisition completely unsafe. Therefore, when 976using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter. 977If using refcount_t, the specialized refcount_{add|inc}_not_zero_acquire() 978and refcount_set_release() APIs should be used to ensure correct operation 979ordering when verifying object identity and when initializing newly 980allocated objects. Acquire fence in refcount_{add|inc}_not_zero_acquire() 981ensures that identity checks happen *after* reference count is taken. 982refcount_set_release() should be called after a newly allocated object is 983fully initialized and release fence ensures that new values are visible 984*before* refcount can be successfully taken by other users. Once 985refcount_set_release() is called, the object should be considered visible 986by other tasks. 987(Those willing to initialize their locks in a kmem_cache constructor 988may also use locking, including cache-friendly sequence locking.) 989 990With traditional reference counting -- such as that implemented by the 991kref library in Linux -- there is typically code that runs when the last 992reference to an object is dropped. With kref, this is the function 993passed to kref_put(). When RCU is being used, such finalization code 994must not be run until all ``__rcu`` pointers referencing the object have 995been updated, and then a grace period has passed. Every remaining 996globally visible pointer to the object must be considered to be a 997potential counted reference, and the finalization code is typically run 998using call_rcu() only after all those pointers have been changed. 999 1000To see how to choose between these two analogies -- of RCU as a 1001reader-writer lock and RCU as a reference counting system -- it is useful 1002to reflect on the scale of the thing being protected. The reader-writer 1003lock analogy looks at larger multi-part objects such as a linked list 1004and shows how RCU can facilitate concurrency while elements are added 1005to, and removed from, the list. The reference-count analogy looks at 1006the individual objects and looks at how they can be accessed safely 1007within whatever whole they are a part of. 1008 1009.. _8_whatisRCU: 1010 10118. FULL LIST OF RCU APIs 1012------------------------- 1013 1014The RCU APIs are documented in docbook-format header comments in the 1015Linux-kernel source code, but it helps to have a full list of the 1016APIs, since there does not appear to be a way to categorize them 1017in docbook. Here is the list, by category. 1018 1019RCU list traversal:: 1020 1021 list_entry_rcu 1022 list_entry_lockless 1023 list_first_entry_rcu 1024 list_next_rcu 1025 list_for_each_entry_rcu 1026 list_for_each_entry_continue_rcu 1027 list_for_each_entry_from_rcu 1028 list_first_or_null_rcu 1029 list_next_or_null_rcu 1030 hlist_first_rcu 1031 hlist_next_rcu 1032 hlist_pprev_rcu 1033 hlist_for_each_entry_rcu 1034 hlist_for_each_entry_rcu_bh 1035 hlist_for_each_entry_from_rcu 1036 hlist_for_each_entry_continue_rcu 1037 hlist_for_each_entry_continue_rcu_bh 1038 hlist_nulls_first_rcu 1039 hlist_nulls_for_each_entry_rcu 1040 hlist_bl_first_rcu 1041 hlist_bl_for_each_entry_rcu 1042 1043RCU pointer/list update:: 1044 1045 rcu_assign_pointer 1046 list_add_rcu 1047 list_add_tail_rcu 1048 list_del_rcu 1049 list_replace_rcu 1050 hlist_add_behind_rcu 1051 hlist_add_before_rcu 1052 hlist_add_head_rcu 1053 hlist_add_tail_rcu 1054 hlist_del_rcu 1055 hlist_del_init_rcu 1056 hlist_replace_rcu 1057 list_splice_init_rcu 1058 list_splice_tail_init_rcu 1059 hlist_nulls_del_init_rcu 1060 hlist_nulls_del_rcu 1061 hlist_nulls_add_head_rcu 1062 hlist_bl_add_head_rcu 1063 hlist_bl_del_init_rcu 1064 hlist_bl_del_rcu 1065 hlist_bl_set_first_rcu 1066 1067RCU:: 1068 1069 Critical sections Grace period Barrier 1070 1071 rcu_read_lock synchronize_net rcu_barrier 1072 rcu_read_unlock synchronize_rcu 1073 rcu_dereference synchronize_rcu_expedited 1074 rcu_read_lock_held call_rcu 1075 rcu_dereference_check kfree_rcu 1076 rcu_dereference_protected 1077 1078bh:: 1079 1080 Critical sections Grace period Barrier 1081 1082 rcu_read_lock_bh call_rcu rcu_barrier 1083 rcu_read_unlock_bh synchronize_rcu 1084 [local_bh_disable] synchronize_rcu_expedited 1085 [and friends] 1086 rcu_dereference_bh 1087 rcu_dereference_bh_check 1088 rcu_dereference_bh_protected 1089 rcu_read_lock_bh_held 1090 1091sched:: 1092 1093 Critical sections Grace period Barrier 1094 1095 rcu_read_lock_sched call_rcu rcu_barrier 1096 rcu_read_unlock_sched synchronize_rcu 1097 [preempt_disable] synchronize_rcu_expedited 1098 [and friends] 1099 rcu_read_lock_sched_notrace 1100 rcu_read_unlock_sched_notrace 1101 rcu_dereference_sched 1102 rcu_dereference_sched_check 1103 rcu_dereference_sched_protected 1104 rcu_read_lock_sched_held 1105 1106 1107RCU-Tasks:: 1108 1109 Critical sections Grace period Barrier 1110 1111 N/A call_rcu_tasks rcu_barrier_tasks 1112 synchronize_rcu_tasks 1113 1114 1115RCU-Tasks-Rude:: 1116 1117 Critical sections Grace period Barrier 1118 1119 N/A N/A 1120 synchronize_rcu_tasks_rude 1121 1122 1123RCU-Tasks-Trace:: 1124 1125 Critical sections Grace period Barrier 1126 1127 rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace 1128 rcu_read_unlock_trace synchronize_rcu_tasks_trace 1129 1130 1131SRCU:: 1132 1133 Critical sections Grace period Barrier 1134 1135 srcu_read_lock call_srcu srcu_barrier 1136 srcu_read_unlock synchronize_srcu 1137 srcu_dereference synchronize_srcu_expedited 1138 srcu_dereference_check 1139 srcu_read_lock_held 1140 1141SRCU: Initialization/cleanup:: 1142 1143 DEFINE_SRCU 1144 DEFINE_STATIC_SRCU 1145 init_srcu_struct 1146 cleanup_srcu_struct 1147 1148All: lockdep-checked RCU utility APIs:: 1149 1150 RCU_LOCKDEP_WARN 1151 rcu_sleep_check 1152 1153All: Unchecked RCU-protected pointer access:: 1154 1155 rcu_dereference_raw 1156 1157All: Unchecked RCU-protected pointer access with dereferencing prohibited:: 1158 1159 rcu_access_pointer 1160 1161See the comment headers in the source code (or the docbook generated 1162from them) for more information. 1163 1164However, given that there are no fewer than four families of RCU APIs 1165in the Linux kernel, how do you choose which one to use? The following 1166list can be helpful: 1167 1168a. Will readers need to block? If so, you need SRCU. 1169 1170b. Will readers need to block and are you doing tracing, for 1171 example, ftrace or BPF? If so, you need RCU-tasks, 1172 RCU-tasks-rude, and/or RCU-tasks-trace. 1173 1174c. What about the -rt patchset? If readers would need to block in 1175 an non-rt kernel, you need SRCU. If readers would block when 1176 acquiring spinlocks in a -rt kernel, but not in a non-rt kernel, 1177 SRCU is not necessary. (The -rt patchset turns spinlocks into 1178 sleeplocks, hence this distinction.) 1179 1180d. Do you need to treat NMI handlers, hardirq handlers, 1181 and code segments with preemption disabled (whether 1182 via preempt_disable(), local_irq_save(), local_bh_disable(), 1183 or some other mechanism) as if they were explicit RCU readers? 1184 If so, RCU-sched readers are the only choice that will work 1185 for you, but since about v4.20 you use can use the vanilla RCU 1186 update primitives. 1187 1188e. Do you need RCU grace periods to complete even in the face of 1189 softirq monopolization of one or more of the CPUs? For example, 1190 is your code subject to network-based denial-of-service attacks? 1191 If so, you should disable softirq across your readers, for 1192 example, by using rcu_read_lock_bh(). Since about v4.20 you 1193 use can use the vanilla RCU update primitives. 1194 1195f. Is your workload too update-intensive for normal use of 1196 RCU, but inappropriate for other synchronization mechanisms? 1197 If so, consider SLAB_TYPESAFE_BY_RCU (which was originally 1198 named SLAB_DESTROY_BY_RCU). But please be careful! 1199 1200g. Do you need read-side critical sections that are respected even 1201 on CPUs that are deep in the idle loop, during entry to or exit 1202 from user-mode execution, or on an offlined CPU? If so, SRCU 1203 and RCU Tasks Trace are the only choices that will work for you, 1204 with SRCU being strongly preferred in almost all cases. 1205 1206h. Otherwise, use RCU. 1207 1208Of course, this all assumes that you have determined that RCU is in fact 1209the right tool for your job. 1210 1211.. _9_whatisRCU: 1212 12139. ANSWERS TO QUICK QUIZZES 1214---------------------------- 1215 1216Quick Quiz #1: 1217 Why is this argument naive? How could a deadlock 1218 occur when using this algorithm in a real-world Linux 1219 kernel? [Referring to the lock-based "toy" RCU 1220 algorithm.] 1221 1222Answer: 1223 Consider the following sequence of events: 1224 1225 1. CPU 0 acquires some unrelated lock, call it 1226 "problematic_lock", disabling irq via 1227 spin_lock_irqsave(). 1228 1229 2. CPU 1 enters synchronize_rcu(), write-acquiring 1230 rcu_gp_mutex. 1231 1232 3. CPU 0 enters rcu_read_lock(), but must wait 1233 because CPU 1 holds rcu_gp_mutex. 1234 1235 4. CPU 1 is interrupted, and the irq handler 1236 attempts to acquire problematic_lock. 1237 1238 The system is now deadlocked. 1239 1240 One way to avoid this deadlock is to use an approach like 1241 that of CONFIG_PREEMPT_RT, where all normal spinlocks 1242 become blocking locks, and all irq handlers execute in 1243 the context of special tasks. In this case, in step 4 1244 above, the irq handler would block, allowing CPU 1 to 1245 release rcu_gp_mutex, avoiding the deadlock. 1246 1247 Even in the absence of deadlock, this RCU implementation 1248 allows latency to "bleed" from readers to other 1249 readers through synchronize_rcu(). To see this, 1250 consider task A in an RCU read-side critical section 1251 (thus read-holding rcu_gp_mutex), task B blocked 1252 attempting to write-acquire rcu_gp_mutex, and 1253 task C blocked in rcu_read_lock() attempting to 1254 read_acquire rcu_gp_mutex. Task A's RCU read-side 1255 latency is holding up task C, albeit indirectly via 1256 task B. 1257 1258 Realtime RCU implementations therefore use a counter-based 1259 approach where tasks in RCU read-side critical sections 1260 cannot be blocked by tasks executing synchronize_rcu(). 1261 1262:ref:`Back to Quick Quiz #1 <quiz_1>` 1263 1264Quick Quiz #2: 1265 Give an example where Classic RCU's read-side 1266 overhead is **negative**. 1267 1268Answer: 1269 Imagine a single-CPU system with a non-CONFIG_PREEMPTION 1270 kernel where a routing table is used by process-context 1271 code, but can be updated by irq-context code (for example, 1272 by an "ICMP REDIRECT" packet). The usual way of handling 1273 this would be to have the process-context code disable 1274 interrupts while searching the routing table. Use of 1275 RCU allows such interrupt-disabling to be dispensed with. 1276 Thus, without RCU, you pay the cost of disabling interrupts, 1277 and with RCU you don't. 1278 1279 One can argue that the overhead of RCU in this 1280 case is negative with respect to the single-CPU 1281 interrupt-disabling approach. Others might argue that 1282 the overhead of RCU is merely zero, and that replacing 1283 the positive overhead of the interrupt-disabling scheme 1284 with the zero-overhead RCU scheme does not constitute 1285 negative overhead. 1286 1287 In real life, of course, things are more complex. But 1288 even the theoretical possibility of negative overhead for 1289 a synchronization primitive is a bit unexpected. ;-) 1290 1291:ref:`Back to Quick Quiz #2 <quiz_2>` 1292 1293Quick Quiz #3: 1294 If it is illegal to block in an RCU read-side 1295 critical section, what the heck do you do in 1296 CONFIG_PREEMPT_RT, where normal spinlocks can block??? 1297 1298Answer: 1299 Just as CONFIG_PREEMPT_RT permits preemption of spinlock 1300 critical sections, it permits preemption of RCU 1301 read-side critical sections. It also permits 1302 spinlocks blocking while in RCU read-side critical 1303 sections. 1304 1305 Why the apparent inconsistency? Because it is 1306 possible to use priority boosting to keep the RCU 1307 grace periods short if need be (for example, if running 1308 short of memory). In contrast, if blocking waiting 1309 for (say) network reception, there is no way to know 1310 what should be boosted. Especially given that the 1311 process we need to boost might well be a human being 1312 who just went out for a pizza or something. And although 1313 a computer-operated cattle prod might arouse serious 1314 interest, it might also provoke serious objections. 1315 Besides, how does the computer know what pizza parlor 1316 the human being went to??? 1317 1318:ref:`Back to Quick Quiz #3 <quiz_3>` 1319 1320ACKNOWLEDGEMENTS 1321 1322My thanks to the people who helped make this human-readable, including 1323Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern. 1324 1325 1326For more information, see http://www.rdrop.com/users/paulmck/RCU. 1327