Lines Matching +full:sense +full:- +full:bitfield +full:- +full:width
19 documentation at tools/memory-model/. Nevertheless, even this memory
37 Note also that it is possible that a barrier may be a no-op for an
48 - Device operations.
49 - Guarantees.
53 - Varieties of memory barrier.
54 - What may not be assumed about memory barriers?
55 - Address-dependency barriers (historical).
56 - Control dependencies.
57 - SMP barrier pairing.
58 - Examples of memory barrier sequences.
59 - Read memory barriers vs load speculation.
60 - Multicopy atomicity.
64 - Compiler barrier.
65 - CPU memory barriers.
69 - Lock acquisition functions.
70 - Interrupt disabling functions.
71 - Sleep and wake-up functions.
72 - Miscellaneous functions.
74 (*) Inter-CPU acquiring barrier effects.
76 - Acquires vs memory accesses.
80 - Interprocessor interaction.
81 - Atomic operations.
82 - Accessing devices.
83 - Interrupts.
91 - Cache coherency.
92 - Cache coherency vs DMA.
93 - Cache coherency vs MMIO.
97 - And then there's the Alpha.
98 - Virtual Machine Guests.
102 - Circular buffers.
116 +-------+ : +--------+ : +-------+
119 | CPU 1 |<----->| Memory |<----->| CPU 2 |
122 +-------+ : +--------+ : +-------+
127 | : +--------+ : |
130 +---------->| Device |<----------+
133 : +--------+ :
159 STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4
160 STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3
161 STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4
162 STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4
163 STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3
164 STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4
165 STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4
203 -----------------
225 ----------
239 emits a memory-barrier instruction, so that a DEC Alpha CPU will
310 And there are anti-guarantees:
313 generate code to modify these using non-atomic read-modify-write
318 in a given bitfield must be protected by one lock. If two fields
319 in a given bitfield are protected by different locks, the compiler's
320 non-atomic read-modify-write sequences can cause an update to one
327 "char", two-byte alignment for "short", four-byte alignment for
328 "int", and either four-byte or eight-byte alignment for "long",
329 on 32-bit and 64-bit systems, respectively. Note that these
331 using older pre-C11 compilers (for example, gcc 4.6). The portion
337 of adjacent bit-fields all having nonzero width
343 NOTE 2: A bit-field and an adjacent non-bit-field member
345 to two bit-fields, if one is declared inside a nested
347 are separated by a zero-length bit-field declaration,
348 or if they are separated by a non-bit-field member
350 bit-fields in the same structure if all members declared
351 between them are also bit-fields, no matter what the
352 sizes of those intervening bit-fields happen to be.
360 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
376 ---------------------------
395 address-dependency barriers; see the "SMP barrier pairing" subsection.
398 (2) Address-dependency barriers (historical).
399 [!] This section is marked as HISTORICAL: it covers the long-obsolete
401 implicit in all marked accesses. For more up-to-date information,
405 An address-dependency barrier is a weaker form of read barrier. In the
408 the second load will be directed), an address-dependency barrier would
412 An address-dependency barrier is a partial ordering on interdependent
418 considered can then perceive. An address-dependency barrier issued by
423 the address-dependency barrier.
435 [!] Note that address-dependency barriers should normally be paired with
438 [!] Kernel release v5.9 removed kernel APIs for explicit address-
441 address-dependency barriers.
445 A read barrier is an address-dependency barrier plus a guarantee that all
453 Read memory barriers imply address-dependency barriers, and so can
477 This acts as a one-way permeable barrier. It guarantees that all memory
492 This also acts as a one-way permeable barrier. It guarantees that all
503 -not- guaranteed to act as a full memory barrier. However, after an
514 RELEASE variants in addition to fully-ordered and relaxed (no barrier
531 ----------------------------------------------
550 (*) There is no guarantee that some intervening piece of off-the-CPU
557 Documentation/driver-api/pci/pci.rst
558 Documentation/core-api/dma-api-howto.rst
559 Documentation/core-api/dma-api.rst
562 ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
563 ----------------------------------------
564 [!] This section is marked as HISTORICAL: it covers the long-obsolete
566 in all marked accesses. For more up-to-date information, including
572 to this section are those working on DEC Alpha architecture-specific code
575 address-dependency barriers.
577 [!] While address dependencies are observed in both load-to-load and
578 load-to-store relations, address-dependency barriers are not necessary
579 for load-to-store situations.
581 The requirement of address-dependency barriers is a little subtle, and
594 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
595 doesn't imply an address-dependency barrier.
612 To deal with this, READ_ONCE() provides an implicit address-dependency barrier
622 <implicit address-dependency barrier>
631 even-numbered cache lines and the other bank processes odd-numbered cache
632 lines. The pointer P might be stored in an odd-numbered cache line, and the
633 variable B might be stored in an even-numbered cache line. Then, if the
634 even-numbered bank of the reading CPU's cache is extremely busy while the
635 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
639 An address-dependency barrier is not required to order dependent writes
656 Therefore, no address-dependency barrier is required to order the read into
658 even without an implicit address-dependency barrier of modern READ_ONCE():
663 of dependency ordering is to -prevent- writes to the data structure, along
674 The address-dependency barrier is very important to the RCU system,
684 --------------------
690 A load-load control dependency requires a full read memory barrier, not
691 simply an (implicit) address-dependency barrier to make it work correctly.
695 <implicit address-dependency barrier>
702 dependency, but rather a control dependency that the CPU may short-circuit
713 However, stores are not speculated. This means that ordering -is- provided
714 for load-store control dependencies, as in the following example:
729 variable 'a' is always non-zero, it would be well within its rights
759 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
762 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
782 In contrast, without explicit memory barriers, two-legged-if control
839 You must also be careful not to rely too much on boolean short-circuit
854 out-guess your code. More generally, although READ_ONCE() does force
858 In addition, control dependencies apply only to the then-clause and
859 else-clause of the if-statement in question. In particular, it does
860 not necessarily apply to code following the if-statement:
874 conditional-move instructions, as in this fanciful pseudo-assembly
887 In short, control dependencies apply only to the stores in the then-clause
888 and else-clause of the if-statement in question (including functions
889 invoked by those two clauses), not to code following that if-statement.
900 However, they do -not- guarantee any other sort of ordering:
909 to carry out the stores. Please note that it is -not- sufficient
915 (*) Control dependencies require at least one run-time conditional
927 (*) Control dependencies apply only to the then-clause and else-clause
928 of the if-statement containing the control dependency, including
930 do -not- apply to code following the if-statement containing the
935 (*) Control dependencies do -not- provide multicopy atomicity. If you
943 -------------------
945 When dealing with CPU-CPU interactions, certain types of memory barrier should
952 with an address-dependency barrier, a control dependency, an acquire barrier,
954 read barrier, control dependency, or an address-dependency barrier pairs
973 <implicit address-dependency barrier>
993 match the loads after the read barrier or the address-dependency barrier, and
998 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
1002 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
1006 ------------------------------------
1025 +-------+ : :
1026 | | +------+
1027 | |------>| C=3 | } /\
1028 | | : +------+ }----- \ -----> Events perceptible to
1030 | | : +------+ }
1032 | | +------+ }
1033 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
1034 | | +------+ } requires all stores prior to the
1036 | | : +------+ } further stores may take place
1037 | |------>| D=4 | }
1038 | | +------+
1039 +-------+ : :
1046 Secondly, address-dependency barriers act as partial orderings on address-
1062 +-------+ : : : :
1063 | | +------+ +-------+ | Sequence of update
1064 | |------>| B=2 |----- --->| Y->8 | | of perception on
1065 | | : +------+ \ +-------+ | CPU 2
1066 | CPU 1 | : | A=1 | \ --->| C->&Y | V
1067 | | +------+ | +-------+
1069 | | +------+ | : :
1070 | | : | C=&B |--- | : : +-------+
1071 | | : +------+ \ | +-------+ | |
1072 | |------>| D=4 | ----------->| C->&B |------>| |
1073 | | +------+ | +-------+ | |
1074 +-------+ : : | : : | |
1077 | +-------+ | |
1078 Apparently incorrect ---> | | B->7 |------>| |
1079 perception of B (!) | +-------+ | |
1081 | +-------+ | |
1082 The load of X holds ---> \ | X->9 |------>| |
1083 up the maintenance \ +-------+ | |
1084 of coherence of B ----->| B->2 | +-------+
1085 +-------+
1092 If, however, an address-dependency barrier were to be placed between the load
1103 <address-dependency barrier>
1108 +-------+ : : : :
1109 | | +------+ +-------+
1110 | |------>| B=2 |----- --->| Y->8 |
1111 | | : +------+ \ +-------+
1112 | CPU 1 | : | A=1 | \ --->| C->&Y |
1113 | | +------+ | +-------+
1115 | | +------+ | : :
1116 | | : | C=&B |--- | : : +-------+
1117 | | : +------+ \ | +-------+ | |
1118 | |------>| D=4 | ----------->| C->&B |------>| |
1119 | | +------+ | +-------+ | |
1120 +-------+ : : | : : | |
1123 | +-------+ | |
1124 | | X->9 |------>| |
1125 | +-------+ | |
1126 Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
1127 prior to the store of C \ +-------+ | |
1128 are perceptible to ----->| B->2 |------>| |
1129 subsequent loads +-------+ | |
1130 : : +-------+
1148 +-------+ : : : :
1149 | | +------+ +-------+
1150 | |------>| A=1 |------ --->| A->0 |
1151 | | +------+ \ +-------+
1152 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1153 | | +------+ | +-------+
1154 | |------>| B=2 |--- | : :
1155 | | +------+ \ | : : +-------+
1156 +-------+ : : \ | +-------+ | |
1157 ---------->| B->2 |------>| |
1158 | +-------+ | CPU 2 |
1159 | | A->0 |------>| |
1160 | +-------+ | |
1161 | : : +-------+
1163 \ +-------+
1164 ---->| A->1 |
1165 +-------+
1185 +-------+ : : : :
1186 | | +------+ +-------+
1187 | |------>| A=1 |------ --->| A->0 |
1188 | | +------+ \ +-------+
1189 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1190 | | +------+ | +-------+
1191 | |------>| B=2 |--- | : :
1192 | | +------+ \ | : : +-------+
1193 +-------+ : : \ | +-------+ | |
1194 ---------->| B->2 |------>| |
1195 | +-------+ | CPU 2 |
1198 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1199 barrier causes all effects \ +-------+ | |
1200 prior to the storage of B ---->| A->1 |------>| |
1201 to be perceptible to CPU 2 +-------+ | |
1202 : : +-------+
1222 +-------+ : : : :
1223 | | +------+ +-------+
1224 | |------>| A=1 |------ --->| A->0 |
1225 | | +------+ \ +-------+
1226 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1227 | | +------+ | +-------+
1228 | |------>| B=2 |--- | : :
1229 | | +------+ \ | : : +-------+
1230 +-------+ : : \ | +-------+ | |
1231 ---------->| B->2 |------>| |
1232 | +-------+ | CPU 2 |
1235 | +-------+ | |
1236 | | A->0 |------>| 1st |
1237 | +-------+ | |
1238 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1239 barrier causes all effects \ +-------+ | |
1240 prior to the storage of B ---->| A->1 |------>| 2nd |
1241 to be perceptible to CPU 2 +-------+ | |
1242 : : +-------+
1248 +-------+ : : : :
1249 | | +------+ +-------+
1250 | |------>| A=1 |------ --->| A->0 |
1251 | | +------+ \ +-------+
1252 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1253 | | +------+ | +-------+
1254 | |------>| B=2 |--- | : :
1255 | | +------+ \ | : : +-------+
1256 +-------+ : : \ | +-------+ | |
1257 ---------->| B->2 |------>| |
1258 | +-------+ | CPU 2 |
1261 \ +-------+ | |
1262 ---->| A->1 |------>| 1st |
1263 +-------+ | |
1265 +-------+ | |
1266 | A->1 |------>| 2nd |
1267 +-------+ | |
1268 : : +-------+
1277 ----------------------------------------
1281 other loads, and so do the load in advance - even though they haven't actually
1286 It may turn out that the CPU didn't actually need the value - perhaps because a
1287 branch circumvented the load - in which case it can discard the value or just
1301 : : +-------+
1302 +-------+ | |
1303 --->| B->2 |------>| |
1304 +-------+ | CPU 2 |
1306 +-------+ | |
1307 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1308 division speculates on the +-------+ ~ | |
1312 Once the divisions are complete --> : : ~-->| |
1314 LOAD with immediate effect : : +-------+
1317 Placing a read barrier or an address-dependency barrier just before the second
1332 : : +-------+
1333 +-------+ | |
1334 --->| B->2 |------>| |
1335 +-------+ | CPU 2 |
1337 +-------+ | |
1338 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1339 division speculates on the +-------+ ~ | |
1346 : : ~-->| |
1348 : : +-------+
1354 : : +-------+
1355 +-------+ | |
1356 --->| B->2 |------>| |
1357 +-------+ | CPU 2 |
1359 +-------+ | |
1360 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1361 division speculates on the +-------+ ~ | |
1367 +-------+ | |
1368 The speculation is discarded ---> --->| A->1 |------>| |
1369 and an updated value is +-------+ | |
1370 retrieved : : +-------+
1374 --------------------
1383 time to all -other- CPUs. The remainder of this document discusses this
1402 Because CPU 3's load from X in some sense comes after CPU 2's load, it
1407 multicopy-atomic systems, CPU B's load must return either the same value
1417 able to compensate for non-multicopy atomicity. For example, suppose
1428 This substitution allows non-multicopy atomicity to run rampant: in
1434 example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1439 General barriers can compensate not only for non-multicopy atomicity,
1440 but can also generate additional ordering that can ensure that -all-
1441 CPUs will perceive the same order of -all- operations. In contrast, a
1442 chain of release-acquire pairs do not provide this additional ordering,
1483 Furthermore, because of the release-acquire relationship between cpu0()
1489 However, the ordering provided by a release-acquire chain is local
1500 writes in order, CPUs not involved in the release-acquire chain might
1502 the weak memory-barrier instructions used to implement smp_load_acquire()
1505 store to u as happening -after- cpu1()'s load from v, even though
1511 -not- ensure that any particular value will be read. Therefore, the
1536 ----------------
1543 This is a general barrier -- there are no read-read or write-write
1553 interrupt-handler code and the code that was interrupted.
1559 optimizations that, while perfectly safe in single-threaded code, can
1587 into the following code, which, although in some sense legitimate
1588 for single-threaded code, is almost certainly not what the developer
1609 single-threaded code, but can be fatal in concurrent code:
1627 single-threaded code, so you need to tell the compiler about cases
1641 This transformation is a win for single-threaded code because it
1660 the code into near-nonexistence. (It will still load from the
1688 between process-level code and an interrupt handler:
1704 win for single-threaded code:
1765 In single-threaded code, this is not only safe, but also saves
1767 could cause some other CPU to see a spurious value of 42 -- even
1768 if variable 'a' was never zero -- when loading variable 'b'.
1777 damaging, but they can result in cache-line bouncing and thus in
1782 with a single memory-reference instruction, prevents "load tearing"
1785 16-bit store instructions with 7-bit immediate fields, the compiler
1786 might be tempted to use two 16-bit store-immediate instructions to
1787 implement the following 32-bit store:
1794 This optimization can therefore be a win in single-threaded code.
1818 implement these three assignment statements as a pair of 32-bit
1819 loads followed by a pair of 32-bit stores. This would result in
1839 -------------------
1851 All memory barriers except the address-dependency barriers imply a compiler
1865 systems because it is assumed that a CPU will appear to be self-consistent,
1876 windows. These barriers are required even on non-SMP systems as they affect
1907 obj->dead = 1;
1909 atomic_dec(&obj->ref_count);
1923 DMA capable device. See Documentation/core-api/dma-api.rst file for more
1931 if (desc->status != DEVICE_OWN) {
1936 read_data = desc->data;
1937 desc->data = write_data;
1943 desc->status = DEVICE_OWN;
1967 For example, after a non-temporal write to pmem region, we use pmem_wmb()
1978 For memory accesses with write-combining attributes (e.g. those returned
1981 write-combining memory accesses before this macro with those after it when
1997 --------------------------
2044 one-way barriers is that the effects of instructions outside of a critical
2065 RELEASE may -not- be assumed to be a full memory barrier.
2090 -could- occur.
2105 a sleep-unlock race, but the locking primitive needs to resolve
2110 anything at all - especially with respect to I/O accesses - unless combined
2113 See also the section on "Inter-CPU acquiring barrier effects".
2143 -----------------------------
2151 SLEEP AND WAKE-UP FUNCTIONS
2152 ---------------------------
2177 STORE current->state
2220 STORE current->state ...
2222 LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
2223 STORE task->state
2268 order multiple stores before the wake-up with respect to loads of those stored
2304 -----------------------
2312 INTER-CPU ACQUIRING BARRIER EFFECTS
2321 ---------------------------
2354 be a problem as a single-threaded linear piece of code will still appear to
2368 --------------------------
2408 LOAD waiter->list.next;
2409 LOAD waiter->task;
2410 STORE waiter->task;
2432 LOAD waiter->task;
2433 STORE waiter->task;
2441 LOAD waiter->list.next;
2442 --- OOPS ---
2449 LOAD waiter->list.next;
2450 LOAD waiter->task;
2452 STORE waiter->task;
2462 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2469 -----------------
2480 -----------------
2489 efficient to reorder, combine or merge accesses - something that would cause
2493 routines - such as inb() or writel() - which know how to make such accesses
2499 See Documentation/driver-api/device-io.rst for more information.
2503 ----------
2509 This may be alleviated - at least in part - by disabling local interrupts (a
2511 the interrupt-disabled section in the driver. While the driver's interrupt
2518 under interrupt-disablement and then the driver's interrupt handler is invoked:
2537 accesses performed in an interrupt - and vice versa - unless implicit or
2547 likely, then interrupt-disabling locks should be used to guarantee ordering.
2555 specific. Therefore, drivers which are inherently non-portable may rely on
2607 The ordering properties of __iomem pointers obtained with non-default
2617 bullets 2-5 above) but they are still guaranteed to be ordered with
2625 register-based, memory-mapped FIFOs residing on peripherals that are not
2631 The inX() and outX() accessors are intended to access legacy port-mapped
2642 Device drivers may expect outX() to emit a non-posted write transaction
2660 little-endian and will therefore perform byte-swapping operations on big-endian
2668 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2672 of arch-specific code.
2675 stream in any order it feels like - or even in parallel - provided that if an
2681 [*] Some instructions have more than one effect - such as changing the
2682 condition codes, changing registers or changing memory - and different
2708 <--- CPU ---> : <----------- Memory ----------->
2710 +--------+ +--------+ : +--------+ +-----------+
2711 | | | | : | | | | +--------+
2713 | Core |--->| Access |----->| Cache |<-->| | | |
2714 | | | Queue | : | | | |--->| Memory |
2716 +--------+ +--------+ : +--------+ | | | |
2717 : | Cache | +--------+
2719 : | Mechanism | +--------+
2720 +--------+ +--------+ : +--------+ | | | |
2722 | CPU | | Memory | : | CPU | | |--->| Device |
2723 | Core |--->| Access |----->| Cache |<-->| | | |
2725 | | | | : | | | | +--------+
2726 +--------+ +--------+ : +--------+ +-----------+
2757 ----------------------
2774 See Documentation/core-api/cachetlb.rst for more information on cache
2779 -----------------------
2835 (*) the CPU's data cache may affect the ordering, and while cache-coherency
2836 mechanisms may alleviate this - once the store has actually hit the cache
2837 - there's no guarantee that the coherency management will be propagated in
2848 However, it is guaranteed that a CPU will be self-consistent: it will see its
2875 are -not- optional in the above example, as there are architectures
2910 --------------------------
2914 two semantically-related cache lines updated at separate times. This is where
2915 the address-dependency barrier really becomes necessary as this synchronises
2925 ----------------------
2930 barriers for this use-case would be possible but is often suboptimal.
2932 To handle this case optimally, low-level virt_mb() etc macros are available.
2934 identical code for SMP and non-SMP systems. For example, virtual machine guests
2948 ----------------
2953 Documentation/core-api/circular-buffers.rst
2970 Chapter 7.1: Memory-Access Ordering
2973 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
2976 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
2991 Chapter 15: Sparc-V9 Memory Models
3007 Solaris Internals, Core Kernel Architecture, p63-68: