Lines Matching +full:sense +full:- +full:bitfield +full:- +full:width

19 documentation at tools/memory-model/.  Nevertheless, even this memory
37 Note also that it is possible that a barrier may be a no-op for an
48 - Device operations.
49 - Guarantees.
53 - Varieties of memory barrier.
54 - What may not be assumed about memory barriers?
55 - Data dependency barriers (historical).
56 - Control dependencies.
57 - SMP barrier pairing.
58 - Examples of memory barrier sequences.
59 - Read memory barriers vs load speculation.
60 - Multicopy atomicity.
64 - Compiler barrier.
65 - CPU memory barriers.
69 - Lock acquisition functions.
70 - Interrupt disabling functions.
71 - Sleep and wake-up functions.
72 - Miscellaneous functions.
74 (*) Inter-CPU acquiring barrier effects.
76 - Acquires vs memory accesses.
80 - Interprocessor interaction.
81 - Atomic operations.
82 - Accessing devices.
83 - Interrupts.
91 - Cache coherency.
92 - Cache coherency vs DMA.
93 - Cache coherency vs MMIO.
97 - And then there's the Alpha.
98 - Virtual Machine Guests.
102 - Circular buffers.
116 +-------+ : +--------+ : +-------+
119 | CPU 1 |<----->| Memory |<----->| CPU 2 |
122 +-------+ : +--------+ : +-------+
127 | : +--------+ : |
130 +---------->| Device |<----------+
133 : +--------+ :
159 STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4
160 STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3
161 STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4
162 STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4
163 STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3
164 STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4
165 STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4
203 -----------------
225 ----------
239 emits a memory-barrier instruction, so that a DEC Alpha CPU will
310 And there are anti-guarantees:
313 generate code to modify these using non-atomic read-modify-write
318 in a given bitfield must be protected by one lock. If two fields
319 in a given bitfield are protected by different locks, the compiler's
320 non-atomic read-modify-write sequences can cause an update to one
327 "char", two-byte alignment for "short", four-byte alignment for
328 "int", and either four-byte or eight-byte alignment for "long",
329 on 32-bit and 64-bit systems, respectively. Note that these
331 using older pre-C11 compilers (for example, gcc 4.6). The portion
337 of adjacent bit-fields all having nonzero width
343 NOTE 2: A bit-field and an adjacent non-bit-field member
345 to two bit-fields, if one is declared inside a nested
347 are separated by a zero-length bit-field declaration,
348 or if they are separated by a non-bit-field member
350 bit-fields in the same structure if all members declared
351 between them are also bit-fields, no matter what the
352 sizes of those intervening bit-fields happen to be.
360 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
376 ---------------------------
468 This acts as a one-way permeable barrier. It guarantees that all memory
483 This also acts as a one-way permeable barrier. It guarantees that all
494 -not- guaranteed to act as a full memory barrier. However, after an
505 RELEASE variants in addition to fully-ordered and relaxed (no barrier
522 ----------------------------------------------
541 (*) There is no guarantee that some intervening piece of off-the-CPU
548 Documentation/driver-api/pci/pci.rst
549 Documentation/core-api/dma-api-howto.rst
550 Documentation/core-api/dma-api.rst
554 -------------------------------------
558 to this section are those working on DEC Alpha architecture-specific code
561 data-dependency barriers.
610 even-numbered cache lines and the other bank processes odd-numbered cache
611 lines. The pointer P might be stored in an odd-numbered cache line, and the
612 variable B might be stored in an even-numbered cache line. Then, if the
613 even-numbered bank of the reading CPU's cache is extremely busy while the
614 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
618 A data-dependency barrier is not required to order dependent writes
635 Therefore, no data-dependency barrier is required to order the read into
637 even without a data-dependency barrier:
642 of dependency ordering is to -prevent- writes to the data structure, along
663 --------------------
669 A load-load control dependency requires a full read memory barrier, not
680 dependency, but rather a control dependency that the CPU may short-circuit
691 However, stores are not speculated. This means that ordering -is- provided
692 for load-store control dependencies, as in the following example:
707 variable 'a' is always non-zero, it would be well within its rights
737 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
740 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
760 In contrast, without explicit memory barriers, two-legged-if control
817 You must also be careful not to rely too much on boolean short-circuit
832 out-guess your code. More generally, although READ_ONCE() does force
836 In addition, control dependencies apply only to the then-clause and
837 else-clause of the if-statement in question. In particular, it does
838 not necessarily apply to code following the if-statement:
852 conditional-move instructions, as in this fanciful pseudo-assembly
865 In short, control dependencies apply only to the stores in the then-clause
866 and else-clause of the if-statement in question (including functions
867 invoked by those two clauses), not to code following that if-statement.
878 However, they do -not- guarantee any other sort of ordering:
887 to carry out the stores. Please note that it is -not- sufficient
893 (*) Control dependencies require at least one run-time conditional
905 (*) Control dependencies apply only to the then-clause and else-clause
906 of the if-statement containing the control dependency, including
908 do -not- apply to code following the if-statement containing the
913 (*) Control dependencies do -not- provide multicopy atomicity. If you
921 -------------------
923 When dealing with CPU-CPU interactions, certain types of memory barrier should
976 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
980 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
984 ------------------------------------
1003 +-------+ : :
1004 | | +------+
1005 | |------>| C=3 | } /\
1006 | | : +------+ }----- \ -----> Events perceptible to
1008 | | : +------+ }
1010 | | +------+ }
1011 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
1012 | | +------+ } requires all stores prior to the
1014 | | : +------+ } further stores may take place
1015 | |------>| D=4 | }
1016 | | +------+
1017 +-------+ : :
1024 Secondly, data dependency barriers act as partial orderings on data-dependent
1040 +-------+ : : : :
1041 | | +------+ +-------+ | Sequence of update
1042 | |------>| B=2 |----- --->| Y->8 | | of perception on
1043 | | : +------+ \ +-------+ | CPU 2
1044 | CPU 1 | : | A=1 | \ --->| C->&Y | V
1045 | | +------+ | +-------+
1047 | | +------+ | : :
1048 | | : | C=&B |--- | : : +-------+
1049 | | : +------+ \ | +-------+ | |
1050 | |------>| D=4 | ----------->| C->&B |------>| |
1051 | | +------+ | +-------+ | |
1052 +-------+ : : | : : | |
1055 | +-------+ | |
1056 Apparently incorrect ---> | | B->7 |------>| |
1057 perception of B (!) | +-------+ | |
1059 | +-------+ | |
1060 The load of X holds ---> \ | X->9 |------>| |
1061 up the maintenance \ +-------+ | |
1062 of coherence of B ----->| B->2 | +-------+
1063 +-------+
1086 +-------+ : : : :
1087 | | +------+ +-------+
1088 | |------>| B=2 |----- --->| Y->8 |
1089 | | : +------+ \ +-------+
1090 | CPU 1 | : | A=1 | \ --->| C->&Y |
1091 | | +------+ | +-------+
1093 | | +------+ | : :
1094 | | : | C=&B |--- | : : +-------+
1095 | | : +------+ \ | +-------+ | |
1096 | |------>| D=4 | ----------->| C->&B |------>| |
1097 | | +------+ | +-------+ | |
1098 +-------+ : : | : : | |
1101 | +-------+ | |
1102 | | X->9 |------>| |
1103 | +-------+ | |
1104 Makes sure all effects ---> \ ddddddddddddddddd | |
1105 prior to the store of C \ +-------+ | |
1106 are perceptible to ----->| B->2 |------>| |
1107 subsequent loads +-------+ | |
1108 : : +-------+
1126 +-------+ : : : :
1127 | | +------+ +-------+
1128 | |------>| A=1 |------ --->| A->0 |
1129 | | +------+ \ +-------+
1130 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1131 | | +------+ | +-------+
1132 | |------>| B=2 |--- | : :
1133 | | +------+ \ | : : +-------+
1134 +-------+ : : \ | +-------+ | |
1135 ---------->| B->2 |------>| |
1136 | +-------+ | CPU 2 |
1137 | | A->0 |------>| |
1138 | +-------+ | |
1139 | : : +-------+
1141 \ +-------+
1142 ---->| A->1 |
1143 +-------+
1163 +-------+ : : : :
1164 | | +------+ +-------+
1165 | |------>| A=1 |------ --->| A->0 |
1166 | | +------+ \ +-------+
1167 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1168 | | +------+ | +-------+
1169 | |------>| B=2 |--- | : :
1170 | | +------+ \ | : : +-------+
1171 +-------+ : : \ | +-------+ | |
1172 ---------->| B->2 |------>| |
1173 | +-------+ | CPU 2 |
1176 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1177 barrier causes all effects \ +-------+ | |
1178 prior to the storage of B ---->| A->1 |------>| |
1179 to be perceptible to CPU 2 +-------+ | |
1180 : : +-------+
1200 +-------+ : : : :
1201 | | +------+ +-------+
1202 | |------>| A=1 |------ --->| A->0 |
1203 | | +------+ \ +-------+
1204 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1205 | | +------+ | +-------+
1206 | |------>| B=2 |--- | : :
1207 | | +------+ \ | : : +-------+
1208 +-------+ : : \ | +-------+ | |
1209 ---------->| B->2 |------>| |
1210 | +-------+ | CPU 2 |
1213 | +-------+ | |
1214 | | A->0 |------>| 1st |
1215 | +-------+ | |
1216 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1217 barrier causes all effects \ +-------+ | |
1218 prior to the storage of B ---->| A->1 |------>| 2nd |
1219 to be perceptible to CPU 2 +-------+ | |
1220 : : +-------+
1226 +-------+ : : : :
1227 | | +------+ +-------+
1228 | |------>| A=1 |------ --->| A->0 |
1229 | | +------+ \ +-------+
1230 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1231 | | +------+ | +-------+
1232 | |------>| B=2 |--- | : :
1233 | | +------+ \ | : : +-------+
1234 +-------+ : : \ | +-------+ | |
1235 ---------->| B->2 |------>| |
1236 | +-------+ | CPU 2 |
1239 \ +-------+ | |
1240 ---->| A->1 |------>| 1st |
1241 +-------+ | |
1243 +-------+ | |
1244 | A->1 |------>| 2nd |
1245 +-------+ | |
1246 : : +-------+
1255 ----------------------------------------
1259 other loads, and so do the load in advance - even though they haven't actually
1264 It may turn out that the CPU didn't actually need the value - perhaps because a
1265 branch circumvented the load - in which case it can discard the value or just
1279 : : +-------+
1280 +-------+ | |
1281 --->| B->2 |------>| |
1282 +-------+ | CPU 2 |
1284 +-------+ | |
1285 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1286 division speculates on the +-------+ ~ | |
1290 Once the divisions are complete --> : : ~-->| |
1292 LOAD with immediate effect : : +-------+
1310 : : +-------+
1311 +-------+ | |
1312 --->| B->2 |------>| |
1313 +-------+ | CPU 2 |
1315 +-------+ | |
1316 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1317 division speculates on the +-------+ ~ | |
1324 : : ~-->| |
1326 : : +-------+
1332 : : +-------+
1333 +-------+ | |
1334 --->| B->2 |------>| |
1335 +-------+ | CPU 2 |
1337 +-------+ | |
1338 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1339 division speculates on the +-------+ ~ | |
1345 +-------+ | |
1346 The speculation is discarded ---> --->| A->1 |------>| |
1347 and an updated value is +-------+ | |
1348 retrieved : : +-------+
1352 --------------------
1361 time to all -other- CPUs. The remainder of this document discusses this
1380 Because CPU 3's load from X in some sense comes after CPU 2's load, it
1385 multicopy-atomic systems, CPU B's load must return either the same value
1395 able to compensate for non-multicopy atomicity. For example, suppose
1406 This substitution allows non-multicopy atomicity to run rampant: in
1412 example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1417 General barriers can compensate not only for non-multicopy atomicity,
1418 but can also generate additional ordering that can ensure that -all-
1419 CPUs will perceive the same order of -all- operations. In contrast, a
1420 chain of release-acquire pairs do not provide this additional ordering,
1461 Furthermore, because of the release-acquire relationship between cpu0()
1467 However, the ordering provided by a release-acquire chain is local
1478 writes in order, CPUs not involved in the release-acquire chain might
1480 the weak memory-barrier instructions used to implement smp_load_acquire()
1483 store to u as happening -after- cpu1()'s load from v, even though
1489 -not- ensure that any particular value will be read. Therefore, the
1514 ----------------
1521 This is a general barrier -- there are no read-read or write-write
1531 interrupt-handler code and the code that was interrupted.
1537 optimizations that, while perfectly safe in single-threaded code, can
1565 into the following code, which, although in some sense legitimate
1566 for single-threaded code, is almost certainly not what the developer
1587 single-threaded code, but can be fatal in concurrent code:
1605 single-threaded code, so you need to tell the compiler about cases
1619 This transformation is a win for single-threaded code because it
1638 the code into near-nonexistence. (It will still load from the
1666 between process-level code and an interrupt handler:
1682 win for single-threaded code:
1743 In single-threaded code, this is not only safe, but also saves
1745 could cause some other CPU to see a spurious value of 42 -- even
1746 if variable 'a' was never zero -- when loading variable 'b'.
1755 damaging, but they can result in cache-line bouncing and thus in
1760 with a single memory-reference instruction, prevents "load tearing"
1763 16-bit store instructions with 7-bit immediate fields, the compiler
1764 might be tempted to use two 16-bit store-immediate instructions to
1765 implement the following 32-bit store:
1772 This optimization can therefore be a win in single-threaded code.
1796 implement these three assignment statements as a pair of 32-bit
1797 loads followed by a pair of 32-bit stores. This would result in
1817 -------------------
1843 systems because it is assumed that a CPU will appear to be self-consistent,
1854 windows. These barriers are required even on non-SMP systems as they affect
1885 obj->dead = 1;
1887 atomic_dec(&obj->ref_count);
1907 if (desc->status != DEVICE_OWN) {
1912 read_data = desc->data;
1913 desc->data = write_data;
1919 desc->status = DEVICE_OWN;
1935 relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for
1944 For example, after a non-temporal write to pmem region, we use pmem_wmb()
1966 --------------------------
2013 one-way barriers is that the effects of instructions outside of a critical
2034 RELEASE may -not- be assumed to be a full memory barrier.
2059 -could- occur.
2074 a sleep-unlock race, but the locking primitive needs to resolve
2079 anything at all - especially with respect to I/O accesses - unless combined
2082 See also the section on "Inter-CPU acquiring barrier effects".
2112 -----------------------------
2120 SLEEP AND WAKE-UP FUNCTIONS
2121 ---------------------------
2146 STORE current->state
2189 STORE current->state ...
2191 LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
2192 STORE task->state
2237 order multiple stores before the wake-up with respect to loads of those stored
2273 -----------------------
2281 INTER-CPU ACQUIRING BARRIER EFFECTS
2290 ---------------------------
2323 be a problem as a single-threaded linear piece of code will still appear to
2337 --------------------------
2377 LOAD waiter->list.next;
2378 LOAD waiter->task;
2379 STORE waiter->task;
2401 LOAD waiter->task;
2402 STORE waiter->task;
2410 LOAD waiter->list.next;
2411 --- OOPS ---
2418 LOAD waiter->list.next;
2419 LOAD waiter->task;
2421 STORE waiter->task;
2431 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2438 -----------------
2449 -----------------
2458 efficient to reorder, combine or merge accesses - something that would cause
2462 routines - such as inb() or writel() - which know how to make such accesses
2468 See Documentation/driver-api/device-io.rst for more information.
2472 ----------
2478 This may be alleviated - at least in part - by disabling local interrupts (a
2480 the interrupt-disabled section in the driver. While the driver's interrupt
2487 under interrupt-disablement and then the driver's interrupt handler is invoked:
2506 accesses performed in an interrupt - and vice versa - unless implicit or
2516 likely, then interrupt-disabling locks should be used to guarantee ordering.
2524 specific. Therefore, drivers which are inherently non-portable may rely on
2576 The ordering properties of __iomem pointers obtained with non-default
2586 bullets 2-5 above) but they are still guaranteed to be ordered with
2594 register-based, memory-mapped FIFOs residing on peripherals that are not
2600 The inX() and outX() accessors are intended to access legacy port-mapped
2611 Device drivers may expect outX() to emit a non-posted write transaction
2629 little-endian and will therefore perform byte-swapping operations on big-endian
2637 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2641 of arch-specific code.
2644 stream in any order it feels like - or even in parallel - provided that if an
2650 [*] Some instructions have more than one effect - such as changing the
2651 condition codes, changing registers or changing memory - and different
2677 <--- CPU ---> : <----------- Memory ----------->
2679 +--------+ +--------+ : +--------+ +-----------+
2680 | | | | : | | | | +--------+
2682 | Core |--->| Access |----->| Cache |<-->| | | |
2683 | | | Queue | : | | | |--->| Memory |
2685 +--------+ +--------+ : +--------+ | | | |
2686 : | Cache | +--------+
2688 : | Mechanism | +--------+
2689 +--------+ +--------+ : +--------+ | | | |
2691 | CPU | | Memory | : | CPU | | |--->| Device |
2692 | Core |--->| Access |----->| Cache |<-->| | | |
2694 | | | | : | | | | +--------+
2695 +--------+ +--------+ : +--------+ +-----------+
2726 ----------------------
2743 See Documentation/core-api/cachetlb.rst for more information on cache management.
2747 -----------------------
2803 (*) the CPU's data cache may affect the ordering, and while cache-coherency
2804 mechanisms may alleviate this - once the store has actually hit the cache
2805 - there's no guarantee that the coherency management will be propagated in
2816 However, it is guaranteed that a CPU will be self-consistent: it will see its
2843 are -not- optional in the above example, as there are architectures
2878 --------------------------
2882 two semantically-related cache lines updated at separate times. This is where
2893 ----------------------
2898 barriers for this use-case would be possible but is often suboptimal.
2900 To handle this case optimally, low-level virt_mb() etc macros are available.
2902 identical code for SMP and non-SMP systems. For example, virtual machine guests
2916 ----------------
2921 Documentation/core-api/circular-buffers.rst
2938 Chapter 7.1: Memory-Access Ordering
2941 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
2944 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
2959 Chapter 15: Sparc-V9 Memory Models
2975 Solaris Internals, Core Kernel Architecture, p63-68: