1362dbb4fSPeter MaydellLocked Counters (aka ``QemuLockCnt``) 2362dbb4fSPeter Maydell===================================== 351dee5e4SPaolo Bonzini 451dee5e4SPaolo BonziniQEMU often uses reference counts to track data structures that are being 551dee5e4SPaolo Bonziniaccessed and should not be freed. For example, a loop that invoke 6362dbb4fSPeter Maydellcallbacks like this is not safe:: 751dee5e4SPaolo Bonzini 851dee5e4SPaolo Bonzini QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 951dee5e4SPaolo Bonzini if (ioh->revents & G_IO_OUT) { 1051dee5e4SPaolo Bonzini ioh->fd_write(ioh->opaque); 1151dee5e4SPaolo Bonzini } 1251dee5e4SPaolo Bonzini } 1351dee5e4SPaolo Bonzini 14362dbb4fSPeter Maydell``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``) 15362dbb4fSPeter Maydellby stashing away its ``next`` pointer. However, ``ioh->fd_write`` could 1651dee5e4SPaolo Bonziniactually delete the next node from the list. The simplest way to 1751dee5e4SPaolo Bonziniavoid this is to mark the node as deleted, and remove it from the 18362dbb4fSPeter Maydelllist in the above loop:: 1951dee5e4SPaolo Bonzini 2051dee5e4SPaolo Bonzini QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 2151dee5e4SPaolo Bonzini if (ioh->deleted) { 2251dee5e4SPaolo Bonzini QLIST_REMOVE(ioh, next); 2351dee5e4SPaolo Bonzini g_free(ioh); 2451dee5e4SPaolo Bonzini } else { 2551dee5e4SPaolo Bonzini if (ioh->revents & G_IO_OUT) { 2651dee5e4SPaolo Bonzini ioh->fd_write(ioh->opaque); 2751dee5e4SPaolo Bonzini } 2851dee5e4SPaolo Bonzini } 2951dee5e4SPaolo Bonzini } 3051dee5e4SPaolo Bonzini 3151dee5e4SPaolo BonziniIf however this loop must also be reentrant, i.e. it is possible that 32362dbb4fSPeter Maydell``ioh->fd_write`` invokes the loop again, some kind of counting is needed:: 3351dee5e4SPaolo Bonzini 3451dee5e4SPaolo Bonzini walking_handlers++; 3551dee5e4SPaolo Bonzini QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 3651dee5e4SPaolo Bonzini if (ioh->deleted) { 3751dee5e4SPaolo Bonzini if (walking_handlers == 1) { 3851dee5e4SPaolo Bonzini QLIST_REMOVE(ioh, next); 3951dee5e4SPaolo Bonzini g_free(ioh); 4051dee5e4SPaolo Bonzini } 4151dee5e4SPaolo Bonzini } else { 4251dee5e4SPaolo Bonzini if (ioh->revents & G_IO_OUT) { 4351dee5e4SPaolo Bonzini ioh->fd_write(ioh->opaque); 4451dee5e4SPaolo Bonzini } 4551dee5e4SPaolo Bonzini } 4651dee5e4SPaolo Bonzini } 4751dee5e4SPaolo Bonzini walking_handlers--; 4851dee5e4SPaolo Bonzini 49362dbb4fSPeter MaydellOne may think of using the RCU primitives, ``rcu_read_lock()`` and 50362dbb4fSPeter Maydell``rcu_read_unlock()``; effectively, the RCU nesting count would take 5151dee5e4SPaolo Bonzinithe place of the walking_handlers global variable. Indeed, 5251dee5e4SPaolo Bonzinireference counting and RCU have similar purposes, but their usage in 5351dee5e4SPaolo Bonzinigeneral is complementary: 5451dee5e4SPaolo Bonzini 5551dee5e4SPaolo Bonzini- reference counting is fine-grained and limited to a single data 5651dee5e4SPaolo Bonzini structure; RCU delays reclamation of *all* RCU-protected data 5751dee5e4SPaolo Bonzini structures; 5851dee5e4SPaolo Bonzini 5951dee5e4SPaolo Bonzini- reference counting works even in the presence of code that keeps 6051dee5e4SPaolo Bonzini a reference for a long time; RCU critical sections in principle 6151dee5e4SPaolo Bonzini should be kept short; 6251dee5e4SPaolo Bonzini 6351dee5e4SPaolo Bonzini- reference counting is often applied to code that is not thread-safe 6451dee5e4SPaolo Bonzini but is reentrant; in fact, usage of reference counting in QEMU predates 6551dee5e4SPaolo Bonzini the introduction of threads by many years. RCU is generally used to 6651dee5e4SPaolo Bonzini protect readers from other threads freeing memory after concurrent 6751dee5e4SPaolo Bonzini modifications to a data structure. 6851dee5e4SPaolo Bonzini 6951dee5e4SPaolo Bonzini- reclaiming data can be done by a separate thread in the case of RCU; 7051dee5e4SPaolo Bonzini this can improve performance, but also delay reclamation undesirably. 7151dee5e4SPaolo Bonzini With reference counting, reclamation is deterministic. 7251dee5e4SPaolo Bonzini 73362dbb4fSPeter MaydellThis file documents ``QemuLockCnt``, an abstraction for using reference 7451dee5e4SPaolo Bonzinicounting in code that has to be both thread-safe and reentrant. 7551dee5e4SPaolo Bonzini 7651dee5e4SPaolo Bonzini 77362dbb4fSPeter Maydell``QemuLockCnt`` concepts 78362dbb4fSPeter Maydell------------------------ 7951dee5e4SPaolo Bonzini 80362dbb4fSPeter MaydellA ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives 8151dee5e4SPaolo Bonzinito increment and decrement the counter, and to take and release the 8251dee5e4SPaolo Bonzinimutex. The counter notes how many visits to the data structures are 8351dee5e4SPaolo Bonzinitaking place (the visits could be from different threads, or there could 8451dee5e4SPaolo Bonzinibe multiple reentrant visits from the same thread). The basic rules 8551dee5e4SPaolo Bonzinigoverning the counter/mutex pair then are the following: 8651dee5e4SPaolo Bonzini 8751dee5e4SPaolo Bonzini- Data protected by the QemuLockCnt must not be freed unless the 8851dee5e4SPaolo Bonzini counter is zero and the mutex is taken. 8951dee5e4SPaolo Bonzini 9051dee5e4SPaolo Bonzini- A new visit cannot be started while the counter is zero and the 9151dee5e4SPaolo Bonzini mutex is taken. 9251dee5e4SPaolo Bonzini 9351dee5e4SPaolo BonziniMost of the time, the mutex protects all writes to the data structure, 9451dee5e4SPaolo Bonzininot just frees, though there could be cases where this is not necessary. 9551dee5e4SPaolo Bonzini 9651dee5e4SPaolo BonziniReads, instead, can be done without taking the mutex, as long as the 9751dee5e4SPaolo Bonzinireaders and writers use the same macros that are used for RCU, for 98362dbb4fSPeter Maydellexample ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``, 99362dbb4fSPeter Maydelletc. This is because the reads are done outside a lock and a set 100362dbb4fSPeter Maydellor ``QLIST_INSERT_HEAD`` 10151dee5e4SPaolo Bonzinican happen concurrently with the read. The RCU API ensures that the 10251dee5e4SPaolo Bonziniprocessor and the compiler see all required memory barriers. 10351dee5e4SPaolo Bonzini 10451dee5e4SPaolo BonziniThis could be implemented simply by protecting the counter with the 105362dbb4fSPeter Maydellmutex, for example:: 10651dee5e4SPaolo Bonzini 10751dee5e4SPaolo Bonzini // (1) 10851dee5e4SPaolo Bonzini qemu_mutex_lock(&walking_handlers_mutex); 10951dee5e4SPaolo Bonzini walking_handlers++; 11051dee5e4SPaolo Bonzini qemu_mutex_unlock(&walking_handlers_mutex); 11151dee5e4SPaolo Bonzini 11251dee5e4SPaolo Bonzini ... 11351dee5e4SPaolo Bonzini 11451dee5e4SPaolo Bonzini // (2) 11551dee5e4SPaolo Bonzini qemu_mutex_lock(&walking_handlers_mutex); 11651dee5e4SPaolo Bonzini if (--walking_handlers == 0) { 11751dee5e4SPaolo Bonzini QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 11851dee5e4SPaolo Bonzini if (ioh->deleted) { 11951dee5e4SPaolo Bonzini QLIST_REMOVE(ioh, next); 12051dee5e4SPaolo Bonzini g_free(ioh); 12151dee5e4SPaolo Bonzini } 12251dee5e4SPaolo Bonzini } 12351dee5e4SPaolo Bonzini } 12451dee5e4SPaolo Bonzini qemu_mutex_unlock(&walking_handlers_mutex); 12551dee5e4SPaolo Bonzini 12651dee5e4SPaolo BonziniHere, no frees can happen in the code represented by the ellipsis. 12751dee5e4SPaolo BonziniIf another thread is executing critical section (2), that part of 12851dee5e4SPaolo Bonzinithe code cannot be entered, because the thread will not be able 129362dbb4fSPeter Maydellto increment the ``walking_handlers`` variable. And of course 13051dee5e4SPaolo Bonziniduring the visit any other thread will see a nonzero value for 131362dbb4fSPeter Maydell``walking_handlers``, as in the single-threaded code. 13251dee5e4SPaolo Bonzini 13351dee5e4SPaolo BonziniNote that it is possible for multiple concurrent accesses to delay 134362dbb4fSPeter Maydellthe cleanup arbitrarily; in other words, for the ``walking_handlers`` 13551dee5e4SPaolo Bonzinicounter to never become zero. For this reason, this technique is 13651dee5e4SPaolo Bonzinimore easily applicable if concurrent access to the structure is rare. 13751dee5e4SPaolo Bonzini 13851dee5e4SPaolo BonziniHowever, critical sections are easy to forget since you have to do 139362dbb4fSPeter Maydellthem for each modification of the counter. ``QemuLockCnt`` ensures that 14051dee5e4SPaolo Bonziniall modifications of the counter take the lock appropriately, and it 14151dee5e4SPaolo Bonzinican also be more efficient in two ways: 14251dee5e4SPaolo Bonzini 14351dee5e4SPaolo Bonzini- it avoids taking the lock for many operations (for example 14451dee5e4SPaolo Bonzini incrementing the counter while it is non-zero); 14551dee5e4SPaolo Bonzini 146362dbb4fSPeter Maydell- on some platforms, one can implement ``QemuLockCnt`` to hold the lock 147fbcc3e50SPaolo Bonzini and the mutex in a single word, making the fast path no more expensive 14851dee5e4SPaolo Bonzini than simply managing a counter using atomic operations (see 149362dbb4fSPeter Maydell :doc:`atomics`). This can be very helpful if concurrent access to 150fbcc3e50SPaolo Bonzini the data structure is expected to be rare. 15151dee5e4SPaolo Bonzini 15251dee5e4SPaolo Bonzini 15351dee5e4SPaolo BonziniUsing the same mutex for frees and writes can still incur some small 15451dee5e4SPaolo Bonziniinefficiencies; for example, a visit can never start if the counter is 155362dbb4fSPeter Maydellzero and the mutex is taken -- even if the mutex is taken by a write, 15651dee5e4SPaolo Bonziniwhich in principle need not block a visit of the data structure. 15751dee5e4SPaolo BonziniHowever, these are usually not a problem if any of the following 15851dee5e4SPaolo Bonziniassumptions are valid: 15951dee5e4SPaolo Bonzini 16051dee5e4SPaolo Bonzini- concurrent access is possible but rare 16151dee5e4SPaolo Bonzini 16251dee5e4SPaolo Bonzini- writes are rare 16351dee5e4SPaolo Bonzini 16451dee5e4SPaolo Bonzini- writes are frequent, but this kind of write (e.g. appending to a 16551dee5e4SPaolo Bonzini list) has a very small critical section. 16651dee5e4SPaolo Bonzini 167362dbb4fSPeter MaydellFor example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of 16851dee5e4SPaolo Bonzinibottom halves and file descriptor handlers. Modifications to the list 16951dee5e4SPaolo Bonziniof file descriptor handlers are rare. Creation of a new bottom half is 17051dee5e4SPaolo Bonzinifrequent and can happen on a fast path; however: 1) it is almost never 17151dee5e4SPaolo Bonziniconcurrent with a visit to the list of bottom halves; 2) it only has 172362dbb4fSPeter Maydellthree instructions in the critical path, two assignments and a ``smp_wmb()``. 17351dee5e4SPaolo Bonzini 17451dee5e4SPaolo Bonzini 175362dbb4fSPeter Maydell``QemuLockCnt`` API 176362dbb4fSPeter Maydell------------------- 17751dee5e4SPaolo Bonzini 178*0ae50e8eSPeter Maydell.. kernel-doc:: include/qemu/lockcnt.h 17951dee5e4SPaolo Bonzini 18051dee5e4SPaolo Bonzini 181362dbb4fSPeter Maydell``QemuLockCnt`` usage 182362dbb4fSPeter Maydell--------------------- 18351dee5e4SPaolo Bonzini 184362dbb4fSPeter MaydellThis section explains the typical usage patterns for ``QemuLockCnt`` functions. 18551dee5e4SPaolo Bonzini 18651dee5e4SPaolo BonziniSetting a variable to a non-NULL value can be done between 187362dbb4fSPeter Maydell``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``:: 18851dee5e4SPaolo Bonzini 18951dee5e4SPaolo Bonzini qemu_lockcnt_lock(&xyz_lockcnt); 19051dee5e4SPaolo Bonzini if (!xyz) { 19151dee5e4SPaolo Bonzini new_xyz = g_new(XYZ, 1); 19251dee5e4SPaolo Bonzini ... 193d73415a3SStefan Hajnoczi qatomic_rcu_set(&xyz, new_xyz); 19451dee5e4SPaolo Bonzini } 19551dee5e4SPaolo Bonzini qemu_lockcnt_unlock(&xyz_lockcnt); 19651dee5e4SPaolo Bonzini 197362dbb4fSPeter MaydellAccessing the value can be done between ``qemu_lockcnt_inc`` and 198362dbb4fSPeter Maydell``qemu_lockcnt_dec``:: 19951dee5e4SPaolo Bonzini 20051dee5e4SPaolo Bonzini qemu_lockcnt_inc(&xyz_lockcnt); 20151dee5e4SPaolo Bonzini if (xyz) { 202d73415a3SStefan Hajnoczi XYZ *p = qatomic_rcu_read(&xyz); 20351dee5e4SPaolo Bonzini ... 20451dee5e4SPaolo Bonzini /* Accesses can now be done through "p". */ 20551dee5e4SPaolo Bonzini } 20651dee5e4SPaolo Bonzini qemu_lockcnt_dec(&xyz_lockcnt); 20751dee5e4SPaolo Bonzini 208362dbb4fSPeter MaydellFreeing the object can similarly use ``qemu_lockcnt_lock`` and 209362dbb4fSPeter Maydell``qemu_lockcnt_unlock``, but you also need to ensure that the count 210362dbb4fSPeter Maydellis zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc`` 211362dbb4fSPeter Maydelltakes the ``QemuLockCnt``'s lock, the count cannot become non-zero while 212362dbb4fSPeter Maydellthe object is being freed. Freeing an object looks like this:: 21351dee5e4SPaolo Bonzini 21451dee5e4SPaolo Bonzini qemu_lockcnt_lock(&xyz_lockcnt); 21551dee5e4SPaolo Bonzini if (!qemu_lockcnt_count(&xyz_lockcnt)) { 21651dee5e4SPaolo Bonzini g_free(xyz); 21751dee5e4SPaolo Bonzini xyz = NULL; 21851dee5e4SPaolo Bonzini } 21951dee5e4SPaolo Bonzini qemu_lockcnt_unlock(&xyz_lockcnt); 22051dee5e4SPaolo Bonzini 22151dee5e4SPaolo BonziniIf an object has to be freed right after a visit, you can combine 222362dbb4fSPeter Maydellthe decrement, the locking and the check on count as follows:: 22351dee5e4SPaolo Bonzini 22451dee5e4SPaolo Bonzini qemu_lockcnt_inc(&xyz_lockcnt); 22551dee5e4SPaolo Bonzini if (xyz) { 226d73415a3SStefan Hajnoczi XYZ *p = qatomic_rcu_read(&xyz); 22751dee5e4SPaolo Bonzini ... 22851dee5e4SPaolo Bonzini /* Accesses can now be done through "p". */ 22951dee5e4SPaolo Bonzini } 23051dee5e4SPaolo Bonzini if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) { 23151dee5e4SPaolo Bonzini g_free(xyz); 23251dee5e4SPaolo Bonzini xyz = NULL; 23351dee5e4SPaolo Bonzini qemu_lockcnt_unlock(&xyz_lockcnt); 23451dee5e4SPaolo Bonzini } 23551dee5e4SPaolo Bonzini 236362dbb4fSPeter Maydell``QemuLockCnt`` can also be used to access a list as follows:: 23751dee5e4SPaolo Bonzini 23851dee5e4SPaolo Bonzini qemu_lockcnt_inc(&io_handlers_lockcnt); 23951dee5e4SPaolo Bonzini QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) { 24051dee5e4SPaolo Bonzini if (ioh->revents & G_IO_OUT) { 24151dee5e4SPaolo Bonzini ioh->fd_write(ioh->opaque); 24251dee5e4SPaolo Bonzini } 24351dee5e4SPaolo Bonzini } 24451dee5e4SPaolo Bonzini 24551dee5e4SPaolo Bonzini if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) { 24651dee5e4SPaolo Bonzini QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 24751dee5e4SPaolo Bonzini if (ioh->deleted) { 24851dee5e4SPaolo Bonzini QLIST_REMOVE(ioh, next); 24951dee5e4SPaolo Bonzini g_free(ioh); 25051dee5e4SPaolo Bonzini } 25151dee5e4SPaolo Bonzini } 25251dee5e4SPaolo Bonzini qemu_lockcnt_unlock(&io_handlers_lockcnt); 25351dee5e4SPaolo Bonzini } 25451dee5e4SPaolo Bonzini 25551dee5e4SPaolo BonziniAgain, the RCU primitives are used because new items can be added to the 256362dbb4fSPeter Maydelllist during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and 25751dee5e4SPaolo Bonzinithe compiler see the appropriate memory barriers. 25851dee5e4SPaolo Bonzini 259362dbb4fSPeter MaydellAn alternative pattern uses ``qemu_lockcnt_dec_if_lock``:: 26051dee5e4SPaolo Bonzini 26151dee5e4SPaolo Bonzini qemu_lockcnt_inc(&io_handlers_lockcnt); 26251dee5e4SPaolo Bonzini QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) { 26351dee5e4SPaolo Bonzini if (ioh->deleted) { 26451dee5e4SPaolo Bonzini if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) { 26551dee5e4SPaolo Bonzini QLIST_REMOVE(ioh, next); 26651dee5e4SPaolo Bonzini g_free(ioh); 26751dee5e4SPaolo Bonzini qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt); 26851dee5e4SPaolo Bonzini } 26951dee5e4SPaolo Bonzini } else { 27051dee5e4SPaolo Bonzini if (ioh->revents & G_IO_OUT) { 27151dee5e4SPaolo Bonzini ioh->fd_write(ioh->opaque); 27251dee5e4SPaolo Bonzini } 27351dee5e4SPaolo Bonzini } 27451dee5e4SPaolo Bonzini } 27551dee5e4SPaolo Bonzini qemu_lockcnt_dec(&io_handlers_lockcnt); 27651dee5e4SPaolo Bonzini 277362dbb4fSPeter MaydellHere you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``, 27851dee5e4SPaolo Bonzinibecause there is no special task to do if the count goes from 1 to 0. 279