xref: /qemu/docs/devel/lockcnt.rst (revision 0ae50e8e1e3799d22ca3431f0f238608dc2a4d36)
1362dbb4fSPeter MaydellLocked Counters (aka ``QemuLockCnt``)
2362dbb4fSPeter Maydell=====================================
351dee5e4SPaolo Bonzini
451dee5e4SPaolo BonziniQEMU often uses reference counts to track data structures that are being
551dee5e4SPaolo Bonziniaccessed and should not be freed.  For example, a loop that invoke
6362dbb4fSPeter Maydellcallbacks like this is not safe::
751dee5e4SPaolo Bonzini
851dee5e4SPaolo Bonzini    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
951dee5e4SPaolo Bonzini        if (ioh->revents & G_IO_OUT) {
1051dee5e4SPaolo Bonzini            ioh->fd_write(ioh->opaque);
1151dee5e4SPaolo Bonzini        }
1251dee5e4SPaolo Bonzini    }
1351dee5e4SPaolo Bonzini
14362dbb4fSPeter Maydell``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``)
15362dbb4fSPeter Maydellby stashing away its ``next`` pointer.  However, ``ioh->fd_write`` could
1651dee5e4SPaolo Bonziniactually delete the next node from the list.  The simplest way to
1751dee5e4SPaolo Bonziniavoid this is to mark the node as deleted, and remove it from the
18362dbb4fSPeter Maydelllist in the above loop::
1951dee5e4SPaolo Bonzini
2051dee5e4SPaolo Bonzini    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
2151dee5e4SPaolo Bonzini        if (ioh->deleted) {
2251dee5e4SPaolo Bonzini            QLIST_REMOVE(ioh, next);
2351dee5e4SPaolo Bonzini            g_free(ioh);
2451dee5e4SPaolo Bonzini        } else {
2551dee5e4SPaolo Bonzini            if (ioh->revents & G_IO_OUT) {
2651dee5e4SPaolo Bonzini                ioh->fd_write(ioh->opaque);
2751dee5e4SPaolo Bonzini            }
2851dee5e4SPaolo Bonzini        }
2951dee5e4SPaolo Bonzini    }
3051dee5e4SPaolo Bonzini
3151dee5e4SPaolo BonziniIf however this loop must also be reentrant, i.e. it is possible that
32362dbb4fSPeter Maydell``ioh->fd_write`` invokes the loop again, some kind of counting is needed::
3351dee5e4SPaolo Bonzini
3451dee5e4SPaolo Bonzini    walking_handlers++;
3551dee5e4SPaolo Bonzini    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
3651dee5e4SPaolo Bonzini        if (ioh->deleted) {
3751dee5e4SPaolo Bonzini            if (walking_handlers == 1) {
3851dee5e4SPaolo Bonzini                QLIST_REMOVE(ioh, next);
3951dee5e4SPaolo Bonzini                g_free(ioh);
4051dee5e4SPaolo Bonzini            }
4151dee5e4SPaolo Bonzini        } else {
4251dee5e4SPaolo Bonzini            if (ioh->revents & G_IO_OUT) {
4351dee5e4SPaolo Bonzini                ioh->fd_write(ioh->opaque);
4451dee5e4SPaolo Bonzini            }
4551dee5e4SPaolo Bonzini        }
4651dee5e4SPaolo Bonzini    }
4751dee5e4SPaolo Bonzini    walking_handlers--;
4851dee5e4SPaolo Bonzini
49362dbb4fSPeter MaydellOne may think of using the RCU primitives, ``rcu_read_lock()`` and
50362dbb4fSPeter Maydell``rcu_read_unlock()``; effectively, the RCU nesting count would take
5151dee5e4SPaolo Bonzinithe place of the walking_handlers global variable.  Indeed,
5251dee5e4SPaolo Bonzinireference counting and RCU have similar purposes, but their usage in
5351dee5e4SPaolo Bonzinigeneral is complementary:
5451dee5e4SPaolo Bonzini
5551dee5e4SPaolo Bonzini- reference counting is fine-grained and limited to a single data
5651dee5e4SPaolo Bonzini  structure; RCU delays reclamation of *all* RCU-protected data
5751dee5e4SPaolo Bonzini  structures;
5851dee5e4SPaolo Bonzini
5951dee5e4SPaolo Bonzini- reference counting works even in the presence of code that keeps
6051dee5e4SPaolo Bonzini  a reference for a long time; RCU critical sections in principle
6151dee5e4SPaolo Bonzini  should be kept short;
6251dee5e4SPaolo Bonzini
6351dee5e4SPaolo Bonzini- reference counting is often applied to code that is not thread-safe
6451dee5e4SPaolo Bonzini  but is reentrant; in fact, usage of reference counting in QEMU predates
6551dee5e4SPaolo Bonzini  the introduction of threads by many years.  RCU is generally used to
6651dee5e4SPaolo Bonzini  protect readers from other threads freeing memory after concurrent
6751dee5e4SPaolo Bonzini  modifications to a data structure.
6851dee5e4SPaolo Bonzini
6951dee5e4SPaolo Bonzini- reclaiming data can be done by a separate thread in the case of RCU;
7051dee5e4SPaolo Bonzini  this can improve performance, but also delay reclamation undesirably.
7151dee5e4SPaolo Bonzini  With reference counting, reclamation is deterministic.
7251dee5e4SPaolo Bonzini
73362dbb4fSPeter MaydellThis file documents ``QemuLockCnt``, an abstraction for using reference
7451dee5e4SPaolo Bonzinicounting in code that has to be both thread-safe and reentrant.
7551dee5e4SPaolo Bonzini
7651dee5e4SPaolo Bonzini
77362dbb4fSPeter Maydell``QemuLockCnt`` concepts
78362dbb4fSPeter Maydell------------------------
7951dee5e4SPaolo Bonzini
80362dbb4fSPeter MaydellA ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives
8151dee5e4SPaolo Bonzinito increment and decrement the counter, and to take and release the
8251dee5e4SPaolo Bonzinimutex.  The counter notes how many visits to the data structures are
8351dee5e4SPaolo Bonzinitaking place (the visits could be from different threads, or there could
8451dee5e4SPaolo Bonzinibe multiple reentrant visits from the same thread).  The basic rules
8551dee5e4SPaolo Bonzinigoverning the counter/mutex pair then are the following:
8651dee5e4SPaolo Bonzini
8751dee5e4SPaolo Bonzini- Data protected by the QemuLockCnt must not be freed unless the
8851dee5e4SPaolo Bonzini  counter is zero and the mutex is taken.
8951dee5e4SPaolo Bonzini
9051dee5e4SPaolo Bonzini- A new visit cannot be started while the counter is zero and the
9151dee5e4SPaolo Bonzini  mutex is taken.
9251dee5e4SPaolo Bonzini
9351dee5e4SPaolo BonziniMost of the time, the mutex protects all writes to the data structure,
9451dee5e4SPaolo Bonzininot just frees, though there could be cases where this is not necessary.
9551dee5e4SPaolo Bonzini
9651dee5e4SPaolo BonziniReads, instead, can be done without taking the mutex, as long as the
9751dee5e4SPaolo Bonzinireaders and writers use the same macros that are used for RCU, for
98362dbb4fSPeter Maydellexample ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``,
99362dbb4fSPeter Maydelletc.  This is because the reads are done outside a lock and a set
100362dbb4fSPeter Maydellor ``QLIST_INSERT_HEAD``
10151dee5e4SPaolo Bonzinican happen concurrently with the read.  The RCU API ensures that the
10251dee5e4SPaolo Bonziniprocessor and the compiler see all required memory barriers.
10351dee5e4SPaolo Bonzini
10451dee5e4SPaolo BonziniThis could be implemented simply by protecting the counter with the
105362dbb4fSPeter Maydellmutex, for example::
10651dee5e4SPaolo Bonzini
10751dee5e4SPaolo Bonzini    // (1)
10851dee5e4SPaolo Bonzini    qemu_mutex_lock(&walking_handlers_mutex);
10951dee5e4SPaolo Bonzini    walking_handlers++;
11051dee5e4SPaolo Bonzini    qemu_mutex_unlock(&walking_handlers_mutex);
11151dee5e4SPaolo Bonzini
11251dee5e4SPaolo Bonzini    ...
11351dee5e4SPaolo Bonzini
11451dee5e4SPaolo Bonzini    // (2)
11551dee5e4SPaolo Bonzini    qemu_mutex_lock(&walking_handlers_mutex);
11651dee5e4SPaolo Bonzini    if (--walking_handlers == 0) {
11751dee5e4SPaolo Bonzini        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
11851dee5e4SPaolo Bonzini            if (ioh->deleted) {
11951dee5e4SPaolo Bonzini                QLIST_REMOVE(ioh, next);
12051dee5e4SPaolo Bonzini                g_free(ioh);
12151dee5e4SPaolo Bonzini            }
12251dee5e4SPaolo Bonzini        }
12351dee5e4SPaolo Bonzini    }
12451dee5e4SPaolo Bonzini    qemu_mutex_unlock(&walking_handlers_mutex);
12551dee5e4SPaolo Bonzini
12651dee5e4SPaolo BonziniHere, no frees can happen in the code represented by the ellipsis.
12751dee5e4SPaolo BonziniIf another thread is executing critical section (2), that part of
12851dee5e4SPaolo Bonzinithe code cannot be entered, because the thread will not be able
129362dbb4fSPeter Maydellto increment the ``walking_handlers`` variable.  And of course
13051dee5e4SPaolo Bonziniduring the visit any other thread will see a nonzero value for
131362dbb4fSPeter Maydell``walking_handlers``, as in the single-threaded code.
13251dee5e4SPaolo Bonzini
13351dee5e4SPaolo BonziniNote that it is possible for multiple concurrent accesses to delay
134362dbb4fSPeter Maydellthe cleanup arbitrarily; in other words, for the ``walking_handlers``
13551dee5e4SPaolo Bonzinicounter to never become zero.  For this reason, this technique is
13651dee5e4SPaolo Bonzinimore easily applicable if concurrent access to the structure is rare.
13751dee5e4SPaolo Bonzini
13851dee5e4SPaolo BonziniHowever, critical sections are easy to forget since you have to do
139362dbb4fSPeter Maydellthem for each modification of the counter.  ``QemuLockCnt`` ensures that
14051dee5e4SPaolo Bonziniall modifications of the counter take the lock appropriately, and it
14151dee5e4SPaolo Bonzinican also be more efficient in two ways:
14251dee5e4SPaolo Bonzini
14351dee5e4SPaolo Bonzini- it avoids taking the lock for many operations (for example
14451dee5e4SPaolo Bonzini  incrementing the counter while it is non-zero);
14551dee5e4SPaolo Bonzini
146362dbb4fSPeter Maydell- on some platforms, one can implement ``QemuLockCnt`` to hold the lock
147fbcc3e50SPaolo Bonzini  and the mutex in a single word, making the fast path no more expensive
14851dee5e4SPaolo Bonzini  than simply managing a counter using atomic operations (see
149362dbb4fSPeter Maydell  :doc:`atomics`).  This can be very helpful if concurrent access to
150fbcc3e50SPaolo Bonzini  the data structure is expected to be rare.
15151dee5e4SPaolo Bonzini
15251dee5e4SPaolo Bonzini
15351dee5e4SPaolo BonziniUsing the same mutex for frees and writes can still incur some small
15451dee5e4SPaolo Bonziniinefficiencies; for example, a visit can never start if the counter is
155362dbb4fSPeter Maydellzero and the mutex is taken -- even if the mutex is taken by a write,
15651dee5e4SPaolo Bonziniwhich in principle need not block a visit of the data structure.
15751dee5e4SPaolo BonziniHowever, these are usually not a problem if any of the following
15851dee5e4SPaolo Bonziniassumptions are valid:
15951dee5e4SPaolo Bonzini
16051dee5e4SPaolo Bonzini- concurrent access is possible but rare
16151dee5e4SPaolo Bonzini
16251dee5e4SPaolo Bonzini- writes are rare
16351dee5e4SPaolo Bonzini
16451dee5e4SPaolo Bonzini- writes are frequent, but this kind of write (e.g. appending to a
16551dee5e4SPaolo Bonzini  list) has a very small critical section.
16651dee5e4SPaolo Bonzini
167362dbb4fSPeter MaydellFor example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of
16851dee5e4SPaolo Bonzinibottom halves and file descriptor handlers.  Modifications to the list
16951dee5e4SPaolo Bonziniof file descriptor handlers are rare.  Creation of a new bottom half is
17051dee5e4SPaolo Bonzinifrequent and can happen on a fast path; however: 1) it is almost never
17151dee5e4SPaolo Bonziniconcurrent with a visit to the list of bottom halves; 2) it only has
172362dbb4fSPeter Maydellthree instructions in the critical path, two assignments and a ``smp_wmb()``.
17351dee5e4SPaolo Bonzini
17451dee5e4SPaolo Bonzini
175362dbb4fSPeter Maydell``QemuLockCnt`` API
176362dbb4fSPeter Maydell-------------------
17751dee5e4SPaolo Bonzini
178*0ae50e8eSPeter Maydell.. kernel-doc:: include/qemu/lockcnt.h
17951dee5e4SPaolo Bonzini
18051dee5e4SPaolo Bonzini
181362dbb4fSPeter Maydell``QemuLockCnt`` usage
182362dbb4fSPeter Maydell---------------------
18351dee5e4SPaolo Bonzini
184362dbb4fSPeter MaydellThis section explains the typical usage patterns for ``QemuLockCnt`` functions.
18551dee5e4SPaolo Bonzini
18651dee5e4SPaolo BonziniSetting a variable to a non-NULL value can be done between
187362dbb4fSPeter Maydell``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``::
18851dee5e4SPaolo Bonzini
18951dee5e4SPaolo Bonzini    qemu_lockcnt_lock(&xyz_lockcnt);
19051dee5e4SPaolo Bonzini    if (!xyz) {
19151dee5e4SPaolo Bonzini        new_xyz = g_new(XYZ, 1);
19251dee5e4SPaolo Bonzini        ...
193d73415a3SStefan Hajnoczi        qatomic_rcu_set(&xyz, new_xyz);
19451dee5e4SPaolo Bonzini    }
19551dee5e4SPaolo Bonzini    qemu_lockcnt_unlock(&xyz_lockcnt);
19651dee5e4SPaolo Bonzini
197362dbb4fSPeter MaydellAccessing the value can be done between ``qemu_lockcnt_inc`` and
198362dbb4fSPeter Maydell``qemu_lockcnt_dec``::
19951dee5e4SPaolo Bonzini
20051dee5e4SPaolo Bonzini    qemu_lockcnt_inc(&xyz_lockcnt);
20151dee5e4SPaolo Bonzini    if (xyz) {
202d73415a3SStefan Hajnoczi        XYZ *p = qatomic_rcu_read(&xyz);
20351dee5e4SPaolo Bonzini        ...
20451dee5e4SPaolo Bonzini        /* Accesses can now be done through "p".  */
20551dee5e4SPaolo Bonzini    }
20651dee5e4SPaolo Bonzini    qemu_lockcnt_dec(&xyz_lockcnt);
20751dee5e4SPaolo Bonzini
208362dbb4fSPeter MaydellFreeing the object can similarly use ``qemu_lockcnt_lock`` and
209362dbb4fSPeter Maydell``qemu_lockcnt_unlock``, but you also need to ensure that the count
210362dbb4fSPeter Maydellis zero (i.e. there is no concurrent visit).  Because ``qemu_lockcnt_inc``
211362dbb4fSPeter Maydelltakes the ``QemuLockCnt``'s lock, the count cannot become non-zero while
212362dbb4fSPeter Maydellthe object is being freed.  Freeing an object looks like this::
21351dee5e4SPaolo Bonzini
21451dee5e4SPaolo Bonzini    qemu_lockcnt_lock(&xyz_lockcnt);
21551dee5e4SPaolo Bonzini    if (!qemu_lockcnt_count(&xyz_lockcnt)) {
21651dee5e4SPaolo Bonzini        g_free(xyz);
21751dee5e4SPaolo Bonzini        xyz = NULL;
21851dee5e4SPaolo Bonzini    }
21951dee5e4SPaolo Bonzini    qemu_lockcnt_unlock(&xyz_lockcnt);
22051dee5e4SPaolo Bonzini
22151dee5e4SPaolo BonziniIf an object has to be freed right after a visit, you can combine
222362dbb4fSPeter Maydellthe decrement, the locking and the check on count as follows::
22351dee5e4SPaolo Bonzini
22451dee5e4SPaolo Bonzini    qemu_lockcnt_inc(&xyz_lockcnt);
22551dee5e4SPaolo Bonzini    if (xyz) {
226d73415a3SStefan Hajnoczi        XYZ *p = qatomic_rcu_read(&xyz);
22751dee5e4SPaolo Bonzini        ...
22851dee5e4SPaolo Bonzini        /* Accesses can now be done through "p".  */
22951dee5e4SPaolo Bonzini    }
23051dee5e4SPaolo Bonzini    if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
23151dee5e4SPaolo Bonzini        g_free(xyz);
23251dee5e4SPaolo Bonzini        xyz = NULL;
23351dee5e4SPaolo Bonzini        qemu_lockcnt_unlock(&xyz_lockcnt);
23451dee5e4SPaolo Bonzini    }
23551dee5e4SPaolo Bonzini
236362dbb4fSPeter Maydell``QemuLockCnt`` can also be used to access a list as follows::
23751dee5e4SPaolo Bonzini
23851dee5e4SPaolo Bonzini    qemu_lockcnt_inc(&io_handlers_lockcnt);
23951dee5e4SPaolo Bonzini    QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
24051dee5e4SPaolo Bonzini        if (ioh->revents & G_IO_OUT) {
24151dee5e4SPaolo Bonzini            ioh->fd_write(ioh->opaque);
24251dee5e4SPaolo Bonzini        }
24351dee5e4SPaolo Bonzini    }
24451dee5e4SPaolo Bonzini
24551dee5e4SPaolo Bonzini    if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
24651dee5e4SPaolo Bonzini        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
24751dee5e4SPaolo Bonzini            if (ioh->deleted) {
24851dee5e4SPaolo Bonzini                QLIST_REMOVE(ioh, next);
24951dee5e4SPaolo Bonzini                g_free(ioh);
25051dee5e4SPaolo Bonzini            }
25151dee5e4SPaolo Bonzini        }
25251dee5e4SPaolo Bonzini        qemu_lockcnt_unlock(&io_handlers_lockcnt);
25351dee5e4SPaolo Bonzini    }
25451dee5e4SPaolo Bonzini
25551dee5e4SPaolo BonziniAgain, the RCU primitives are used because new items can be added to the
256362dbb4fSPeter Maydelllist during the walk.  ``QLIST_FOREACH_RCU`` ensures that the processor and
25751dee5e4SPaolo Bonzinithe compiler see the appropriate memory barriers.
25851dee5e4SPaolo Bonzini
259362dbb4fSPeter MaydellAn alternative pattern uses ``qemu_lockcnt_dec_if_lock``::
26051dee5e4SPaolo Bonzini
26151dee5e4SPaolo Bonzini    qemu_lockcnt_inc(&io_handlers_lockcnt);
26251dee5e4SPaolo Bonzini    QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
26351dee5e4SPaolo Bonzini        if (ioh->deleted) {
26451dee5e4SPaolo Bonzini            if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
26551dee5e4SPaolo Bonzini                QLIST_REMOVE(ioh, next);
26651dee5e4SPaolo Bonzini                g_free(ioh);
26751dee5e4SPaolo Bonzini                qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
26851dee5e4SPaolo Bonzini            }
26951dee5e4SPaolo Bonzini        } else {
27051dee5e4SPaolo Bonzini            if (ioh->revents & G_IO_OUT) {
27151dee5e4SPaolo Bonzini                ioh->fd_write(ioh->opaque);
27251dee5e4SPaolo Bonzini            }
27351dee5e4SPaolo Bonzini        }
27451dee5e4SPaolo Bonzini    }
27551dee5e4SPaolo Bonzini    qemu_lockcnt_dec(&io_handlers_lockcnt);
27651dee5e4SPaolo Bonzini
277362dbb4fSPeter MaydellHere you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``,
27851dee5e4SPaolo Bonzinibecause there is no special task to do if the count goes from 1 to 0.
279