Searched hist:"2362 a28ea11c145e1a13ae79342d76dc118a72a6" (Results 1 – 3 of 3) sorted by relevance
/qemu/tests/unit/ |
H A D | iothread.c | 69de48445a0d6169f1e2a6c5bfab994e1c810e33 Thu Oct 03 10:01:03 UTC 2019 Stefan Hajnoczi <stefanha@redhat.com> test-bdrv-drain: fix iothread_join() hang
tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
while (!atomic_read(&iothread->stopping)) { aio_poll(iothread->ctx, true); }
The iothread_join() function works as follows:
void iothread_join(IOThread *iothread) { iothread->stopping = true; aio_notify(iothread->ctx); qemu_thread_join(&iothread->thread);
If iothread_run() checks iothread->stopping before the iothread_join() thread sets stopping to true, then aio_notify() may be optimized away and iothread_run() hangs forever in aio_poll().
The correct way to change iothread->stopping is from a BH that executes within iothread_run(). This ensures that iothread->stopping is checked after we set it to true.
This was already fixed for ./iothread.c (note this is a different source file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread: fix iothread_stop() race condition"), but not for tests/iothread.c.
Fixes: 0c330a734b51c177ab8488932ac3b0c4d63a718a ("aio: introduce aio_co_schedule and aio_co_wake") Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20191003100103.331-1-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
/qemu/include/system/ |
H A D | iothread.h | 2362a28ea11c145e1a13ae79342d76dc118a72a6 Thu Dec 07 20:13:19 UTC 2017 Stefan Hajnoczi <stefanha@redhat.com> iothread: fix iothread_stop() race condition
There is a small chance that iothread_stop() hangs as follows:
Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)): #0 0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6 #1 0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 #3 0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629 #4 0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59 #5 0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0 #6 0x00007f64012cce6f in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7f640b45b280 (LWP 16103)): #0 0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0 #1 0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547 #2 0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91 #3 0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102 #4 0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852 #5 0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867 #6 0x0000559599680a6e in iothread_stop_all () at iothread.c:341 #7 0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913
The relevant code from iothread_run() is:
while (!atomic_read(&iothread->stopping)) { aio_poll(iothread->ctx, true);
and iothread_stop():
iothread->stopping = true; aio_notify(iothread->ctx); ... qemu_thread_join(&iothread->thread);
The following scenario can occur:
1. IOThread: while (!atomic_read(&iothread->stopping)) -> stopping=false
2. Main loop: iothread->stopping = true; aio_notify(iothread->ctx);
3. IOThread: aio_poll(iothread->ctx, true); -> hang
The bug is explained by the AioContext->notify_me doc comments:
"If this field is 0, everything (file descriptors, bottom halves, timers) will be re-evaluated before the next blocking poll(), thus the event_notifier_set call can be skipped."
The problem is that "everything" does not include checking iothread->stopping. This means iothread_run() will block in aio_poll() if aio_notify() was called just before aio_poll().
This patch fixes the hang by replacing aio_notify() with aio_bh_schedule_oneshot(). This makes aio_poll() or g_main_loop_run() to return.
Implementing this properly required a new bool running flag. The new flag prevents races that are tricky if we try to use iothread->stopping. Now iothread->stopping is purely for iothread_stop() and iothread->running is purely for the iothread_run() thread.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20171207201320.19284-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
/qemu/ |
H A D | iothread.c | 2362a28ea11c145e1a13ae79342d76dc118a72a6 Thu Dec 07 20:13:19 UTC 2017 Stefan Hajnoczi <stefanha@redhat.com> iothread: fix iothread_stop() race condition
There is a small chance that iothread_stop() hangs as follows:
Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)): #0 0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6 #1 0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322 #3 0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629 #4 0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59 #5 0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0 #6 0x00007f64012cce6f in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7f640b45b280 (LWP 16103)): #0 0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0 #1 0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547 #2 0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91 #3 0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102 #4 0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852 #5 0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867 #6 0x0000559599680a6e in iothread_stop_all () at iothread.c:341 #7 0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913
The relevant code from iothread_run() is:
while (!atomic_read(&iothread->stopping)) { aio_poll(iothread->ctx, true);
and iothread_stop():
iothread->stopping = true; aio_notify(iothread->ctx); ... qemu_thread_join(&iothread->thread);
The following scenario can occur:
1. IOThread: while (!atomic_read(&iothread->stopping)) -> stopping=false
2. Main loop: iothread->stopping = true; aio_notify(iothread->ctx);
3. IOThread: aio_poll(iothread->ctx, true); -> hang
The bug is explained by the AioContext->notify_me doc comments:
"If this field is 0, everything (file descriptors, bottom halves, timers) will be re-evaluated before the next blocking poll(), thus the event_notifier_set call can be skipped."
The problem is that "everything" does not include checking iothread->stopping. This means iothread_run() will block in aio_poll() if aio_notify() was called just before aio_poll().
This patch fixes the hang by replacing aio_notify() with aio_bh_schedule_oneshot(). This makes aio_poll() or g_main_loop_run() to return.
Implementing this properly required a new bool running flag. The new flag prevents races that are tricky if we try to use iothread->stopping. Now iothread->stopping is purely for iothread_stop() and iothread->running is purely for the iothread_run() thread.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20171207201320.19284-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|