Skip to content
  • Vladimir Sementsov-Ogievskiy's avatar
    mirror: fix dead-lock · d12ade57
    Vladimir Sementsov-Ogievskiy authored
    Let start from the beginning:
    
    Commit b9e413dd (in 2.9)
    "block: explicitly acquire aiocontext in aio callbacks that need it"
    added pairs of aio_context_acquire/release to mirror_write_complete and
    mirror_read_complete, when they were aio callbacks for blk_aio_* calls.
    
    Then, commit 2e1990b2
    
     (in 3.0) "block/mirror: Convert to coroutines"
    dropped these blk_aio_* calls, than mirror_write_complete and
    mirror_read_complete are not callbacks more, and don't need additional
    aiocontext acquiring. Furthermore, mirror_read_complete calls
    blk_co_pwritev inside these pair of aio_context_acquire/release, which
    leads to the following dead-lock with mirror:
    
     (gdb) info thr
       Id   Target Id         Frame
       3    Thread (LWP 145412) "qemu-system-x86" syscall ()
       2    Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait ()
     * 1    Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait ()
    
     (gdb) bt
     #0  __lll_lock_wait ()
     #1  _L_lock_812 ()
     #2  __GI___pthread_mutex_lock
     #3  qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>,
         file=0x5610327d8654 "util/main-loop.c", line=236) at
         util/qemu-thread-posix.c:66
     #4  qemu_mutex_lock_iothread_impl
     #5  os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236
     #6  main_loop_wait (nonblocking=0) at util/main-loop.c:497
     #7  main_loop () at vl.c:1892
     #8  main
    
    Printing contents of qemu_global_mutex, I see that "__owner = 145416",
    so, thr1 is main loop, and now it wants BQL, which is owned by thr2.
    
     (gdb) thr 2
     (gdb) bt
     #0  __lll_lock_wait ()
     #1  _L_lock_870 ()
     #2  __GI___pthread_mutex_lock
     #3  qemu_mutex_lock_impl (mutex=0x561034d25dc0, ...
     #4  aio_context_acquire (ctx=0x561034d25d60)
     #5  dma_blk_cb
     #6  dma_blk_io
     #7  dma_blk_read
     #8  ide_dma_cb
     #9  bmdma_cmd_writeb
     #10 bmdma_write
     #11 memory_region_write_accessor
     #12 access_with_adjusted_size
     #15 flatview_write
     #16 address_space_write
     #17 address_space_rw
     #18 kvm_handle_io
     #19 kvm_cpu_exec
     #20 qemu_kvm_cpu_thread_fn
     #21 qemu_thread_start
     #22 start_thread
     #23 clone ()
    
    Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio
    context mutex, which is owned by thr1. Classic dead-lock.
    
    Then, let's check that aio context is hold by mirror coroutine: just
    print coroutine stack of first tracked request in mirror job target:
    
     (gdb) [...]
     (gdb) qemu coroutine 0x561035dd0860
     #0  qemu_coroutine_switch
     #1  qemu_coroutine_yield
     #2  qemu_co_mutex_lock_slowpath
     #3  qemu_co_mutex_lock
     #4  qcow2_co_pwritev
     #5  bdrv_driver_pwritev
     #6  bdrv_aligned_pwritev
     #7  bdrv_co_pwritev
     #8  blk_co_pwritev
     #9  mirror_read_complete () at block/mirror.c:232
     #10 mirror_co_read () at block/mirror.c:370
     #11 coroutine_trampoline
     #12 __start_context
    
    Yes it is mirror_read_complete calling blk_co_pwritev after acquiring
    aio context.
    
    Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
    Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
    Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    d12ade57