Skip to content
  • Peter Xu's avatar
    monitor: bind dispatch bh to iohandler context · 951702f3
    Peter Xu authored
    Eric Auger reported the problem days ago that OOB broke ARM when running
    with libvirt:
    
    http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html
    
    
    
    The problem was that the monitor dispatcher bottom half was bound to
    qemu_aio_context now, which could be polled unexpectedly in block code.
    We should keep the dispatchers run in iohandler_ctx just like what we
    did before the Out-Of-Band series (chardev uses qio, and qio binds
    everything with iohandler_ctx).
    
    If without this change, QMP dispatcher might be run even before reaching
    main loop in block IO path, for example, in a stack like (the ARM case,
    "cont" command handler run even during machine init phase):
    
            #0  qmp_cont ()
            #1  0x00000000006bd210 in qmp_marshal_cont ()
            #2  0x0000000000ac05c4 in do_qmp_dispatch ()
            #3  0x0000000000ac07a0 in qmp_dispatch ()
            #4  0x0000000000472d60 in monitor_qmp_dispatch_one ()
            #5  0x000000000047302c in monitor_qmp_bh_dispatcher ()
            #6  0x0000000000acf374 in aio_bh_call ()
            #7  0x0000000000acf428 in aio_bh_poll ()
            #8  0x0000000000ad5110 in aio_poll ()
            #9  0x0000000000a08ab8 in blk_prw ()
            #10 0x0000000000a091c4 in blk_pread ()
            #11 0x0000000000734f94 in pflash_cfi01_realize ()
            #12 0x000000000075a3a4 in device_set_realized ()
            #13 0x00000000009a26cc in property_set_bool ()
            #14 0x00000000009a0a40 in object_property_set ()
            #15 0x00000000009a3a08 in object_property_set_qobject ()
            #16 0x00000000009a0c8c in object_property_set_bool ()
            #17 0x0000000000758f94 in qdev_init_nofail ()
            #18 0x000000000058e190 in create_one_flash ()
            #19 0x000000000058e2f4 in create_flash ()
            #20 0x00000000005902f0 in machvirt_init ()
            #21 0x00000000007635cc in machine_run_board_init ()
            #22 0x00000000006b135c in main ()
    
    Actually the problem is more severe than that.  After we switched to the
    qemu AIO handler it means the monitor dispatcher code can even be called
    with nested aio_poll(), then it can be an explicit aio_poll() inside
    another main loop aio_poll() which could be racy too; breaking code
    like TPM and 9p that use nested event loops.
    
    Switch to use the iohandler_ctx for monitor dispatchers.
    
    My sincere thanks to Eric Auger who offered great help during both
    debugging and verifying the problem.  The ARM test was carried out by
    applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the
    patch.
    
    A quick test of mine shows that after this patch applied we can pass all
    raw iotests even with OOB on by default.
    
    CC: Eric Blake <eblake@redhat.com>
    CC: Markus Armbruster <armbru@redhat.com>
    CC: Stefan Hajnoczi <stefanha@redhat.com>
    CC: Fam Zheng <famz@redhat.com>
    Reported-by: default avatarEric Auger <eric.auger@redhat.com>
    Tested-by: default avatarEric Auger <eric.auger@redhat.com>
    Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
    Message-Id: <20180410044942.17059-1-peterx@redhat.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: default avatarEric Blake <eblake@redhat.com>
    951702f3