Skip to content
  • Jack Morgenstein's avatar
    IB/mad: Fix use-after-free in ib mad completion handling · 770b7d96
    Jack Morgenstein authored
    We encountered a use-after-free bug when unloading the driver:
    
    [ 3562.116059] BUG: KASAN: use-after-free in ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
    [ 3562.117233] Read of size 4 at addr ffff8882ca5aa868 by task kworker/u13:2/23862
    [ 3562.118385]
    [ 3562.119519] CPU: 2 PID: 23862 Comm: kworker/u13:2 Tainted: G           OE     5.1.0-for-upstream-dbg-2019-05-19_16-44-30-13 #1
    [ 3562.121806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
    [ 3562.123075] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
    [ 3562.124383] Call Trace:
    [ 3562.125640]  dump_stack+0x9a/0xeb
    [ 3562.126911]  print_address_description+0xe3/0x2e0
    [ 3562.128223]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
    [ 3562.129545]  __kasan_report+0x15c/0x1df
    [ 3562.130866]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
    [ 3562.132174]  kasan_report+0xe/0x20
    [ 3562.133514]  ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
    [ 3562.134835]  ? find_mad_agent+0xa00/0xa00 [ib_core]
    [ 3562.136158]  ? qlist_free_all+0x51/0xb0
    [ 3562.137498]  ? mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
    [ 3562.138833]  ? quarantine_reduce+0x1fa/0x270
    [ 3562.140171]  ? kasan_unpoison_shadow+0x30/0x40
    [ 3562.141522]  ib_mad_recv_done+0xdf6/0x3000 [ib_core]
    [ 3562.142880]  ? _raw_spin_unlock_irqrestore+0x46/0x70
    [ 3562.144277]  ? ib_mad_send_done+0x1810/0x1810 [ib_core]
    [ 3562.145649]  ? mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
    [ 3562.147008]  ? _raw_spin_unlock_irqrestore+0x46/0x70
    [ 3562.148380]  ? debug_object_deactivate+0x2b9/0x4a0
    [ 3562.149814]  __ib_process_cq+0xe2/0x1d0 [ib_core]
    [ 3562.151195]  ib_cq_poll_work+0x45/0xf0 [ib_core]
    [ 3562.152577]  process_one_work+0x90c/0x1860
    [ 3562.153959]  ? pwq_dec_nr_in_flight+0x320/0x320
    [ 3562.155320]  worker_thread+0x87/0xbb0
    [ 3562.156687]  ? __kthread_parkme+0xb6/0x180
    [ 3562.158058]  ? process_one_work+0x1860/0x1860
    [ 3562.159429]  kthread+0x320/0x3e0
    [ 3562.161391]  ? kthread_park+0x120/0x120
    [ 3562.162744]  ret_from_fork+0x24/0x30
    ...
    [ 3562.187615] Freed by task 31682:
    [ 3562.188602]  save_stack+0x19/0x80
    [ 3562.189586]  __kasan_slab_free+0x11d/0x160
    [ 3562.190571]  kfree+0xf5/0x2f0
    [ 3562.191552]  ib_mad_port_close+0x200/0x380 [ib_core]
    [ 3562.192538]  ib_mad_remove_device+0xf0/0x230 [ib_core]
    [ 3562.193538]  remove_client_context+0xa6/0xe0 [ib_core]
    [ 3562.194514]  disable_device+0x14e/0x260 [ib_core]
    [ 3562.195488]  __ib_unregister_device+0x79/0x150 [ib_core]
    [ 3562.196462]  ib_unregister_device+0x21/0x30 [ib_core]
    [ 3562.197439]  mlx4_ib_remove+0x162/0x690 [mlx4_ib]
    [ 3562.198408]  mlx4_remove_device+0x204/0x2c0 [mlx4_core]
    [ 3562.199381]  mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
    [ 3562.200356]  mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
    [ 3562.201329]  __x64_sys_delete_module+0x2d2/0x400
    [ 3562.202288]  do_syscall_64+0x95/0x470
    [ 3562.203277]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
    The problem was that the MAD PD was deallocated before the MAD CQ.
    There was completion work pending for the CQ when the PD got deallocated.
    When the mad completion handling reached procedure
    ib_mad_post_receive_mads(), we got a use-after-free bug in the following
    line of code in that procedure:
       sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
    (the pd pointer in the above line is no longer valid, because the
    pd has been deallocated).
    
    We fix this by allocating the PD before the CQ in procedure
    ib_mad_port_open(), and deallocating the PD after freeing the CQ
    in procedure ib_mad_port_close().
    
    Since the CQ completion work queue is flushed during ib_free_cq(),
    no completions will be pending for that CQ when the PD is later
    deallocated.
    
    Note that freeing the CQ before deallocating the PD is the practice
    in the ULPs.
    
    Fixes: 4be90bc6
    
     ("IB/mad: Remove ib_get_dma_mr calls")
    Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
    Link: https://lore.kernel.org/r/20190801121449.24973-1-leon@kernel.org
    
    
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    770b7d96