Skip to content
  • Håkon Bugge's avatar
    rds: ib: Fix NULL pointer dereference in debug code · 1cb483a5
    Håkon Bugge authored
    rds_ib_recv_refill() is a function that refills an IB receive
    queue. It can be called from both the CQE handler (tasklet) and a
    worker thread.
    
    Just after the call to ib_post_recv(), a debug message is printed with
    rdsdebug():
    
                ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
                rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
                         recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
                         (long) ib_sg_dma_address(
                                ic->i_cm_id->device,
                                &recv->r_frag->f_sg),
                        ret);
    
    Now consider an invocation of rds_ib_recv_refill() from the worker
    thread, which is preemptible. Further, assume that the worker thread
    is preempted between the ib_post_recv() and rdsdebug() statements.
    
    Then, if the preemption is due to a receive CQE event, the
    rds_ib_recv_cqe_handler() will be invoked. This function processes
    receive completions, including freeing up data structures, such as the
    recv->r_frag.
    
    In this scenario, rds_ib_recv_cqe_handler() will process the receive
    WR posted above. That implies, that the recv->r_frag has been freed
    before the above rdsdebug() statement has been executed. When it is
    later executed, we will have a NULL pointer dereference:
    
    [ 4088.068008] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    [ 4088.076754] IP: rds_ib_recv_refill+0x87/0x620 [rds_rdma]
    [ 4088.082686] PGD 0 P4D 0
    [ 4088.085515] Oops: 0000 [#1] SMP
    [ 4088.089015] Modules linked in: rds_rdma(OE) rds(OE) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) mlx4_ib(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_core(E) binfmt_misc(E) sb_edac(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) crypto_simd(E) iTCO_wdt(E) glue_helper(E) iTCO_vendor_support(E) sg(E) cryptd(E) pcspkr(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) shpchp(E) ioatdma(E) i2c_i801(E) wmi(E) lpc_ich(E) mei_me(E) mei(E) mfd_core(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E)
    [ 4088.168486]  fb_sys_fops(E) ahci(E) ixgbe(E) libahci(E) ttm(E) mdio(E) ptp(E) pps_core(E) drm(E) sd_mod(E) libata(E) crc32c_intel(E) mlx4_core(E) i2c_core(E) dca(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: rds]
    [ 4088.193442] CPU: 20 PID: 1244 Comm: kworker/20:2 Tainted: G           OE   4.14.0-rc7.master.20171105.ol7.x86_64 #1
    
    
    [ 4088.205097] Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
    [ 4088.216074] Workqueue: ib_cm cm_work_handler [ib_cm]
    [ 4088.221614] task: ffff885fa11d0000 task.stack: ffffc9000e598000
    [ 4088.228224] RIP: 0010:rds_ib_recv_refill+0x87/0x620 [rds_rdma]
    [ 4088.234736] RSP: 0018:ffffc9000e59bb68 EFLAGS: 00010286
    [ 4088.240568] RAX: 0000000000000000 RBX: ffffc9002115d050 RCX: ffffc9002115d050
    [ 4088.248535] RDX: ffffffffa0521380 RSI: ffffffffa0522158 RDI: ffffffffa0525580
    [ 4088.256498] RBP: ffffc9000e59bbf8 R08: 0000000000000005 R09: 0000000000000000
    [ 4088.264465] R10: 0000000000000339 R11: 0000000000000001 R12: 0000000000000000
    [ 4088.272433] R13: ffff885f8c9d8000 R14: ffffffff81a0a060 R15: ffff884676268000
    [ 4088.280397] FS:  0000000000000000(0000) GS:ffff885fbec80000(0000) knlGS:0000000000000000
    [ 4088.289434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 4088.295846] CR2: 0000000000000020 CR3: 0000000001e09005 CR4: 00000000001606e0
    [ 4088.303816] Call Trace:
    [ 4088.306557]  rds_ib_cm_connect_complete+0xe0/0x220 [rds_rdma]
    [ 4088.312982]  ? __dynamic_pr_debug+0x8c/0xb0
    [ 4088.317664]  ? __queue_work+0x142/0x3c0
    [ 4088.321944]  rds_rdma_cm_event_handler+0x19e/0x250 [rds_rdma]
    [ 4088.328370]  cma_ib_handler+0xcd/0x280 [rdma_cm]
    [ 4088.333522]  cm_process_work+0x25/0x120 [ib_cm]
    [ 4088.338580]  cm_work_handler+0xd6b/0x17aa [ib_cm]
    [ 4088.343832]  process_one_work+0x149/0x360
    [ 4088.348307]  worker_thread+0x4d/0x3e0
    [ 4088.352397]  kthread+0x109/0x140
    [ 4088.355996]  ? rescuer_thread+0x380/0x380
    [ 4088.360467]  ? kthread_park+0x60/0x60
    [ 4088.364563]  ret_from_fork+0x25/0x30
    [ 4088.368548] Code: 48 89 45 90 48 89 45 98 eb 4d 0f 1f 44 00 00 48 8b 43 08 48 89 d9 48 c7 c2 80 13 52 a0 48 c7 c6 58 21 52 a0 48 c7 c7 80 55 52 a0 <4c> 8b 48 20 44 89 64 24 08 48 8b 40 30 49 83 e1 fc 48 89 04 24
    [ 4088.389612] RIP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] RSP: ffffc9000e59bb68
    [ 4088.397772] CR2: 0000000000000020
    [ 4088.401505] ---[ end trace fe922e6ccf004431 ]---
    
    This bug was provoked by compiling rds out-of-tree with
    EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
    between the rdsdebug() and ib_ib_port_recv() statements:
    
       	       /* XXX when can this fail? */
    	       ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
    +		if (can_wait)
    +			usleep_range(1000, 5000);
    	       rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
    			recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
    			(long) ib_sg_dma_address(
    
    The fix is simply to move the rdsdebug() statement up before the
    ib_post_recv() and remove the printing of ret, which is taken care of
    anyway by the non-debug code.
    
    Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
    Reviewed-by: default avatarKnut Omang <knut.omang@oracle.com>
    Reviewed-by: default avatarWei Lin Guay <wei.lin.guay@oracle.com>
    Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    1cb483a5