1. 20 Aug, 2019 3 commits
  2. 12 Aug, 2019 1 commit
    • Rafael J. Wysocki's avatar
      nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled · 4eaefe8c
      Rafael J. Wysocki authored
      One of the modifications made by commit d916b1be ("nvme-pci: use
      host managed power state for suspend") was adding a pci_save_state()
      call to nvme_suspend() so as to instruct the PCI bus type to leave
      devices handled by the nvme driver in D0 during suspend-to-idle.
      That was done with the assumption that ASPM would transition the
      device's PCIe link into a low-power state when the device became
      inactive.  However, if ASPM is disabled for the device, its PCIe
      link will stay in L0 and in that case commit d916b1be is likely
      to cause the energy used by the system while suspended to increase.
      
      Namely, if the device in question works in accordance with the PCIe
      specification, putting it into D3hot causes its PCIe link to go to
      L1 or L2/L3 Ready, which is lower-power than L0.  Since the energy
      used by the system while suspended depends on the state of its PCIe
      link (as a general rule, the lower-power the state of the link, the
      less energy the system will use), putting the device into D3hot
      during suspend-to-idle should be more energy-efficient that leaving
      it in D0 with disabled ASPM.
      
      For this reason, avoid leaving NVMe devices with disabled ASPM in D0
      during suspend-to-idle.  Instead, shut them down entirely and let
      the PCI bus type put them into D3.
      
      Fixes: d916b1be ("nvme-pci: use host managed power state for suspend")
      Link: https://lore.kernel.org/linux-pm/2763495.NmdaWeg79L@kreacher/T/#tSigned-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      4eaefe8c
  3. 01 Aug, 2019 8 commits
    • Keith Busch's avatar
      nvme-pci: Fix async probe remove race · bd46a906
      Keith Busch authored
      Ensure the controller is not in the NEW state when nvme_probe() exits.
      This will always allow a subsequent nvme_remove() to set the state to
      DELETING, fixing a potential race between the initial asynchronous probe
      and device removal.
      Reported-by: default avatarLi Zhong <lizhongfs@gmail.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      bd46a906
    • Sagi Grimberg's avatar
      nvme: fix controller removal race with scan work · 0157ec8d
      Sagi Grimberg authored
      With multipath enabled, nvme_scan_work() can read from the device
      (through nvme_mpath_add_disk()) and hang [1]. However, with fabrics,
      once ctrl->state is set to NVME_CTRL_DELETING, the reads will hang
      (see nvmf_check_ready()) and the mpath stack device make_request
      will block if head->list is not empty. However, when the head->list
      consistst of only DELETING/DEAD controllers, we should actually not
      block, but rather fail immediately.
      
      In addition, before we go ahead and remove the namespaces, make sure
      to clear the current path and kick the requeue list so that the
      request will fast fail upon requeuing.
      
      [1]:
      --
        INFO: task kworker/u4:3:166 blocked for more than 120 seconds.
              Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316
        "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        kworker/u4:3    D    0   166      2 0x80004000
        Workqueue: nvme-wq nvme_scan_work
        Call Trace:
         __schedule+0x851/0x1400
         schedule+0x99/0x210
         io_schedule+0x21/0x70
         do_read_cache_page+0xa57/0x1330
         read_cache_page+0x4a/0x70
         read_dev_sector+0xbf/0x380
         amiga_partition+0xc4/0x1230
         check_partition+0x30f/0x630
         rescan_partitions+0x19a/0x980
         __blkdev_get+0x85a/0x12f0
         blkdev_get+0x2a5/0x790
         __device_add_disk+0xe25/0x1250
         device_add_disk+0x13/0x20
         nvme_mpath_set_live+0x172/0x2b0
         nvme_update_ns_ana_state+0x130/0x180
         nvme_set_ns_ana_state+0x9a/0xb0
         nvme_parse_ana_log+0x1c3/0x4a0
         nvme_mpath_add_disk+0x157/0x290
         nvme_validate_ns+0x1017/0x1bd0
         nvme_scan_work+0x44d/0x6a0
         process_one_work+0x7d7/0x1240
         worker_thread+0x8e/0xff0
         kthread+0x2c3/0x3b0
         ret_from_fork+0x35/0x40
      
         INFO: task kworker/u4:1:1034 blocked for more than 120 seconds.
              Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316
        "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        kworker/u4:1    D    0  1034      2 0x80004000
        Workqueue: nvme-delete-wq nvme_delete_ctrl_work
        Call Trace:
         __schedule+0x851/0x1400
         schedule+0x99/0x210
         schedule_timeout+0x390/0x830
         wait_for_completion+0x1a7/0x310
         __flush_work+0x241/0x5d0
         flush_work+0x10/0x20
         nvme_remove_namespaces+0x85/0x3d0
         nvme_do_delete_ctrl+0xb4/0x1e0
         nvme_delete_ctrl_work+0x15/0x20
         process_one_work+0x7d7/0x1240
         worker_thread+0x8e/0xff0
         kthread+0x2c3/0x3b0
         ret_from_fork+0x35/0x40
      --
      Reported-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Tested-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      0157ec8d
    • Sagi Grimberg's avatar
      nvme-rdma: fix possible use-after-free in connect error flow · d94211b8
      Sagi Grimberg authored
      When start_queue fails, we need to make sure to drain the
      queue cq before freeing the rdma resources because we might
      still race with the completion path. Have start_queue() error
      path safely stop the queue.
      
      --
      [30371.808111] nvme nvme1: Failed reconnect attempt 11
      [30371.808113] nvme nvme1: Reconnecting in 10 seconds...
      [...]
      [30382.069315] nvme nvme1: creating 4 I/O queues.
      [30382.257058] nvme nvme1: Connect Invalid SQE Parameter, qid 4
      [30382.257061] nvme nvme1: failed to connect queue: 4 ret=386
      [30382.305001] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [30382.305022] IP: qedr_poll_cq+0x8a3/0x1170 [qedr]
      [30382.305028] PGD 0 P4D 0
      [30382.305037] Oops: 0000 [#1] SMP PTI
      [...]
      [30382.305153] Call Trace:
      [30382.305166]  ? __switch_to_asm+0x34/0x70
      [30382.305187]  __ib_process_cq+0x56/0xd0 [ib_core]
      [30382.305201]  ib_poll_handler+0x26/0x70 [ib_core]
      [30382.305213]  irq_poll_softirq+0x88/0x110
      [30382.305223]  ? sort_range+0x20/0x20
      [30382.305232]  __do_softirq+0xde/0x2c6
      [30382.305241]  ? sort_range+0x20/0x20
      [30382.305249]  run_ksoftirqd+0x1c/0x60
      [30382.305258]  smpboot_thread_fn+0xef/0x160
      [30382.305265]  kthread+0x113/0x130
      [30382.305273]  ? kthread_create_worker_on_cpu+0x50/0x50
      [30382.305281]  ret_from_fork+0x35/0x40
      --
      Reported-by: default avatarNicolas Morey-Chaisemartin <NMoreyChaisemartin@suse.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      d94211b8
    • Sagi Grimberg's avatar
      nvme: fix a possible deadlock when passthru commands sent to a multipath device · b9156dae
      Sagi Grimberg authored
      When the user issues a command with side effects, we will end up freezing
      the namespace request queue when updating disk info (and the same for
      the corresponding mpath disk node).
      
      However, we are not freezing the mpath node request queue,
      which means that mpath I/O can still come in and block on blk_queue_enter
      (called from nvme_ns_head_make_request -> direct_make_request).
      
      This is a deadlock, because blk_queue_enter will block until the inner
      namespace request queue is unfroze, but that process is blocked because
      the namespace revalidation is trying to update the mpath disk info
      and freeze its request queue (which will never complete because
      of the I/O that is blocked on blk_queue_enter).
      
      Fix this by freezing all the subsystem nsheads request queues before
      executing the passthru command. Given that these commands are infrequent
      we should not worry about this temporary I/O freeze to keep things sane.
      
      Here is the matching hang traces:
      --
      [ 374.465002] INFO: task systemd-udevd:17994 blocked for more than 122 seconds.
      [ 374.472975] Not tainted 5.2.0-rc3-mpdebug+ #42
      [ 374.478522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 374.487274] systemd-udevd D 0 17994 1 0x00000000
      [ 374.493407] Call Trace:
      [ 374.496145] __schedule+0x2ef/0x620
      [ 374.500047] schedule+0x38/0xa0
      [ 374.503569] blk_queue_enter+0x139/0x220
      [ 374.507959] ? remove_wait_queue+0x60/0x60
      [ 374.512540] direct_make_request+0x60/0x130
      [ 374.517219] nvme_ns_head_make_request+0x11d/0x420 [nvme_core]
      [ 374.523740] ? generic_make_request_checks+0x307/0x6f0
      [ 374.529484] generic_make_request+0x10d/0x2e0
      [ 374.534356] submit_bio+0x75/0x140
      [ 374.538163] ? guard_bio_eod+0x32/0xe0
      [ 374.542361] submit_bh_wbc+0x171/0x1b0
      [ 374.546553] block_read_full_page+0x1ed/0x330
      [ 374.551426] ? check_disk_change+0x70/0x70
      [ 374.556008] ? scan_shadow_nodes+0x30/0x30
      [ 374.560588] blkdev_readpage+0x18/0x20
      [ 374.564783] do_read_cache_page+0x301/0x860
      [ 374.569463] ? blkdev_writepages+0x10/0x10
      [ 374.574037] ? prep_new_page+0x88/0x130
      [ 374.578329] ? get_page_from_freelist+0xa2f/0x1280
      [ 374.583688] ? __alloc_pages_nodemask+0x179/0x320
      [ 374.588947] read_cache_page+0x12/0x20
      [ 374.593142] read_dev_sector+0x2d/0xd0
      [ 374.597337] read_lba+0x104/0x1f0
      [ 374.601046] find_valid_gpt+0xfa/0x720
      [ 374.605243] ? string_nocheck+0x58/0x70
      [ 374.609534] ? find_valid_gpt+0x720/0x720
      [ 374.614016] efi_partition+0x89/0x430
      [ 374.618113] ? string+0x48/0x60
      [ 374.621632] ? snprintf+0x49/0x70
      [ 374.625339] ? find_valid_gpt+0x720/0x720
      [ 374.629828] check_partition+0x116/0x210
      [ 374.634214] rescan_partitions+0xb6/0x360
      [ 374.638699] __blkdev_reread_part+0x64/0x70
      [ 374.643377] blkdev_reread_part+0x23/0x40
      [ 374.647860] blkdev_ioctl+0x48c/0x990
      [ 374.651956] block_ioctl+0x41/0x50
      [ 374.655766] do_vfs_ioctl+0xa7/0x600
      [ 374.659766] ? locks_lock_inode_wait+0xb1/0x150
      [ 374.664832] ksys_ioctl+0x67/0x90
      [ 374.668539] __x64_sys_ioctl+0x1a/0x20
      [ 374.672732] do_syscall_64+0x5a/0x1c0
      [ 374.676828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [ 374.738474] INFO: task nvmeadm:49141 blocked for more than 123 seconds.
      [ 374.745871] Not tainted 5.2.0-rc3-mpdebug+ #42
      [ 374.751419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 374.760170] nvmeadm D 0 49141 36333 0x00004080
      [ 374.766301] Call Trace:
      [ 374.769038] __schedule+0x2ef/0x620
      [ 374.772939] schedule+0x38/0xa0
      [ 374.776452] blk_mq_freeze_queue_wait+0x59/0x100
      [ 374.781614] ? remove_wait_queue+0x60/0x60
      [ 374.786192] blk_mq_freeze_queue+0x1a/0x20
      [ 374.790773] nvme_update_disk_info.isra.57+0x5f/0x350 [nvme_core]
      [ 374.797582] ? nvme_identify_ns.isra.50+0x71/0xc0 [nvme_core]
      [ 374.804006] __nvme_revalidate_disk+0xe5/0x110 [nvme_core]
      [ 374.810139] nvme_revalidate_disk+0xa6/0x120 [nvme_core]
      [ 374.816078] ? nvme_submit_user_cmd+0x11e/0x320 [nvme_core]
      [ 374.822299] nvme_user_cmd+0x264/0x370 [nvme_core]
      [ 374.827661] nvme_dev_ioctl+0x112/0x1d0 [nvme_core]
      [ 374.833114] do_vfs_ioctl+0xa7/0x600
      [ 374.837117] ? __audit_syscall_entry+0xdd/0x130
      [ 374.842184] ksys_ioctl+0x67/0x90
      [ 374.845891] __x64_sys_ioctl+0x1a/0x20
      [ 374.850082] do_syscall_64+0x5a/0x1c0
      [ 374.854178] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      --
      Reported-by: default avatarJames Puthukattukaran <james.puthukattukaran@oracle.com>
      Tested-by: default avatarJames Puthukattukaran <james.puthukattukaran@oracle.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      b9156dae
    • Logan Gunthorpe's avatar
      nvme-core: Fix extra device_put() call on error path · 8c36e66f
      Logan Gunthorpe authored
      In the error path for nvme_init_subsystem(), nvme_put_subsystem()
      will call device_put(), but it will get called again after the
      mutex_unlock().
      
      The device_put() only needs to be called if device_add() fails.
      
      This bug caused a KASAN use-after-free error when adding and
      removing subsytems in a loop:
      
        BUG: KASAN: use-after-free in device_del+0x8d9/0x9a0
        Read of size 8 at addr ffff8883cdaf7120 by task multipathd/329
      
        CPU: 0 PID: 329 Comm: multipathd Not tainted 5.2.0-rc6-vmlocalyes-00019-g70a2b39005fd-dirty #314
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         dump_stack+0x7b/0xb5
         print_address_description+0x6f/0x280
         ? device_del+0x8d9/0x9a0
         __kasan_report+0x148/0x199
         ? device_del+0x8d9/0x9a0
         ? class_release+0x100/0x130
         ? device_del+0x8d9/0x9a0
         kasan_report+0x12/0x20
         __asan_report_load8_noabort+0x14/0x20
         device_del+0x8d9/0x9a0
         ? device_platform_notify+0x70/0x70
         nvme_destroy_subsystem+0xf9/0x150
         nvme_free_ctrl+0x280/0x3a0
         device_release+0x72/0x1d0
         kobject_put+0x144/0x410
         put_device+0x13/0x20
         nvme_free_ns+0xc4/0x100
         nvme_release+0xb3/0xe0
         __blkdev_put+0x549/0x6e0
         ? kasan_check_write+0x14/0x20
         ? bd_set_size+0xb0/0xb0
         ? kasan_check_write+0x14/0x20
         ? mutex_lock+0x8f/0xe0
         ? __mutex_lock_slowpath+0x20/0x20
         ? locks_remove_file+0x239/0x370
         blkdev_put+0x72/0x2c0
         blkdev_close+0x8d/0xd0
         __fput+0x256/0x770
         ? _raw_read_lock_irq+0x40/0x40
         ____fput+0xe/0x10
         task_work_run+0x10c/0x180
         ? filp_close+0xf7/0x140
         exit_to_usermode_loop+0x151/0x170
         do_syscall_64+0x240/0x2e0
         ? prepare_exit_to_usermode+0xd5/0x190
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f5a79af05d7
        Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 c4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 06 fc ff ff 8b 44 24
        RSP: 002b:00007f5a7799c810 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
        RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007f5a79af05d7
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000008
        RBP: 00007f5a58000f98 R08: 0000000000000002 R09: 00007f5a7935ee80
        R10: 0000000000000000 R11: 0000000000000293 R12: 000055e432447240
        R13: 0000000000000000 R14: 0000000000000001 R15: 000055e4324a9cf0
      
        Allocated by task 1236:
         save_stack+0x21/0x80
         __kasan_kmalloc.constprop.6+0xab/0xe0
         kasan_kmalloc+0x9/0x10
         kmem_cache_alloc_trace+0x102/0x210
         nvme_init_identify+0x13c3/0x3820
         nvme_loop_configure_admin_queue+0x4fa/0x5e0
         nvme_loop_create_ctrl+0x469/0xf40
         nvmf_dev_write+0x19a3/0x21ab
         __vfs_write+0x66/0x120
         vfs_write+0x154/0x490
         ksys_write+0x104/0x240
         __x64_sys_write+0x73/0xb0
         do_syscall_64+0xa5/0x2e0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 329:
         save_stack+0x21/0x80
         __kasan_slab_free+0x129/0x190
         kasan_slab_free+0xe/0x10
         kfree+0xa7/0x200
         nvme_release_subsystem+0x49/0x60
         device_release+0x72/0x1d0
         kobject_put+0x144/0x410
         put_device+0x13/0x20
         klist_class_dev_put+0x31/0x40
         klist_put+0x8f/0xf0
         klist_del+0xe/0x10
         device_del+0x3a7/0x9a0
         nvme_destroy_subsystem+0xf9/0x150
         nvme_free_ctrl+0x280/0x3a0
         device_release+0x72/0x1d0
         kobject_put+0x144/0x410
         put_device+0x13/0x20
         nvme_free_ns+0xc4/0x100
         nvme_release+0xb3/0xe0
         __blkdev_put+0x549/0x6e0
         blkdev_put+0x72/0x2c0
         blkdev_close+0x8d/0xd0
         __fput+0x256/0x770
         ____fput+0xe/0x10
         task_work_run+0x10c/0x180
         exit_to_usermode_loop+0x151/0x170
         do_syscall_64+0x240/0x2e0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 32fd90c4 ("nvme: change locking for the per-subsystem controller list")
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      8c36e66f
    • Logan Gunthorpe's avatar
      nvmet-file: fix nvmet_file_flush() always returning an error · cfc1a1af
      Logan Gunthorpe authored
      Presently, nvmet_file_flush() always returns a call to
      errno_to_nvme_status() but that helper doesn't take into account the
      case when errno=0. So nvmet_file_flush() always returns an error code.
      
      All other callers of errno_to_nvme_status() check for success before
      calling it.
      
      To fix this, ensure errno_to_nvme_status() returns success if the
      errno is zero. This should prevent future mistakes like this from
      happening.
      
      Fixes: c6aa3542 ("nvmet: add error log support for file backend")
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      cfc1a1af
    • Logan Gunthorpe's avatar
      nvmet-loop: Flush nvme_delete_wq when removing the port · 86b9a63e
      Logan Gunthorpe authored
      After calling nvme_loop_delete_ctrl(), the controllers will not
      yet be deleted because nvme_delete_ctrl() only schedules work
      to do the delete.
      
      This means a race can occur if a port is removed but there
      are still active controllers trying to access that memory.
      
      To fix this, flush the nvme_delete_wq before returning from
      nvme_loop_remove_port() so that any controllers that might
      be in the process of being deleted won't access a freed port.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      86b9a63e
    • Logan Gunthorpe's avatar
      nvmet: Fix use-after-free bug when a port is removed · 3aed8673
      Logan Gunthorpe authored
      When a port is removed through configfs, any connected controllers
      are still active and can still send commands. This causes a
      use-after-free bug which is detected by KASAN for any admin command
      that dereferences req->port (like in nvmet_execute_identify_ctrl).
      
      To fix this, disconnect all active controllers when a subsystem is
      removed from a port. This ensures there are no active controllers
      when the port is eventually removed.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      3aed8673
  4. 30 Jul, 2019 1 commit
  5. 23 Jul, 2019 4 commits
    • yangerkun's avatar
      Revert "nvme-pci: don't create a read hctx mapping without read queues" · 8fe34be1
      yangerkun authored
      This reverts commit 0298d543.
      
      With this patch, set 'poll_queues > hard queues' will lead to 'nr_read_queues = 0'
      in nvme_calc_irq_sets. Then poll_queues setting can fail since dev->tagset.nr_maps
      equals to 2 and nvme_pci_map_queues will not do map for poll queues.
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8fe34be1
    • Marta Rybczynska's avatar
      nvme: fix multipath crash when ANA is deactivated · 66b20ac0
      Marta Rybczynska authored
      Fix a crash with multipath activated. It happends when ANA log
      page is larger than MDTS and because of that ANA is disabled.
      The driver then tries to access unallocated buffer when connecting
      to a nvme target. The signature is as follows:
      
      [  300.433586] nvme nvme0: ANA log page size (8208) larger than MDTS (8192).
      [  300.435387] nvme nvme0: disabling ANA support.
      [  300.437835] nvme nvme0: creating 4 I/O queues.
      [  300.459132] nvme nvme0: new ctrl: NQN "nqn.0.0.0", addr 10.91.0.1:8009
      [  300.464609] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [  300.466342] #PF error: [normal kernel read fault]
      [  300.467385] PGD 0 P4D 0
      [  300.467987] Oops: 0000 [#1] SMP PTI
      [  300.468787] CPU: 3 PID: 50 Comm: kworker/u8:1 Not tainted 5.0.20kalray+ #4
      [  300.470264] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  300.471532] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [  300.472724] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
      [  300.474038] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
      [  300.477374] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
      [  300.478334] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
      [  300.479784] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
      [  300.481488] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
      [  300.483203] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
      [  300.484928] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
      [  300.486626] FS:  0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
      [  300.488538] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  300.489907] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
      [  300.491612] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  300.493303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  300.494991] Call Trace:
      [  300.495645]  nvme_mpath_add_disk+0x5c/0xb0 [nvme_core]
      [  300.496880]  nvme_validate_ns+0x2ef/0x550 [nvme_core]
      [  300.498105]  ? nvme_identify_ctrl.isra.45+0x6a/0xb0 [nvme_core]
      [  300.499539]  nvme_scan_work+0x2b4/0x370 [nvme_core]
      [  300.500717]  ? __switch_to_asm+0x35/0x70
      [  300.501663]  process_one_work+0x171/0x380
      [  300.502340]  worker_thread+0x49/0x3f0
      [  300.503079]  kthread+0xf8/0x130
      [  300.503795]  ? max_active_store+0x80/0x80
      [  300.504690]  ? kthread_bind+0x10/0x10
      [  300.505502]  ret_from_fork+0x35/0x40
      [  300.506280] Modules linked in: nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm ib_core nvme_fabrics nvme_core xt_physdev ip6table_raw ip6table_mangle ip6table_filter ip6_tables xt_comment iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle iptable_filter veth ebtable_filter ebtable_nat ebtables iptable_raw vxlan ip6_udp_tunnel udp_tunnel sunrpc joydev pcspkr virtio_balloon br_netfilter bridge stp llc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console net_failover virtio_blk failover ata_piix serio_raw libata virtio_pci virtio_ring virtio
      [  300.514984] CR2: 0000000000000008
      [  300.515569] ---[ end trace faa2eefad7e7f218 ]---
      [  300.516354] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
      [  300.517330] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
      [  300.520353] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
      [  300.521229] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
      [  300.522399] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
      [  300.523560] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
      [  300.524734] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
      [  300.525915] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
      [  300.527084] FS:  0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
      [  300.528396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  300.529440] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
      [  300.530739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  300.531989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  300.533264] Kernel panic - not syncing: Fatal exception
      [  300.534338] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [  300.536227] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Condition check refactoring from Christoph Hellwig.
      Signed-off-by: default avatarMarta Rybczynska <marta.rybczynska@kalray.eu>
      Tested-by: default avatarJean-Baptiste Riaux <jbriaux@kalray.eu>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      66b20ac0
    • Logan Gunthorpe's avatar
      nvme: fix memory leak caused by incorrect subsystem free · e654dfd3
      Logan Gunthorpe authored
      When freeing the subsystem after finding another match with
      __nvme_find_get_subsystem(), use put_device() instead of
      __nvme_release_subsystem() which calls kfree() directly.
      
      Per the documentation, put_device() should always be used
      after device_initialization() is called. Otherwise, leaks
      like the one below which was detected by kmemleak may occur.
      
      Once the call of __nvme_release_subsystem() is removed it no
      longer makes sense to keep the helper, so fold it back
      into nvme_release_subsystem().
      
      unreferenced object 0xffff8883d12bfbc0 (size 16):
        comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s)
        hex dump (first 16 bytes):
          6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff  nvme-subsys2....
        backtrace:
          [<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0
          [<0000000081169e5f>] kvasprintf+0xad/0x130
          [<0000000025626f25>] kvasprintf_const+0x47/0x120
          [<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120
          [<000000004881f8b3>] dev_set_name+0x98/0xc0
          [<000000007124dae3>] nvme_init_identify+0x1995/0x38e0
          [<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0
          [<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80
          [<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220
          [<000000002259b3d5>] __vfs_write+0x66/0x120
          [<000000002f6df81e>] vfs_write+0x154/0x490
          [<000000007e8cfc19>] ksys_write+0x10a/0x240
          [<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0
          [<00000000fee6d692>] do_syscall_64+0xaa/0x470
          [<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: ab9e00cc ("nvme: track subsystems")
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      e654dfd3
    • Misha Nasledov's avatar
      nvme: ignore subnqn for ADATA SX6000LNP · 08b903b5
      Misha Nasledov authored
      The ADATA SX6000LNP NVMe SSDs have the same subnqn and, due to this, a
      system with more than one of these SSDs will only have one usable.
      
      [ 0.942706] nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).
      [ 0.943017] nvme nvme1: Removing after probe failure status: -22
      
      02:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
      71:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
      
      There are no firmware updates available from the vendor, unfortunately.
      Applying the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk for these SSDs resolves
      the issue, and they all work after this patch:
      
      /dev/nvme0n1     2J1120050420         ADATA SX6000LNP [...]
      /dev/nvme1n1     2J1120050540         ADATA SX6000LNP [...]
      Signed-off-by: default avatarMisha Nasledov <misha@nasledov.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      08b903b5
  6. 11 Jul, 2019 1 commit
    • Minwoo Im's avatar
      nvme: fix NULL deref for fabrics options · 7d30c81b
      Minwoo Im authored
      git://git.infradead.org/nvme.git nvme-5.3 branch now causes the
      following NULL deref oops.  Check the ctrl->opts first before the deref.
      
      [   16.337581] BUG: kernel NULL pointer dereference, address: 0000000000000056
      [   16.338551] #PF: supervisor read access in kernel mode
      [   16.338551] #PF: error_code(0x0000) - not-present page
      [   16.338551] PGD 0 P4D 0
      [   16.338551] Oops: 0000 [#1] SMP PTI
      [   16.338551] CPU: 2 PID: 1035 Comm: kworker/u16:5 Not tainted 5.2.0-rc6+ #1
      [   16.338551] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
      [   16.338551] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
      [   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
      [   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
      [   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
      [   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
      [   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
      [   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
      [   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
      [   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
      [   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
      [   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.338551] Call Trace:
      [   16.338551]  nvme_scan_work+0x2c0/0x340 [nvme_core]
      [   16.338551]  ? __switch_to_asm+0x40/0x70
      [   16.338551]  ? _raw_spin_unlock_irqrestore+0x18/0x30
      [   16.338551]  ? try_to_wake_up+0x408/0x450
      [   16.338551]  process_one_work+0x20b/0x3e0
      [   16.338551]  worker_thread+0x1f9/0x3d0
      [   16.338551]  ? cancel_delayed_work+0xa0/0xa0
      [   16.338551]  kthread+0x117/0x120
      [   16.338551]  ? kthread_stop+0xf0/0xf0
      [   16.338551]  ret_from_fork+0x3a/0x50
      [   16.338551] Modules linked in: nvme nvme_core
      [   16.338551] CR2: 0000000000000056
      [   16.338551] ---[ end trace b9bf761a93e62d84 ]---
      [   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
      [   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
      [   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
      [   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
      [   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
      [   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
      [   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
      [   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
      [   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
      [   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
      [   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 958f2a0f ("nvme-tcp: set the STABLE_WRITES flag when data digests are enabled")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarMinwoo Im <minwoo.im.dev@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7d30c81b
  7. 10 Jul, 2019 1 commit
  8. 09 Jul, 2019 18 commits
  9. 24 Jun, 2019 1 commit
  10. 21 Jun, 2019 2 commits