1. 08 Mar, 2018 1 commit
    • James Smart's avatar
      nvme_fc: rework sqsize handling · d157e534
      James Smart authored
      Corrected four outstanding issues in the transport around sqsize.
      
      1: Create Connection LS is sending the 1's-based sqsize, should be
      sending the 0's-based value.
      
      2: allocation of hw queue is using the 0's-base size. It should be
      using the 1's-based value.
      
      3: normalization of ctrl.sqsize by MQES is using MQES+1 (1's-based
      value). It should be MQES (0's-based value).
      
      4: Missing clause to ensure queue_count not larger than ctrl->sqsize.
      
      Corrected by:
      Clean up routines that pass queue size around. The queue size value is
      the actual count (1's-based) value and determined from ctrl->sqsize + 1.
      
      Routines that send 0's-based value adapt from queue size.
      
      Sset ctrl->sqsize properly for MQES.
      
      Added clause to nsure queue_count not larger than ctrl->sqsize + 1.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      d157e534
  2. 11 Feb, 2018 2 commits
    • James Smart's avatar
      nvme_fc: cleanup io completion · c3aedd22
      James Smart authored
      There was some old cold that dealt with complete_rq being called
      prior to the lldd returning the io completion. This is garbage code.
      The complete_rq routine was being called after eh_timeouts were
      called and it was due to eh_timeouts not being handled properly.
      The timeouts were fixed in prior patches so that in general, a
      timeout will initiate an abort and the reset timer restarted as
      the abort operation will take care of completing things. Given the
      reset timer restarted, the erroneous complete_rq calls were eliminated.
      
      So remove the work that was synchronizing complete_rq with io
      completion.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      c3aedd22
    • James Smart's avatar
      nvme_fc: correct abort race condition on resets · 3efd6e8e
      James Smart authored
      During reset handling, there is live io completing while the reset
      is taking place. The reset path attempts to abort all outstanding io,
      counting the number of ios that were reset. It then waits for those
      ios to be reclaimed from the lldd before continuing.
      
      The transport's logic on io state and flag setting was poor, allowing
      ios to complete simultaneous to the abort request. The completed ios
      were counted, but as the completion had already occurred, the
      completion never reduced the count. As the count never zeros, the
      reset/delete never completes.
      
      Tighten it up by unconditionally changing the op state to completed
      when the io done handler is called.  The reset/abort path now changes
      the op state to aborted, but the abort only continues if the op
      state was live priviously. If complete, the abort is backed out.
      Thus proper counting of io aborts and their completions is working
      again.
      
      Also removed the TERMIO state on the op as it's redundant with the
      op's aborted state.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      3efd6e8e
  3. 08 Feb, 2018 1 commit
  4. 31 Jan, 2018 1 commit
    • Ming Lei's avatar
      blk-mq: introduce BLK_STS_DEV_RESOURCE · 86ff7c2a
      Ming Lei authored
      This status is returned from driver to block layer if device related
      resource is unavailable, but driver can guarantee that IO dispatch
      will be triggered in future when the resource is available.
      
      Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
      returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
      a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
      3 ms because both scsi-mq and nvmefc are using that magic value.
      
      If a driver can make sure there is in-flight IO, it is safe to return
      BLK_STS_DEV_RESOURCE because:
      
      1) If all in-flight IOs complete before examining SCHED_RESTART in
      blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
      is run immediately in this case by blk_mq_dispatch_rq_list();
      
      2) if there is any in-flight IO after/when examining SCHED_RESTART
      in blk_mq_dispatch_rq_list():
      - if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
      - otherwise, this request will be dispatched after any in-flight IO is
        completed via blk_mq_sched_restart()
      
      3) if SCHED_RESTART is set concurently in context because of
      BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
      cases and make sure IO hang can be avoided.
      
      One invariant is that queue will be rerun if SCHED_RESTART is set.
      Suggested-by: default avatarJens Axboe <axboe@kernel.dk>
      Tested-by: default avatarLaurence Oberman <loberman@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      86ff7c2a
  5. 17 Jan, 2018 2 commits
    • James Smart's avatar
      nvme-fc: correct hang in nvme_ns_remove() · 0fd997d3
      James Smart authored
      When connectivity is lost to a device, the association is terminated
      and the blk-mq queues are quiesced/stopped. When connectivity is
      re-established, they are resumed.
      
      If connectivity is lost for a sufficient amount of time that the
      controller is then deleted, the delete path starts tearing down queues,
      and eventually calling nvme_ns_remove(). It appears that pending
      commands may cause blk_cleanup_queue() to never complete and the
      teardown stalls.
      
      Correct by starting the ns queues after transitioning to a DELETING
      state, allowing pending commands to be flushed with io failures. Thus
      the delete path is clear when reached.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      0fd997d3
    • James Smart's avatar
      nvme-fc: fix rogue admin cmds stalling teardown · d625d05e
      James Smart authored
      When connectivity is lost to a device, the association is terminated
      and the blk-mq queues are quiesced/stopped. When connectivity is
      re-established, they are resumed.
      
      If an admin command is received while connectivity is list, the ioctl
      queues the command on the admin_q and the command stalls (the thread
      issuing the ioctl hangs/waits). if the connectivity is lost long
      enough such that the controller is then deleted, the delete code
      makes its calls to initiate the delete, which then expects the core
      layer to call the transport when all references are removed and the
      controller can be freed.  Unfortunately, nothing in this path dequeued
      the admin command, so a reference sits outstanding and things stop,
      hanging the delete indefinitely.
      
      Correct by unquiescing the admin queue in the delete association. This
      means any admin command (which should only be from an ioctl) issued
      after connectivity is lost will detect the controller is in a
      reconnecting state and will (fast) fail the command. Thus, a pending
      reference can no longer be created.  Once connectivity is re-established,
      a new ioctl/admin command would see proper device state and function again.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      d625d05e
  6. 08 Jan, 2018 1 commit
  7. 15 Dec, 2017 1 commit
  8. 24 Nov, 2017 1 commit
  9. 20 Nov, 2017 1 commit
  10. 11 Nov, 2017 5 commits
  11. 01 Nov, 2017 9 commits
  12. 27 Oct, 2017 3 commits
  13. 20 Oct, 2017 2 commits
    • James Smart's avatar
      nvme-fc: correct io timeout behavior · 134aedc9
      James Smart authored
      The transport io timeout behavior wasn't quite correct. It ignored
      that the io error handler is supposed to be synchronous so it possibly
      allowed the blk request to be restarted while the io associated was
      still aborting. Timeouts on reserved commands, those used for
      association create, were never timing out thus they hung out forever.
      
      To correct:
      If an io is times out while a remoteport is not connected, just
      restart the io timer. The lack of connectivity will simultaneously
      be resetting the controller, so the reset path will abort and terminate
      the io.
      
      If an io is times out while it was marked for transport abort, just
      reset the io timer. The abort process is underway and will complete
      the io.
      
      Otherwise, if an io times out, abort the io. If the abort was
      unsuccessful (unlikely) give up and return not handled.
      
      If the abort was successful, as the abort process is underway it will
      terminate the io, so rather than synchronously waiting, just restart
      the io timer.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      134aedc9
    • James Smart's avatar
      nvme-fc: correct io termination handling · 0a02e39f
      James Smart authored
      The io completion handling for i/o's that are failing due to
      to a transport error or association termination had issues, causing
      io failures (DNR set so retries didn't kick in) or long stalls.
      
      Change the io completion handler for the following items:
      
      When an io has been completed due to a transport abort (based on an
      exchange error) or when marked as aborted as part of an association
      termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
      NVME_SC_ABORTED. By default, do not set DNR on the status so that a
      retry can be attempted after association recreate.
      
      In cases where an io is failed (non-successful nvme status including
      aborted), if the controller is being deleted (blk_queue_dying) or
      the io was part of the ios used for association creation (ctrl state
      is NEW or RECONNECTING), then additionally set the DNR bit so the io
      will not be retried. If the failed io was part of association creation,
      the failure will tear down the partially completioned association and
      typically restart a new reconnect attempt (another create association
      later).
      
      Rearranged code flow to remove a largely unneeded local variable.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      0a02e39f
  14. 19 Oct, 2017 1 commit
  15. 18 Oct, 2017 3 commits
  16. 05 Oct, 2017 1 commit
  17. 04 Oct, 2017 2 commits
    • James Smart's avatar
      nvme-fc: create fc class and transport device · 5f568556
      James Smart authored
      Added a new fc class and a device node for udev events under it.  I
      expect the fc class will eventually be the location where the FC SCSI and
      FC NVME merge in the future. Therefore names are kept somewhat generic.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      5f568556
    • James Smart's avatar
      nvme-fc: add uevent for auto-connect · eaefd5ab
      James Smart authored
      To support auto-connecting to FC-NVME devices upon their dynamic
      appearance, add a uevent that can kick off connection scripts.
      uevent is posted against the fc_udev device.
      
      patch set tested with the following rule to kick an nvme-cli connect-all
      for the FC initiator and FC target ports. This is just an example for
      testing and not intended for real life use.
      
      ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
              ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
      	RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'"
      
      I will post proposed udev/systemd scripts for possible kernel support.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      eaefd5ab
  18. 25 Sep, 2017 2 commits
  19. 28 Aug, 2017 1 commit
    • James Smart's avatar
      nvme-fc: Reattach to localports on re-registration · 5533d424
      James Smart authored
      If the LLDD resets or detaches from an fc port, the LLDD will
      deregister all remoteports seen by the fc port and deregister the
      localport associated with the fc port. The teardown of the localport
      structure will be held off due to reference counting until all the
      remoteports are removed (and they are held off until all
      controllers/associations to terminated). Currently, if the fc port
      is reinit/reattached and registered again as a localport it is
      treated as an independent entity from the prior localport and all
      prior remoteports and controllers cannot be revived. They are
      created as new and separate entities.
      
      This patch changes the localport registration to look at the known
      localports that are waiting to be torndown. If they are the same port
      based on wwn's, the local port is transitioned out of the teardown
      state.  This allows the remote ports and controller connections to
      be reestablished and resumed as long as the localport can also be
      reregistered within the timeout windows.
      
      The patch adds a new routine nvme_fc_attach_to_unreg_lport() with
      the functionality and moves the lport get/put routines to avoid
      forward references.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      5533d424