Skip to content
  • James Smart's avatar
    nvme-fc: fix rogue admin cmds stalling teardown · d625d05e
    James Smart authored
    
    
    When connectivity is lost to a device, the association is terminated
    and the blk-mq queues are quiesced/stopped. When connectivity is
    re-established, they are resumed.
    
    If an admin command is received while connectivity is list, the ioctl
    queues the command on the admin_q and the command stalls (the thread
    issuing the ioctl hangs/waits). if the connectivity is lost long
    enough such that the controller is then deleted, the delete code
    makes its calls to initiate the delete, which then expects the core
    layer to call the transport when all references are removed and the
    controller can be freed.  Unfortunately, nothing in this path dequeued
    the admin command, so a reference sits outstanding and things stop,
    hanging the delete indefinitely.
    
    Correct by unquiescing the admin queue in the delete association. This
    means any admin command (which should only be from an ioctl) issued
    after connectivity is lost will detect the controller is in a
    reconnecting state and will (fast) fail the command. Thus, a pending
    reference can no longer be created.  Once connectivity is re-established,
    a new ioctl/admin command would see proper device state and function again.
    
    Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    d625d05e