1. 17 Jan, 2018 1 commit
  2. 09 Jun, 2017 1 commit
    • Christoph Hellwig's avatar
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig authored
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarJens Axboe <axboe@fb.com>
  3. 20 Apr, 2017 2 commits
  4. 31 Jan, 2017 1 commit
  5. 27 Jan, 2017 2 commits
  6. 17 Jan, 2017 1 commit
  7. 28 Oct, 2016 1 commit
  8. 21 Jul, 2016 1 commit
  9. 05 May, 2015 1 commit
  10. 22 Sep, 2014 1 commit
  11. 09 Jun, 2014 1 commit
  12. 21 Mar, 2014 1 commit
  13. 21 Feb, 2014 1 commit
  14. 07 Feb, 2014 1 commit
  15. 31 Dec, 2013 1 commit
  16. 25 Oct, 2013 2 commits
    • Jens Axboe's avatar
      blk-mq: new multi-queue block IO queueing mechanism · 320ae51f
      Jens Axboe authored
      Linux currently has two models for block devices:
      - The classic request_fn based approach, where drivers use struct
        request units for IO. The block layer provides various helper
        functionalities to let drivers share code, things like tag
        management, timeout handling, queueing, etc.
      - The "stacked" approach, where a driver squeezes in between the
        block layer and IO submitter. Since this bypasses the IO stack,
        driver generally have to manage everything themselves.
      With drivers being written for new high IOPS devices, the classic
      request_fn based driver doesn't work well enough. The design dates
      back to when both SMP and high IOPS was rare. It has problems with
      scaling to bigger machines, and runs into scaling issues even on
      smaller machines when you have IOPS in the hundreds of thousands
      per device.
      The stacked approach is then most often selected as the model
      for the driver. But this means that everybody has to re-invent
      everything, and along with that we get all the problems again
      that the shared approach solved.
      This commit introduces blk-mq, block multi queue support. The
      design is centered around per-cpu queues for queueing IO, which
      then funnel down into x number of hardware submission queues.
      We might have a 1:1 mapping between the two, or it might be
      an N:M mapping. That all depends on what the hardware supports.
      blk-mq provides various helper functions, which include:
      - Scalable support for request tagging. Most devices need to
        be able to uniquely identify a request both in the driver and
        to the hardware. The tagging uses per-cpu caches for freed
        tags, to enable cache hot reuse.
      - Timeout handling without tracking request on a per-device
        basis. Basically the driver should be able to get a notification,
        if a request happens to fail.
      - Optional support for non 1:1 mappings between issue and
        submission queues. blk-mq can redirect IO completions to the
        desired location.
      - Support for per-request payloads. Drivers almost always need
        to associate a request structure with some driver private
        command structure. Drivers can tell blk-mq this at init time,
        and then any request handed to the driver will have the
        required size of memory associated with it.
      - Support for merging of IO, and plugging. The stacked model
        gets neither of these. Even for high IOPS devices, merging
        sequential IO reduces per-command overhead and thus
        increases bandwidth.
      For now, this is provided as a potential 3rd queueing model, with
      the hope being that, as it matures, it can replace both the classic
      and stacked model. That would get us back to having just 1 real
      model for block devices, leaving the stacked approach to dm/md
      devices (as it was originally intended).
      Contributions in this patch from the following people:
      Shaohua Li <shli@fusionio.com>
      Alexander Gordeev <agordeev@redhat.com>
      Christoph Hellwig <hch@infradead.org>
      Mike Christie <michaelc@cs.wisc.edu>
      Matias Bjorling <m@bjorling.me>
      Jeff Moyer <jmoyer@redhat.com>
      Acked-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
    • Christoph Hellwig's avatar
      block: remove request ref_count · 71fe07d0
      Christoph Hellwig authored
      This reference count has been around since before git history, but the only
      place where it's used is in blk_execute_rq, and ther it is entirely useless
      as it is incremented before submitting the request and decremented in the
      end_io handler before waking up the submitter thread.
      Signed-off-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
  17. 18 Sep, 2013 1 commit
  18. 15 Feb, 2013 1 commit
    • Vladimir Davydov's avatar
      block: account iowait time when waiting for completion of IO request · 5577022f
      Vladimir Davydov authored
      Using wait_for_completion() for waiting for a IO request to be executed
      results in wrong iowait time accounting. For example, a system having
      the only task doing write() and fdatasync() on a block device can be
      reported being idle instead of iowaiting as it should because
      blkdev_issue_flush() calls wait_for_completion() which in turn calls
      schedule() that does not increment the iowait proc counter and thus does
      not turn on iowait time accounting.
      The patch makes block layer use wait_for_completion_io() instead of
      wait_for_completion() where appropriate to account iowait time
      Signed-off-by: 's avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
  19. 07 Feb, 2013 1 commit
  20. 06 Dec, 2012 2 commits
    • Bart Van Assche's avatar
      block: Avoid that request_fn is invoked on a dead queue · c246e80d
      Bart Van Assche authored
      A block driver may start cleaning up resources needed by its
      request_fn as soon as blk_cleanup_queue() finished, so request_fn
      must not be invoked after draining finished. This is important
      when blk_run_queue() is invoked without any requests in progress.
      As an example, if blk_drain_queue() and scsi_run_queue() run in
      parallel, blk_drain_queue() may have finished all requests after
      scsi_run_queue() has taken a SCSI device off the starved list but
      before that last function has had a chance to run the queue.
      Signed-off-by: 's avatarBart Van Assche <bvanassche@acm.org>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Chanho Min <chanho.min@lge.com>
      Acked-by: 's avatarTejun Heo <tj@kernel.org>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
    • Bart Van Assche's avatar
      block: Rename queue dead flag · 3f3299d5
      Bart Van Assche authored
      QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
      stop. After this flag has been set queue draining starts. However,
      during the queue draining phase it is still safe to invoke the
      queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
      This patch has been generated by running the following command
      over the kernel source tree:
      git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
          xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g'      \
              -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g';                \
      sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
          include/linux/blkdev.h;                                       \
      sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \
          -e 's/Dead queue/A dying queue/' block/blk-core.c
      Signed-off-by: 's avatarBart Van Assche <bvanassche@acm.org>
      Acked-by: 's avatarTejun Heo <tj@kernel.org>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Chanho Min <chanho.min@lge.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
  21. 23 Nov, 2012 1 commit
    • Roland Dreier's avatar
      block: Don't access request after it might be freed · 893d290f
      Roland Dreier authored
      After we've done __elv_add_request() and __blk_run_queue() in
      blk_execute_rq_nowait(), the request might finish and be freed
      immediately.  Therefore checking if the type is REQ_TYPE_PM_RESUME
      isn't safe afterwards, because if it isn't, rq might be gone.
      Instead, check beforehand and stash the result in a temporary.
      This fixes crashes in blk_execute_rq_nowait() I get occasionally when
      running with lots of memory debugging options enabled -- I think this
      race is usually harmless because the window for rq to be reallocated
      is so small.
      Signed-off-by: 's avatarRoland Dreier <roland@purestorage.com>
      Cc: stable@kernel.org
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
  22. 20 Jul, 2012 1 commit
  23. 13 Dec, 2011 2 commits
  24. 21 Jul, 2011 1 commit
    • James Bottomley's avatar
      [SCSI] fix crash in scsi_dispatch_cmd() · bfe159a5
      James Bottomley authored
      USB surprise removal of sr is triggering an oops in
      scsi_dispatch_command().  What seems to be happening is that USB is
      hanging on to a queue reference until the last close of the upper
      device, so the crash is caused by surprise remove of a mounted CD
      followed by attempted unmount.
      The problem is that USB doesn't issue its final commands as part of
      the SCSI teardown path, but on last close when the block queue is long
      gone.  The long term fix is probably to make sr do the teardown in the
      same way as sd (so remove all the lower bits on ejection, but keep the
      upper disk alive until last close of user space).  However, the
      current oops can be simply fixed by not allowing any commands to be
      sent to a dead queue.
      Cc: stable@kernel.org
      Signed-off-by: 's avatarJames Bottomley <JBottomley@Parallels.com>
  25. 05 May, 2011 1 commit
  26. 18 Apr, 2011 1 commit
  27. 10 Mar, 2011 1 commit
  28. 24 Sep, 2010 1 commit
    • Mark Lord's avatar
      block: Prevent hang_check firing during long I/O · 4b197769
      Mark Lord authored
      During long I/O operations, the hang_check timer may fire,
      trigger stack dumps that unnecessarily alarm the user.
      Eg.  hdparm --security-erase NULL /dev/sdb  ## can take *hours* to complete
      So, if hang_check is armed, we should wake up periodically
      to prevent it from triggering.  This patch uses a wake-up interval
      equal to half the hang_check timer period, which keeps overhead low enough.
      Signed-off-by: 's avatarMark Lord <mlord@pobox.com>
      Signed-off-by: 's avatarJens Axboe <jaxboe@fusionio.com>
  29. 07 Aug, 2010 1 commit
  30. 28 Apr, 2009 1 commit
    • Tejun Heo's avatar
      block: don't set REQ_NOMERGE unnecessarily · e4025f6c
      Tejun Heo authored
      RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
      mergeable.  There is no reason to specify it superflously.  It only
      adds to confusion.  Don't set REQ_NOMERGE for barriers and requests
      with specific queueing directive.  REQ_NOMERGE is now exclusively used
      by the merging code.
      [ Impact: cleanup ]
      Signed-off-by: 's avatarTejun Heo <tj@kernel.org>
  31. 09 Oct, 2008 1 commit
  32. 15 Jul, 2008 2 commits
  33. 01 Feb, 2008 1 commit
  34. 29 Jan, 2008 1 commit