1. 19 Dec, 2018 1 commit
  2. 15 Nov, 2018 1 commit
  3. 07 Nov, 2018 4 commits
  4. 31 Oct, 2018 1 commit
  5. 24 Jul, 2018 1 commit
  6. 09 Jul, 2018 1 commit
  7. 08 Mar, 2018 2 commits
  8. 11 Nov, 2017 1 commit
  9. 23 Aug, 2017 1 commit
  10. 27 Jun, 2017 1 commit
  11. 08 Apr, 2017 1 commit
  12. 08 Feb, 2017 1 commit
  13. 02 Feb, 2017 1 commit
  14. 13 Dec, 2016 1 commit
  15. 01 Dec, 2016 1 commit
  16. 10 Nov, 2016 1 commit
    • Jens Axboe's avatar
      block: hook up writeback throttling · 87760e5e
      Jens Axboe authored
      Enable throttling of buffered writeback to make it a lot
      more smooth, and has way less impact on other system activity.
      Background writeback should be, by definition, background
      activity. The fact that we flush huge bundles of it at the time
      means that it potentially has heavy impacts on foreground workloads,
      which isn't ideal. We can't easily limit the sizes of writes that
      we do, since that would impact file system layout in the presence
      of delayed allocation. So just throttle back buffered writeback,
      unless someone is waiting for it.
      
      The algorithm for when to throttle takes its inspiration in the
      CoDel networking scheduling algorithm. Like CoDel, blk-wb monitors
      the minimum latencies of requests over a window of time. In that
      window of time, if the minimum latency of any request exceeds a
      given target, then a scale count is incremented and the queue depth
      is shrunk. The next monitoring window is shrunk accordingly. Unlike
      CoDel, if we hit a window that exhibits good behavior, then we
      simply increment the scale count and re-calculate the limits for that
      scale value. This prevents us from oscillating between a
      close-to-ideal value and max all the time, instead remaining in the
      windows where we get good behavior.
      
      Unlike CoDel, blk-wb allows the scale count to to negative. This
      happens if we primarily have writes going on. Unlike positive
      scale counts, this doesn't change the size of the monitoring window.
      When the heavy writers finish, blk-bw quickly snaps back to it's
      stable state of a zero scale count.
      
      The patch registers a sysfs entry, 'wb_lat_usec'. This sets the latency
      target to me met. It defaults to 2 msec for non-rotational storage, and
      75 msec for rotational storage. Setting this value to '0' disables
      blk-wb. Generally, a user would not have to touch this setting.
      
      We don't enable WBT on devices that are managed with CFQ, and have
      a non-root block cgroup attached. If we have a proportional share setup
      on this particular disk, then the wbt throttling will interfere with
      that. We don't have a strong need for wbt for that case, since we will
      rely on CFQ doing that for us.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      87760e5e
  17. 05 Nov, 2016 1 commit
    • Jens Axboe's avatar
      block: add code to track actual device queue depth · d278d4a8
      Jens Axboe authored
      For blk-mq, ->nr_requests does track queue depth, at least at init
      time. But for the older queue paths, it's simply a soft setting.
      On top of that, it's generally larger than the hardware setting
      on purpose, to allow backup of requests for merging.
      
      Fill a hole in struct request with a 'queue_depth' member, that
      drivers can call to more closely inform the block layer of the
      real queue depth.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      d278d4a8
  18. 18 Oct, 2016 2 commits
  19. 13 Apr, 2016 1 commit
  20. 12 Apr, 2016 2 commits
  21. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  22. 11 Feb, 2016 2 commits
  23. 26 Nov, 2015 1 commit
    • Martin K. Petersen's avatar
      block/sd: Fix device-imposed transfer length limits · ca369d51
      Martin K. Petersen authored
      Commit 4f258a46 ("sd: Fix maximum I/O size for BLOCK_PC requests")
      had the unfortunate side-effect of removing an implicit clamp to
      BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
      code. This caused problems for some SMR drives.
      
      Debugging this issue revealed a few problems with the existing
      infrastructure since the block layer didn't know how to deal with
      device-imposed limits, only limits set by the I/O controller.
      
       - Introduce a new queue limit, max_dev_sectors, which is used by the
         ULD to signal the maximum sectors for a REQ_TYPE_FS request.
      
       - Ensure that max_dev_sectors is correctly stacked and taken into
         account when overriding max_sectors through sysfs.
      
       - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
         values for later processing.
      
       - In sd_revalidate() set the queue's max_dev_sectors based on the
         MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
         is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
         field size.
      
       - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
         VPD--if reported and sane--to signal the preferred device transfer
         size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.
      
       - blk_limits_max_hw_sectors() is no longer used and can be removed.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: sweeneygj@gmx.com
      Tested-by: default avatarArzeets <anatol.pomozov@gmail.com>
      Tested-by: default avatarDavid Eisner <david.eisner@oriel.oxon.org>
      Tested-by: default avatarMario Kicherer <dev@kicherer.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      ca369d51
  24. 19 Aug, 2015 1 commit
  25. 18 Aug, 2015 1 commit
    • Jeff Moyer's avatar
      Revert "block: remove artifical max_hw_sectors cap" · 30e2bc08
      Jeff Moyer authored
      This reverts commit 34b48db6.
      That commit caused performance regressions for streaming I/O
      workloads on a number of different storage devices, from
      SATA disks to external RAID arrays.  It also managed to
      trip up some buggy firmware in at least one drive, causing
      data corruption.
      
      The next patch will bump the default max_sectors_kb value to
      1280, which will accommodate a 10-data-disk stripe write
      with chunk size 128k.  In the testing I've done using iozone,
      fio, and aio-stress, a value of 1280 does not show a big
      performance difference from 512.  This will hopefully still
      help the software RAID setup that Christoph saw the original
      performance gains with while still not regressing other
      storage configurations.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      30e2bc08
  26. 13 Aug, 2015 1 commit
    • Kent Overstreet's avatar
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet authored
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: default avatarDongsu Park <dpark@posteo.net>
      Signed-off-by: default avatarMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8ae12666
  27. 12 Aug, 2015 1 commit
  28. 17 Jul, 2015 1 commit
    • Jens Axboe's avatar
      block: make /sys/block/<dev>/queue/discard_max_bytes writeable · 0034af03
      Jens Axboe authored
      Lots of devices support huge discard sizes these days. Depending
      on how the device handles them internally, huge discards can
      introduce massive latencies (hundreds of msec) on the device side.
      
      We have a sysfs file, discard_max_bytes, that advertises the max
      hardware supported discard size. Make this writeable, and split
      the settings into a soft and hard limit. This can be set from
      'discard_granularity' and up to the hardware limit.
      
      Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
      set limit.
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0034af03
  29. 31 Mar, 2015 1 commit
    • Mike Snitzer's avatar
      block: fix blk_stack_limits() regression due to lcm() change · e9637415
      Mike Snitzer authored
      Linux 3.19 commit 69c953c8 ("lib/lcm.c: lcm(n,0)=lcm(0,n) is 0, not n")
      caused blk_stack_limits() to not properly stack queue_limits for stacked
      devices (e.g. DM).
      
      Fix this regression by establishing lcm_not_zero() and switching
      blk_stack_limits() over to using it.
      
      DM uses blk_set_stacking_limits() to establish the initial top-level
      queue_limits that are then built up based on underlying devices' limits
      using blk_stack_limits().  In the case of optimal_io_size (io_opt)
      blk_set_stacking_limits() establishes a default value of 0.  With commit
      69c953c8, lcm(0, n) is no longer n, which compromises proper stacking of
      the underlying devices' io_opt.
      
      Test:
      $ modprobe scsi_debug dev_size_mb=10 num_tgts=1 opt_blks=1536
      $ cat /sys/block/sde/queue/optimal_io_size
      786432
      $ dmsetup create node --table "0 100 linear /dev/sde 0"
      
      Before this fix:
      $ cat /sys/block/dm-5/queue/optimal_io_size
      0
      
      After this fix:
      $ cat /sys/block/dm-5/queue/optimal_io_size
      786432
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.19+
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e9637415
  30. 21 Oct, 2014 1 commit
    • Christoph Hellwig's avatar
      block: remove artifical max_hw_sectors cap · 34b48db6
      Christoph Hellwig authored
      Set max_sectors to the value the drivers provides as hardware limit by
      default.  Linux had proper I/O throttling for a long time and doesn't
      rely on a artifically small maximum I/O size anymore.  By not limiting
      the I/O size by default we remove an annoying tuning step required for
      most Linux installation.
      
      Note that both the user, and if absolutely required the driver can still
      impose a limit for FS requests below max_hw_sectors_kb.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      34b48db6
  31. 09 Oct, 2014 1 commit
  32. 10 Jun, 2014 1 commit
  33. 05 Jun, 2014 1 commit
    • Jens Axboe's avatar
      block: add notion of a chunk size for request merging · 762380ad
      Jens Axboe authored
      Some drivers have different limits on what size a request should
      optimally be, depending on the offset of the request. Similar to
      dividing a device into chunks. Add a setting that allows the driver
      to inform the block layer of such a chunk size. The block layer will
      then prevent merging across the chunks.
      
      This is needed to optimally support NVMe with a non-zero stripe size.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      762380ad