1. 10 Apr, 2018 1 commit
  2. 24 Jul, 2017 1 commit
  3. 29 Jun, 2017 1 commit
    • Max Gurtovoy's avatar
      blk-mq: map all HWQ also in hyperthreaded system · fe631457
      Max Gurtovoy authored
      This patch performs sequential mapping between CPUs and queues.
      In case the system has more CPUs than HWQs then there are still
      CPUs to map to HWQs. In hyperthreaded system, map the unmapped CPUs
      and their siblings to the same HWQ.
      This actually fixes a bug that found unmapped HWQs in a system with
      2 sockets, 18 cores per socket, 2 threads per core (total 72 CPUs)
      running NVMEoF (opens upto maximum of 64 HWQs).
      Performance results running fio (72 jobs, 128 iodepth)
      using null_blk (w/w.o patch):
      bs      IOPS(read submit_queues=72)   IOPS(write submit_queues=72)   IOPS(read submit_queues=24)  IOPS(write submit_queues=24)
      -----  ----------------------------  ------------------------------ ---------------------------- -----------------------------
      512    4890.4K/4723.5K                 4524.7K/4324.2K                   4280.2K/4264.3K               3902.4K/3909.5K
      1k     4910.1K/4715.2K                 4535.8K/4309.6K                   4296.7K/4269.1K               3906.8K/3914.9K
      2k     4906.3K/4739.7K                 4526.7K/4330.6K                   4301.1K/4262.4K               3890.8K/3900.1K
      4k     4918.6K/4730.7K                 4556.1K/4343.6K                   4297.6K/4264.5K               3886.9K/3893.9K
      8k     4906.4K/4748.9K                 4550.9K/4346.7K                   4283.2K/4268.8K               3863.4K/3858.2K
      16k    4903.8K/4782.6K                 4501.5K/4233.9K                   4292.3K/4282.3K               3773.1K/3773.5K
      32k    4885.8K/4782.4K                 4365.9K/4184.2K                   4307.5K/4289.4K               3780.3K/3687.3K
      64k    4822.5K/4762.7K                 2752.8K/2675.1K                   4308.8K/4312.3K               2651.5K/2655.7K
      128k   2388.5K/2313.8K                 1391.9K/1375.7K                   2142.8K/2152.2K               1395.5K/1374.2K
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  4. 28 Jun, 2017 1 commit
  5. 08 Nov, 2016 1 commit
  6. 15 Sep, 2016 1 commit
  7. 03 Dec, 2015 1 commit
  8. 29 Sep, 2015 1 commit
    • Akinobu Mita's avatar
      blk-mq: avoid inserting requests before establishing new mapping · 5778322e
      Akinobu Mita authored
      Notifier callbacks for CPU_ONLINE action can be run on the other CPU
      than the CPU which was just onlined.  So it is possible for the
      process running on the just onlined CPU to insert request and run
      hw queue before establishing new mapping which is done by
      This can cause a problem when the CPU has just been onlined first time
      since the request queue was initialized.  At this time ctx->index_hw
      for the CPU, which is the index in hctx->ctxs[] for this ctx, is still
      zero before blk_mq_queue_reinit_notify() is called by notifier
      callbacks for CPU_ONLINE action.
      For example, there is a single hw queue (hctx) and two CPU queues
      (ctx0 for CPU0, and ctx1 for CPU1).  Now CPU1 is just onlined and
      a request is inserted into ctx1->rq_list and set bit0 in pending
      bitmap as ctx1->index_hw is still zero.
      And then while running hw queue, flush_busy_ctxs() finds bit0 is set
      in pending bitmap and tries to retrieve requests in
      hctx->ctxs[0]->rq_list.  But htx->ctxs[0] is a pointer to ctx0, so the
      request in ctx1->rq_list is ignored.
      Fix it by ensuring that new mapping is established before onlined cpu
      starts running.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: default avatarMing Lei <tom.leiming@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
  9. 27 May, 2015 1 commit
  10. 09 Dec, 2014 1 commit
    • Bart Van Assche's avatar
      blk-mq: Use all available hardware queues · 959f5f5b
      Bart Van Assche authored
      Suppose that a system has two CPU sockets, three cores per socket,
      that it does not support hyperthreading and that four hardware
      queues are provided by a block driver. With the current algorithm
      this will lead to the following assignment of CPU cores to hardware
        HWQ 0: 0 1
        HWQ 1: 2 3
        HWQ 2: 4 5
        HWQ 3: (none)
      This patch changes the queue assignment into:
        HWQ 0: 0 1
        HWQ 1: 2
        HWQ 2: 3 4
        HWQ 3: 5
      In other words, this patch has the following three effects:
      - All four hardware queues are used instead of only three.
      - CPU cores are spread more evenly over hardware queues. For the
        above example the range of the number of CPU cores associated
        with a single HWQ is reduced from [0..2] to [1..2].
      - If the number of HWQ's is a multiple of the number of CPU sockets
        it is now guaranteed that all CPU cores associated with a single
        HWQ reside on the same CPU socket.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
  11. 24 Nov, 2014 1 commit
  12. 28 May, 2014 1 commit
  13. 27 May, 2014 1 commit
  14. 15 Apr, 2014 1 commit
  15. 20 Mar, 2014 1 commit
  16. 25 Oct, 2013 1 commit
    • Jens Axboe's avatar
      blk-mq: new multi-queue block IO queueing mechanism · 320ae51f
      Jens Axboe authored
      Linux currently has two models for block devices:
      - The classic request_fn based approach, where drivers use struct
        request units for IO. The block layer provides various helper
        functionalities to let drivers share code, things like tag
        management, timeout handling, queueing, etc.
      - The "stacked" approach, where a driver squeezes in between the
        block layer and IO submitter. Since this bypasses the IO stack,
        driver generally have to manage everything themselves.
      With drivers being written for new high IOPS devices, the classic
      request_fn based driver doesn't work well enough. The design dates
      back to when both SMP and high IOPS was rare. It has problems with
      scaling to bigger machines, and runs into scaling issues even on
      smaller machines when you have IOPS in the hundreds of thousands
      per device.
      The stacked approach is then most often selected as the model
      for the driver. But this means that everybody has to re-invent
      everything, and along with that we get all the problems again
      that the shared approach solved.
      This commit introduces blk-mq, block multi queue support. The
      design is centered around per-cpu queues for queueing IO, which
      then funnel down into x number of hardware submission queues.
      We might have a 1:1 mapping between the two, or it might be
      an N:M mapping. That all depends on what the hardware supports.
      blk-mq provides various helper functions, which include:
      - Scalable support for request tagging. Most devices need to
        be able to uniquely identify a request both in the driver and
        to the hardware. The tagging uses per-cpu caches for freed
        tags, to enable cache hot reuse.
      - Timeout handling without tracking request on a per-device
        basis. Basically the driver should be able to get a notification,
        if a request happens to fail.
      - Optional support for non 1:1 mappings between issue and
        submission queues. blk-mq can redirect IO completions to the
        desired location.
      - Support for per-request payloads. Drivers almost always need
        to associate a request structure with some driver private
        command structure. Drivers can tell blk-mq this at init time,
        and then any request handed to the driver will have the
        required size of memory associated with it.
      - Support for merging of IO, and plugging. The stacked model
        gets neither of these. Even for high IOPS devices, merging
        sequential IO reduces per-command overhead and thus
        increases bandwidth.
      For now, this is provided as a potential 3rd queueing model, with
      the hope being that, as it matures, it can replace both the classic
      and stacked model. That would get us back to having just 1 real
      model for block devices, leaving the stacked approach to dm/md
      devices (as it was originally intended).
      Contributions in this patch from the following people:
      Shaohua Li <shli@fusionio.com>
      Alexander Gordeev <agordeev@redhat.com>
      Christoph Hellwig <hch@infradead.org>
      Mike Christie <michaelc@cs.wisc.edu>
      Matias Bjorling <m@bjorling.me>
      Jeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>