1. 19 Apr, 2017 1 commit
    • Paolo Valente's avatar
      block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler · aee69d78
      Paolo Valente authored
      We tag as v0 the version of BFQ containing only BFQ's engine plus
      hierarchical support. BFQ's engine is introduced by this commit, while
      hierarchical support is added by next commit. We use the v0 tag to
      distinguish this minimal version of BFQ from the versions containing
      also the features and the improvements added by next commits. BFQ-v0
      coincides with the version of BFQ submitted a few years ago [1], apart
      from the introduction of preemption, described below.
      BFQ is a proportional-share I/O scheduler, whose general structure,
      plus a lot of code, are borrowed from CFQ.
      - Each process doing I/O on a device is associated with a weight and a
      - BFQ grants exclusive access to the device, for a while, to one queue
        (process) at a time, and implements this service model by
        associating every queue with a budget, measured in number of
        - After a queue is granted access to the device, the budget of the
          queue is decremented, on each request dispatch, by the size of the
        - The in-service queue is expired, i.e., its service is suspended,
          only if one of the following events occurs: 1) the queue finishes
          its budget, 2) the queue empties, 3) a "budget timeout" fires.
          - The budget timeout prevents processes doing random I/O from
            holding the device for too long and dramatically reducing
          - Actually, as in CFQ, a queue associated with a process issuing
            sync requests may not be expired immediately when it empties. In
            contrast, BFQ may idle the device for a short time interval,
            giving the process the chance to go on being served if it issues
            a new request in time. Device idling typically boosts the
            throughput on rotational devices, if processes do synchronous
            and sequential I/O. In addition, under BFQ, device idling is
            also instrumental in guaranteeing the desired throughput
            fraction to processes issuing sync requests (see [2] for
            - With respect to idling for service guarantees, if several
              processes are competing for the device at the same time, but
              all processes (and groups, after the following commit) have
              the same weight, then BFQ guarantees the expected throughput
              distribution without ever idling the device. Throughput is
              thus as high as possible in this common scenario.
        - Queues are scheduled according to a variant of WF2Q+, named
          B-WF2Q+, and implemented using an augmented rb-tree to preserve an
          O(log N) overall complexity.  See [2] for more details. B-WF2Q+ is
          also ready for hierarchical scheduling. However, for a cleaner
          logical breakdown, the code that enables and completes
          hierarchical support is provided in the next commit, which focuses
          exactly on this feature.
        - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
          perfectly fair, and smooth service. In particular, B-WF2Q+
          guarantees that each queue receives a fraction of the device
          throughput proportional to its weight, even if the throughput
          fluctuates, and regardless of: the device parameters, the current
          workload and the budgets assigned to the queue.
        - The last, budget-independence, property (although probably
          counterintuitive in the first place) is definitely beneficial, for
          the following reasons:
          - First, with any proportional-share scheduler, the maximum
            deviation with respect to an ideal service is proportional to
            the maximum budget (slice) assigned to queues. As a consequence,
            BFQ can keep this deviation tight not only because of the
            accurate service of B-WF2Q+, but also because BFQ *does not*
            need to assign a larger budget to a queue to let the queue
            receive a higher fraction of the device throughput.
          - Second, BFQ is free to choose, for every process (queue), the
            budget that best fits the needs of the process, or best
            leverages the I/O pattern of the process. In particular, BFQ
            updates queue budgets with a simple feedback-loop algorithm that
            allows a high throughput to be achieved, while still providing
            tight latency guarantees to time-sensitive applications. When
            the in-service queue expires, this algorithm computes the next
            budget of the queue so as to:
            - Let large budgets be eventually assigned to the queues
              associated with I/O-bound applications performing sequential
              I/O: in fact, the longer these applications are served once
              got access to the device, the higher the throughput is.
            - Let small budgets be eventually assigned to the queues
              associated with time-sensitive applications (which typically
              perform sporadic and short I/O), because, the smaller the
              budget assigned to a queue waiting for service is, the sooner
              B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
      - Weights can be assigned to processes only indirectly, through I/O
        priorities, and according to the relation:
        weight = 10 * (IOPRIO_BE_NR - ioprio).
        The next patch provides, instead, a cgroups interface through which
        weights can be assigned explicitly.
      - If several processes are competing for the device at the same time,
        but all processes and groups have the same weight, then BFQ
        guarantees the expected throughput distribution without ever idling
        the device. It uses preemption instead. Throughput is then much
        higher in this common scenario.
      - ioprio classes are served in strict priority order, i.e.,
        lower-priority queues are not served as long as there are
        higher-priority queues.  Among queues in the same class, the
        bandwidth is distributed in proportion to the weight of each
        queue. A very thin extra bandwidth is however guaranteed to the Idle
        class, to prevent it from starving.
      - If the strict_guarantees parameter is set (default: unset), then BFQ
           - always performs idling when the in-service queue becomes empty;
           - forces the device to serve one I/O request at a time, by
             dispatching a new request only if there is no outstanding
        In the presence of differentiated weights or I/O-request sizes,
        both the above conditions are needed to guarantee that every
        queue receives its allotted share of the bandwidth (see
        Documentation/block/bfq-iosched.txt for more details). Setting
        strict_guarantees may evidently affect throughput.
      [1] https://lkml.org/lkml/2008/4/1/234
      [2] P. Valente and M. Andreolini, "Improving Application
          Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
          the 5th Annual International Systems and Storage Conference
          (SYSTOR '12), June 2012.
          Slightly extended version:
      Signed-off-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
  2. 31 Mar, 2016 1 commit
  3. 11 Feb, 2014 1 commit
    • Henrik Austad's avatar
      Documentation/: update 00-INDEX files · 3cf8ca1c
      Henrik Austad authored
      Some of the 00-INDEX files are somewhat outdated and some folders does
      not contain 00-INDEX at all.  Only outdated (with the notably exception
      of spi) indexes are touched here, the 169 folders without 00-INDEX has
      not been touched.
      New 00-INDEX
       - spi/* was added in a series of commits dating back to 2006
      Added files (missing in (*/)00-INDEX)
       - dmatest.txt was added by commit 851b7e16 ("dmatest: run test via
       - this_cpu_ops.txt was added by commit a1b2a555 ("percpu: add
         documentation on this_cpu operations")
       - ww-mutex-design.txt was added by commit 040a0a37 ("mutex: Add
         support for wound/wait style locks")
       - bcache.txt was added by commit cafe5635 ("bcache: A block layer
       - kernel-per-CPU-kthreads.txt was added by commit 49717cb4
         ("kthread: Document ways of reducing OS jitter due to per-CPU
       - phy.txt was added by commit ff764963 ("drivers: phy: add generic
         PHY framework")
       - block/null_blk was added by commit 12f8f4fc ("null_blk:
       - module-signing.txt was added by commit 3cafea30 ("Add
         Documentation/module-signing.txt file")
       - assoc_array.txt was added by commit 3cb98950 ("Add a generic
         associative array implementation.")
       - arm/IXP4xx was part of the initial repo
       - arm/cluster-pm-race-avoidance.txt was added by commit 7fe31d28
         ("ARM: mcpm: introduce helpers for platform coherency exit/setup")
       - arm/firmware.txt was added by commit 7366b92a ("ARM: Add
         interface for registering and calling firmware-specific operations")
       - arm/kernel_mode_neon.txt was added by commit 2afd0a05 ("ARM:
         7825/1: document the use of NEON in kernel mode")
       - arm/tcm.txt was added by commit bc581770 ("ARM: 5580/2: ARM TCM
         (Tightly-Coupled Memory) support v3")
       - arm/vlocks.txt was added by commit 9762f12d ("ARM: mcpm: Add
         baremetal voting mutexes")
       - blackfin/gptimers-example.c, Makefile was added by commit
         4b60779d ("Blackfin: add an example showing how to use the
         gptimers API")
       - devicetree/usage-model.txt was added by commit 31134efc ("dt:
         Linux DT usage model documentation")
       - fb/api.txt was added by commit fb21c2f4 ("fbdev: Add FOURCC-based
         format configuration API")
       - fb/sm501.txt was added by commit e6a04980 ("video, sm501: add
         edid and commandline support")
       - fb/udlfb.txt was added by commit 96f8d864 ("fbdev: move udlfb out
         of staging.")
       - filesystems/Makefile was added by commit 1e0051ae
         ("Documentation/fs/: split txt and source files")
       - filesystems/nfs/nfsd-admin-interfaces.txt was added by commit
         8a4c6e19 ("nfsd: document kernel interfaces for nfsd
       - ide/warm-plug-howto.txt was added by commit f74c9141 ("ide: add
         warm-plug support for IDE devices (take 2)")
       - laptops/Makefile was added by commit d49129ac
         ("Documentation/laptop/: split txt and source files")
       - leds/leds-blinkm.txt was added by commit b54cf35a ("LEDS: add
         BlinkM RGB LED driver, documentation and update MAINTAINERS")
       - leds/ledtrig-oneshot.txt was added by commit 5e417281 ("leds: add
         oneshot trigger")
       - leds/ledtrig-transient.txt was added by commit 44e1e9f8 ("leds:
         add new transient trigger for one shot timer activation")
       - m68k/README.buddha was part of the initial repo
       - networking/LICENSE.(qla3xxx|qlcnic|qlge) was added by commits
         40839129, c4e84bde, 5a4faa87
       - networking/Makefile was added by commit 3794f3e8 ("docsrc: build
         Documentation/ sources")
       - networking/i40evf.txt was added by commit 105bf2fe ("i40evf: add
         driver to kernel build system")
       - networking/ipsec.txt was added by commit b3c6efbc ("xfrm: Add
         file to document IPsec corner case")
       - networking/mac80211-auth-assoc-deauth.txt was added by commit
         3cd7920a ("mac80211: add auth/assoc/deauth flow diagram")
       - networking/netlink_mmap.txt was added by commit 5683264c
         ("netlink: add documentation for memory mapped I/O")
       - networking/nf_conntrack-sysctl.txt was added by commit c9f9e0e1
         ("netfilter: doc: add nf_conntrack sysctl api documentation") lan)
       - networking/team.txt was added by commit 3d249d4c ("net: introduce
         ethernet teaming device")
       - networking/vxlan.txt was added by commit d342894c ("vxlan:
         virtual extensible lan")
       - power/runtime_pm.txt was added by commit 5e928f77 ("PM: Introduce
         core framework for run-time PM of I/O devices (rev.  17)")
       - power/charger-manager.txt was added by commit 3bb3dbbd
         ("power_supply: Add initial Charger-Manager driver")
       - RCU/lockdep-splat.txt was added by commit d7bd2d68 ("rcu:
         Document interpretation of RCU-lockdep splats")
       - s390/kvm.txt was added by 5ecee4ba (KVM: s390: API documentation)
       - s390/qeth.txt was added by commit b4d72c08 ("qeth: bridgeport
         support - basic control")
       - scheduler/sched-bwc.txt was added by commit 88ebc08e ("sched: Add
         documentation for bandwidth control")
       - scsi/advansys.txt was added by commit 4bd6d7f3 ("[SCSI] advansys:
         Move documentation to Documentation/scsi")
       - scsi/bfa.txt was added by commit 1ec90174 ("[SCSI] bfa: add
         readme file")
       - scsi/bnx2fc.txt was added by commit 12b8fc10 ("[SCSI] bnx2fc: Add
         driver documentation")
       - scsi/cxgb3i.txt was added by commit c3673464 ("[SCSI] cxgb3i: Add
         cxgb3i iSCSI driver.")
       - scsi/hpsa.txt was added by commit 992ebcf1 ("[SCSI] hpsa: Add
         hpsa.txt to Documentation/scsi")
       - scsi/link_power_management_policy.txt was added by commit
         ca77329f ("[libata] Link power management infrastructure")
       - scsi/osd.txt was added by commit 78e0c621 ("[SCSI] osd:
         Documentation for OSD library")
       - scsi/scsi-parameter.txt was created/moved by commit 163475fb
         ("Documentation: move SCSI parameters to their own text file")
       - serial/driver was part of the initial repo
       - serial/n_gsm.txt was added by commit 323e8412 ("n_gsm: add a
       - timers/Makefile was added by commit 3794f3e8 ("docsrc: build
         Documentation/ sources")
       - virt/kvm/s390.txt was added by commit d9101fca ("KVM: s390:
         diagnose call documentation")
       - vm/split_page_table_lock was added by commit 49076ec2 ("mm:
         dynamically allocate page->ptl if it cannot be embedded to struct
       - w1/slaves/w1_ds28e04 was added by commit fbf7f7b4 ("w1: Add
         1-wire slave device driver for DS28E04-100")
       - w1/masters/omap-hdq was added by commit e0a29382 ("hdq:
         documentation for OMAP HDQ")
       - x86/early-microcode.txt was added by commit 0d91ea86 ("x86, doc:
         Documentation for early microcode loading")
       - x86/earlyprintk.txt was added by commit a1aade47 ("x86/doc:
         mini-howto for using earlyprintk=dbgp")
       - x86/entry_64.txt was added by commit 8b4777a4 ("x86-64: Document
         some of entry_64.S")
       - x86/pat.txt was added by commit d27554d8 ("x86: PAT
      Moved files
       - arm/kernel_user_helpers.txt was moved out of arch/arm/kernel by
         commit 37b83046 ("ARM: kuser: move interface documentation out of
         the source code")
       - efi-stub.txt was moved out of x86/ and down into Documentation/ in
         commit 4172fe2f ("EFI stub documentation updates")
       - laptops/hpfall.c was moved out of hwmon/ and into laptops/ in commit
         efcfed9b ("Move hp_accel to drivers/platform/x86")
       - commit 5616c23a ("x86: doc: move x86-generic documentation from
         * x86/usb-legacy-support.txt
         * x86/boot.txt
         * x86/zero_page.txt
       - power/video_extension.txt was moved to acpi in commit 70e66e4d
         ("ACPI / video: move video_extension.txt to Documentation/acpi")
      Removed files (left in 00-INDEX)
       - memory.txt was removed by commit 00ea8990 ("memory.txt: remove
         stray information")
       - gpio.txt was moved to gpio/ in commit fd8e198c ("Documentation:
         gpiolib: document new interface")
       - networking/DLINK.txt was removed by commit 168e06ae
         ("drivers/net: delete old parallel port de600/de620 drivers")
       - serial/hayes-esp.txt was removed by commit f53a2ade ("tty: esp:
         remove broken driver")
       - s390/TAPE was removed by commit 9e280f66 ("[S390] remove tape
         block docu")
       - vm/locking was removed by commit 57ea8171 ("mm: documentation:
         remove hopelessly out-of-date locking doc")
       - laptops/acer-wmi.txt was remvoed by commit 02003667 ("acer-wmi:
         Delete out-of-date documentation")
      Typos/misc issues
       - rpc-server-gss.txt was added as knfsd-rpcgss.txt in commit
         030d794b ("SUNRPC: Use gssproxy upcall for server RPCGSS
       - commit b88cf73d ("net: add missing entries to
         * generic-hdlc.txt was added as generic_hdlc.txt
         * spider_net.txt was added as spider-net.txt
       - w1/master/mxc-w1 was added as mxc_w1 by commit a5fd9139 ("w1: add
         1-wire master driver for i.MX27 / i.MX31")
       - s390/zfcpdump.txt was added as zfcpdump by commit 6920c12a
         ("[S390] Add Documentation/s390/00-INDEX.")
      Signed-off-by: default avatarHenrik Austad <henrik@austad.us>
      Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	[rcu bits]
      Acked-by: default avatarRob Landley <rob@landley.net>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: James Bottomley <JBottomley@parallels.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 30 Sep, 2013 1 commit
    • Paul Gortmaker's avatar
      block: change config option name for cmdline partition parsing · 080506ad
      Paul Gortmaker authored
      Recently commit bab55417 ("block: support embedded device command
      line partition") introduced CONFIG_CMDLINE_PARSER.  However, that name
      is too generic and sounds like it enables/disables generic kernel boot
      arg processing, when it really is block specific.
      Before this option becomes a part of a full/final release, add the BLK_
      prefix to it so that it is clear in absence of any other context that it
      is block specific.
      In addition, fix up the following less critical items:
       - help text was not really at all helpful.
       - index file for Documentation was not updated
       - add the new arg to Documentation/kernel-parameters.txt
       - clarify wording in source comments
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Cai Zhiyong <caizhiyong@huawei.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  5. 09 Aug, 2012 1 commit
  6. 10 Sep, 2010 1 commit
  7. 18 Dec, 2009 1 commit
  8. 16 Oct, 2007 1 commit