1. 11 Dec, 2018 1 commit
  2. 16 Nov, 2018 1 commit
  3. 25 Oct, 2018 5 commits
    • Damien Le Moal's avatar
      block: Introduce blk_revalidate_disk_zones() · bf505456
      Damien Le Moal authored
      Drivers exposing zoned block devices have to initialize and maintain
      correctness (i.e. revalidate) of the device zone bitmaps attached to
      the device request queue (seq_zones_bitmap and seq_zones_wlock).
      
      To simplify coding this, introduce a generic helper function
      blk_revalidate_disk_zones() suitable for most (and likely all) cases.
      This new function always update the seq_zones_bitmap and seq_zones_wlock
      bitmaps as well as the queue nr_zones field when called for a disk
      using a request based queue. For a disk using a BIO based queue, only
      the number of zones is updated since these queues do not have
      schedulers and so do not need the zone bitmaps.
      
      With this change, the zone bitmap initialization code in sd_zbc.c can be
      replaced with a call to this function in sd_zbc_read_zones(), which is
      called from the disk revalidate block operation method.
      
      A call to blk_revalidate_disk_zones() is also added to the null_blk
      driver for devices created with the zoned mode enabled.
      
      Finally, to ensure that zoned devices created with dm-linear or
      dm-flakey expose the correct number of zones through sysfs, a call to
      blk_revalidate_disk_zones() is added to dm_table_set_restrictions().
      
      The zone bitmaps allocated and initialized with
      blk_revalidate_disk_zones() are freed automatically from
      __blk_release_queue() using the block internal function
      blk_queue_free_zone_bitmaps().
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bf505456
    • Christoph Hellwig's avatar
      block: add a report_zones method · e76239a3
      Christoph Hellwig authored
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e76239a3
    • Damien Le Moal's avatar
      block: Improve zone reset execution · a2d6b3a2
      Damien Le Moal authored
      There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs
      necessary to reset a range of zones. Similarly to what is done for
      discard BIOs in blk-lib.c, all zone reset BIOs can be chained and
      executed asynchronously and a synchronous call done only for the last
      BIO of the chain.
      
      Modify blkdev_reset_zones() to operate similarly to
      blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To
      avoid code duplication of that function in blk_zoned.c, rename
      next_bio() into blk_next_bio() and declare it as a block internal
      function in blk.h.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a2d6b3a2
    • Damien Le Moal's avatar
      block: Limit allocation of zone descriptors for report zones · 2e85fbaf
      Damien Le Moal authored
      There is no point in allocating more zone descriptors than the number of
      zones a block device has for doing a zone report. Avoid doing that in
      blkdev_report_zones_ioctl() by limiting the number of zone decriptors
      allocated internally to process the user request.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e85fbaf
    • Damien Le Moal's avatar
      block: Introduce blkdev_nr_zones() helper · a91e1380
      Damien Le Moal authored
      Introduce the blkdev_nr_zones() helper function to get the total
      number of zones of a zoned block device. This number is always 0 for a
      regular block device (q->limits.zoned == BLK_ZONED_NONE case).
      
      Replace hard-coded number of zones calculation in dmz_get_zoned_device()
      with a call to this helper.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a91e1380
  4. 09 Jul, 2018 1 commit
  5. 12 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: kvmalloc() -> kvmalloc_array() · 344476e1
      Kees Cook authored
      The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This
      patch replaces cases of:
      
              kvmalloc(a * b, gfp)
      
      with:
              kvmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kvmalloc(a * b * c, gfp)
      
      with:
      
              kvmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kvmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kvmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kvmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kvmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kvmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kvmalloc
      + kvmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kvmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(sizeof(THING) * C2, ...)
      |
        kvmalloc(sizeof(TYPE) * C2, ...)
      |
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(C1 * C2, ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      344476e1
  6. 22 May, 2018 1 commit
    • Bart Van Assche's avatar
      blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers · 327ea4ad
      Bart Van Assche authored
      Avoid that complaints similar to the following appear in the kernel log
      if the number of zones is sufficiently large:
      
        fio: page allocation failure: order:9, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
        Call Trace:
        dump_stack+0x63/0x88
        warn_alloc+0xf5/0x190
        __alloc_pages_slowpath+0x8f0/0xb0d
        __alloc_pages_nodemask+0x242/0x260
        alloc_pages_current+0x6a/0xb0
        kmalloc_order+0x18/0x50
        kmalloc_order_trace+0x26/0xb0
        __kmalloc+0x20e/0x220
        blkdev_report_zones_ioctl+0xa5/0x1a0
        blkdev_ioctl+0x1ba/0x930
        block_ioctl+0x41/0x50
        do_vfs_ioctl+0xaa/0x610
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x79/0x1b0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Shaun Tancheff <shaun.tancheff@seagate.com>
      Cc: Damien Le Moal <damien.lemoal@hgst.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      327ea4ad
  7. 09 Mar, 2018 1 commit
  8. 05 Jan, 2018 1 commit
    • Christoph Hellwig's avatar
      block: introduce zoned block devices zone write locking · 6cc77e9c
      Christoph Hellwig authored
      Components relying only on the request_queue structure for accessing
      block devices (e.g. I/O schedulers) have a limited knowledged of the
      device characteristics. In particular, the device capacity cannot be
      easily discovered, which for a zoned block device also result in the
      inability to easily know the number of zones of the device (the zone
      size is indicated by the chunk_sectors field of the queue limits).
      
      Introduce the nr_zones field to the request_queue structure to simplify
      access to this information. Also, add the bitmap seq_zone_bitmap which
      indicates which zones of the device are sequential zones (write
      preferred or write required) and the bitmap seq_zones_wlock which
      indicates if a zone is write locked, that is, if a write request
      targeting a zone was dispatched to the device. These fields are
      initialized by the low level block device driver (sd.c for ZBC/ZAC
      disks). They are not initialized by stacking drivers (device mappers)
      handling zoned block devices (e.g. dm-linear).
      
      Using this, I/O schedulers can introduce zone write locking to control
      request dispatching to a zoned block device and avoid write request
      reordering by limiting to at most a single write request per zone
      outside of the scheduler at any time.
      
      Based on previous patches from Damien Le Moal.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      [Damien]
      * Fixed comments and identation in blkdev.h
      * Changed helper functions
      * Fixed this commit message
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6cc77e9c
  9. 23 Aug, 2017 1 commit
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74d46992
  10. 12 Jan, 2017 1 commit
    • Damien Le Moal's avatar
      block: Rename blk_queue_zone_size and bdev_zone_size · f99e8648
      Damien Le Moal authored
      All block device data fields and functions returning a number of 512B
      sectors are by convention named xxx_sectors while names in the form
      xxx_size are generally used for a number of bytes. The blk_queue_zone_size
      and bdev_zone_size functions were not following this convention so rename
      them.
      
      No functional change is introduced by this patch.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      
      Collapsed the two patches, they were nonsensically split and broke
      bisection.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f99e8648
  11. 25 Oct, 2016 1 commit
  12. 18 Oct, 2016 2 commits