- Nov 24, 2020
-
-
Peter Zijlstra authored
Get rid of the __call_single_node union and cleanup the API a little to avoid external code relying on the structure layout as much. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Nov 14, 2020
-
-
Christoph Hellwig authored
disk_get_part needs to be paired with a disk_put_part. Cc: stable@vger.kernel.org Fixes: ef45fe47 ("blk-cgroup: show global disk stats in root cgroup io.stat") Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 13, 2020
-
-
Ming Lei authored
For avoiding use-after-free on flush request, we call its .end_io() from both timeout code path and __blk_mq_end_request(). When flush request's ref doesn't drop to zero, it is still used, we can't mark it as IDLE, so fix it by marking IDLE when its refcount drops to zero really. Fixes: 65ff5cd0 ("blk-mq: mark flush request as IDLE in flush_end_io()") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Cc: Yi Zhang <yi.zhang@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 12, 2020
-
-
Christoph Hellwig authored
Return if the function ended up sending an uevent or not. Cc: stable@vger.kernel.org # v5.9 Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Petr Vorel <pvorel@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 30, 2020
-
-
Ming Lei authored
Mark flush request as IDLE in its .end_io(), aligning it with how normal requests behave. The flush request stays in in-flight tags if we're not using an IO scheduler, so we need to change its state into IDLE. Otherwise, we will hang in blk_mq_tagset_wait_completed_request() during error recovery because flush the request state is kept as COMPLETED. Reported-by:
Yi Zhang <yi.zhang@redhat.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Tested-by:
Yi Zhang <yi.zhang@redhat.com> Cc: Chao Leng <lengchao@huawei.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 28, 2020
-
-
Naohiro Aota authored
When the bio's size reaches max_append_sectors, bio_add_hw_page returns 0 then __bio_iov_append_get_pages returns -EINVAL. This is an expected result of building a small enough bio not to be split in the IO path. However, iov_iter is not advanced in this case, causing the same pages are filled for the bio again and again. Fix the case by properly advancing the iov_iter for already processed pages. Fixes: 0512a75b ("block: Introduce REQ_OP_ZONE_APPEND") Cc: stable@vger.kernel.org # 5.8+ Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 26, 2020
-
-
Gabriel Krisman Bertazi authored
Similarly to commit 457e490f ("blkcg: allocate struct blkcg_gq outside request queue spinlock"), blkg_create can also trigger occasional -ENOMEM failures at the radix insertion because any allocation inside blkg_create has to be non-blocking, making it more likely to fail. This causes trouble for userspace tools trying to configure io weights who need to deal with this condition. This patch reduces the occurrence of -ENOMEMs on this path by preloading the radix tree element on a GFP_KERNEL context, such that we guarantee the later non-blocking insertion won't fail. A similar solution exists in blkcg_init_queue for the same situation. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Gabriel Krisman Bertazi authored
If new_blkg allocation raced with blk_policy change and blkg_lookup_check fails, new_blkg is leaked. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 23, 2020
-
-
Mauro Carvalho Chehab authored
Fix a typo: blk_mq_run_hw_queue -> blk_mq_run_hw_queues Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 20, 2020
-
-
Xianting Tian authored
We don't need to check whether the node is memoryless numa node before calling allocator interface. SLUB(and SLAB,SLOB) relies on the page allocator to pick a node. Page allocator should deal with memoryless nodes just fine. It has zonelists constructed for each possible nodes. And it will automatically fall back into a node which is closest to the requested node. As long as __GFP_THISNODE is not enforced of course. The code comments of kmem_cache_alloc_node() of SLAB also showed this: * Fallback to other node is possible if __GFP_THISNODE is not set. blk-mq code doesn't set __GFP_THISNODE, so we can remove the calling of local_memory_node(). Signed-off-by:
Xianting Tian <tian.xianting@h3c.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 15, 2020
-
-
Mauro Carvalho Chehab authored
Fix this warning: ./block/bio.c:1098: WARNING: Inline emphasis start-string without end-string. The thing is that *iter is not a valid markup. That seems to be a typo: *iter -> @iter Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-
Mauro Carvalho Chehab authored
Using "@bio's parent" causes the following waring: ./block/bio.c:10: WARNING: Inline emphasis start-string without end-string. The main problem here is that this would be converted into: **bio**'s parent By kernel-doc, which is not a valid notation. It would be possible to use, instead, this kernel-doc markup: ``bio's`` parent Yet, here, is probably simpler to just use an altenative language: the parent of @bio Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-
- Oct 13, 2020
-
-
Keith Busch authored
A zoned device with limited resources to open or activate zones may return an error when the host exceeds those limits. The same command may be successful if retried later, but the host needs to wait for specific zone states before it should expect a retry to succeed. Have the block layer provide an appropriate status for these conditions so applications can distinuguish this error for special handling. Cc: linux-api@vger.kernel.org Cc: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 09, 2020
-
-
Yang Yang authored
blk_exit_queue will free elevator_data, while blk_mq_run_work_fn will access it. Move cancel of hctx->run_work to the front of blk_exit_queue to avoid use-after-free. Fixes: 1b97871b ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release") Signed-off-by:
Yang Yang <yang.yang@vivo.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
After commit 923218f6 ("blk-mq: don't allocate driver tag upfront for flush rq"), blk_mq_submit_bio() will call blk_insert_flush() directly to handle flush request rather than blk_mq_sched_insert_request() in the case of elevator. Then, all flush request either have set RQF_FLUSH_SEQ flag when call blk_mq_sched_insert_request(), or have inserted into hctx->dispatch. So, remove the dead code path. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
Since whole elevator register is protectd by sysfs_lock, we don't need extras 'has_elevator'. Just use q->elevator directly. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
After commit b89f625e ("block: don't release queue's sysfs lock during switching elevator"), whole elevator register and unregister function are covered by sysfs_lock. So, remove wrong comment and add lockdep assert. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
We have introduced helper function blk_mq_hctx_stopped() to test BLK_MQ_S_STOPPED. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
We have defined common interface blk_queue_registered() to test QUEUE_FLAG_REGISTERED. Just use it. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
elv_support_iosched() will check queue_is_mq() for us. So, remove the redundant check to clean code. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
We will register debugfs for scheduler no matter whether it have defined callback funciton .exit_sched. So, blk_mq_exit_sched() is always needed to unregister debugfs. Also, q->elevator should be set as NULL after exiting scheduler. For now, since all register scheduler have defined .exit_sched, it will not cause any actual problem. But It will be more reasonable to do this change. Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 08, 2020
-
-
Tetsuo Handa authored
syzbot is reporting unkillable task [1], for the caller is failing to handle a corrupted filesystem image which attempts to access beyond the end of the device. While we need to fix the caller, flooding the console with handle_bad_sector() message is unlikely useful. [1] https://syzkaller.appspot.com/bug?id=f1f49fb971d7a3e01bd8ab8cff2ff4572ccf3092 Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
Re-use throtl_set_slice_end() to remove duplicate code. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The __throtl_de/enqueue_tg() functions are only be called by throtl_de/enqueue_tg(), thus we can just open code them to make code more readable. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The throtl_schedule_next_dispatch() will validate if the service queue is empty before calling update_min_dispatch_time(), and the update_min_dispatch_time() will call throtl_rb_first(), which will validate service queue again. Thus we can move the service queue validation out of the throtl_rb_first() to remove the redundant validation in the fast path. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
We should move the list operation after validation. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
It can not scale up in throtl_adjusted_limit() if we set bps or iops is 1, which will cause IO hang when enable low limit. Thus we should treat 1 as a illegal value to avoid this issue. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The IO latency tracking is only for LOW limit, so we should add a validation to avoid redundant latency tracking if the LOW limit is not valid. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
We only update the tg->last_finish_time when the low limitaion is enabled, so we can move the tg->last_finish_time validation a little forward to avoid getting the unnecessary current time stamp if the the low limitation is not enabled. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
The throtl_downgrade_state() is always used to change to LIMIT_LOW limitation, thus remove the latter meaningless parameter which indicates the limitation index. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
Remove redundant 'return' statement for 'void' functions. Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 07, 2020
-
-
Mike Snitzer authored
It is unnecessary to force request-based DM to call into bio-based dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then call blk_mq_submit_bio(). Fix this by establishing a request-based DM block_device_operations (dm_rq_blk_dops, which doesn't have .submit_bio) and update dm_setup_md_queue() to set md->disk->fops to it for DM_TYPE_REQUEST_BASED. Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport blk_mq_submit_bio. Fixes: c62b37d9 ("block: move ->make_request_fn to struct block_device_operations") Signed-off-by:
Mike Snitzer <snitzer@redhat.com>
-
Christoph Hellwig authored
Don't error out if the dasd_biodasdinfo symbol is not available. Cc: stable@vger.kernel.org Fixes: 26d7e28e ("s390/dasd: remove ioctl_by_bdev calls") Reported-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by:
Stefan Haberland <sth@linux.ibm.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 06, 2020
-
-
Gabriel Krisman Bertazi authored
According to Documentation/block/stat.rst, inflight should not include I/O requests that are in the queue but not yet dispatched to the device, but blk-mq identifies as inflight any request that has a tag allocated, which, for queues without elevator, happens at request allocation time and before it is queued in the ctx (default case in blk_mq_submit_bio). In addition, current behavior is different for queues with elevator from queues without it, since for the former the driver tag is allocated at dispatch time. A more precise approach would be to only consider requests with state MQ_RQ_IN_FLIGHT. This effectively reverts commit 6131837b ("blk-mq: count allocated but not started requests in iostats inflight") to consolidate blk-mq behavior with itself (elevator case) and with original documentation, but it differs from the behavior used by the legacy path. This version differs from v1 by using blk_mq_rq_state to access the state attribute. Avoid using blk_mq_request_started, which was suggested, since we don't want to include MQ_RQ_COMPLETE. Signed-off-by:
Gabriel Krisman Bertazi <krisman@collabora.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Move blk_mq_sched_try_merge to blk-merge.c, which allows to mark a lot of the merge infrastructure static there. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Also move the definition from the public blkdev.h to the private block/blk.h header. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Also move the definition from the public blkdev.h to the private block/blk.h header. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 05, 2020
-
-
Eric Biggers authored
bio_crypt_set_ctx() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. For now this assumption is still fine, since no callers violate it. Making bio_crypt_set_ctx() able to fail would add unneeded complexity. However, if a caller didn't use __GFP_DIRECT_RECLAIM, it would be very hard to notice the bug. Make it easier by adding a WARN_ON_ONCE(). Signed-off-by:
Eric Biggers <ebiggers@google.com> Reviewed-by:
Satya Tangirala <satyat@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Satya Tangirala <satyat@google.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Eric Biggers authored
blk_crypto_rq_bio_prep() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, blk_crypto_rq_bio_prep() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c. This case isn't currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make blk_crypto_rq_bio_prep() able to fail. Suggested-by:
Satya Tangirala <satyat@google.com> Signed-off-by:
Eric Biggers <ebiggers@google.com> Reviewed-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Satya Tangirala <satyat@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Eric Biggers authored
bio_crypt_clone() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, bio_crypt_clone() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via kcryptd_io_read() in drivers/md/dm-crypt.c. Neither case is currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make bio_crypt_clone() able to fail, analogous to bio_integrity_clone(). Reported-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Eric Biggers <ebiggers@google.com> Reviewed-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Satya Tangirala <satyat@google.com> Cc: Satya Tangirala <satyat@google.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-