- Jul 01, 2021
-
-
Manfred Spraul authored
If semctl(), msgctl() and shmctl() are called with IPC_INFO, SEM_INFO, MSG_INFO or SHM_INFO, then the return value is the index of the highest used index in the kernel's internal array recording information about all SysV objects of the requested type for the current namespace. (This information can be used with repeated ..._STAT or ..._STAT_ANY operations to obtain information about all SysV objects on the system.) There is a cache for this value. But when the cache needs up be updated, then the highest used index is determined by looping over all possible values. With the introduction of IPCMNI_EXTEND_SHIFT, this could be a loop over 16 million entries. And due to /proc/sys/kernel/*next_id, the index values do not need to be consecutive. With <write 16000000 to msg_next_id>, msgget(), msgctl(,IPC_RMID) in a loop, I have observed a performance increase of around factor 13000. As there is no get_last() function for idr structures: Implement a "get_last()" using a binary search. As far as I see, ipc is the only user that needs get_last(), thus implement it in ipc/util.c and not in a central location. [akpm@linux-foundation.org: tweak comment, fix typo] Link: https://lkml.kernel.org/r/20210425075208.11777-2-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Acked-by:
Davidlohr Bueso <dbueso@suse.de> Cc: <1vier1@web.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
The patch solves three weaknesses in ipc/sem.c: 1) The initial read of use_global_lock in sem_lock() is an intentional race. KCSAN detects these accesses and prints a warning. 2) The code assumes that plain C read/writes are not mangled by the CPU or the compiler. 3) The comment it sysvipc_sem_proc_show() was hard to understand: The rest of the comments in ipc/sem.c speaks about sem_perm.lock, and suddenly this function speaks about ipc_lock_object(). To solve 1) and 2), use READ_ONCE()/WRITE_ONCE(). Plain C reads are used in code that owns sma->sem_perm.lock. The comment is updated to solve 3) [manfred@colorfullife.com: use READ_ONCE()/WRITE_ONCE() for use_global_lock] Link: https://lkml.kernel.org/r/20210627161919.3196-3-manfred@colorfullife.com Link: https://lkml.kernel.org/r/20210514175319.12195-1-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Reviewed-by:
Davidlohr Bueso <dbueso@suse.de> Cc: <1vier1@web.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Vasily Averin authored
msg_queue and shmid_kernel are quite small objects, no need to use kvmalloc for them. mhocko@: "Both of them are 256B on most 64b systems." Previously these objects was allocated via ipc_alloc/ipc_rcu_alloc(), common function for several ipc objects. It had kvmalloc call inside(). Later, this function went away and was finally replaced by direct kvmalloc call, and now we can use more suitable kmalloc/kfree for them. Link: https://lkml.kernel.org/r/0d0b6c9b-8af3-29d8-34e2-a565c53780f3@virtuozzo.com Reported-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Vasily Averin <vvs@virtuozzo.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Vasily Averin authored
Patch series "ipc: allocations cleanup", v2. Some ipc objects use the wrong allocation functions: small objects can use kmalloc(), and vice versa, potentially large objects can use kmalloc(). This patch (of 2): Size of sem_undo can exceed one page and with the maximum possible nsems = 32000 it can grow up to 64Kb. Let's switch its allocation to kvmalloc to avoid user-triggered disruptive actions like OOM killer in case of high-order memory shortage. User triggerable high order allocations are quite a problem on heavily fragmented systems. They can be a DoS vector. Link: https://lkml.kernel.org/r/ebc3ac79-3190-520d-81ce-22ad194986ec@virtuozzo.com Link: https://lkml.kernel.org/r/a6354fd9-2d55-2e63-dd4d-fa7dc1d11134@virtuozzo.com Signed-off-by:
Vasily Averin <vvs@virtuozzo.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Roman Gushchin <guro@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 23, 2021
-
-
Varad Gautam authored
do_mq_timedreceive calls wq_sleep with a stack local address. The sender (do_mq_timedsend) uses this address to later call pipelined_send. This leads to a very hard to trigger race where a do_mq_timedreceive call might return and leave do_mq_timedsend to rely on an invalid address, causing the following crash: RIP: 0010:wake_q_add_safe+0x13/0x60 Call Trace: __x64_sys_mq_timedsend+0x2a9/0x490 do_syscall_64+0x80/0x680 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f5928e40343 The race occurs as: 1. do_mq_timedreceive calls wq_sleep with the address of `struct ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it holds a valid `struct ext_wait_queue *` as long as the stack has not been overwritten. 2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call __pipelined_op. 3. Sender calls __pipelined_op::smp_store_release(&this->state, STATE_READY). Here is where the race window begins. (`this` is `ewq_addr`.) 4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it will see `state == STATE_READY` and break. 5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's stack. (Although the address may not get overwritten until another function happens to touch it, which means it can persist around for an indefinite time.) 6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a `struct ext_wait_queue *`, and uses it to find a task_struct to pass to the wake_q_add_safe call. In the lucky case where nothing has overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct. In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a bogus address as the receiver's task_struct causing the crash. do_mq_timedsend::__pipelined_op() should not dereference `this` after setting STATE_READY, as the receiver counterpart is now free to return. Change __pipelined_op to call wake_q_add_safe on the receiver's task_struct returned by get_task_struct, instead of dereferencing `this` which sits on the receiver's stack. As Manfred pointed out, the race potentially also exists in ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare. Fix those in the same way. Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com Fixes: c5b2cbdb ("ipc/mqueue.c: update/document memory barriers") Fixes: 8116b54e ("ipc/sem.c: document and update memory barriers") Fixes: 0d97a82b ("ipc/msg.c: update and document memory barriers") Signed-off-by:
Varad Gautam <varad.gautam@suse.com> Reported-by:
Matthias von Faber <matthias.vonfaber@aox-tech.de> Acked-by:
Davidlohr Bueso <dbueso@suse.de> Acked-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 07, 2021
-
-
Bhaskar Chowdhury authored
s/purpuse/purpose/ Link: https://lkml.kernel.org/r/20210319221432.26631-1-unixbhaskar@gmail.com Signed-off-by:
Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Bhaskar Chowdhury authored
s/runtine/runtime/ s/AQUIRE/ACQUIRE/ s/seperately/separately/ s/wont/won\'t/ s/succesfull/successful/ Link: https://lkml.kernel.org/r/20210326022240.26375-1-unixbhaskar@gmail.com Signed-off-by:
Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 30, 2021
-
-
Alexey Gladkov authored
The rlimit counter is tied to uid in the user_namespace. This allows rlimit values to be specified in userns even if they are already globally exceeded by the user. However, the value of the previous user_namespaces cannot be exceeded. Changelog v11: * Fix issue found by lkp robot. v8: * Fix issues found by lkp-tests project. v7: * Keep only ucounts for RLIMIT_MEMLOCK checks instead of struct cred. v6: * Fix bug in hugetlb_file_setup() detected by trinity. Reported-by:
kernel test robot <oliver.sang@intel.com> Reported-by:
kernel test robot <lkp@intel.com> Signed-off-by:
Alexey Gladkov <legion@kernel.org> Link: https://lkml.kernel.org/r/970d50c70c71bfd4496e0e8d2a0a32feebebb350.1619094428.git.legion@kernel.org Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com>
-
Alexey Gladkov authored
The rlimit counter is tied to uid in the user_namespace. This allows rlimit values to be specified in userns even if they are already globally exceeded by the user. However, the value of the previous user_namespaces cannot be exceeded. Signed-off-by:
Alexey Gladkov <legion@kernel.org> Link: https://lkml.kernel.org/r/2531f42f7884bbfee56a978040b3e0d25cdf6cde.1619094428.git.legion@kernel.org Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com>
-
- Jan 24, 2021
-
-
Christian Brauner authored
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
The various vfs_*() helpers are called by filesystems or by the vfs itself to perform core operations such as create, link, mkdir, mknod, rename, rmdir, tmpfile and unlink. Enable them to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace and pass it down. Afterwards the checks and operations are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-15-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
Christian Brauner authored
The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
James Morris <jamorris@linux.microsoft.com> Acked-by:
Serge Hallyn <serge@hallyn.com> Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
- Dec 15, 2020
-
-
Dmitry Safonov authored
Rename the callback to reflect that it's not called *on* or *after* split, but rather some time before the splitting to check if it's possible. Link: https://lkml.kernel.org/r/20201013013416.390574-5-dima@arista.com Signed-off-by:
Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 05, 2020
-
-
Tobias Klauser authored
Commit 32927393 ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which fixes the following sparse error/warning: ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces) ipc/ipc_sysctl.c:94:47: expected void *buffer ipc/ipc_sysctl.c:94:47: got void [noderef] __user *buffer ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces)) ipc/ipc_sysctl.c:194:35: expected int ( [usertype] *proc_handler )( ... ) ipc/ipc_sysctl.c:194:35: got int ( * )( ... ) Fixes: 32927393 ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by:
Tobias Klauser <tklauser@distanz.ch> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200825105846.5193-1-tklauser@distanz.ch Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 23, 2020
-
-
Gustavo A. R. Silva authored
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by:
Gustavo A. R. Silva <gustavoars@kernel.org>
-
- Aug 19, 2020
-
-
Kirill Tkhai authored
Switch over ipc namespaces to use the newly introduced common lifetime counter. Currently every namespace type has its own lifetime counter which is stored in the specific namespace struct. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered. This introduces a common lifetime counter into struct ns_common. The ns_common struct encompasses information that all namespaces share. That should include the lifetime counter since its common for all of them. It also allows us to unify the type of the counters across all namespaces. Most of them use refcount_t but one uses atomic_t and at least one uses kref. Especially the last one doesn't make much sense since it's just a wrapper around refcount_t since 2016 and actually complicates cleanup operations by having to use container_of() to cast the correct namespace struct out of struct ns_common. Having the lifetime counter for the namespaces in one place reduces maintenance cost. Not just because after switching all namespaces over we will have removed more code than we added but also because the logic is more easily understandable and we indicate to the user that the basic lifetime requirements for all namespaces are currently identical. Signed-off-by:
Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Acked-by:
Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/159644978697.604812.16592754423881032385.stgit@localhost.localdomain Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com>
-
- Aug 12, 2020
-
-
Liao Pingfang authored
Remove the superfuous break, as there is a 'return' before it. Signed-off-by:
Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by:
Yi Wang <wang.yi59@zte.com.cn> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1594724361-11525-1-git-send-email-wang.yi59@zte.com.cn Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Two functions are only called via function pointers, don't bother inlining them. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/20200710200312.GA960353@localhost.localdomain Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 07, 2020
-
-
Peter Collingbourne authored
The current split between do_mmap() and do_mmap_pgoff() was introduced in commit 1fcfd8db ("mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()") to support MPX. The wrapper function do_mmap_pgoff() always passed 0 as the value of the vm_flags argument to do_mmap(). However, MPX support has subsequently been removed from the kernel and there were no more direct callers of do_mmap(); all calls were going via do_mmap_pgoff(). Simplify the code by removing do_mmap_pgoff() and changing all callers to directly call do_mmap(), which now no longer takes a vm_flags argument. Signed-off-by:
Peter Collingbourne <pcc@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
David Hildenbrand <david@redhat.com> Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jun 09, 2020
-
-
Michel Lespinasse authored
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by:
Michel Lespinasse <walken@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by:
Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jun 08, 2020
-
-
Giuseppe Scrivano authored
the reason is to avoid a delay caused by the synchronize_rcu() call in kern_umount() when the mqueue mount is freed. the code: #define _GNU_SOURCE #include <sched.h> #include <error.h> #include <errno.h> #include <stdlib.h> int main() { int i; for (i = 0; i < 1000; i++) if (unshare(CLONE_NEWIPC) < 0) error(EXIT_FAILURE, errno, "unshare"); } goes from Command being timed: "./ipc-namespace" User time (seconds): 0.00 System time (seconds): 0.06 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.05 to Command being timed: "./ipc-namespace" User time (seconds): 0.00 System time (seconds): 0.02 Percent of CPU this job got: 96% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03 Signed-off-by:
Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Reviewed-by:
Waiman Long <longman@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Link: http://lkml.kernel.org/r/20200225145419.527994-1-gscrivan@redhat.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Jules Irenge authored
Sparse reports a warning at freeque() warning: context imbalance in freeque() - unexpected unlock The root cause is the missing annotation at freeque() Add the missing __releases(RCU) annotation Add the missing __releases(&msq->q_perm) annotation Signed-off-by:
Jules Irenge <jbi.octave@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lu Shuaibing <shuaibinglu@126.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/20200403160505.2832-2-jbi.octave@gmail.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 14, 2020
-
-
Vasily Averin authored
Commit 89163f93 ("ipc/util.c: sysvipc_find_ipc() should increase position index") is causing this bug (seen on 5.6.8): # ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages # ipcmk -Q Message queue id: 0 # ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0x82db8127 0 root 644 0 0 # ipcmk -Q Message queue id: 1 # ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0x82db8127 0 root 644 0 0 0x76d1fb2a 1 root 644 0 0 # ipcrm -q 0 # ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0x76d1fb2a 1 root 644 0 0 0x76d1fb2a 1 root 644 0 0 # ipcmk -Q Message queue id: 2 # ipcrm -q 2 # ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0x76d1fb2a 1 root 644 0 0 0x76d1fb2a 1 root 644 0 0 # ipcmk -Q Message queue id: 3 # ipcrm -q 1 # ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0x7c982867 3 root 644 0 0 0x7c982867 3 root 644 0 0 0x7c982867 3 root 644 0 0 0x7c982867 3 root 644 0 0 Whenever an IPC item with a low id is deleted, the items with higher ids are duplicated, as if filling a hole. new_pos should jump through hole of unused ids, pos can be updated inside "for" cycle. Fixes: 89163f93 ("ipc/util.c: sysvipc_find_ipc() should increase position index") Reported-by:
Andreas Schwab <schwab@suse.de> Reported-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Vasily Averin <vvs@virtuozzo.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
Waiman Long <longman@redhat.com> Cc: NeilBrown <neilb@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/4921fe9b-9385-a2b4-1dc4-1099be6d2e39@virtuozzo.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 09, 2020
-
-
Christian Brauner authored
Add a simple struct nsset. It holds all necessary pieces to switch to a new set of namespaces without leaving a task in a half-switched state which we will make use of in the next patch. This patch switches the existing setns logic over without causing a change in setns() behavior. This brings setns() closer to how unshare() works(). The prepare_ns() function is responsible to prepare all necessary information. This has two reasons. First it minimizes dependencies between individual namespaces, i.e. all install handler can expect that all fields are properly initialized independent in what order they are called in. Second, this makes the code easier to maintain and easier to follow if it needs to be changed. The prepare_ns() helper will only be switched over to use a flags argument in the next patch. Here it will still use nstype as a simple integer argument which was argued would be clearer. I'm not particularly opinionated about this if it really helps or not. The struct nsset itself already contains the flags field since its name already indicates that it can contain information required by different namespaces. None of this should have functional consequences. Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by:
Serge Hallyn <serge@hallyn.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
-
- May 08, 2020
-
-
Oleg Nesterov authored
Commit cc731525 ("signal: Remove kernel interal si_code magic") changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no longer works if the sender doesn't have rights to send a signal. Change __do_notify() to use do_send_sig_info() instead of kill_pid_info() to avoid check_kill_permission(). This needs the additional notify.sigev_signo != 0 check, shouldn't we change do_mq_notify() to deny sigev_signo == 0 ? Test-case: #include <signal.h> #include <mqueue.h> #include <unistd.h> #include <sys/wait.h> #include <assert.h> static int notified; static void sigh(int sig) { notified = 1; } int main(void) { signal(SIGIO, sigh); int fd = mq_open("/mq", O_RDWR|O_CREAT, 0666, NULL); assert(fd >= 0); struct sigevent se = { .sigev_notify = SIGEV_SIGNAL, .sigev_signo = SIGIO, }; assert(mq_notify(fd, &se) == 0); if (!fork()) { assert(setuid(1) == 0); mq_send(fd, "",1,0); return 0; } wait(NULL); mq_unlink("/mq"); assert(notified); return 0; } [manfred@colorfullife.com: 1) Add self_exec_id evaluation so that the implementation matches do_notify_parent 2) use PIDTYPE_TGID everywhere] Fixes: cc731525 ("signal: Remove kernel interal si_code magic") Reported-by:
Yoji <yoji.fujihar.min@gmail.com> Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
"Eric W. Biederman" <ebiederm@xmission.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Markus Elfring <elfring@users.sourceforge.net> Cc: <1vier1@web.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/e2a782e4-eab9-4f5c-c749-c07a8f7a4e66@colorfullife.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 27, 2020
-
-
Christoph Hellwig authored
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Andrey Ignatov <rdna@fb.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Apr 10, 2020
-
-
Vasily Averin authored
If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by:
Vasily Averin <vvs@virtuozzo.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
Waiman Long <longman@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: NeilBrown <neilb@suse.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/b7a20945-e315-8bb0-21e6-3875c14a8494@virtuozzo.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 07, 2020
-
-
Jason Yan authored
Fix the following sparse warning: ipc/shm.c:1335:6: warning: symbol 'compat_ksys_shmctl' was not declared. Should it be static? Reported-by:
Hulk Robot <hulkci@huawei.com> Signed-off-by:
Jason Yan <yanaijie@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200403063933.24785-1-yanaijie@huawei.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Somala Swaraj authored
Signed-off-by:
somala swaraj <somalaswaraj@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200301135530.18340-1-somalaswaraj@gmail.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"... Most of fs/proc/inode.c file is dedicated to make open/read/.../close reliable in the event of disappearing /proc entries which usually happens if module is getting removed. Files like /proc/cpuinfo which never disappear simply do not need such protection. Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such "permanent" files. Enable "permanent" flag for /proc/cpuinfo /proc/kmsg /proc/modules /proc/slabinfo /proc/stat /proc/sysvipc/* /proc/swaps More will come once I figure out foolproof way to prevent out module authors from marking their stuff "permanent" for performance reasons when it is not. This should help with scalability: benchmark is "read /proc/cpuinfo R times by N threads scattered over the system". N R t, s (before) t, s (after) ----------------------------------------------------- 64 4096 1.582458 1.530502 -3.2% 256 4096 6.371926 6.125168 -3.9% 1024 4096 25.64888 24.47528 -4.6% Benchmark source: #include <chrono> #include <iostream> #include <thread> #include <vector> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN); int N; const char *filename; int R; int xxx = 0; int glue(int n) { cpu_set_t m; CPU_ZERO(&m); CPU_SET(n, &m); return sched_setaffinity(0, sizeof(cpu_set_t), &m); } void f(int n) { glue(n % NR_CPUS); while (*(volatile int *)&xxx == 0) { } for (int i = 0; i < R; i++) { int fd = open(filename, O_RDONLY); char buf[4096]; ssize_t rv = read(fd, buf, sizeof(buf)); asm volatile ("" :: "g" (rv)); close(fd); } } int main(int argc, char *argv[]) { if (argc < 4) { std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R "; return 1; } N = atoi(argv[1]); filename = argv[2]; R = atoi(argv[3]); for (int i = 0; i < NR_CPUS; i++) { if (glue(i) == 0) break; } std::vector<std::thread> T; T.reserve(N); for (int i = 0; i < N; i++) { T.emplace_back(f, i); } auto t0 = std::chrono::system_clock::now(); { *(volatile int *)&xxx = 1; for (auto& t: T) { t.join(); } } auto t1 = std::chrono::system_clock::now(); std::chrono::duration<double> dt = t1 - t0; std::cout << dt.count() << ' '; return 0; } P.S.: Explicit randomization marker is added because adding non-function pointer will silently disable structure layout randomization. [akpm@linux-foundation.org: coding style fixes] Reported-by:
kbuild test robot <lkp@intel.com> Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2 Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 21, 2020
-
-
Ioanna Alifieraki authored
This reverts commit a9795584. Commit a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") removes a lock that is needed. This leads to a process looping infinitely in exit_sem() and can also lead to a crash. There is a reproducer available in [1] and with the commit reverted the issue does not reproduce anymore. Using the reproducer found in [1] is fairly easy to reach a point where one of the child processes is looping infinitely in exit_sem between for(;;) and if (semid == -1) block, while it's trying to free its last sem_undo structure which has already been freed by freeary(). Each sem_undo struct is on two lists: one per semaphore set (list_id) and one per process (list_proc). The list_id list tracks undos by semaphore set, and the list_proc by process. Undo structures are removed either by freeary() or by exit_sem(). The freeary function is invoked when the user invokes a syscall to remove a semaphore set. During this operation freeary() traverses the list_id associated with the semaphore set and removes the undo structures from both the list_id and list_proc lists. For this case, exit_sem() is called at process exit. Each process contains a struct sem_undo_list (referred to as "ulp") which contains the head for the list_proc list. When the process exits, exit_sem() traverses this list to remove each sem_undo struct. As in freeary(), whenever a sem_undo struct is removed from list_proc, it is also removed from the list_id list. Removing elements from list_id is safe for both exit_sem() and freeary() due to sem_lock(). Removing elements from list_proc is not safe; freeary() locks &un->ulp->lock when it performs list_del_rcu(&un->list_proc) but exit_sem() does not (locking was removed by commit a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"). This can result in the following situation while executing the reproducer [1] : Consider a child process in exit_sem() and the parent in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)). - The list_proc for the child contains the last two undo structs A and B (the rest have been removed either by exit_sem() or freeary()). - The semid for A is 1 and semid for B is 2. - exit_sem() removes A and at the same time freeary() removes B. - Since A and B have different semid sem_lock() will acquire different locks for each process and both can proceed. The bug is that they remove A and B from the same list_proc at the same time because only freeary() acquires the ulp lock. When exit_sem() removes A it makes ulp->list_proc.next to point at B and at the same time freeary() removes B setting B->semid=-1. At the next iteration of for(;;) loop exit_sem() will try to remove B. The only way to break from for(;;) is for (&un->list_proc == &ulp->list_proc) to be true which is not. Then exit_sem() will check if B->semid=-1 which is and will continue looping in for(;;) until the memory for B is reallocated and the value at B->semid is changed. At that point, exit_sem() will crash attempting to unlink B from the lists (this can be easily triggered by running the reproducer [1] a second time). To prove this scenario instrumentation was added to keep information about each sem_undo (un) struct that is removed per process and per semaphore set (sma). CPU0 CPU1 [caller holds sem_lock(sma for A)] ... freeary() exit_sem() ... ... ... sem_lock(sma for B) spin_lock(A->ulp->lock) ... list_del_rcu(un_A->list_proc) list_del_rcu(un_B->list_proc) Undo structures A and B have different semid and sem_lock() operations proceed. However they belong to the same list_proc list and they are removed at the same time. This results into ulp->list_proc.next pointing to the address of B which is already removed. After reverting commit a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") the issue was no longer reproducible. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779 Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com Fixes: a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") Signed-off-by:
Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> Acked-by:
Manfred Spraul <manfred@colorfullife.com> Acked-by:
Herton R. Krzesinski <herton@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: <malat@debian.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 04, 2020
-
-
Alexey Dobriyan authored
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in seq_file.h. Conversion rule is: llseek => proc_lseek unlocked_ioctl => proc_ioctl xxx => proc_xxx delete ".owner = THIS_MODULE" line [akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c] [sfr@canb.auug.org.au: fix kernel/sched/psi.c] Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2 Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Lu Shuaibing authored
A use of uninitialized memory in msgctl_down() because msqid64 in ksys_msgctl hasn't been initialized. The local | msqid64 | is created in ksys_msgctl() and then passed into msgctl_down(). Along the way msqid64 is never initialized before msgctl_down() checks msqid64->msg_qbytes. KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool) reports: ================================================================== BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300 Read of size 8 at addr ffff88806bb97eb8 by task syz-executor707/2022 CPU: 0 PID: 2022 Comm: syz-executor707 Not tainted 5.2.0-rc4+ #63 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0x75/0xae __kumsan_report+0x17c/0x3e6 kumsan_report+0xe/0x20 msgctl_down+0x94/0x300 ksys_msgctl.constprop.14+0xef/0x260 do_syscall_64+0x7e/0x1f0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4400e9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd869e0598 EFLAGS: 00000246 ORIG_RAX: 0000000000000047 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004400e9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401970 R13: 0000000000401a00 R14: 0000000000000000 R15: 0000000000000000 The buggy address belongs to the page: page:ffffea0001aee5c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x100000000000000() raw: 0100000000000000 0000000000000000 ffffffff01ae0101 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kumsan: bad access detected ================================================================== Syzkaller reproducer: msgctl$IPC_RMID(0x0, 0x0) C reproducer: // autogenerated by syzkaller (https://github.com/google/syzkaller) int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); syscall(__NR_msgctl, 0, 0, 0); return 0; } [natechancellor@gmail.com: adjust indentation in ksys_msgctl] Link: https://github.com/ClangBuiltLinux/linux/issues/829 Link: http://lkml.kernel.org/r/20191218032932.37479-1-natechancellor@gmail.com Link: http://lkml.kernel.org/r/20190613014044.24234-1-shuaibinglu@126.com Signed-off-by:
Lu Shuaibing <shuaibinglu@126.com> Signed-off-by:
Nathan Chancellor <natechancellor@gmail.com> Suggested-by:
Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: NeilBrown <neilb@suse.com> From: Andrew Morton <akpm@linux-foundation.org> Subject: drivers/block/null_blk_main.c: fix layout Each line here overflows 80 cols by exactly one character. Delete one tab per line to fix. Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
Document and update the memory barriers in ipc/sem.c: - Add smp_store_release() to wake_up_sem_queue_prepare() and document why it is needed. - Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep(). as the pair for the barrier inside wake_up_sem_queue_prepare(). - Add comments to all barriers, and mention the rules in the block regarding locking. - Switch to using wake_q_add_safe(). Link: http://lkml.kernel.org/r/20191020123305.14715-6-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Waiman Long <longman@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <1vier1@web.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
Transfer findings from ipc/mqueue.c: - A control barrier was missing for the lockless receive case So in theory, not yet initialized data may have been copied to user space - obviously only for architectures where control barriers are not NOP. - use smp_store_release(). In theory, the refount may have been decreased to 0 already when wake_q_add() tries to get a reference. Link: http://lkml.kernel.org/r/20191020123305.14715-5-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Waiman Long <longman@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <1vier1@web.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
Update and document memory barriers for mqueue.c: - ewp->state is read without any locks, thus READ_ONCE is required. - add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need acquire semantics if the value is STATE_READY. - use wake_q_add_safe() - document why __set_current_state() may be used: Reading task->state cannot happen before the wake_q_add() call, which happens while holding info->lock. Thus the spin_unlock() is the RELEASE, and the spin_lock() is the ACQUIRE. For completeness: there is also a 3 CPU scenario, if the to be woken up task is already on another wake_q. Then: - CPU1: spin_unlock() of the task that goes to sleep is the RELEASE - CPU2: the spin_lock() of the waker is the ACQUIRE - CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE - CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE Link: http://lkml.kernel.org/r/20191020123305.14715-4-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Reviewed-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Waiman Long <longman@redhat.com> Cc: <1vier1@web.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
pipelined_send() and pipelined_receive() are identical, so merge them. [manfred@colorfullife.com: add changelog] Link: http://lkml.kernel.org/r/20191020123305.14715-3-manfred@colorfullife.com Signed-off-by:
Davidlohr Bueso <dave@stgolabs.net> Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: <1vier1@web.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 09, 2019
-
-
Pankaj Bharadiya authored
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by:
Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
-
- Nov 15, 2019
-
-
Arnd Bergmann authored
The CONFIG_64BIT_TIME option is defined on all architectures, and can be removed for simplicity now. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Sep 26, 2019
-
-
Joel Fernandes (Google) authored
CONFIG_PROVE_RCU_LIST requires list_for_each_entry_rcu() to pass a lockdep expression if using srcu or locking for protection. It can only check regular RCU protection, all other protection needs to be passed as lockdep expression. Link: http://lkml.kernel.org/r/20190830231817.76862-2-joel@joelfernandes.org Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: Jonathan Derrick <jonathan.derrick@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-