- Dec 15, 2020
-
-
Thomas Gleixner authored
Most users of kstat_irqs_cpu() have the irq descriptor already. No point in calling into the core code and looking it up once more. Use it in per_cpu_count_show() to start with. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194043.362094758@linutronix.de
-
Thomas Gleixner authored
No more users outside the core code. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194043.268774449@linutronix.de
-
Thomas Gleixner authored
Both the per cpu stats and the accumulated count are accessed lockless and can be concurrently modified. That's intentional and the stats are a rough estimate anyway. Annotate them with data_race(). Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194043.067097663@linutronix.de
-
Thomas Gleixner authored
irq_set_lockdep_class() is used from modules and requires irq_to_desc() to be exported. Move it into the core code which lifts another requirement for the export. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.860029489@linutronix.de
-
Thomas Gleixner authored
These checks are used by modules and prevent the removal of the export of irq_to_desc(). Move the accessor into the core. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.703779349@linutronix.de
-
Thomas Gleixner authored
This function uses irq_to_desc() and is going to be used by modules to replace the open coded irq_to_desc() (ab)usage. The final goal is to remove the export of irq_to_desc() so driver cannot fiddle with it anymore. Move it into the core code and fixup the usage sites to include the proper header. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.548936472@linutronix.de
-
- Dec 11, 2020
-
-
John Garry authored
Add a function to allow the affinity of an interrupt be switched to managed, such that interrupts allocated for platform devices may be managed. This new interface has certain limitations, and attempts to use it in the following circumstances will fail: - For when the kernel is configured for generic IRQ reservation mode (in config GENERIC_IRQ_RESERVATION_MODE). The reason being that it could conflict with managed vs. non-managed interrupt accounting. - The interrupt is already started, which should not be the case during init - The interrupt is already configured as managed, which means double init Suggested-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
John Garry <john.garry@huawei.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1606905417-183214-2-git-send-email-john.garry@huawei.com
-
Valentin Schneider authored
handle_percpu_devid_fasteoi_ipi() has no more users, and handle_percpu_devid_irq() can do all that it was supposed to do. Get rid of it. This reverts commit c5e5ec03. Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201109094121.29975-6-valentin.schneider@arm.com
-
- Dec 02, 2020
-
-
Frederic Weisbecker authored
Now that account_hardirq_enter() is called after HARDIRQ_OFFSET has been incremented, there is nothing left that prevents us from also moving tick_irq_enter() after HARDIRQ_OFFSET is incremented. The desired outcome is to remove the nasty hack that prevents softirqs from being raised through ksoftirqd instead of the hardirq bottom half. Also tick_irq_enter() then becomes appropriately covered by lockdep. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-6-frederic@kernel.org
-
Frederic Weisbecker authored
IRQ time entry is currently accounted before HARDIRQ_OFFSET or SOFTIRQ_OFFSET are incremented. This is convenient to decide to which index the cputime to account is dispatched. Unfortunately it prevents tick_irq_enter() from being called under HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ entry accounting due to the necessary clock catch up. As a result we don't benefit from appropriate lockdep coverage on tick_irq_enter(). To prepare for fixing this, move the IRQ entry cputime accounting after the preempt offset is incremented. This requires the cputime dispatch code to handle the extra offset. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20201202115732.27827-5-frederic@kernel.org
-
Frederic Weisbecker authored
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org
-
Frederic Weisbecker authored
s390 has its own version of IRQ entry accounting because it doesn't account the idle time the same way the other architectures do. Only the actual idle sleep time is accounted as idle time, the rest of the idle task execution is accounted as system time. Make the generic IRQ entry accounting aware of architectures that have their own way of accounting idle time and convert s390 to use it. This prepares s390 to get involved in further consolidations of IRQ time accounting. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-3-frederic@kernel.org
-
Frederic Weisbecker authored
account_irq_enter_time() and account_irq_exit_time() are not called from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ cputime accounting functions called from there. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-2-frederic@kernel.org
-
- Nov 30, 2020
-
-
Marc Zyngier authored
When an interrupt allocation fails for N interrupts, it is pretty common for the error handling code to free the same number of interrupts, no matter how many interrupts have actually been allocated. This may result in the domain freeing code to be unexpectedly called for interrupts that have no mapping in that domain. Things end pretty badly. Instead, add some checks to irq_domain_free_irqs_hierarchy() to make sure that thiss does not follow the hierarchy if no mapping exists for a given interrupt. Fixes: 6a6544e5 ("genirq/irqdomain: Remove auto-recursive hierarchy support") Signed-off-by:
Marc Zyngier <maz@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201129135551.396777-1-maz@kernel.org
-
- Nov 27, 2020
-
-
John Ogness authored
Any record with a trailing newline (LOG_NEWLINE flag) cannot be continued because the newline has been stripped and will not be visible if the message is appended. This was already handled correctly when committing in log_output() but was not handled correctly when committing in log_store(). Fixes: f5f022e5 ("printk: reimplement log_cont using record extension") Link: https://lore.kernel.org/r/20201126114836.14750-1-john.ogness@linutronix.de Reported-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by:
John Ogness <john.ogness@linutronix.de> Tested-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by:
Petr Mladek <pmladek@suse.com> Signed-off-by:
Petr Mladek <pmladek@suse.com>
-
- Nov 24, 2020
-
-
Peter Zijlstra authored
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by:
Sven Schnelle <svens@linux.ibm.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> Tested-by:
Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
-
- Nov 23, 2020
-
-
Thomas Gleixner authored
To prepare for adding a RT aware variant of softirq serialization and processing move related code into one section so the necessary #ifdeffery is reduced to one. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20201113141733.974214480@linutronix.de
-
- Nov 19, 2020
-
-
Luo Meng authored
Fix a mutex_unlock() issue where before copy_from_user() is not called mutex_locked. Fixes: 4b1a29a7 ("error-injection: Support fault injection framework") Reported-by:
Hulk Robot <hulkci@huawei.com> Signed-off-by:
Luo Meng <luomeng12@huawei.com> Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Acked-by:
Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/160570737118.263807.8358435412898356284.stgit@devnote2
-
Daniel Xu authored
do_strncpy_from_user() may copy some extra bytes after the NUL terminator into the destination buffer. This usually does not matter for normal string operations. However, when BPF programs key BPF maps with strings, this matters a lot. A BPF program may read strings from user memory by calling the bpf_probe_read_user_str() helper which eventually calls do_strncpy_from_user(). The program can then key a map with the destination buffer. BPF map keys are fixed-width and string-agnostic, meaning that map keys are treated as a set of bytes. The issue is when do_strncpy_from_user() overcopies bytes after the NUL terminator, it can result in seemingly identical strings occupying multiple slots in a BPF map. This behavior is subtle and totally unexpected by the user. This commit masks out the bytes following the NUL while preserving long-sized stride in the fast path. Fixes: 6ae08ae3 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") Signed-off-by:
Daniel Xu <dxu@dxuuu.xyz> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/21efc982b3e9f2f7b0379eed642294caaa0c27a7.1605642949.git.dxu@dxuuu.xyz
-
- Nov 17, 2020
-
-
Mickaël Salaün authored
Replace the use of security_capable(current_cred(), ...) with ns_capable_noaudit() which set PF_SUPERPRIV. Since commit 98f368e9 ("kernel: Add noaudit variant of ns_capable()"), a new ns_capable_noaudit() helper is available. Let's use it! Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Fixes: e2cfabdf ("seccomp: add system call filtering using BPF") Signed-off-by:
Mickaël Salaün <mic@linux.microsoft.com> Reviewed-by:
Jann Horn <jannh@google.com> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201030123849.770769-3-mic@digikod.net
-
Mickaël Salaün authored
Commit 69f594a3 ("ptrace: do not audit capability check when outputing /proc/pid/stat") replaced the use of ns_capable() with has_ns_capability{,_noaudit}() which doesn't set PF_SUPERPRIV. Commit 6b3ad664 ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()") replaced has_ns_capability{,_noaudit}() with security_capable(), which doesn't set PF_SUPERPRIV neither. Since commit 98f368e9 ("kernel: Add noaudit variant of ns_capable()"), a new ns_capable_noaudit() helper is available. Let's use it! As a result, the signature of ptrace_has_cap() is restored to its original one. Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: stable@vger.kernel.org Fixes: 6b3ad664 ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()") Fixes: 69f594a3 ("ptrace: do not audit capability check when outputing /proc/pid/stat") Signed-off-by:
Mickaël Salaün <mic@linux.microsoft.com> Reviewed-by:
Jann Horn <jannh@google.com> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201030123849.770769-2-mic@digikod.net
-
Boqun Feng authored
A warning was hit when running xfstests/generic/068 in a Hyper-V guest: [...] ------------[ cut here ]------------ [...] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) [...] WARNING: CPU: 2 PID: 1350 at kernel/locking/lockdep.c:5280 check_flags.part.0+0x165/0x170 [...] ... [...] Workqueue: events pwq_unbound_release_workfn [...] RIP: 0010:check_flags.part.0+0x165/0x170 [...] ... [...] Call Trace: [...] lock_is_held_type+0x72/0x150 [...] ? lock_acquire+0x16e/0x4a0 [...] rcu_read_lock_sched_held+0x3f/0x80 [...] __send_ipi_one+0x14d/0x1b0 [...] hv_send_ipi+0x12/0x30 [...] __pv_queued_spin_unlock_slowpath+0xd1/0x110 [...] __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20 [...] .slowpath+0x9/0xe [...] lockdep_unregister_key+0x128/0x180 [...] pwq_unbound_release_workfn+0xbb/0xf0 [...] process_one_work+0x227/0x5c0 [...] worker_thread+0x55/0x3c0 [...] ? process_one_work+0x5c0/0x5c0 [...] kthread+0x153/0x170 [...] ? __kthread_bind_mask+0x60/0x60 [...] ret_from_fork+0x1f/0x30 The cause of the problem is we have call chain lockdep_unregister_key() -> <irq disabled by raw_local_irq_save()> lockdep_unlock() -> arch_spin_unlock() -> __pv_queued_spin_unlock_slowpath() -> pv_kick() -> __send_ipi_one() -> trace_hyperv_send_ipi_one(). Although this particular warning is triggered because Hyper-V has a trace point in ipi sending, but in general arch_spin_unlock() may call another function having a trace point in it, so put the arch_spin_lock() and arch_spin_unlock() after lock_recursion protection to fix this problem and avoid similiar problems. Signed-off-by:
Boqun Feng <boqun.feng@gmail.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201113110512.1056501-1-boqun.feng@gmail.com
-
Juri Lelli authored
Glenn reported that "an application [he developed produces] a BUG in deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested PTHREAD_PRIO_INHERIT mutexes. I believe the bug is triggered when a CFS task that was boosted by a SCHED_DEADLINE task boosts another CFS task (nested priority inheritance). ------------[ cut here ]------------ kernel BUG at kernel/sched/deadline.c:1462! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ... Hardware name: ... RIP: 0010:enqueue_task_dl+0x335/0x910 Code: ... RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002 RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500 RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600 RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078 R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600 R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600 FS: 00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? intel_pstate_update_util_hwp+0x13/0x170 rt_mutex_setprio+0x1cc/0x4b0 task_blocks_on_rt_mutex+0x225/0x260 rt_spin_lock_slowlock_locked+0xab/0x2d0 rt_spin_lock_slowlock+0x50/0x80 hrtimer_grab_expiry_lock+0x20/0x30 hrtimer_cancel+0x13/0x30 do_nanosleep+0xa0/0x150 hrtimer_nanosleep+0xe1/0x230 ? __hrtimer_init_sleeper+0x60/0x60 __x64_sys_nanosleep+0x8d/0xa0 do_syscall_64+0x4a/0x100 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fa58b52330d ... ---[ end trace 0000000000000002 ]— He also provided a simple reproducer creating the situation below: So the execution order of locking steps are the following (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2 are mutexes that are enabled * with priority inheritance.) Time moves forward as this timeline goes down: N1 N2 D1 | | | | | | Lock(M1) | | | | | | Lock(M2) | | | | | | Lock(M2) | | | | Lock(M1) | | (!!bug triggered!) | Daniel reported a similar situation as well, by just letting ksoftirqd run with DEADLINE (and eventually block on a mutex). Problem is that boosted entities (Priority Inheritance) use static DEADLINE parameters of the top priority waiter. However, there might be cases where top waiter could be a non-DEADLINE entity that is currently boosted by a DEADLINE entity from a different lock chain (i.e., nested priority chains involving entities of non-DEADLINE classes). In this case, top waiter static DEADLINE parameters could be null (initialized to 0 at fork()) and replenish_dl_entity() would hit a BUG(). Fix this by keeping track of the original donor and using its parameters when a task is boosted. Reported-by:
Glenn Elliott <glenn@aurora.tech> Reported-by:
Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by:
Juri Lelli <juri.lelli@redhat.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by:
Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com
-
Peter Zijlstra authored
schedule() ttwu() deactivate_task(); if (p->on_rq && ...) // false atomic_dec(&task_rq(p)->nr_iowait); if (prev->in_iowait) atomic_inc(&rq->nr_iowait); Allows nr_iowait to be decremented before it gets incremented, resulting in more dodgy IO-wait numbers than usual. Note that because we can now do ttwu_queue_wakelist() before p->on_cpu==0, we lose the natural ordering and have to further delay the decrement. Fixes: c6e7bd7a ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Mel Gorman <mgorman@techsingularity.net> Link: https://lkml.kernel.org/r/20201117093829.GD3121429@hirez.programming.kicks-ass.net
-
Quentin Perret authored
enqueue_task_fair() attempts to skip the overutilized update for new tasks as their util_avg is not accurate yet. However, the flag we check to do so is overwritten earlier on in the function, which makes the condition pretty much a nop. Fix this by saving the flag early on. Fixes: 2802bf3c ("sched/fair: Add over-utilization/tipping point indicator") Reported-by:
Rick Yiu <rickyiu@google.com> Signed-off-by:
Quentin Perret <qperret@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by:
Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20201112111201.2081902-1-qperret@google.com
-
- Nov 16, 2020
-
-
Mauro Carvalho Chehab authored
Some identifiers have different names between their prototypes and the kernel-doc markup. Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/13a44f4f0c3135e14b16ae8fcce4af1eab27cb5f.1605521731.git.mchehab+huawei@kernel.org
-
- Nov 15, 2020
-
-
Thomas Gleixner authored
No users outside of the core code. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87a6vja7mb.fsf@nanos.tec.linutronix.de
-
- Nov 14, 2020
-
-
Thomas Gleixner authored
Commit bb9d8126 ("arch: remove tile port") removed the last user of this cruft two years ago... Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87eekvac06.fsf@nanos.tec.linutronix.de
-
Christophe Leroy authored
Before commit 3f388f28 ("panic: dump registers on panic_on_warn"), __warn() was calling show_regs() when regs was not NULL, and show_stack() otherwise. After that commit, show_stack() is called regardless of whether show_regs() has been called or not, leading to duplicated Call Trace: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/nohash/8xx.c:186 mmu_mark_initmem_nx+0x24/0x94 CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty #4092 NIP: c00128b4 LR: c0010228 CTR: 00000000 REGS: c9023e40 TRAP: 0700 Not tainted (5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24000424 XER: 00000000 GPR00: c0010228 c9023ef8 c2100000 0074c000 ffffffff 00000000 c2151000 c07b3880 GPR08: ff000900 0074c000 c8000000 c33b53a8 24000822 00000000 c0003a20 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 GPR24: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00800000 NIP [c00128b4] mmu_mark_initmem_nx+0x24/0x94 LR [c0010228] free_initmem+0x20/0x58 Call Trace: free_initmem+0x20/0x58 kernel_init+0x1c/0x114 ret_from_kernel_thread+0x14/0x1c Instruction dump: 7d291850 7d234b78 4e800020 9421ffe0 7c0802a6 bfc10018 3fe0c060 3bff0000 3fff4080 3bffffff 90010024 57ff0010 <0fe00000> 392001cd 7c3e0b78 953e0008 CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty #4092 Call Trace: __warn+0x8c/0xd8 (unreliable) report_bug+0x11c/0x154 program_check_exception+0x1dc/0x6e0 ret_from_except_full+0x0/0x4 --- interrupt: 700 at mmu_mark_initmem_nx+0x24/0x94 LR = free_initmem+0x20/0x58 free_initmem+0x20/0x58 kernel_init+0x1c/0x114 ret_from_kernel_thread+0x14/0x1c ---[ end trace 31702cd2a9570752 ]--- Only call show_stack() when regs is NULL. Fixes: 3f388f28 ("panic: dump registers on panic_on_warn") Signed-off-by:
Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lkml.kernel.org/r/e8c055458b080707f1bc1a98ff8bea79d0cec445.1604748361.git.christophe.leroy@csgroup.eu Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Santosh Sivaraj authored
Define watchdog_allowed_mask only when SOFTLOCKUP_DETECTOR is enabled. Fixes: 7feeb9cd ("watchdog/sysctl: Clean up sysctl variable name space") Signed-off-by:
Santosh Sivaraj <santosh@fossix.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Petr Mladek <pmladek@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20201106015025.1281561-1-santosh@fossix.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matteo Croce authored
Limit the CPU number to num_possible_cpus(), because setting it to a value lower than INT_MAX but higher than NR_CPUS produces the following error on reboot and shutdown: BUG: unable to handle page fault for address: ffffffff90ab1bb0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:migrate_to_reboot_cpu+0xe/0x60 Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c RSP: 0018:ffffc90000013e08 EFLAGS: 00010246 RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000 RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0 RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __do_sys_reboot.cold+0x34/0x5b do_syscall_64+0x2d/0x40 Fixes: 1b3a5d02 ("reboot: move arch/x86 reboot= handling to generic kernel") Signed-off-by:
Matteo Croce <mcroce@microsoft.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fabian Frederick <fabf@skynet.be> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201103214025.116799-3-mcroce@linux.microsoft.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matteo Croce authored
Patch series "fix parsing of reboot= cmdline", v3. The parsing of the reboot= cmdline has two major errors: - a missing bound check can crash the system on reboot - parsing of the cpu number only works if specified last Fix both. This patch (of 2): This reverts commit 616feab7. kstrtoint() and simple_strtoul() have a subtle difference which makes them non interchangeable: if a non digit character is found amid the parsing, the former will return an error, while the latter will just stop parsing, e.g. simple_strtoul("123xyx") = 123. The kernel cmdline reboot= argument allows to specify the CPU used for rebooting, with the syntax `s####` among the other flags, e.g. "reboot=warm,s31,force", so if this flag is not the last given, it's silently ignored as well as the subsequent ones. Fixes: 616feab7 ("kernel/reboot.c: convert simple_strtoul to kstrtoint") Signed-off-by:
Matteo Croce <mcroce@microsoft.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Petr Mladek <pmladek@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201103214025.116799-2-mcroce@linux.microsoft.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Dmitrii Banshchikov authored
Currently verifier enforces return code checks for subprograms in the same manner as it does for program entry points. This prevents returning arbitrary scalar values from subprograms. Scalar type of returned values is checked by btf_prepare_func_args() and hence it should be safe to allow only scalars for now. Relax return code checks for subprograms and allow any correct scalar values. Fixes: 51c39bb1 (bpf: Introduce function-by-function verification) Signed-off-by:
Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Acked-by:
Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201113171756.90594-1-me@ubique.spb.ru
-
- Nov 11, 2020
-
-
Kaixu Xia authored
The unsigned variable datasec_id is assigned a return value from the call to check_pseudo_btf_id(), which may return negative error code. This fixes the following coccicheck warning: ./kernel/bpf/verifier.c:9616:5-15: WARNING: Unsigned expression compared with zero: datasec_id > 0 Fixes: eaa6bcb7 ("bpf: Introduce bpf_per_cpu_ptr()") Reported-by:
Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by:
Kaixu Xia <kaixuxia@tencent.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Andrii Nakryiko <andrii@kernel.org> Acked-by:
John Fastabend <john.fastabend@gmail.com> Cc: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/1605071026-25906-1-git-send-email-kaixuxia@tencent.com
-
Paul E. McKenney authored
The try_invoke_on_locked_down_task() function requires that interrupts be enabled, but it is called with interrupts disabled from rcu_print_task_stall(), resulting in an "IRQs not enabled as expected" diagnostic. This commit therefore updates rcu_print_task_stall() to accumulate a list of the first few tasks while holding the current leaf rcu_node structure's ->lock, then releases that lock and only then uses try_invoke_on_locked_down_task() to attempt to obtain per-task detailed information. Of course, as soon as ->lock is released, the task might exit, so the get_task_struct() function is used to prevent the task structure from going away in the meantime. Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/ Fixes: 5bef8da6 ("rcu: Add per-task state to RCU CPU stall warnings") Reported-by:
<syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com> Reported-by:
<syzbot+f04854e1c5c9e913cc27@syzkaller.appspotmail.com> Tested-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org>
-
- Nov 10, 2020
-
-
Colin Ian King authored
Reading /proc/sys/kernel/sched_domain/cpu*/domain0/flags mutliple times with small reads causes oopses with slub corruption issues because the kfree is free'ing an offset from a previous allocation. Fix this by adding in a new pointer 'buf' for the allocation and kfree and use the temporary pointer tmp to handle memory copies of the buf offsets. Fixes: 5b9f8ff7 ("sched/debug: Output SD flag names rather than their values") Reported-by:
Jeff Bastian <jbastian@redhat.com> Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20201029151103.373410-1-colin.king@canonical.com
-
Vincent Guittot authored
During fast wakeup path, scheduler always check whether local or prev cpus are good candidates for the task before looking for other cpus in the domain. With commit b7a33161 ("sched/fair: Add asymmetric CPU capacity wakeup scan") the heterogenous system gains a dedicated path but doesn't try to reuse prev cpu whenever possible. If the previous cpu is idle and belong to the LLC domain, we should check it 1st before looking for another cpu because it stays one of the best candidate and this also stabilizes task placement on the system. This change aligns asymmetric path behavior with symmetric one and reduces cases where the task migrates across all cpus of the sd_asym_cpucapacity domains at wakeup. This change does not impact normal EAS mode but only the overloaded case or when EAS is not used. - On hikey960 with performance governor (EAS disable) ./perf bench sched pipe -T -l 50000 mainline w/ patch # migrations 999364 0 ops/sec 149313(+/-0.28%) 182587(+/- 0.40) +22% - On hikey with performance governor ./perf bench sched pipe -T -l 50000 mainline w/ patch # migrations 0 0 ops/sec 47721(+/-0.76%) 47899(+/- 0.56) +0.4% According to test on hikey, the patch doesn't impact symmetric system compared to current implementation (only tested on arm64) Also read the uclamped value of task's utilization at most twice instead instead each time we compare task's utilization with cpu's capacity. Fixes: b7a33161 ("sched/fair: Add asymmetric CPU capacity wakeup scan") Signed-off-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by:
Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20201029161824.26389-1-vincent.guittot@linaro.org
-
Vincent Guittot authored
schbench shows latency increase for 95 percentile above since: commit 0b0695f2 ("sched/fair: Rework load_balance()") Align the behavior of the load balancer with the wake up path, which tries to select an idle CPU which belongs to the LLC for a waking task. calculate_imbalance() will use nr_running instead of the spare capacity when CPUs share resources (ie cache) at the domain level. This will ensure a better spread of tasks on idle CPUs. Running schbench on a hikey (8cores arm64) shows the problem: tip/sched/core : schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10 Latency percentiles (usec) 50.0th: 33 75.0th: 45 90.0th: 51 95.0th: 4152 *99.0th: 14288 99.5th: 14288 99.9th: 14288 min=0, max=14276 tip/sched/core + patch : schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10 Latency percentiles (usec) 50.0th: 34 75.0th: 47 90.0th: 52 95.0th: 78 *99.0th: 94 99.5th: 94 99.9th: 94 min=0, max=94 Fixes: 0b0695f2 ("sched/fair: Rework load_balance()") Reported-by:
Chris Mason <clm@fb.com> Suggested-by:
Rik van Riel <riel@surriel.com> Signed-off-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Rik van Riel <riel@surriel.com> Tested-by:
Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20201102102457.28808-1-vincent.guittot@linaro.org
-
Boqun Feng authored
Chris Wilson reported a problem spotted by check_chain_key(): a chain key got changed in validate_chain() because we modify the ->read in validate_chain() to skip checks for dependency adding, and ->read is taken into calculation for chain key since commit f611e8cf ("lockdep: Take read/write status in consideration when generate chainkey"). Fix this by avoiding to modify ->read in validate_chain() based on two facts: a) since we now support recursive read lock detection, there is no need to skip checks for dependency adding for recursive readers, b) since we have a), there is only one case left (nest_lock) where we want to skip checks in validate_chain(), we simply remove the modification for ->read and rely on the return value of check_deadlock() to skip the dependency adding. Reported-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Boqun Feng <boqun.feng@gmail.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201102053743.450459-1-boqun.feng@gmail.com
-
Rafael J. Wysocki authored
A new cpufreq governor flag will be added subsequently, so replace the bool dynamic_switching fleid in struct cpufreq_governor with a flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for the "dynamic switching" governors instead of it. No intentional functional impact. Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by:
Viresh Kumar <viresh.kumar@linaro.org>
-