-
Dmitry Safonov authored
Hung task detector has one timeout and has two associated actions on it: - issuing warnings with names and stacks of blocked tasks - panic() We want switches to panic (and reboot) if there's a task in uninterruptible sleep for some minutes - at that moment something ugly has happened and the box needs a reboot. But we also want to detect conditions that are "out of range" or approaching the point of failure. Under such conditions we want to issue an "early warning" of an impending failure, minutes before the switch is going to panic. Those "early warnings" serve a purpose while monitoring the network infrastructure. Those are also valuable on post-mortem analysis, when the logs from userspace applications aren't enough. Furthermore, we have a test pool of long-running duts that are constantly under close to real-world load for weeks. And such early warnings allowed to figure out some bottle necks without much engineer work intervention. There are also not yet upstream patches for other kinds of "early warnings" as prints whenever a mutex/semaphore is released after being held for long time, but those patches are much more intricate and have their runtime cost. It seems rather easy to add printing tasks and their stacks for notification and debugging purposes into hung task detector without complicating the code or major cost (prints are with KERN_INFO loglevel and so don't go on console, only into dmesg log). Since a2e51445 ("kernel/hung_task.c: allow to set checking interval separately from timeout") it's possible to set checking interval for hung task detector with `hung_task_check_interval_secs`. Provide `hung_task_interval_warnings` sysctl that allows printing hung tasks every detection interval. It's not ratelimited, so the root should be cautious configuring it. Link: http://lkml.kernel.org/r/20190724170249.9644-1-dima@arista.com Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Vasiliy Khoruzhick <vasilykh@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
174eff39