Skip to content
  • Dmitry Safonov's avatar
    hung_task: allow printing warnings every check interval · 174eff39
    Dmitry Safonov authored
    Hung task detector has one timeout and has two associated actions on it:
    
    - issuing warnings with names and stacks of blocked tasks
    - panic()
    
    We want switches to panic (and reboot) if there's a task in
    uninterruptible sleep for some minutes - at that moment something ugly has
    happened and the box needs a reboot.  But we also want to detect
    conditions that are "out of range" or approaching the point of failure.
    Under such conditions we want to issue an "early warning" of an impending
    failure, minutes before the switch is going to panic.
    
    Those "early warnings" serve a purpose while monitoring the network
    infrastructure.  Those are also valuable on post-mortem analysis, when the
    logs from userspace applications aren't enough.  Furthermore, we have a
    test pool of long-running duts that are constantly under close to
    real-world load for weeks.  And such early warnings allowed to figure out
    some bottle necks without much engineer work intervention.
    
    There are also not yet upstream patches for other kinds of "early
    warnings" as prints whenever a mutex/semaphore is released after being
    held for long time, but those patches are much more intricate and have
    their runtime cost.
    
    It seems rather easy to add printing tasks and their stacks for
    notification and debugging purposes into hung task detector without
    complicating the code or major cost (prints are with KERN_INFO loglevel
    and so don't go on console, only into dmesg log).
    
    Since a2e51445 ("kernel/hung_task.c: allow to set checking interval
    separately from timeout") it's possible to set checking interval for hung
    task detector with `hung_task_check_interval_secs`.
    
    Provide `hung_task_interval_warnings` sysctl that allows printing hung
    tasks every detection interval.  It's not ratelimited, so the root should
    be cautious configuring it.
    
    Link: http://lkml.kernel.org/r/20190724170249.9644-1-dima@arista.com
    
    
    Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
    Cc: Vasiliy Khoruzhick <vasilykh@arista.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
    174eff39