Commit d283c640 authored by Ulrich Obergfell's avatar Ulrich Obergfell Committed by Linus Torvalds

watchdog: fix error handling in proc_watchdog_thresh()

The original watchdog_park_threads() function that was introduced by
commit 81a4beef ("watchdog: introduce watchdog_park_threads() and
watchdog_unpark_threads()") takes a very simple approach to handle
errors returned by kthread_park(): It attempts to roll back all watchdog
threads to the unparked state.  However, this may be undesired behaviour
from the perspective of the caller which may want to handle errors as
appropriate in its specific context.  Currently, there are two possible
call chains:

- watchdog suspend/resume interface

    lockup_detector_suspend
      watchdog_park_threads

- write to parameters in /proc/sys/kernel

    proc_watchdog_update
      watchdog_enable_all_cpus
        update_watchdog_all_cpus
          watchdog_park_threads

Instead of 'blindly' attempting to unpark the watchdog threads if a
kthread_park() call fails, the new approach is to disable the lockup
detectors in the above call chains.  Failure becomes visible to the user
as follows:

- error messages from lockup_detector_suspend()
                   or watchdog_enable_all_cpus()

- the state that can be read from /proc/sys/kernel/watchdog_enabled

- the 'write' system call in the latter call chain returns an error

I did not experience kthread_park() failures in practice, I used some
instrumentation to fake error returns from kthread_park() in order to test
the patches.

This patch (of 5):

Restore the previous value of watchdog_thresh _and_ sample_period if
proc_watchdog_update() returns an error.  The variables must be consistent
to avoid false positives of the lockup detectors.
Signed-off-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
Acked-by: default avatarDon Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 451637e4
......@@ -914,13 +914,14 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
goto out;
/*
* Update the sample period.
* Restore 'watchdog_thresh' on failure.
* Update the sample period. Restore on failure.
*/
set_sample_period();
err = proc_watchdog_update();
if (err)
if (err) {
watchdog_thresh = old;
set_sample_period();
}
out:
mutex_unlock(&watchdog_proc_mutex);
return err;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment