Skip to content
  • Sebastian Andrzej Siewior's avatar
    workqueue: Remove the warning in wq_worker_sleeping() · 62849a96
    Sebastian Andrzej Siewior authored
    The kernel test robot triggered a warning with the following race:
       task-ctx A                            interrupt-ctx B
     worker
      -> process_one_work()
        -> work_item()
          -> schedule();
             -> sched_submit_work()
               -> wq_worker_sleeping()
                 -> ->sleeping = 1
                   atomic_dec_and_test(nr_running)
             __schedule();                *interrupt*
                                           async_page_fault()
                                           -> local_irq_enable();
                                           -> schedule();
                                              -> sched_submit_work()
                                                -> wq_worker_sleeping()
                                                   -> if (WARN_ON(->sleeping)) return
                                              -> __schedule()
                                                ->  sched_update_worker()
                                                  -> wq_worker_running()
                                                     -> atomic_inc(nr_running);
                                                     -> ->sleeping = 0;
    
          ->  sched_update_worker()
            -> wq_worker_running()
              if (!->sleeping) return
    
    In this context the warning is pointless everything is fine.
    An interrupt before wq_worker_sleeping() will perform the ->sleeping
    assignment (0 -> 1 > 0) twice.
    An interrupt after wq_worker_sleeping() will trigger the warning and
    nr_running will be decremented (by A) and incremented once (only by B, A
    will skip it). This is the case until the ->sleeping is zeroed again in
    wq_worker_running().
    
    Remove the WARN statement because this condition may happen. Document
    that preemption around wq_worker_sleeping() needs to be disabled to
    protect ->sleeping and not just as an optimisation.
    
    Fixes: 6d25be57
    
     ("sched/core, workqueues: Distangle worker accounting from rq lock")
    Reported-by: default avatarkernel test robot <lkp@intel.com>
    Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
    62849a96