Skip to content
  • Vikram Mulukutla's avatar
    lib/spinlock_debug: avoid livelock in do_raw_spin_lock() · 214f766e
    Vikram Mulukutla authored
    
    
    The logic in do_raw_spin_lock() attempts to acquire a spinlock by invoking
    arch_spin_trylock() in a loop with a delay between each attempt.  Now
    consider the following situation in a 2 CPU system:
    
    1. CPU-0 continually acquires and releases a spinlock in a
       tight loop; it stays in this loop until some condition X
       is satisfied. X can only be satisfied by another CPU.
    
    2. CPU-1 tries to acquire the same spinlock, in an attempt
       to satisfy the aforementioned condition X. However, it
       never sees the unlocked value of the lock because the
       debug spinlock code uses trylock instead of just lock;
       it checks at all the wrong moments - whenever CPU-0 has
       locked the lock.
    
    Now in the absence of debug spinlocks, the architecture specific spinlock
    code can correctly allow CPU-1 to wait in a "queue" (e.g., ticket
    spinlocks), ensuring that it acquires the lock at some point.  However,
    with the debug spinlock code, livelock can easily occur due to the use of
    try_lock, which obviously cannot put the CPU in that "queue".  This
    queueing mechanism is implemented in both x86 and ARM spinlock code.
    
    Note that the situation mentioned above is not hypothetical.  A real
    problem was encountered where CPU-0 was running hrtimer_cancel with
    interrupts disabled, and CPU-1 was attempting to run the hrtimer that
    CPU-0 was trying to cancel.
    
    Address this by actually attempting arch_spin_lock once it is suspected
    that there is a spinlock lockup.  If we're in a situation that is
    described above, the arch_spin_lock should succeed; otherwise other
    timeout mechanisms (e.g., watchdog) should alert the system of a lockup.
    Therefore, if there is a genuine system problem and the spinlock can't be
    acquired, the end result (irrespective of this change being present) is
    the same.  If there is a livelock caused by the debug code, this change
    will allow the lock to be acquired, depending on the implementation of the
    lower level arch specific spinlock code.
    
    [akpm@linux-foundation.org: tweak comment]
    Signed-off-by: default avatarVikram Mulukutla <markivx@codeaurora.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    214f766e