Skip to content
  • Ilya Dryomov's avatar
    Btrfs: eliminate races in worker stopping code · 964fb15a
    Ilya Dryomov authored
    
    
    The current implementation of worker threads in Btrfs has races in
    worker stopping code, which cause all kinds of panics and lockups when
    running btrfs/011 xfstest in a loop.  The problem is that
    btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
    check_busy_worker and __btrfs_start_workers.
    
    E.g., check_idle_worker race flow:
    
           btrfs_stop_workers():            check_idle_worker(aworker):
    - grabs the lock
    - splices the idle list into the
      working list
    - removes the first worker from the
      working list
    - releases the lock to wait for
      its kthread's completion
                                      - grabs the lock
                                      - if aworker is on the working list,
                                        moves aworker from the working list
                                        to the idle list
                                      - releases the lock
    - grabs the lock
    - puts the worker
    - removes the second worker from the
      working list
                                  ......
            btrfs_stop_workers returns, aworker is on the idle list
                     FS is umounted, memory is freed
                                  ......
                  aworker is waken up, fireworks ensue
    
    With this applied, I wasn't able to trigger the problem in 48 hours,
    whereas previously I could reliably reproduce at least one of these
    races within an hour.
    
    Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
    Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    964fb15a