Skip to content
  • Hugh Dickins's avatar
    tmpfs: fix race between umount and writepage · b1dea800
    Hugh Dickins authored
    Konstanin Khlebnikov reports that a dangerous race between umount and
    shmem_writepage can be reproduced by this script:
    
      for i in {1..300} ; do
    	mkdir $i
    	while true ; do
    		mount -t tmpfs none $i
    		dd if=/dev/zero of=$i/test bs=1M count=$(($RANDOM % 100))
    		umount $i
    	done &
      done
    
    on a 6xCPU node with 8Gb RAM: kernel very unstable after this accident. =)
    
    Kernel log:
    
      VFS: Busy inodes after unmount of tmpfs.
                     Self-destruct in 5 seconds.  Have a nice day...
    
      WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
      list_del corruption. prev->next should be ffff880222fdaac8, but was (null)
      Pid: 11222, comm: mount.tmpfs Not tainted 2.6.39-rc2+ #4
      Call Trace:
       warn_slowpath_common+0x80/0x98
       warn_slowpath_fmt+0x41/0x43
       __list_del_entry+0x8d/0x98
       evict+0x50/0x113
       iput+0x138/0x141
      ...
      BUG: unable to handle kernel paging request at ffffffffffffffff
      IP: shmem_free_blocks+0x18/0x4c
      Pid: 10422, comm: dd Tainted: G        W   2.6.39-rc2+ #4
      Call Trace:
       shmem_recalc_inode+0x61/0x66
       shmem_writepage+0xba/0x1dc
       pageout+0x13c/0x24c
       shrink_page_list+0x28e/0x4be
       shrink_inactive_list+0x21f/0x382
      ...
    
    shmem_writepage() calls igrab() on the inode for the page which came from
    page reclaim, to add it later into shmem_swaplist for swapoff operation.
    
    This igrab() can race with super-block deactivating process:
    
      shrink_inactive_list()          deactivate_super()
      pageout()                       tmpfs_fs_type->kill_sb()
      shmem_writepage()               kill_litter_super()
                                      generic_shutdown_super()
                                       evict_inodes()
       igrab()
                                        atomic_read(&inode->i_count)
                                         skip-inode
       iput()
                                       if (!list_empty(&sb->s_inodes))
                                              printk("VFS: Busy inodes after...
    
    This igrap-iput pair was added in commit 1b1b32f2 "tmpfs: fix
    shmem_swaplist races" based on incorrect assumptions: igrab() protects the
    inode from concurrent eviction by deletion, but it does nothing to protect
    it from concurrent unmounting, which goes ahead despite the raised
    i_count.
    
    So this use of igrab() was wrong all along, but the race made much worse
    in 2.6.37 when commit 63997e98
    
     "split invalidate_inodes()" replaced
    two attempts at invalidate_inodes() by a single evict_inodes().
    
    Konstantin posted a plausible patch, raising sb->s_active too: I'm unsure
    whether it was correct or not; but burnt once by igrab(), I am sure that
    we don't want to rely more deeply upon externals here.
    
    Fix it by adding the inode to shmem_swaplist earlier, while the page lock
    on page in page cache still secures the inode against eviction, without
    artifically raising i_count.  It was originally added later because
    shmem_unuse_inode() is liable to remove an inode from the list while it's
    unswapped; but we can guard against that by taking spinlock before
    dropping mutex.
    
    Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Tested-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
    Cc: <stable@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b1dea800