Skip to content
  • Ross Zwisler's avatar
    dax: fix radix tree insertion race · e11f8b7b
    Ross Zwisler authored
    While running generic/340 in my test setup I hit the following race.  It
    can happen with kernels that support FS DAX PMDs, so v4.10 thru
    v4.11-rc5.
    
    Thread 1				Thread 2
    --------				--------
    dax_iomap_pmd_fault()
      grab_mapping_entry()
        spin_lock_irq()
        get_unlocked_mapping_entry()
        'entry' is NULL, can't call lock_slot()
        spin_unlock_irq()
        radix_tree_preload()
    					dax_iomap_pmd_fault()
    					  grab_mapping_entry()
    					    spin_lock_irq()
    					    get_unlocked_mapping_entry()
    					    ...
    					    lock_slot()
    					    spin_unlock_irq()
    					  dax_pmd_insert_mapping()
    					    <inserts a PMD mapping>
        spin_lock_irq()
        __radix_tree_insert() fails with -EEXIST
        <fall back to 4k fault, and die horribly
         when inserting a 4k entry where a PMD exists>
    
    The issue is that we have to drop mapping->tree_lock while calling
    radix_tree_preload(), but since we didn't have a radix tree entry to
    lock (unlike in the pmd_downgrade case) we have no protection against
    Thread 2 coming along and inserting a PMD at the same index.  For 4k
    entries we handled this with a special-case response to -EEXIST coming
    from the __radix_tree_insert(), but this doesn't save us for PMDs
    because the -EEXIST case can also mean that we collided with a 4k entry
    in the radix tree at a different index, but one that is covered by our
    PMD range.
    
    So, correctly handle both the 4k and 2M collision cases by explicitly
    re-checking the radix tree for an entry at our index once we reacquire
    mapping->tree_lock.
    
    This patch has made it through a clean xfstests run with the current
    v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
    loop.  It used to fail within the first 10 iterations.
    
    Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com
    
    
    Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
    Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Matthew Wilcox <mawilcox@microsoft.com>
    Cc: <stable@vger.kernel.org>    [4.10+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e11f8b7b