Skip to content
  • Eryu Guan's avatar
    direct-io: fix direct write stale data exposure from concurrent buffered read · 9ecd10b7
    Eryu Guan authored
    Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are
    not allowed to allocate blocks(get_more_blocks() sets 'create' to 0
    before calling get_block() callback), if it's a sparse file, direct
    writes fall back to buffered writes to avoid stale data exposure from
    concurrent buffered read.  But there're two cases that can result in
    stale data exposure are not correctly detected.
    
    1. The detection for "writing inside i_size" is not sufficient,
       writes can be treated as "extending writes" wrongly.  For example,
       direct write 1FSB (file system block) to a 1FSB sparse file on
       ext2/3/4, starting from offset 0, in this case it's writing inside
       i_size, but 'create' is non-zero, because 'block_in_file' and
       '(i_size_read(inode) >> blkbits' are both zero.
    
    2. Direct writes starting from or beyong i_size (not inside i_size)
       also could trigger block allocation and expose stale data.  For
       example, consider a sparse file with i_size of 2k, and a write to
       offset 2k or 3k into the file, with a filesystem block size of 4k.
       (Thanks to Jeff Moyer for pointing this case out in his review.)
    
    The first problem can be demostrated by running ltp-aiodio test ADSP045
    many times.  When testing on extN filesystems, I see test failures
    occasionally, buffered read could read non-zero (stale) data.
    
    ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1
    
    dio_sparse    0  TINFO  :  Dirtying free blocks
    dio_sparse    0  TINFO  :  Starting I/O tests
    non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa
    non-zero read at offset 0
    dio_sparse    0  TINFO  :  Killing childrens(s)
    dio_sparse    1  TFAIL  :  dio_sparse.c:191: 1 children(s) exited abnormally
    
    The second problem can also be reproduced easily by a hacked dio_sparse
    program, which accepts an option to specify the write offset.
    
    What we should really do is to disable block allocation for writes that
    could result in filling holes inside i_size.
    
    Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@gmail.com
    
    
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9ecd10b7