Skip to content
  • Qu Wenruo's avatar
    btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() · 665d4953
    Qu Wenruo authored
    In commit ac0b4145 ("btrfs: scrub: Don't use inode pages for device
    replace") we removed the branch of copy_nocow_pages() to avoid
    corruption for compressed nodatasum extents.
    
    However above commit only solves the problem in scrub_extent(), if
    during scrub_pages() we failed to read some pages,
    sctx->no_io_error_seen will be non-zero and we go to fixup function
    scrub_handle_errored_block().
    
    In scrub_handle_errored_block(), for sctx without csum (no matter if
    we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
    which does the similar thing with copy_nocow_pages(), but does it
    without the extra check in copy_nocow_pages() routine.
    
    So for test cases like btrfs/100, where we emulate read errors during
    replace/scrub, we could corrupt compressed extent data again.
    
    This patch will fix it just by avoiding any "optimization" for
    nodatasum, just falls back to the normal fixup routine by try read from
    any good copy.
    
    This also solves WARN_ON() or dead lock caused by lame backref iteration
    in scrub_fixup_nodatasum() routine.
    
    The deadlock or WARN_ON() won't be triggered before commit ac0b4145
    ("btrfs: scrub: Don't use inode pages for device replace") since
    copy_nocow_pages() have better locking and extra check for data extent,
    and it's already doing the fixup work by try to read data from any good
    copy, so it won't go scrub_fixup_nodatasum() anyway.
    
    This patch disables the faulty code and will be removed completely in a
    followup patch.
    
    Fixes: ac0b4145
    
     ("btrfs: scrub: Don't use inode pages for device replace")
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    665d4953