1. 17 May, 2017 1 commit
  2. 14 May, 2017 1 commit
    • Dan Williams's avatar
      dax, xfs, ext4: compile out iomap-dax paths in the FS_DAX=n case · f5705aa8
      Dan Williams authored
      Tetsuo reports:
      
        fs/built-in.o: In function `xfs_file_iomap_end':
        xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax'
        fs/built-in.o: In function `xfs_file_iomap_begin':
        xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host'
        make: *** [vmlinux] Error 1
        $ grep DAX .config
        CONFIG_DAX=m
        # CONFIG_DEV_DAX is not set
        # CONFIG_FS_DAX is not set
      
      When FS_DAX=n we can/must throw away the dax code in filesystems.
      Implement 'fs_' versions of dax_get_by_host() and put_dax() that are
      nops in the FS_DAX=n case.
      
      Cc: <linux-xfs@vger.kernel.org>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Tested-by: default avatarTony Luck <tony.luck@intel.com>
      Fixes: ef510424 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
      Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      f5705aa8
  3. 13 May, 2017 4 commits
  4. 12 May, 2017 6 commits
    • Ross Zwisler's avatar
      dax: fix PMD data corruption when fault races with write · 876f2946
      Ross Zwisler authored
      This is based on a patch from Jan Kara that fixed the equivalent race in
      the DAX PTE fault path.
      
      Currently DAX PMD read fault can race with write(2) in the following
      way:
      
      CPU1 - write(2)                 CPU2 - read fault
                                      dax_iomap_pmd_fault()
                                        ->iomap_begin() - sees hole
      
      dax_iomap_rw()
        iomap_apply()
          ->iomap_begin - allocates blocks
          dax_iomap_actor()
            invalidate_inode_pages2_range()
              - there's nothing to invalidate
      
                                        grab_mapping_entry()
      				  - we add huge zero page to the radix tree
      				    and map it to page tables
      
      The result is that hole page is mapped into page tables (and thus zeros
      are seen in mmap) while file has data written in that place.
      
      Fix the problem by locking exception entry before mapping blocks for the
      fault.  That way we are sure invalidate_inode_pages2_range() call for
      racing write will either block on entry lock waiting for the fault to
      finish (and unmap stale page tables after that) or read fault will see
      already allocated blocks by write(2).
      
      Fixes: 9f141d6e ("dax: Call ->iomap_begin without entry lock during dax fault")
      Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      876f2946
    • Jan Kara's avatar
      dax: fix data corruption when fault races with write · 13e451fd
      Jan Kara authored
      Currently DAX read fault can race with write(2) in the following way:
      
      CPU1 - write(2)			CPU2 - read fault
      				dax_iomap_pte_fault()
      				  ->iomap_begin() - sees hole
      dax_iomap_rw()
        iomap_apply()
          ->iomap_begin - allocates blocks
          dax_iomap_actor()
            invalidate_inode_pages2_range()
              - there's nothing to invalidate
      				  grab_mapping_entry()
      				  - we add zero page in the radix tree
      				    and map it to page tables
      
      The result is that hole page is mapped into page tables (and thus zeros
      are seen in mmap) while file has data written in that place.
      
      Fix the problem by locking exception entry before mapping blocks for the
      fault.  That way we are sure invalidate_inode_pages2_range() call for
      racing write will either block on entry lock waiting for the fault to
      finish (and unmap stale page tables after that) or read fault will see
      already allocated blocks by write(2).
      
      Fixes: 9f141d6e
      Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13e451fd
    • Jan Kara's avatar
      ext4: return to starting transaction in ext4_dax_huge_fault() · fb26a1cb
      Jan Kara authored
      DAX will return to locking exceptional entry before mapping blocks for a
      page fault to fix possible races with concurrent writes.  To avoid lock
      inversion between exceptional entry lock and transaction start, start
      the transaction already in ext4_dax_huge_fault().
      
      Fixes: 9f141d6e
      Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb26a1cb
    • Jan Kara's avatar
      mm: fix data corruption due to stale mmap reads · cd656375
      Jan Kara authored
      Currently, we didn't invalidate page tables during invalidate_inode_pages2()
      for DAX.  That could result in e.g. 2MiB zero page being mapped into
      page tables while there were already underlying blocks allocated and
      thus data seen through mmap were different from data seen by read(2).
      The following sequence reproduces the problem:
      
       - open an mmap over a 2MiB hole
      
       - read from a 2MiB hole, faulting in a 2MiB zero page
      
       - write to the hole with write(3p). The write succeeds but we
         incorrectly leave the 2MiB zero page mapping intact.
      
       - via the mmap, read the data that was just written. Since the zero
         page mapping is still intact we read back zeroes instead of the new
         data.
      
      Fix the problem by unconditionally calling invalidate_inode_pages2_range()
      in dax_iomap_actor() for new block allocations and by properly
      invalidating page tables in invalidate_inode_pages2_range() for DAX
      mappings.
      
      Fixes: c6dcf52c
      Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd656375
    • Ross Zwisler's avatar
      dax: prevent invalidation of mapped DAX entries · 4636e70b
      Ross Zwisler authored
      Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
      v4.
      
      This series fixes data corruption that can happen for DAX mounts when
      page faults race with write(2) and as a result page tables get out of
      sync with block mappings in the filesystem and thus data seen through
      mmap is different from data seen through read(2).
      
      The series passes testing with t_mmap_stale test program from Ross and
      also other mmap related tests on DAX filesystem.
      
      This patch (of 4):
      
      dax_invalidate_mapping_entry() currently removes DAX exceptional entries
      only if they are clean and unlocked.  This is done via:
      
        invalidate_mapping_pages()
          invalidate_exceptional_entry()
            dax_invalidate_mapping_entry()
      
      However, for page cache pages removed in invalidate_mapping_pages()
      there is an additional criteria which is that the page must not be
      mapped.  This is noted in the comments above invalidate_mapping_pages()
      and is checked in invalidate_inode_page().
      
      For DAX entries this means that we can can end up in a situation where a
      DAX exceptional entry, either a huge zero page or a regular DAX entry,
      could end up mapped but without an associated radix tree entry.  This is
      inconsistent with the rest of the DAX code and with what happens in the
      page cache case.
      
      We aren't able to unmap the DAX exceptional entry because according to
      its comments invalidate_mapping_pages() isn't allowed to block, and
      unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
      
      Since we essentially never have unmapped DAX entries to evict from the
      radix tree, just remove dax_invalidate_mapping_entry().
      
      Fixes: c6dcf52c ("mm: Invalidate DAX radix tree entries only if appropriate")
      Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.czSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>    [4.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4636e70b
    • Andrew Morton's avatar
      Tigran has moved · cea58224
      Andrew Morton authored
      Cc: Tigran Aivazian <aivazian.tigran@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cea58224
  5. 11 May, 2017 1 commit
  6. 10 May, 2017 4 commits
    • Trond Myklebust's avatar
      nfsd: Fix up the "supattr_exclcreat" attributes · b26b78cb
      Trond Myklebust authored
      If an NFSv4 client asks us for the supattr_exclcreat, then we must
      not return attributes that are unsupported by this minor version.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Fixes: 75976de6 ("NFSD: Return word2 bitmask if setting security..,")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      b26b78cb
    • J. Bruce Fields's avatar
      nfsd: encoders mustn't use unitialized values in error cases · f961e3f2
      J. Bruce Fields authored
      In error cases, lgp->lg_layout_type may be out of bounds; so we
      shouldn't be using it until after the check of nfserr.
      
      This was seen to crash nfsd threads when the server receives a LAYOUTGET
      request with a large layout type.
      
      GETDEVICEINFO has the same problem.
      Reported-by: default avatarAri Kauppi <Ari.Kauppi@synopsys.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      f961e3f2
    • Steve French's avatar
      Don't delay freeing mids when blocked on slow socket write of request · de1892b8
      Steve French authored
      When processing responses, and in particular freeing mids (DeleteMidQEntry),
      which is very important since it also frees the associated buffers (cifs_buf_release),
      we can block a long time if (writes to) socket is slow due to low memory or networking
      issues.
      
      We can block in send (smb request) waiting for memory, and be blocked in processing
      responess (which could free memory if we let it) - since they both grab the
      server->srv_mutex.
      
      In practice, in the DeleteMidQEntry case - there is no reason we need to
      grab the srv_mutex so remove these around DeleteMidQEntry, and it allows
      us to free memory faster.
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Acked-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      de1892b8
    • Rabin Vincent's avatar
      CIFS: silence lockdep splat in cifs_relock_file() · 560d3889
      Rabin Vincent authored
      cifs_relock_file() can perform a down_write() on the inode's lock_sem even
      though it was already performed in cifs_strict_readv().  Lockdep complains
      about this.  AFAICS, there is no problem here, and lockdep just needs to be
      told that this nesting is OK.
      
       =============================================
       [ INFO: possible recursive locking detected ]
       4.11.0+ #20 Not tainted
       ---------------------------------------------
       cat/701 is trying to acquire lock:
        (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00
      
       but task is already holding lock:
        (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&cifsi->lock_sem);
         lock(&cifsi->lock_sem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       1 lock held by cat/701:
        #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
      
       stack backtrace:
       CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
       Call Trace:
        dump_stack+0x85/0xc2
        __lock_acquire+0x17dd/0x2260
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        ? preempt_schedule_irq+0x6b/0x80
        lock_acquire+0xcc/0x260
        ? lock_acquire+0xcc/0x260
        ? cifs_reopen_file+0x7a7/0xc00
        down_read+0x2d/0x70
        ? cifs_reopen_file+0x7a7/0xc00
        cifs_reopen_file+0x7a7/0xc00
        ? printk+0x43/0x4b
        cifs_readpage_worker+0x327/0x8a0
        cifs_readpage+0x8c/0x2a0
        generic_file_read_iter+0x692/0xd00
        cifs_strict_readv+0x29f/0x310
        generic_file_splice_read+0x11c/0x1c0
        do_splice_to+0xa5/0xc0
        splice_direct_to_actor+0xfa/0x350
        ? generic_pipe_buf_nosteal+0x10/0x10
        do_splice_direct+0xb5/0xe0
        do_sendfile+0x278/0x3a0
        SyS_sendfile64+0xc4/0xe0
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      Signed-off-by: default avatarRabin Vincent <rabinv@axis.com>
      Acked-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      560d3889
  7. 09 May, 2017 23 commits