1. 17 Jan, 2019 5 commits
    • Colin Ian King's avatar
      cifs: fix memory leak of an allocated cifs_ntsd structure · cb891e7a
      Colin Ian King authored
      The call to SMB2_queary_acl can allocate memory to pntsd and also
      return a failure via a call to SMB2_query_acl (and then query_info).
      This occurs when query_info allocates the structure and then in
      query_info the call to smb2_validate_and_copy_iov fails. Currently the
      failure just returns without kfree'ing pntsd hence causing a memory
      leak.
      
      Currently, *data is allocated if it's not already pointing to a buffer,
      so it needs to be kfree'd only if was allocated in query_info, so the
      fix adds an allocated flag to track this.  Also set *dlen to zero on
      an error just to be safe since *data is kfree'd.
      
      Also set errno to -ENOMEM if the allocation of *data fails.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: default avatarDan Carpener <dan.carpenter@oracle.com>
      cb891e7a
    • David Howells's avatar
      afs: Fix race in async call refcounting · 34fa4761
      David Howells authored
      There's a race between afs_make_call() and afs_wake_up_async_call() in the
      case that an error is returned from rxrpc_kernel_send_data() after it has
      queued the final packet.
      
      afs_make_call() will try and clean up the mess, but the call state may have
      been moved on thereby causing afs_process_async_call() to also try and to
      delete the call.
      
      Fix this by:
      
       (1) Getting an extra ref for an asynchronous call for the call itself to
           hold.  This makes sure the call doesn't evaporate on us accidentally
           and will allow the call to be retained by the caller in a future
           patch.  The ref is released on leaving afs_make_call() or
           afs_wait_for_call_to_complete().
      
       (2) In the event of an error from rxrpc_kernel_send_data():
      
           (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the
           	 call has been aborted and ended.  This prevents
           	 afs_deliver_to_call() from doing anything with any notifications
           	 it gets.
      
           (b) Explicitly end the call immediately to prevent further callbacks.
      
           (c) Cancel any queued async_work and wait for the work if it's
           	 executing.  This allows us to be sure the race won't recur when we
           	 change the state.  We put the work queue's ref on the call if we
           	 managed to cancel it.
      
           (d) Put the call's ref that we got in (1).  This belongs to us as long
           	 as the call is in state AFS_CALL_CL_REQUESTING.
      
      Fixes: 341f741f ("afs: Refcount the afs_call struct")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      34fa4761
    • David Howells's avatar
      afs: Provide a function to get a ref on a call · 7a75b007
      David Howells authored
      Provide a function to get a reference on an afs_call struct.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7a75b007
    • David Howells's avatar
      afs: Fix key refcounting in file locking code · 59d49076
      David Howells authored
      Fix the refcounting of the authentication keys in the file locking code.
      The vnode->lock_key member points to a key on which it expects to be
      holding a ref, but it isn't always given an extra ref, however.
      
      Fixes: 0fafdc9f ("afs: Fix file locking")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      59d49076
    • Marc Dionne's avatar
      afs: Don't set vnode->cb_s_break in afs_validate() · 4882a27c
      Marc Dionne authored
      A cb_interest record is not necessarily attached to the vnode on entry to
      afs_validate(), which can cause an oops when we try to bring the vnode's
      cb_s_break up to date in the default case (ie. no current callback promise
      and the vnode has not been deleted).
      
      Fix this by simply removing the line, as vnode->cb_s_break will be set when
      needed by afs_register_server_cb_interest() when we next get a callback
      promise from RPC call.
      
      The oops looks something like:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
          ...
          RIP: 0010:afs_validate+0x66/0x250 [kafs]
          ...
          Call Trace:
           afs_d_revalidate+0x8d/0x340 [kafs]
           ? __d_lookup+0x61/0x150
           lookup_dcache+0x44/0x70
           ? lookup_dcache+0x44/0x70
           __lookup_hash+0x24/0xa0
           do_unlinkat+0x11d/0x2c0
           __x64_sys_unlink+0x23/0x30
           do_syscall_64+0x4d/0xf0
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: ae3b7361 ("afs: Fix validation/callback interaction")
      Signed-off-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4882a27c
  2. 16 Jan, 2019 21 commits
  3. 15 Jan, 2019 4 commits
    • Gustavo A. R. Silva's avatar
      adfs: mark expected switch fall-throughs · 154f7390
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling
      -Wimplicit-fallthrough
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      154f7390
    • Gustavo A. R. Silva's avatar
      afs: Mark expected switch fall-throughs · b564edf0
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Notice that in many cases I placed a /* Fall through */ comment
      at the bottom of the case, which what GCC is expecting to find.
      
      In other cases I had to tweak a bit the format of the comments.
      
      This patch suppresses ALL missing-break-in-switch false positives
      in fs/afs
      
      Addresses-Coverity-ID: 115042 ("Missing break in switch")
      Addresses-Coverity-ID: 115043 ("Missing break in switch")
      Addresses-Coverity-ID: 115045 ("Missing break in switch")
      Addresses-Coverity-ID: 1357430 ("Missing break in switch")
      Addresses-Coverity-ID: 115047 ("Missing break in switch")
      Addresses-Coverity-ID: 115050 ("Missing break in switch")
      Addresses-Coverity-ID: 115051 ("Missing break in switch")
      Addresses-Coverity-ID: 1467806 ("Missing break in switch")
      Addresses-Coverity-ID: 1467807 ("Missing break in switch")
      Addresses-Coverity-ID: 1467811 ("Missing break in switch")
      Addresses-Coverity-ID: 115041 ("Missing break in switch")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      b564edf0
    • Olga Kornievskaia's avatar
      NFSv4.2 fix unnecessary retry in nfs4_copy_file_range · 45ac486e
      Olga Kornievskaia authored
      Currently nfs42_proc_copy_file_range() can not return EAGAIN.
      
      Fixes: e4648aa4 ("NFS recover from destination server reboot for copies")
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      45ac486e
    • Vivek Goyal's avatar
      overlayfs: During copy up, first copy up data and then xattrs · 2424e1c7
      Vivek Goyal authored
      If a file with capability set (and hence security.capability xattr) is
      written kernel clears security.capability xattr. For overlay, during file
      copy up if xattrs are copied up first and then data is, copied up. This
      means data copy up will result in clearing of security.capability xattr
      file on lower has. And this can result into surprises. If a lower file has
      CAP_SETUID, then it should not be cleared over copy up (if nothing was
      actually written to file).
      
      This also creates problems with chown logic where it first copies up file
      and then tries to clear setuid bit. But by that time security.capability
      xattr is already gone (due to data copy up), and caller gets -ENODATA.
      This has been reported by Giuseppe here.
      
      https://github.com/containers/libpod/issues/2015#issuecomment-447824842
      
      Fix this by copying up data first and then metadta. This is a regression
      which has been introduced by my commit as part of metadata only copy up
      patches.
      
      TODO: There will be some corner cases where a file is copied up metadata
      only and later data copy up happens and that will clear security.capability
      xattr. Something needs to be done about that too.
      
      Fixes: bd64e575 ("ovl: During copy up, first copy up metadata and then data")
      Cc: <stable@vger.kernel.org> # v4.19+
      Reported-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2424e1c7
  4. 14 Jan, 2019 8 commits
    • Josef Bacik's avatar
      btrfs: replace cleaner_delayed_iput_mutex with a waitqueue · 83d4056d
      Josef Bacik authored
      The throttle path doesn't take cleaner_delayed_iput_mutex, which means
      we could think we're done flushing iputs in the data space reservation
      path when we could have a throttler doing an iput.  There's no real
      reason to serialize the delayed iput flushing, so instead of taking the
      cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
      replace it with an atomic counter and a waitqueue.  This removes the
      short (or long depending on how big the inode is) window where we think
      there are no more pending iputs when there really are some.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      83d4056d
    • Josef Bacik's avatar
      btrfs: wakeup cleaner thread when adding delayed iput · 95b63865
      Josef Bacik authored
      The cleaner thread usually takes care of delayed iputs, with the
      exception of the btrfs_end_transaction_throttle path.  Delaying iputs
      means we are potentially delaying the eviction of an inode and it's
      respective space.  The cleaner thread only gets woken up every 30
      seconds, or when we require space.  If there are a lot of inodes that
      need to be deleted we could induce a serious amount of latency while we
      wait for these inodes to be evicted.  So instead wakeup the cleaner if
      it's not already awake to process any new delayed iputs we add to the
      list.  If we suddenly need space we will less likely be backed up
      behind a bunch of inodes that are waiting to be deleted, and we could
      possibly free space before we need to get into the flushing logic which
      will save us some latency.
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      95b63865
    • Josef Bacik's avatar
      btrfs: run delayed iputs before committing · e336338e
      Josef Bacik authored
      Delayed iputs means we can have final iputs of deleted inodes in the
      queue, which could potentially generate a lot of pinned space that could
      be free'd.  So before we decide to commit the transaction for ENOPSC
      reasons, run the delayed iputs so that any potential space is free'd up.
      If there is and we freed enough we can then commit the transaction and
      potentially be able to make our reservation.
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e336338e
    • Josef Bacik's avatar
      btrfs: wait on ordered extents on abort cleanup · 3926f97e
      Josef Bacik authored
      If we flip read-only before we initiate writeback on all dirty pages for
      ordered extents we've created then we'll have ordered extents left over
      on umount, which results in all sorts of bad things happening.  Fix this
      by making sure we wait on ordered extents if we have to do the aborted
      transaction cleanup stuff.
      
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3926f97e
    • Josef Bacik's avatar
      btrfs: cleanup pending bgs on transaction abort · bbf00526
      Josef Bacik authored
      We may abort the transaction during a commit and not have a chance to
      run the pending bgs stuff, which will leave block groups on our list and
      cause us accounting issues and leaked memory.  Fix this by running the
      pending bgs when we cleanup a transaction.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      bbf00526
    • Josef Bacik's avatar
      btrfs: just delete pending bgs if we are aborted · 037e2716
      Josef Bacik authored
      We still need to do all of the accounting cleanup for pending block
      groups if we abort.  So set the ret to trans->aborted so if we aborted
      the cleanup happens and everybody is happy.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      037e2716
    • Josef Bacik's avatar
      btrfs: handle delayed ref head accounting cleanup in abort · 75330209
      Josef Bacik authored
      We weren't doing any of the accounting cleanup when we aborted
      transactions.  Fix this by making cleanup_ref_head_accounting global and
      calling it from the abort code, this fixes the issue where our
      accounting was all wrong after the fs aborts.
      
      The test generic/475 on a 2G VM can trigger the problems eg.:
      
        [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs]
        [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394
        [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014
        [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs]
        [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206
        [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000
        [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400
        [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000
        [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108
        [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100
        [ 8502.170296] FS:  00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000
        [ 8502.172439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0
        [ 8502.175094] Call Trace:
        [ 8502.175759]  close_ctree+0x17f/0x350 [btrfs]
        [ 8502.176721]  generic_shutdown_super+0x64/0x100
        [ 8502.177702]  kill_anon_super+0x14/0x30
        [ 8502.178607]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [ 8502.179602]  deactivate_locked_super+0x29/0x60
        [ 8502.180595]  cleanup_mnt+0x3b/0x70
        [ 8502.181406]  task_work_run+0x98/0xc0
        [ 8502.182255]  exit_to_usermode_loop+0x83/0x90
        [ 8502.183113]  do_syscall_64+0x15b/0x180
        [ 8502.183919]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Corresponding to
      
        release_global_block_rsv() {
        ...
        WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      [ add log dump ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      75330209
    • David Sterba's avatar
      Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io" · 7e18e310
      David Sterba authored
      This reverts commit e73e81b6.
      
      This patch causes a few problems:
      
      - adds latency to btrfs_finish_ordered_io
      - as btrfs_finish_ordered_io is used for free space cache, generating
        more work from btrfs_btree_balance_dirty_nodelay could end up in the
        same workque, effectively deadlocking
      
      12260 kworker/u96:16+btrfs-freespace-write D
      [<0>] balance_dirty_pages+0x6e6/0x7ad
      [<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
      [<0>] btrfs_finish_ordered_io+0x3da/0x770
      [<0>] normal_work_helper+0x1c5/0x5a0
      [<0>] process_one_work+0x1ee/0x5a0
      [<0>] worker_thread+0x46/0x3d0
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      Transaction commit will wait on the freespace cache:
      
      838 btrfs-transacti D
      [<0>] btrfs_start_ordered_extent+0x154/0x1e0
      [<0>] btrfs_wait_ordered_range+0xbd/0x110
      [<0>] __btrfs_wait_cache_io+0x49/0x1a0
      [<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
      [<0>] commit_cowonly_roots+0x215/0x2b0
      [<0>] btrfs_commit_transaction+0x37e/0x910
      [<0>] transaction_kthread+0x14d/0x180
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      And then writepages ends up waiting on transaction commit:
      
      9520 kworker/u96:13+flush-btrfs-1 D
      [<0>] wait_current_trans+0xac/0xe0
      [<0>] start_transaction+0x21b/0x4b0
      [<0>] cow_file_range_inline+0x10b/0x6b0
      [<0>] cow_file_range.isra.69+0x329/0x4a0
      [<0>] run_delalloc_range+0x105/0x3c0
      [<0>] writepage_delalloc+0x119/0x180
      [<0>] __extent_writepage+0x10c/0x390
      [<0>] extent_write_cache_pages+0x26f/0x3d0
      [<0>] extent_writepages+0x4f/0x80
      [<0>] do_writepages+0x17/0x60
      [<0>] __writeback_single_inode+0x59/0x690
      [<0>] writeback_sb_inodes+0x291/0x4e0
      [<0>] __writeback_inodes_wb+0x87/0xb0
      [<0>] wb_writeback+0x3bb/0x500
      [<0>] wb_workfn+0x40d/0x610
      [<0>] process_one_work+0x1ee/0x5a0
      [<0>] worker_thread+0x1e0/0x3d0
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      Eventually, we have every process in the system waiting on
      balance_dirty_pages(), and nobody is able to make progress on page
      writeback.
      
      The original patch tried to fix an OOM condition, that happened on 4.4 but no
      success reproducing that on later kernels (4.19 and 4.20). This is more likely
      a problem in OOM itself.
      
      Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/Reported-by: default avatarChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org # 4.18+
      CC: ethanlien <ethanlien@synology.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7e18e310
  5. 11 Jan, 2019 2 commits