1. 22 Sep, 2016 2 commits
  2. 16 Sep, 2016 1 commit
    • Mike Galbraith's avatar
      reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() · 420902c9
      Mike Galbraith authored
      If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can
      deadlock our own worker - mount blocks kworker/3:2, sleeps forever more.
      
      crash> ps|grep UN
          715      2   3  ffff880220734d30  UN   0.0       0      0  [kworker/3:2]
         9369   9341   2  ffff88021ffb7560  UN   1.3  493404 123184  Xorg
         9665   9664   3  ffff880225b92ab0  UN   0.0   47368    812  udisks-daemon
        10635  10403   3  ffff880222f22c70  UN   0.0   14904    936  mount
      crash> bt ffff880220734d30
      PID: 715    TASK: ffff880220734d30  CPU: 3   COMMAND: "kworker/3:2"
       #0 [ffff8802244c3c20] schedule at ffffffff8144584b
       #1 [ffff8802244c3cc8] __rt_mutex_slowlock at ffffffff814472b3
       #2 [ffff8802244c3d28] rt_mutex_slowlock at ffffffff814473f5
       #3 [ffff8802244c3dc8] reiserfs_write_lock at ffffffffa05f28fd [reiserfs]
       #4 [ffff8802244c3de8] flush_async_commits at ffffffffa05ec91d [reiserfs]
       #5 [ffff8802244c3e08] process_one_work at ffffffff81073726
       #6 [ffff8802244c3e68] worker_thread at ffffffff81073eba
       #7 [ffff8802244c3ec8] kthread at ffffffff810782e0
       #8 [ffff8802244c3f48] kernel_thread_helper at ffffffff81450064
      crash> rd ffff8802244c3cc8 10
      ffff8802244c3cc8:  ffffffff814472b3 ffff880222f23250   .rD.....P2."....
      ffff8802244c3cd8:  0000000000000000 0000000000000286   ................
      ffff8802244c3ce8:  ffff8802244c3d30 ffff880220734d80   0=L$.....Ms ....
      ffff8802244c3cf8:  ffff880222e8f628 0000000000000000   (.."............
      ffff8802244c3d08:  0000000000000000 0000000000000002   ................
      crash> struct rt_mutex ffff880222e8f628
      struct rt_mutex {
        wait_lock = {
          raw_lock = {
            slock = 65537
          }
        },
        wait_list = {
          node_list = {
            next = 0xffff8802244c3d48,
            prev = 0xffff8802244c3d48
          }
        },
        owner = 0xffff880222f22c71,
        save_state = 0
      }
      crash> bt 0xffff880222f22c70
      PID: 10635  TASK: ffff880222f22c70  CPU: 3   COMMAND: "mount"
       #0 [ffff8802216a9868] schedule at ffffffff8144584b
       #1 [ffff8802216a9910] schedule_timeout at ffffffff81446865
       #2 [ffff8802216a99a0] wait_for_common at ffffffff81445f74
       #3 [ffff8802216a9a30] flush_work at ffffffff810712d3
       #4 [ffff8802216a9ab0] schedule_on_each_cpu at ffffffff81074463
       #5 [ffff8802216a9ae0] invalidate_bdev at ffffffff81178aba
       #6 [ffff8802216a9af0] vfs_load_quota_inode at ffffffff811a3632
       #7 [ffff8802216a9b50] dquot_quota_on_mount at ffffffff811a375c
       #8 [ffff8802216a9b80] finish_unfinished at ffffffffa05dd8b0 [reiserfs]
       #9 [ffff8802216a9cc0] reiserfs_fill_super at ffffffffa05de825 [reiserfs]
          RIP: 00007f7b9303997a  RSP: 00007ffff443c7a8  RFLAGS: 00010202
          RAX: 00000000000000a5  RBX: ffffffff8144ef12  RCX: 00007f7b932e9ee0
          RDX: 00007f7b93d9a400  RSI: 00007f7b93d9a3e0  RDI: 00007f7b93d9a3c0
          RBP: 00007f7b93d9a2c0   R8: 00007f7b93d9a550   R9: 0000000000000001
          R10: ffffffffc0ed040e  R11: 0000000000000202  R12: 000000000000040e
          R13: 0000000000000000  R14: 00000000c0ed040e  R15: 00007ffff443ca20
          ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarMike Galbraith <mgalbraith@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      420902c9
  3. 02 Aug, 2016 1 commit
  4. 20 Jul, 2016 1 commit
  5. 07 Jun, 2016 2 commits
  6. 27 May, 2016 1 commit
  7. 25 May, 2016 1 commit
    • Mikulas Patocka's avatar
      reiserfs: check kstrdup failure · b9d8905e
      Mikulas Patocka authored
      Check out-of-memory failure of the kstrdup option. Note that the argument
      "arg" may be NULL (in that case kstrup returns NULL), so out of memory
      condition happened if arg was non-NULL and kstrdup returned NULL.
      
      The patch also changes the call to replace_mount_options - if we didn't
      pass any filesystem-specific options, we don't call replace_mount_options
      (thus we don't erase existing reported options).
      
      Note that to properly report options after remount, the reiserfs
      filesystem should implement the show_options method. Without the
      show_options method, options changed with remount replace existing
      options.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b9d8905e
  8. 21 May, 2016 1 commit
  9. 09 May, 2016 1 commit
  10. 02 May, 2016 1 commit
  11. 01 May, 2016 1 commit
  12. 11 Apr, 2016 1 commit
  13. 10 Apr, 2016 1 commit
    • Al Viro's avatar
      reiserfs: switch to generic_{get,set,remove}xattr() · 79a628d1
      Al Viro authored
      reiserfs_xattr_[sg]et() will fail with -EOPNOTSUPP for V1 inodes anyway,
      and all reiserfs instances of ->[sg]et() call it and so does ->set_acl().
      
      Checks for name length in the instances had been bogus; they should've
      been "bugger off if it's _exactly_ the prefix" (as generic would
      do on its own) and not "bugger off if it's shorter than the prefix" -
      that can't happen.
      
      xattr_full_name() is needed to adjust for the fact that generic instances
      will skip the prefix in the name passed to ->[gs]et(); reiserfs homegrown
      analogues didn't.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      79a628d1
  14. 04 Apr, 2016 2 commits
    • Kirill A. Shutemov's avatar
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov authored
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  15. 31 Mar, 2016 1 commit
    • Andreas Gruenbacher's avatar
      posix_acl: Inode acl caching fixes · b8a7a3a6
      Andreas Gruenbacher authored
      When get_acl() is called for an inode whose ACL is not cached yet, the
      get_acl inode operation is called to fetch the ACL from the filesystem.
      The inode operation is responsible for updating the cached acl with
      set_cached_acl().  This is done without locking at the VFS level, so
      another task can call set_cached_acl() or forget_cached_acl() before the
      get_acl inode operation gets to calling set_cached_acl(), and then
      get_acl's call to set_cached_acl() results in caching an outdate ACL.
      
      Prevent this from happening by setting the cached ACL pointer to a
      task-specific sentinel value before calling the get_acl inode operation.
      Move the responsibility for updating the cached ACL from the get_acl
      inode operations to get_acl().  There, only set the cached ACL if the
      sentinel value hasn't changed.
      
      The sentinel values are chosen to have odd values.  Likewise, the value
      of ACL_NOT_CACHED is odd.  In contrast, ACL object pointers always have
      an even value (ACLs are aligned in memory).  This allows to distinguish
      uncached ACLs values from ACL objects.
      
      In addition, switch from guarding inode->i_acl and inode->i_default_acl
      upates by the inode->i_lock spinlock to using xchg() and cmpxchg().
      
      Filesystems that do not want ACLs returned from their get_acl inode
      operations to be cached must call forget_cached_acl() to prevent the VFS
      from doing so.
      
      (Patch written by Al Viro and Andreas Gruenbacher.)
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b8a7a3a6
  16. 28 Mar, 2016 1 commit
  17. 09 Feb, 2016 1 commit
  18. 22 Jan, 2016 2 commits
  19. 15 Jan, 2016 1 commit
    • Vladimir Davydov's avatar
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov authored
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  20. 06 Jan, 2016 1 commit
  21. 30 Dec, 2015 1 commit
  22. 14 Dec, 2015 1 commit
  23. 09 Dec, 2015 2 commits
    • Al Viro's avatar
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro authored
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6b255391
    • Al Viro's avatar
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro authored
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      21fc61c7
  24. 07 Dec, 2015 2 commits
  25. 14 Nov, 2015 1 commit
  26. 09 Nov, 2015 1 commit
  27. 04 Sep, 2015 1 commit
    • Kees Cook's avatar
      fs: create and use seq_show_option for escaping · a068acf2
      Kees Cook authored
      Many file systems that implement the show_options hook fail to correctly
      escape their output which could lead to unescaped characters (e.g.  new
      lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
      could lead to confusion, spoofed entries (resulting in things like
      systemd issuing false d-bus "mount" notifications), and who knows what
      else.  This looks like it would only be the root user stepping on
      themselves, but it's possible weird things could happen in containers or
      in other situations with delegated mount privileges.
      
      Here's an example using overlay with setuid fusermount trusting the
      contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
      of "sudo" is something more sneaky:
      
        $ BASE="ovl"
        $ MNT="$BASE/mnt"
        $ LOW="$BASE/lower"
        $ UP="$BASE/upper"
        $ WORK="$BASE/work/ 0 0
        none /proc fuse.pwn user_id=1000"
        $ mkdir -p "$LOW" "$UP" "$WORK"
        $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
        $ cat /proc/mounts
        none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
        none /proc fuse.pwn user_id=1000 0 0
        $ fusermount -u /proc
        $ cat /proc/mounts
        cat: /proc/mounts: No such file or directory
      
      This fixes the problem by adding new seq_show_option and
      seq_show_option_n helpers, and updating the vulnerable show_option
      handlers to use them as needed.  Some, like SELinux, need to be open
      coded due to unusual existing escape mechanisms.
      
      [akpm@linux-foundation.org: add lost chunk, per Kees]
      [keescook@chromium.org: seq_show_option should be using const parameters]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: default avatarJan Kara <jack@suse.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Cc: J. R. Okajima <hooanon05g@gmail.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a068acf2
  28. 23 Jul, 2015 1 commit
  29. 26 Jun, 2015 1 commit
  30. 02 Jun, 2015 1 commit
    • Tejun Heo's avatar
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo authored
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      66114cad
  31. 15 Apr, 2015 1 commit
  32. 12 Apr, 2015 3 commits