1. 28 Dec, 2018 2 commits
  2. 18 Jul, 2018 1 commit
    • Miklos Szeredi's avatar
      vfs: make open_with_fake_path() not contribute to nr_files · d3b1084d
      Miklos Szeredi authored
      Stacking file operations in overlay will store an extra open file for each
      overlay file opened.
      The overhead is just that of "struct file" which is about 256bytes, because
      overlay already pins an extra dentry and inode when the file is open, which
      add up to a much larger overhead.
      For fear of breaking working setups, don't start accounting the extra file.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
  3. 12 Jul, 2018 8 commits
  4. 11 Jul, 2018 1 commit
  5. 07 Dec, 2017 1 commit
  6. 16 Nov, 2017 1 commit
    • Shakeel Butt's avatar
      fs, mm: account filp cache to kmemcg · f3f7c093
      Shakeel Butt authored
      The allocations from filp cache can be directly triggered by userspace
      applications.  A buggy application can consume a significant amount of
      unaccounted system memory.  Though we have not noticed such buggy
      applications in our production but upon close inspection, we found that
      a lot of machines spend very significant amount of memory on these
      One way to limit allocations from filp cache is to set system level
      limit of maximum number of open files.  However this limit is shared
      between different users on the system and one user can hog this
      resource.  To cater that, we can charge filp to kmemcg and set the
      maximum limit very high and let the memory limit of each user limit the
      number of files they can open and indirectly limiting their allocations
      from filp cache.
      One side effect of this change is that it will allow _sysctl() to return
      ENOMEM and the man page of _sysctl() does not specify that.  However the
      man page also discourages to use _sysctl() at all.
      Link: http://lkml.kernel.org/r/20171011190359.34926-1-shakeelb@google.comSigned-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 08 Nov, 2017 1 commit
  8. 28 Aug, 2017 1 commit
  9. 06 Jul, 2017 1 commit
    • Jeff Layton's avatar
      fs: new infrastructure for writeback error handling and reporting · 5660e13d
      Jeff Layton authored
      Most filesystems currently use mapping_set_error and
      filemap_check_errors for setting and reporting/clearing writeback errors
      at the mapping level. filemap_check_errors is indirectly called from
      most of the filemap_fdatawait_* functions and from
      filemap_write_and_wait*. These functions are called from all sorts of
      contexts to wait on writeback to finish -- e.g. mostly in fsync, but
      also in truncate calls, getattr, etc.
      The non-fsync callers are problematic. We should be reporting writeback
      errors during fsync, but many places spread over the tree clear out
      errors before they can be properly reported, or report errors at
      nonsensical times.
      If I get -EIO on a stat() call, there is no reason for me to assume that
      it is because some previous writeback failed. The fact that it also
      clears out the error such that a subsequent fsync returns 0 is a bug,
      and a nasty one since that's potentially silent data corruption.
      This patch adds a small bit of new infrastructure for setting and
      reporting errors during address_space writeback. While the above was my
      original impetus for adding this, I think it's also the case that
      current fsync semantics are just problematic for userland. Most
      applications that call fsync do so to ensure that the data they wrote
      has hit the backing store.
      In the case where there are multiple writers to the file at the same
      time, this is really hard to determine. The first one to call fsync will
      see any stored error, and the rest get back 0. The processes with open
      fds may not be associated with one another in any way. They could even
      be in different containers, so ensuring coordination between all fsync
      callers is not really an option.
      One way to remedy this would be to track what file descriptor was used
      to dirty the file, but that's rather cumbersome and would likely be
      slow. However, there is a simpler way to improve the semantics here
      without incurring too much overhead.
      This set adds an errseq_t to struct address_space, and a corresponding
      one is added to struct file. Writeback errors are recorded in the
      mapping's errseq_t, and the one in struct file is used as the "since"
      This changes the semantics of the Linux fsync implementation such that
      applications can now use it to determine whether there were any
      writeback errors since fsync(fd) was last called (or since the file was
      opened in the case of fsync having never been called).
      Note that those writeback errors may have occurred when writing data
      that was dirtied via an entirely different fd, but that's the case now
      with the current mapping_set_error/filemap_check_error infrastructure.
      This will at least prevent you from getting a false report of success.
      The new behavior is still consistent with the POSIX spec, and is more
      reliable for application developers. This patch just adds some basic
      infrastructure for doing this, and ensures that the f_wb_err "cursor"
      is properly set when a file is opened. Later patches will change the
      existing code to use this new infrastructure for reporting errors at
      fsync time.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
  10. 02 Mar, 2017 1 commit
  11. 06 Dec, 2016 1 commit
  12. 07 Aug, 2015 1 commit
    • Mel Gorman's avatar
      fs, file table: reinit files_stat.max_files after deferred memory initialisation · 4248b0da
      Mel Gorman authored
      Dave Hansen reported the following;
      	My laptop has been behaving strangely with 4.2-rc2.  Once I log
      	in to my X session, I start getting all kinds of strange errors
      	from applications and see this in my dmesg:
              	VFS: file-max limit 8192 reached
      The problem is that the file-max is calculated before memory is fully
      initialised and miscalculates how much memory the kernel is using.  This
      patch recalculates file-max after deferred memory initialisation.  Note
      that using memory hotplug infrastructure would not have avoided this
      problem as the value is not recalculated after memory hot-add.
      4.1:             files_stat.max_files = 6582781
      4.2-rc2:         files_stat.max_files = 8192
      4.2-rc2 patched: files_stat.max_files = 6562467
      Small differences with the patch applied and 4.1 but not enough to matter.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Alex Ng <alexng@microsoft.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  13. 23 Jun, 2015 1 commit
  14. 12 Apr, 2015 1 commit
  15. 12 Oct, 2014 1 commit
  16. 08 Sep, 2014 1 commit
    • Tejun Heo's avatar
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo authored
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
  17. 06 Jun, 2014 1 commit
  18. 06 May, 2014 2 commits
    • Al Viro's avatar
      new methods: ->read_iter() and ->write_iter() · 293bc982
      Al Viro authored
      Beginning to introduce those.  Just the callers for now, and it's
      clumsier than it'll eventually become; once we finish converting
      aio_read and aio_write instances, the things will get nicer.
      For now, these guys are in parallel to ->aio_read() and ->aio_write();
      they take iocb and iov_iter, with everything in iov_iter already
      validated.  File offset is passed in iocb->ki_pos, iov/nr_segs -
      in iov_iter.
      Main concerns in that series are stack footprint and ability to
      split the damn thing cleanly.
      [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Al Viro's avatar
      replace checking for ->read/->aio_read presence with check in ->f_mode · 7f7f25e8
      Al Viro authored
      Since we are about to introduce new methods (read_iter/write_iter), the
      tests in a bunch of places would have to grow inconveniently.  Check
      once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
      and FMODE_CAN_WRITE resp.  It might end up being a temporary measure -
      once everything switches from ->aio_{read,write} to ->{read,write}_iter
      it might make sense to return to open-coded checks.  We'll see...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  19. 02 Apr, 2014 4 commits
  20. 31 Mar, 2014 1 commit
  21. 10 Mar, 2014 1 commit
    • Linus Torvalds's avatar
      vfs: atomic f_pos accesses as per POSIX · 9c225f26
      Linus Torvalds authored
      Our write() system call has always been atomic in the sense that you get
      the expected thread-safe contiguous write, but we haven't actually
      guaranteed that concurrent writes are serialized wrt f_pos accesses, so
      threads (or processes) that share a file descriptor and use "write()"
      concurrently would quite likely overwrite each others data.
      This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:
       "2.9.7 Thread Interactions with Regular File Operations
        All of the following functions shall be atomic with respect to each
        other in the effects specified in POSIX.1-2008 when they operate on
        regular files or symbolic links: [...]"
      and one of the effects is the file position update.
      This unprotected file position behavior is not new behavior, and nobody
      has ever cared.  Until now.  Yongzhi Pan reported unexpected behavior to
      Michael Kerrisk that was due to this.
      This resolves the issue with a f_pos-specific lock that is taken by
      read/write/lseek on file descriptors that may be shared across threads
      or processes.
      Reported-by: default avatarYongzhi Pan <panyongzhi@gmail.com>
      Reported-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  22. 09 Nov, 2013 1 commit
  23. 25 Oct, 2013 1 commit
  24. 20 Oct, 2013 1 commit
    • Al Viro's avatar
      nfsd regression since delayed fput() · c7314d74
      Al Viro authored
      Background: nfsd v[23] had throughput regression since delayed fput
      went in; every read or write ends up doing fput() and we get a pair
      of extra context switches out of that (plus quite a bit of work
      in queue_work itselfi, apparently).  Use of schedule_delayed_work()
      gives it a chance to accumulate a bit before we do __fput() on all
      of them.  I'm not too happy about that solution, but... on at least
      one real-world setup it reverts about 10% throughput loss we got from
      switch to delayed fput.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  25. 11 Sep, 2013 1 commit
  26. 04 Sep, 2013 1 commit
  27. 13 Jul, 2013 2 commits