1. 26 Jul, 2011 2 commits
  2. 28 Jun, 2011 2 commits
    • Hugh Dickins's avatar
      tmpfs: add shmem_read_mapping_page_gfp · d9d90e5e
      Hugh Dickins authored
      Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
      is unsuited to tmpfs, because it inserts a page into pagecache before
      calling the filesystem's ->readpage: tmpfs may have pages in swapcache
      which only it knows how to locate and switch to filecache.
      
      At present tmpfs provides a ->readpage method, and copes with this by
      copying pages; but soon we can simplify it by removing its ->readpage.
      Provide shmem_read_mapping_page_gfp() now, ready for that transition,
      
      Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
      with shmem_read_mapping_page() inline for the common mapping_gfp case.
      
      (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
      read_mapping_page functions use the mapping's ->readpage, and the
      read_cache_page functions use the supplied filler, so I think
      read_cache_page_gfp was slightly misnamed.)
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9d90e5e
    • Hugh Dickins's avatar
      tmpfs: take control of its truncate_range · 94c1e62d
      Hugh Dickins authored
      2.6.35's new truncate convention gave tmpfs the opportunity to control
      its file truncation, no longer enforced from outside by vmtruncate().
      We shall want to build upon that, to handle pagecache and swap together.
      
      Slightly redefine the ->truncate_range interface: let it now be called
      between the unmap_mapping_range()s, with the filesystem responsible for
      doing the truncate_inode_pages_range() from it - just as the filesystem
      is nowadays responsible for doing that from its ->setattr.
      
      Let's rename shmem_notify_change() to shmem_setattr().  Instead of
      calling the generic truncate_setsize(), bring that code in so we can
      call shmem_truncate_range() - which will later be updated to perform its
      own variant of truncate_inode_pages_range().
      
      Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
      now that the COW's unmap_mapping_range() comes after ->truncate_range,
      there is no need to call it a third time.
      
      Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
      that i915_gem_object_truncate() can call it explicitly in future; get
      this patch in first, then update drm/i915 once this is available (until
      then, i915 will just be doing the truncate_inode_pages() twice).
      
      Though introduced five years ago, no other filesystem is implementing
      ->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
      expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
      whereupon ->truncate_range can be removed from inode_operations -
      shmem_truncate_range() will help i915 across that transition too.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94c1e62d
  3. 28 May, 2011 1 commit
    • Hugh Dickins's avatar
      tmpfs: fix race between truncate and writepage · 826267cf
      Hugh Dickins authored
      While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
      (interruptibly), repeatedly failing to locate the owner of a 0xff entry in
      the swap_map.
      
      Although shmem_writepage() does abandon when it sees incoming page index
      is beyond eof, there was still a window in which shmem_truncate_range()
      could come in between writepage's dropping lock and updating swap_map,
      find the half-completed swap_map entry, and in trying to free it,
      leave it in a state that swap_shmem_alloc() could not correct.
      
      Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
      of the different cases, but easiest to fix by moving swap_shmem_alloc()
      under cover of the lock.
      
      More interesting than the bug: it's been there since 2.6.33, why could
      I not see it with earlier kernels?  The mmotm of two weeks ago seems to
      have some magic for generating races, this is just one of three I found.
      
      With yesterday's git I first saw this in mainline, bisected in search of
      that magic, but the easy reproducibility evaporated.  Oh well, fix the bug.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      826267cf
  4. 27 May, 2011 1 commit
    • Ying Han's avatar
      memcg: add the pagefault count into memcg stats · 456f998e
      Ying Han authored
      Two new stats in per-memcg memory.stat which tracks the number of page
      faults and number of major page faults.
      
        "pgfault"
        "pgmajfault"
      
      They are different from "pgpgin"/"pgpgout" stat which count number of
      pages charged/discharged to the cgroup and have no meaning of reading/
      writing page to disk.
      
      It is valuable to track the two stats for both measuring application's
      performance as well as the efficiency of the kernel page reclaim path.
      Counting pagefaults per process is useful, but we also need the aggregated
      value since processes are monitored and controlled in cgroup basis in
      memcg.
      
      Functional test: check the total number of pgfault/pgmajfault of all
      memcgs and compare with global vmstat value:
      
        $ cat /proc/vmstat | grep fault
        pgfault 1070751
        pgmajfault 553
      
        $ cat /dev/cgroup/memory.stat | grep fault
        pgfault 1071138
        pgmajfault 553
        total_pgfault 1071142
        total_pgmajfault 553
      
        $ cat /dev/cgroup/A/memory.stat | grep fault
        pgfault 199
        pgmajfault 0
        total_pgfault 199
        total_pgmajfault 0
      
      Performance test: run page fault test(pft) wit 16 thread on faulting in
      15G anon pages in 16G container.  There is no regression noticed on the
      "flt/cpu/s"
      
      Sample output from pft:
      
        TAG pft:anon-sys-default:
          Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
          15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260
      
        +-------------------------------------------------------------------------+
            N           Min           Max        Median           Avg        Stddev
        x  10     16682.962     17344.027     16913.524     16928.812      166.5362
        +  10     16695.568     16923.896     16820.604     16824.652     84.816568
        No difference proven at 95.0% confidence
      
      [akpm@linux-foundation.org: fix build]
      [hughd@google.com: shmem fix]
      Signed-off-by: default avatarYing Han <yinghan@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      456f998e
  5. 25 May, 2011 1 commit
    • Eric Paris's avatar
      tmpfs: implement generic xattr support · b09e0fa4
      Eric Paris authored
      Implement generic xattrs for tmpfs filesystems.  The Feodra project, while
      trying to replace suid apps with file capabilities, realized that tmpfs,
      which is used on the build systems, does not support file capabilities and
      thus cannot be used to build packages which use file capabilities.  Xattrs
      are also needed for overlayfs.
      
      The xattr interface is a bit odd.  If a filesystem does not implement any
      {get,set,list}xattr functions the VFS will call into some random LSM hooks
      and the running LSM can then implement some method for handling xattrs.
      SELinux for example provides a method to support security.selinux but no
      other security.* xattrs.
      
      As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
      xattr handler routines specifically to handle acls.  Because of this tmpfs
      would loose the VFS/LSM helpers to support the running LSM.  To make up
      for that tmpfs had stub functions that did nothing but call into the LSM
      hooks which implement the helpers.
      
      This new patch does not use the LSM fallback functions and instead just
      implements a native get/set/list xattr feature for the full security.* and
      trusted.* namespace like a normal filesystem.  This means that tmpfs can
      now support both security.selinux and security.capability, which was not
      previously possible.
      
      The basic implementation is that I attach a:
      
      struct shmem_xattr {
      	struct list_head list; /* anchored by shmem_inode_info->xattr_list */
      	char *name;
      	size_t size;
      	char value[0];
      };
      
      Into the struct shmem_inode_info for each xattr that is set.  This
      implementation could easily support the user.* namespace as well, except
      some care needs to be taken to prevent large amounts of unswappable memory
      being allocated for unprivileged users.
      
      [mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@ubuntu.com>
      Tested-by: default avatarSerge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarJordi Pujol <jordipujolp@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b09e0fa4
  6. 20 May, 2011 1 commit
  7. 14 May, 2011 1 commit
    • Hugh Dickins's avatar
      tmpfs: fix race between swapoff and writepage · 05bf86b4
      Hugh Dickins authored
      Shame on me!  Commit b1dea800 "tmpfs: fix race between umount and
      writepage" fixed the advertized race, but introduced another: as even
      its comment makes clear, we cannot safely rely on a peek at list_empty()
      while holding no lock - until info->swapped is set, shmem_unuse_inode()
      may delete any formerly-swapped inode from the shmem_swaplist, which
      in this case would leave a swap area impossible to swapoff.
      
      Although I don't relish taking the mutex every time, I don't care much
      for the alternatives either; and at least the peek at list_empty() in
      shmem_evict_inode() (a hotter path since most inodes would never have
      been swapped) remains safe, because we already truncated the whole file.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05bf86b4
  8. 12 May, 2011 3 commits
    • Hugh Dickins's avatar
      tmpfs: fix spurious ENOSPC when racing with unswap · 59a16ead
      Hugh Dickins authored
      Testing the shmem_swaplist replacements for igrab() revealed another bug:
      writes to /dev/loop0 on a tmpfs file which fills its filesystem were
      sometimes failing with "Buffer I/O error"s.
      
      These came from ENOSPC failures of shmem_getpage(), when racing with
      swapoff: the same could happen when racing with another shmem_getpage(),
      pulling the page in from swap in between our find_lock_page() and our
      taking the info->lock (though not in the single-threaded loop case).
      
      This is unacceptable, and surprising that I've not noticed it before:
      it dates back many years, but (presumably) was made a lot easier to
      reproduce in 2.6.36, which sited a page preallocation in the race window.
      
      Fix it by rechecking the page cache before settling on an ENOSPC error.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59a16ead
    • Hugh Dickins's avatar
      tmpfs: fix race between umount and swapoff · 778dd893
      Hugh Dickins authored
      The use of igrab() in swapoff's shmem_unuse_inode() is just as vulnerable
      to umount as that in shmem_writepage().
      
      Fix this instance by extending the protection of shmem_swaplist_mutex
      right across shmem_unuse_inode(): while it's on the list, the inode cannot
      be evicted (and the filesystem cannot be unmounted) without
      shmem_evict_inode() taking that mutex to remove it from the list.
      
      But since shmem_writepage() might take that mutex, we should avoid making
      memory allocations or memcg charges while holding it: prepare them at the
      outer level in shmem_unuse().  When mem_cgroup_cache_charge() was
      originally placed, we didn't know until that point that the page from swap
      was actually a shmem page; but nowadays it's noted in the swap_map, so
      we're safe to charge upfront.  For the radix_tree, do as is done in
      shmem_getpage(): preload upfront, but don't pin to the cpu; so we make a
      habit of refreshing the node pool, but might dip into GFP_NOWAIT reserves
      on occasion if subsequently preempted.
      
      With the allocation and charge moved out from shmem_unuse_inode(),
      we can also hold index map and info->lock over from finding the entry.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      778dd893
    • Hugh Dickins's avatar
      tmpfs: fix race between umount and writepage · b1dea800
      Hugh Dickins authored
      Konstanin Khlebnikov reports that a dangerous race between umount and
      shmem_writepage can be reproduced by this script:
      
        for i in {1..300} ; do
      	mkdir $i
      	while true ; do
      		mount -t tmpfs none $i
      		dd if=/dev/zero of=$i/test bs=1M count=$(($RANDOM % 100))
      		umount $i
      	done &
        done
      
      on a 6xCPU node with 8Gb RAM: kernel very unstable after this accident. =)
      
      Kernel log:
      
        VFS: Busy inodes after unmount of tmpfs.
                       Self-destruct in 5 seconds.  Have a nice day...
      
        WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
        list_del corruption. prev->next should be ffff880222fdaac8, but was (null)
        Pid: 11222, comm: mount.tmpfs Not tainted 2.6.39-rc2+ #4
        Call Trace:
         warn_slowpath_common+0x80/0x98
         warn_slowpath_fmt+0x41/0x43
         __list_del_entry+0x8d/0x98
         evict+0x50/0x113
         iput+0x138/0x141
        ...
        BUG: unable to handle kernel paging request at ffffffffffffffff
        IP: shmem_free_blocks+0x18/0x4c
        Pid: 10422, comm: dd Tainted: G        W   2.6.39-rc2+ #4
        Call Trace:
         shmem_recalc_inode+0x61/0x66
         shmem_writepage+0xba/0x1dc
         pageout+0x13c/0x24c
         shrink_page_list+0x28e/0x4be
         shrink_inactive_list+0x21f/0x382
        ...
      
      shmem_writepage() calls igrab() on the inode for the page which came from
      page reclaim, to add it later into shmem_swaplist for swapoff operation.
      
      This igrab() can race with super-block deactivating process:
      
        shrink_inactive_list()          deactivate_super()
        pageout()                       tmpfs_fs_type->kill_sb()
        shmem_writepage()               kill_litter_super()
                                        generic_shutdown_super()
                                         evict_inodes()
         igrab()
                                          atomic_read(&inode->i_count)
                                           skip-inode
         iput()
                                         if (!list_empty(&sb->s_inodes))
                                                printk("VFS: Busy inodes after...
      
      This igrap-iput pair was added in commit 1b1b32f2 "tmpfs: fix
      shmem_swaplist races" based on incorrect assumptions: igrab() protects the
      inode from concurrent eviction by deletion, but it does nothing to protect
      it from concurrent unmounting, which goes ahead despite the raised
      i_count.
      
      So this use of igrab() was wrong all along, but the race made much worse
      in 2.6.37 when commit 63997e98 "split invalidate_inodes()" replaced
      two attempts at invalidate_inodes() by a single evict_inodes().
      
      Konstantin posted a plausible patch, raising sb->s_active too: I'm unsure
      whether it was correct or not; but burnt once by igrab(), I am sure that
      we don't want to rely more deeply upon externals here.
      
      Fix it by adding the inode to shmem_swaplist earlier, while the page lock
      on page in page cache still secures the inode against eviction, without
      artifically raising i_count.  It was originally added later because
      shmem_unuse_inode() is liable to remove an inode from the list while it's
      unswapped; but we can guard against that by taking spinlock before
      dropping mutex.
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1dea800
  9. 14 Apr, 2011 1 commit
  10. 23 Mar, 2011 2 commits
  11. 14 Mar, 2011 1 commit
  12. 10 Mar, 2011 1 commit
  13. 01 Mar, 2011 1 commit
  14. 01 Feb, 2011 1 commit
    • Eric Paris's avatar
      fs/vfs/security: pass last path component to LSM on inode creation · 2a7dba39
      Eric Paris authored
      SELinux would like to implement a new labeling behavior of newly created
      inodes.  We currently label new inodes based on the parent and the creating
      process.  This new behavior would also take into account the name of the
      new object when deciding the new label.  This is not the (supposed) full path,
      just the last component of the path.
      
      This is very useful because creating /etc/shadow is different than creating
      /etc/passwd but the kernel hooks are unable to differentiate these
      operations.  We currently require that userspace realize it is doing some
      difficult operation like that and than userspace jumps through SELinux hoops
      to get things set up correctly.  This patch does not implement new
      behavior, that is obviously contained in a seperate SELinux patch, but it
      does pass the needed name down to the correct LSM hook.  If no such name
      exists it is fine to pass NULL.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      2a7dba39
  15. 07 Jan, 2011 1 commit
    • Nick Piggin's avatar
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin authored
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  16. 29 Oct, 2010 1 commit
  17. 26 Oct, 2010 3 commits
  18. 18 Aug, 2010 1 commit
  19. 10 Aug, 2010 2 commits
    • Shaohua Li's avatar
      shmem: reduce pagefault lock contention · ff36b801
      Shaohua Li authored
      I'm running a shmem pagefault test case (see attached file) under a 64 CPU
      system.  Profile shows shmem_inode_info->lock is heavily contented and
      100% CPUs time are trying to get the lock.  In the pagefault (no swap)
      case, shmem_getpage gets the lock twice, the last one is avoidable if we
      prealloc a page so we could reduce one time of locking.  This is what
      below patch does.
      
      The result of the test case:
      2.6.35-rc3: ~20s
      2.6.35-rc3 + patch: ~12s
      so this is 40% improvement.
      
      One might argue if we could have better locking for shmem.  But even shmem
      is lockless, the pagefault will soon have pagecache lock heavily contented
      because shmem must add new page to pagecache.  So before we have better
      locking for pagecache, improving shmem locking doesn't have too much
      improvement.  I did a similar pagefault test against a ramfs file, the
      test result is ~10.5s.
      
      [akpm@linux-foundation.org: fix comment, clean up code layout, elimintate code duplication]
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff36b801
    • Tim Chen's avatar
      tmpfs: make tmpfs scalable with percpu_counter for used blocks · 7e496299
      Tim Chen authored
      The current implementation of tmpfs is not scalable.  We found that
      stat_lock is contended by multiple threads when we need to get a new page,
      leading to useless spinning inside this spin lock.
      
      This patch makes use of the percpu_counter library to maintain local count
      of used blocks to speed up getting and returning of pages.  So the
      acquisition of stat_lock is unnecessary for getting and returning blocks,
      improving the performance of tmpfs on system with large number of cpus.
      On a 4 socket 32 core NHM-EX system, we saw improvement of 270%.
      
      The implementation below has a slight chance of race between threads
      causing a slight overshoot of the maximum configured blocks.  However, any
      overshoot is small, and is bounded by the number of cpus.  This happens
      when the number of used blocks is slightly below the maximum configured
      blocks when a thread checks the used block count, and another thread
      allocates the last block before the current thread does.  This should not
      be a problem for tmpfs, as the overshoot is most likely to be a few blocks
      and bounded.  If a strict limit is really desired, then configured the max
      blocks to be the limit less the number of cpus in system.
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e496299
  20. 09 Aug, 2010 4 commits
  21. 04 Jun, 2010 1 commit
    • Nick Piggin's avatar
      fix truncate inode time modification breakage · af5a30d8
      Nick Piggin authored
      mtime and ctime should be changed only if the file size has actually
      changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
      sequence has caused regressions where they always update timestamps.
      
      There is some strange cases in POSIX where truncate(2) must not update
      times unless the size has acutally changed, see 6e656be8.
      
      This area is all still rather buggy in different ways in a lot of
      filesystems and needs a cleanup and audit (ideally the vfs will provide
      a simple attribute or call to direct all filesystems exactly which
      attributes to change). But coming up with the best solution will take a
      while and is not appropriate for rc anyway.
      
      So fix recent regression for now.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      af5a30d8
  22. 28 May, 2010 2 commits
  23. 27 May, 2010 1 commit
    • Daisuke Nishimura's avatar
      memcg: move charge of file pages · 87946a72
      Daisuke Nishimura authored
      This patch adds support for moving charge of file pages, which include
      normal file, tmpfs file and swaps of tmpfs file.  It's enabled by setting
      bit 1 of <target cgroup>/memory.move_charge_at_immigrate.
      
      Unlike the case of anonymous pages, file pages(and swaps) in the range
      mmapped by the task will be moved even if the task hasn't done page fault,
      i.e.  they might not be the task's "RSS", but other task's "RSS" that maps
      the same file.  And mapcount of the page is ignored(the page can be moved
      even if page_mapcount(page) > 1).  So, conditions that the page/swap
      should be met to be moved is that it must be in the range mmapped by the
      target task and it must be charged to the old cgroup.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      87946a72
  24. 25 May, 2010 1 commit
  25. 21 May, 2010 2 commits
  26. 17 Dec, 2009 1 commit
  27. 16 Dec, 2009 1 commit