1. 22 Mar, 2012 1 commit
  2. 21 Mar, 2012 2 commits
  3. 20 Mar, 2012 1 commit
  4. 13 Feb, 2012 1 commit
  5. 23 Jan, 2012 2 commits
    • Hugh Dickins's avatar
      SHM_UNLOCK: fix Unevictable pages stranded after swap · 24513264
      Hugh Dickins authored
      Commit cc39c6a9 ("mm: account skipped entries to avoid looping in
      find_get_pages") correctly fixed an infinite loop; but left a problem
      that find_get_pages() on shmem would return 0 (appearing to callers to
      mean end of tree) when it meets a run of nr_pages swap entries.
      
      The only uses of find_get_pages() on shmem are via pagevec_lookup(),
      called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
      scan_mapping_unevictable_pages().  The first is already commented, and
      not worth worrying about; but the second can leave pages on the
      Unevictable list after an unusual sequence of swapping and locking.
      
      Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
      swap) instead of pagevec_lookup().
      
      But I don't want to contaminate vmscan.c with shmem internals, nor
      shmem.c with LRU locking.  So move scan_mapping_unevictable_pages() into
      shmem.c, renaming it shmem_unlock_mapping(); and rename
      check_move_unevictable_page() to check_move_unevictable_pages(), looping
      down an array of pages, oftentimes under the same lock.
      
      Leave out the "rotate unevictable list" block: that's a leftover from
      when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
      handling involved looking at pages at tail of LRU.
      
      Was there significance to the sequence first ClearPageUnevictable, then
      test page_evictable, then SetPageUnevictable here? I think not, we're
      under LRU lock, and have no barriers between those.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: <stable@vger.kernel.org> [back to 3.1 but will need respins]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      24513264
    • Hugh Dickins's avatar
      SHM_UNLOCK: fix long unpreemptible section · 85046579
      Hugh Dickins authored
      scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
      evictable again once the shared memory is unlocked.  It does this with
      pagevec_lookup()s across the whole object (which might occupy most of
      memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
      PAGEVEC_SIZE pages would be good.
      
      However, KOSAKI-san points out that this is called under shmem.c's
      info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
      There is no strong reason for that: we need to take these pages off the
      unevictable list soonish, but those locks are not required for it.
      
      So move the call to scan_mapping_unevictable_pages() from shmem.c's
      unlock handling up to shm.c's unlock handling.  Remove the recently
      added barrier, not needed now we have spin_unlock() before the scan.
      
      Use get_file(), with subsequent fput(), to make sure we have a reference
      to mapping throughout scan_mapping_unevictable_pages(): that's something
      that was previously guaranteed by the shm_lock().
      
      Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK
      time, and we lazily discover them to be Unevictable later, so it serves
      no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since
      pages still on pagevec are not marked Unevictable.
      
      The original code avoided redundant rescans by checking VM_LOCKED flag
      at its level: now avoid them by checking shp's SHM_LOCKED.
      
      The original code called scan_mapping_unevictable_pages() on a locked
      area at shm_destroy() time: perhaps we once had accounting cross-checks
      which required that, but not now, so skip the overhead and just let
      inode eviction deal with them.
      
      Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
      under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
      more as comment than to save space; comment them used for SHM_UNLOCK.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85046579
  6. 07 Jan, 2012 1 commit
  7. 04 Jan, 2012 5 commits
  8. 02 Nov, 2011 1 commit
  9. 01 Nov, 2011 1 commit
    • Minchan Kim's avatar
      vmscan: add barrier to prevent evictable page in unevictable list · 21ee9f39
      Minchan Kim authored
      When a race between putback_lru_page() and shmem_lock with lock=0 happens,
      progrom execution order is as follows, but clear_bit in processor #1 could
      be reordered right before spin_unlock of processor #1.  Then, the page
      would be stranded on the unevictable list.
      
      spin_lock
      SetPageLRU
      spin_unlock
                                      clear_bit(AS_UNEVICTABLE)
                                      spin_lock
                                      if PageLRU()
                                              if !test_bit(AS_UNEVICTABLE)
                                              	move evictable list
      smp_mb
      if !test_bit(AS_UNEVICTABLE)
              move evictable list
                                      spin_unlock
      
      But, pagevec_lookup() in scan_mapping_unevictable_pages() has
      rcu_read_[un]lock() so it could protect reordering before reaching
      test_bit(AS_UNEVICTABLE) on processor #1 so this problem never happens.
      But it's a unexpected side effect and we should solve this problem
      properly.
      
      This patch adds a barrier after mapping_clear_unevictable.
      
      I didn't meet this problem but just found during review.
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarJohannes Weiner <jweiner@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      21ee9f39
  10. 31 Oct, 2011 1 commit
  11. 04 Aug, 2011 11 commits
    • Hugh Dickins's avatar
      mm: clarify the radix_tree exceptional cases · 8079b1c8
      Hugh Dickins authored
      Make the radix_tree exceptional cases, mostly in filemap.c, clearer.
      
      It's hard to devise a suitable snappy name that illuminates the use by
      shmem/tmpfs for swap, while keeping filemap/pagecache/radix_tree
      generality.  And akpm points out that /* radix_tree_deref_retry(page) */
      comments look like calls that have been commented out for unknown
      reason.
      
      Skirt the naming difficulty by rearranging these blocks to handle the
      transient radix_tree_deref_retry(page) case first; then just explain the
      remaining shmem/tmpfs swap case in a comment.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8079b1c8
    • Hugh Dickins's avatar
      tmpfs radix_tree: locate_item to speed up swapoff · e504f3fd
      Hugh Dickins authored
      We have already acknowledged that swapoff of a tmpfs file is slower than
      it was before conversion to the generic radix_tree: a little slower
      there will be acceptable, if the hotter paths are faster.
      
      But it was a shock to find swapoff of a 500MB file 20 times slower on my
      laptop, taking 10 minutes; and at that rate it significantly slows down
      my testing.
      
      Now, most of that turned out to be overhead from PROVE_LOCKING and
      PROVE_RCU: without those it was only 4 times slower than before; and
      more realistic tests on other machines don't fare as badly.
      
      I've tried a number of things to improve it, including tagging the swap
      entries, then doing lookup by tag: I'd expected that to halve the time,
      but in practice it's erratic, and often counter-productive.
      
      The only change I've so far found to make a consistent improvement, is
      to short-circuit the way we go back and forth, gang lookup packing
      entries into the array supplied, then shmem scanning that array for the
      target entry.  Scanning in place doubles the speed, so it's now only
      twice as slow as before (or three times slower when the PROVEs are on).
      
      So, add radix_tree_locate_item() as an expedient, once-off,
      single-caller hack to do the lookup directly in place.  #ifdef it on
      CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
      applicability as save space in other configurations.  And, sadly,
      #include sched.h for cond_resched().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e504f3fd
    • Hugh Dickins's avatar
      tmpfs: use kmemdup for short symlinks · 69f07ec9
      Hugh Dickins authored
      But we've not yet removed the old swp_entry_t i_direct[16] from
      shmem_inode_info.  That's because it was still being shared with the
      inline symlink.  Remove it now (saving 64 or 128 bytes from shmem inode
      size), and use kmemdup() for short symlinks, say, those up to 128 bytes.
      
      I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
      rather than shmem_evict_inode(), where we usually do such freeing? I
      guess it doesn't matter, and I'm not into NUMA mpol testing right now.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69f07ec9
    • Hugh Dickins's avatar
      tmpfs: convert shmem_writepage and enable swap · 6922c0c7
      Hugh Dickins authored
      Convert shmem_writepage() to use shmem_delete_from_page_cache() to use
      shmem_radix_tree_replace() to substitute swap entry for page pointer
      atomically in the radix tree.
      
      As with shmem_add_to_page_cache(), it's not entirely satisfactory to be
      copying such code from delete_from_swap_cache, but again judged easier
      to sell than making its other callers go through the extras.
      
      Remove the toy implementation's shmem_put_swap() and shmem_get_swap(),
      now unreferenced, and the hack to disable swap: it's now good to go.
      
      The way things have worked out, info->lock no longer helps to guard the
      shmem_swaplist: we increment swapped under shmem_swaplist_mutex only.
      That global mutex exclusion between shmem_writepage() and shmem_unuse()
      is not pretty, and we ought to find another way; but it's been forced on
      us by recent race discoveries, not a consequence of this patchset.
      
      And what has become of the WARN_ON_ONCE(1) free_swap_and_cache() if a
      swap entry was found already present? That's no longer possible, the
      (unknown) one inserting this page into filecache would hit the swap
      entry occupying that slot.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6922c0c7
    • Hugh Dickins's avatar
      tmpfs: convert mem_cgroup shmem to radix-swap · aa3b1895
      Hugh Dickins authored
      Remove mem_cgroup_shmem_charge_fallback(): it was only required when we
      had to move swappage to filecache with GFP_NOWAIT.
      
      Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by
      moving its call out from shmem_add_to_page_cache() to two of thats three
      callers.  But leave it doing mem_cgroup_uncharge_cache_page() on error:
      although asymmetrical, it's easier for all 3 callers to handle.
      
      These two changes would also be appropriate if anyone were to start
      using shmem_read_mapping_page_gfp() with GFP_NOWAIT.
      
      Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test
      radix_tree_exceptional_entry() to get what it needs for itself.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa3b1895
    • Hugh Dickins's avatar
      tmpfs: convert shmem_getpage_gfp to radix-swap · 54af6042
      Hugh Dickins authored
      Convert shmem_getpage_gfp(), the engine-room of shmem, to expect page or
      swap entry returned from radix tree by find_lock_page().
      
      Whereas the repetitive old method proceeded mainly under info->lock,
      dropping and repeating whenever one of the conditions needed was not
      met, now we can proceed without it, leaving shmem_add_to_page_cache() to
      check for a race.
      
      This way there is no need to preallocate a page, no need for an early
      radix_tree_preload(), no need for mem_cgroup_shmem_charge_fallback().
      
      Move the error unwinding down to the bottom instead of repeating it
      throughout.  ENOSPC handling is a little different from before: there is
      no longer any race between find_lock_page() and finding swap, but we can
      arrive at ENOSPC before calling shmem_recalc_inode(), which might
      occasionally discover freed space.
      
      Be stricter to check i_size before returning.  info->lock is used for
      little but alloced, swapped, i_blocks updates.  Move i_blocks updates
      out from under the max_blocks check, so even an unlimited size=0 mount
      can show accurate du.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54af6042
    • Hugh Dickins's avatar
      tmpfs: convert shmem_unuse_inode to radix-swap · 46f65ec1
      Hugh Dickins authored
      Convert shmem_unuse_inode() to use a lockless gang lookup of the radix
      tree, searching for matching swap.
      
      This is somewhat slower than the old method: because of repeated radix
      tree descents, because of copying entries up, but probably most because
      the old method noted and skipped once a vector page was cleared of swap.
      Perhaps we can devise a use of radix tree tagging to achieve that later.
      
      shmem_add_to_page_cache() uses shmem_radix_tree_replace() to compensate
      for the lockless lookup by checking that the expected entry is in place,
      under lock.  It is not very satisfactory to be copying this much from
      add_to_page_cache_locked(), but I think easier to sell than insisting
      that every caller of add_to_page_cache*() go through the extras.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46f65ec1
    • Hugh Dickins's avatar
      tmpfs: convert shmem_truncate_range to radix-swap · 7a5d0fbb
      Hugh Dickins authored
      Disable the toy swapping implementation in shmem_writepage() - it's hard
      to support two schemes at once - and convert shmem_truncate_range() to a
      lockless gang lookup of swap entries along with pages, freeing both.
      
      Since the second loop tightens its noose until all entries of either
      kind have been squeezed out (and we shall make sure that there's not an
      instant when neither is visible), there is no longer a need for yet
      another pass below.
      
      shmem_radix_tree_replace() compensates for the lockless lookup by
      checking that the expected entry is in place, under lock, before
      replacing it.  Here it just deletes, but will be used in later patches
      to substitute swap entry for page or page for swap entry.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a5d0fbb
    • Hugh Dickins's avatar
      tmpfs: copy truncate_inode_pages_range · bda97eab
      Hugh Dickins authored
      Bring truncate.c's code for truncate_inode_pages_range() inline into
      shmem_truncate_range(), replacing its first call (there's a followup
      call below, but leave that one, it will disappear next).
      
      Don't play with it yet, apart from leaving out the cleancache flush, and
      (importantly) the nrpages == 0 skip, and moving shmem_setattr()'s
      partial page preparation into its partial page handling.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bda97eab
    • Hugh Dickins's avatar
      tmpfs: miscellaneous trivial cleanups · 41ffe5d5
      Hugh Dickins authored
      While it's at its least, make a number of boring nitpicky cleanups to
      shmem.c, mostly for consistency of variable naming.  Things like "swap"
      instead of "entry", "pgoff_t index" instead of "unsigned long idx".
      
      And since everything else here is prefixed "shmem_", better change
      init_tmpfs() to shmem_init().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41ffe5d5
    • Hugh Dickins's avatar
      tmpfs: demolish old swap vector support · 285b2c4f
      Hugh Dickins authored
      The maximum size of a shmem/tmpfs file has been limited by the maximum
      size of its triple-indirect swap vector.  With 4kB page size, maximum
      filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
      that on a 64-bit kernel.  (With 8kB page size, maximum filesize was just
      over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
      MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)
      
      It's a shame that tmpfs should be more restrictive than ramfs, and this
      limitation has now been noticed.  Add another level to the swap vector?
      No, it became obscure and hard to maintain, once I complicated it to
      make use of highmem pages nine years ago: better choose another way.
      
      Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
      tmpfs would never have invented its own peculiar radix tree: we would
      have fitted swap entries into the common radix tree instead, in much the
      same way as we fit swap entries into page tables.
      
      And why should each file have a separate radix tree for its pages and
      for its swap entries? The swap entries are required precisely where and
      when the pages are not.  We want to put them together in a single radix
      tree: which can then avoid much of the locking which was needed to
      prevent them from being exchanged underneath us.
      
      This also avoids the waste of memory devoted to swap vectors, first in
      the shmem_inode itself, then at least two more pages once a file grew
      beyond 16 data pages (pages accounted by df and du, but not by memcg).
      Allocated upfront, to avoid allocation when under swapping pressure, but
      pure waste when CONFIG_SWAP is not set - I have never spattered around
      the ifdefs to prevent that, preferring this move to sharing the common
      radix tree instead.
      
      There are three downsides to sharing the radix tree.  One, that it binds
      tmpfs more tightly to the rest of mm, either requiring knowledge of swap
      entries in radix tree there, or duplication of its code here in shmem.c.
      I believe that the simplications and memory savings (and probable higher
      performance, not yet measured) justify that.
      
      Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
      nodes that cannot be freed under memory pressure - whereas before it was
      the less precious highmem swap vector pages that could not be freed.
      I'm hoping that 64-bit has now been accessible for long enough, that the
      highmem argument has grown much less persuasive.
      
      Three, that swapoff is slower than it used to be on tmpfs files, since
      it's using a simple generic mechanism not tailored to it: I find this
      noticeable, and shall want to improve, but maybe nobody else will
      notice.
      
      So...  now remove most of the old swap vector code from shmem.c.  But,
      for the moment, keep the simple i_direct vector of 16 pages, with simple
      accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
      to help mark where swap needs to be handled in subsequent patches.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      285b2c4f
  12. 26 Jul, 2011 8 commits
    • Hugh Dickins's avatar
      tmpfs: simplify unuse and writepage · 48f170fb
      Hugh Dickins authored
      shmem_unuse_inode() and shmem_writepage() contain a little code to cope
      with pages inserted independently into the filecache, probably by a
      filesystem stacked on top of tmpfs, then fed to its ->readpage() or
      ->writepage().
      
      Unionfs was indeed experimenting with working in that way three years ago,
      but I find no current examples: nowadays the stacking filesystems use vfs
      interfaces to the lower filesystem.
      
      It's now illegal: remove most of that code, adding some WARN_ON_ONCEs.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Erez Zadok <ezk@fsl.cs.sunysb.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48f170fb
    • Hugh Dickins's avatar
      tmpfs: simplify filepage/swappage · 27ab7006
      Hugh Dickins authored
      We can now simplify shmem_getpage_gfp(): there is no longer a dilemma of
      filepage passed in via shmem_readpage(), then swappage found, which must
      then be copied over to it.
      
      Although at first it's tempting to replace the **pagep arg by returning
      struct page *, that makes a mess of IS_ERR_OR_NULL(page)s in all the
      callers, so leave as is.
      
      Insert BUG_ON(!PageUptodate) when we find and lock page: some of the
      complication came from uninitialized pages inserted into filecache prior
      to readpage; but now we're in control, and only release pagelock on
      filecache once it's uptodate (if an error occurs in reading back from
      swap, the page remains in swapcache, never moved to filecache).
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27ab7006
    • Hugh Dickins's avatar
      tmpfs: simplify prealloc_page · e83c32e8
      Hugh Dickins authored
      The prealloc_page handling in shmem_getpage_gfp() is unnecessarily
      complicated: first simplify that before going on to filepage/swappage.
      
      That's right, don't report ENOMEM when the preallocation fails: we may or
      may not need the page.  But simply report ENOMEM once we find we do need
      it, instead of dropping lock, repeating allocation, unwinding on failure
      etc.  And leave the out label on the fast path, don't goto.
      
      Fix something that looks like a bug but turns out not to be: set
      PageSwapBacked on prealloc_page before its mem_cgroup_cache_charge(), as
      the removed case was doing.  That's important before adding to LRU
      (determines which LRU the page goes on), and does affect which path it
      takes through memcontrol.c, but in the end MEM_CGROUP_CHANGE_TYPE_ SHMEM
      is handled no differently from CACHE.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarShaohua Li <shaohua.li@intel.com>
      Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e83c32e8
    • Hugh Dickins's avatar
      tmpfs: remove_shmem_readpage · 9276aad6
      Hugh Dickins authored
      Remove that pernicious shmem_readpage() at last: the things we needed it
      for (splice, loop, sendfile, i915 GEM) are now fully taken care of by
      shmem_file_splice_read() and shmem_read_mapping_page_gfp().
      
      This removal clears the way for a simpler shmem_getpage_gfp(), since page
      is never passed in; but leave most of that cleanup until after.
      
      sys_readahead() and sys_fadvise(POSIX_FADV_WILLNEED) will now EINVAL,
      instead of unexpectedly trying to read ahead on tmpfs: if that proves to
      be an issue for someone, then we can either arrange for them to return
      success instead, or try to implement async readahead on tmpfs.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9276aad6
    • Hugh Dickins's avatar
      tmpfs: pass gfp to shmem_getpage_gfp · 68da9f05
      Hugh Dickins authored
      Make shmem_getpage() a wrapper, passing mapping_gfp_mask() down to
      shmem_getpage_gfp(), which in turn passes gfp down to shmem_swp_alloc().
      
      Change shmem_read_mapping_page_gfp() to use shmem_getpage_gfp() in the
      CONFIG_SHMEM case; but leave tiny !SHMEM using read_cache_page_gfp().
      
      Add a BUG_ON() in case anyone happens to call this on a non-shmem mapping;
      though we might later want to let that case route to read_cache_page_gfp().
      
      It annoys me to have these two almost-redundant args, gfp and fault_type:
      I can't find a better way; but initialize fault_type only in shmem_fault().
      
      Note that before, read_cache_page_gfp() was allocating i915_gem's pages
      with __GFP_NORETRY as intended; but the corresponding swap vector pages
      got allocated without it, leaving a small possibility of OOM.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68da9f05
    • Hugh Dickins's avatar
      tmpfs: refine shmem_file_splice_read · 71f0e07a
      Hugh Dickins authored
      Tidy up shmem_file_splice_read():
      
      Remove readahead: okay, we could implement shmem readahead on swap,
      but have never done so before, swap being the slow exceptional path.
      
      Use shmem_getpage() instead of find_or_create_page() plus ->readpage().
      
      Remove several comments: sorry, I found them more distracting than
      helpful, and this will not be the reference version of splice_read().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71f0e07a
    • Hugh Dickins's avatar
      tmpfs: clone shmem_file_splice_read() · 708e3508
      Hugh Dickins authored
      Copy __generic_file_splice_read() and generic_file_splice_read() from
      fs/splice.c to shmem_file_splice_read() in mm/shmem.c.  Make
      page_cache_pipe_buf_ops and spd_release_page() accessible to it.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      708e3508
    • Hugh Dickins's avatar
      tmpfs: no need to use i_lock · d515afe8
      Hugh Dickins authored
      2.6.36's 7e496299 ("tmpfs: make tmpfs scalable with percpu_counter for
      used blocks") to make tmpfs scalable with percpu_counter used
      inode->i_lock in place of sbinfo->stat_lock around i_blocks updates; but
      that was adverse to scalability, and unnecessary, since info->lock is
      already held there in the fast paths.
      
      Remove those uses of i_lock, and add info->lock in the three error paths
      where it's then needed across shmem_free_blocks().  It's not actually
      needed across shmem_unacct_blocks(), but they're so often paired that it
      looks wrong to split them apart.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d515afe8
  13. 25 Jul, 2011 1 commit
  14. 18 Jul, 2011 1 commit
    • Mimi Zohar's avatar
      security: new security_inode_init_security API adds function callback · 9d8f13ba
      Mimi Zohar authored
      This patch changes the security_inode_init_security API by adding a
      filesystem specific callback to write security extended attributes.
      This change is in preparation for supporting the initialization of
      multiple LSM xattrs and the EVM xattr.  Initially the callback function
      walks an array of xattrs, writing each xattr separately, but could be
      optimized to write multiple xattrs at once.
      
      For existing security_inode_init_security() calls, which have not yet
      been converted to use the new callback function, such as those in
      reiserfs and ocfs2, this patch defines security_old_inode_init_security().
      Signed-off-by: default avatarMimi Zohar <zohar@us.ibm.com>
      9d8f13ba
  15. 28 Jun, 2011 2 commits
    • Hugh Dickins's avatar
      tmpfs: add shmem_read_mapping_page_gfp · d9d90e5e
      Hugh Dickins authored
      Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
      is unsuited to tmpfs, because it inserts a page into pagecache before
      calling the filesystem's ->readpage: tmpfs may have pages in swapcache
      which only it knows how to locate and switch to filecache.
      
      At present tmpfs provides a ->readpage method, and copes with this by
      copying pages; but soon we can simplify it by removing its ->readpage.
      Provide shmem_read_mapping_page_gfp() now, ready for that transition,
      
      Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
      with shmem_read_mapping_page() inline for the common mapping_gfp case.
      
      (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
      read_mapping_page functions use the mapping's ->readpage, and the
      read_cache_page functions use the supplied filler, so I think
      read_cache_page_gfp was slightly misnamed.)
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9d90e5e
    • Hugh Dickins's avatar
      tmpfs: take control of its truncate_range · 94c1e62d
      Hugh Dickins authored
      2.6.35's new truncate convention gave tmpfs the opportunity to control
      its file truncation, no longer enforced from outside by vmtruncate().
      We shall want to build upon that, to handle pagecache and swap together.
      
      Slightly redefine the ->truncate_range interface: let it now be called
      between the unmap_mapping_range()s, with the filesystem responsible for
      doing the truncate_inode_pages_range() from it - just as the filesystem
      is nowadays responsible for doing that from its ->setattr.
      
      Let's rename shmem_notify_change() to shmem_setattr().  Instead of
      calling the generic truncate_setsize(), bring that code in so we can
      call shmem_truncate_range() - which will later be updated to perform its
      own variant of truncate_inode_pages_range().
      
      Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
      now that the COW's unmap_mapping_range() comes after ->truncate_range,
      there is no need to call it a third time.
      
      Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
      that i915_gem_object_truncate() can call it explicitly in future; get
      this patch in first, then update drm/i915 once this is available (until
      then, i915 will just be doing the truncate_inode_pages() twice).
      
      Though introduced five years ago, no other filesystem is implementing
      ->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
      expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
      whereupon ->truncate_range can be removed from inode_operations -
      shmem_truncate_range() will help i915 across that transition too.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94c1e62d
  16. 28 May, 2011 1 commit
    • Hugh Dickins's avatar
      tmpfs: fix race between truncate and writepage · 826267cf
      Hugh Dickins authored
      While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
      (interruptibly), repeatedly failing to locate the owner of a 0xff entry in
      the swap_map.
      
      Although shmem_writepage() does abandon when it sees incoming page index
      is beyond eof, there was still a window in which shmem_truncate_range()
      could come in between writepage's dropping lock and updating swap_map,
      find the half-completed swap_map entry, and in trying to free it,
      leave it in a state that swap_shmem_alloc() could not correct.
      
      Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
      of the different cases, but easiest to fix by moving swap_shmem_alloc()
      under cover of the lock.
      
      More interesting than the bug: it's been there since 2.6.33, why could
      I not see it with earlier kernels?  The mmotm of two weeks ago seems to
      have some magic for generating races, this is just one of three I found.
      
      With yesterday's git I first saw this in mainline, bisected in search of
      that magic, but the easy reproducibility evaporated.  Oh well, fix the bug.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      826267cf