Skip to content
Snippets Groups Projects
  1. Jun 19, 2019
  2. Jun 14, 2019
    • Dan Williams's avatar
      lib/genalloc: introduce chunk owners · 795ee306
      Dan Williams authored
      The p2pdma facility enables a provider to publish a pool of dma
      addresses for a consumer to allocate.  A genpool is used internally by
      p2pdma to collect dma resources, 'chunks', to be handed out to
      consumers.  Whenever a consumer allocates a resource it needs to pin the
      'struct dev_pagemap' instance that backs the chunk selected by
      pci_alloc_p2pmem().
      
      Currently that reference is taken globally on the entire provider
      device.  That sets up a lifetime mismatch whereby the p2pdma core needs
      to maintain hacks to make sure the percpu_ref is not released twice.
      
      This lifetime mismatch also stands in the way of a fix to
      devm_memremap_pages() whereby devm_memremap_pages_release() must wait for
      the percpu_ref ->release() callback to complete before it can proceed to
      teardown pages.
      
      So, towards fixing this situation, introduce the ability to store a 'chunk
      owner' at gen_pool_add() time, and a facility to retrieve the owner at
      gen_pool_{alloc,free}() time.  For p2pdma this will be used to store and
      recall individual dev_pagemap reference counter instances per-chunk.
      
      Link: http://lkml.kernel.org/r/155727338118.292046.13407378933221579644.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      795ee306
  3. Jun 07, 2019
  4. Jun 05, 2019
  5. Jun 03, 2019
    • Matthew Wilcox's avatar
      XArray tests: Add check_insert · 12fd2aee
      Matthew Wilcox authored
      
      A simple test which just checks that inserting an entry into an empty
      array succeeds.  Try various different interesting indices.
      
      Signed-off-by: default avatarMatthew Wilcox <willy@infradead.org>
      12fd2aee
    • Matthew Wilcox (Oracle)'s avatar
      idr: Fix idr_get_next race with idr_remove · 5c089fd0
      Matthew Wilcox (Oracle) authored
      
      If the entry is deleted from the IDR between the call to
      radix_tree_iter_find() and rcu_dereference_raw(), idr_get_next()
      will return NULL, which will end the iteration prematurely.  We should
      instead continue to the next entry in the IDR.  This only happens if the
      iteration is protected by the RCU lock.  Most IDR users use a spinlock
      or semaphore to exclude simultaneous modifications.  It was noticed once
      the PID allocator was converted to use the IDR, as it uses the RCU lock,
      but there may be other users elsewhere in the kernel.
      
      We can't use the normal pattern of calling radix_tree_deref_retry()
      (which catches both a retry entry in a leaf node and a node entry in
      the root) as the IDR supports storing entries which are unaligned,
      which will trigger an infinite loop if they are encountered.  Instead,
      we have to explicitly check whether the entry is a retry entry.
      
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: default avatarBrendan Gregg <bgregg@netflix.com>
      Tested-by: default avatarBrendan Gregg <bgregg@netflix.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      5c089fd0
  6. Jun 01, 2019
    • Randy Dunlap's avatar
      lib/sort.c: fix kernel-doc notation warnings · aa52619c
      Randy Dunlap authored
      Fix kernel-doc notation in lib/sort.c by using correct function parameter
      names.
      
        lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32'
        lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64'
        lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes'
      
      Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org
      
      
      Fixes: 37d0ec34 ("lib/sort: make swap functions more generic")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: George Spelvin <lkml@sdf.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa52619c
    • Masahiro Yamada's avatar
      treewide: fix typos of SPDX-License-Identifier · 8e82fe2a
      Masahiro Yamada authored
      
      Prior to the adoption of SPDX, it was difficult for tools to determine
      the correct license due to incomplete or badly formatted license text.
      The SPDX solves this issue, assuming people can correctly spell
      "SPDX-License-Identifier" although this assumption is broken in some
      places.
      
      Since scripts/spdxcheck.py parses only lines that exactly matches to
      the correct tag, it cannot (should not) detect this kind of error.
      
      If the correct tag is missing, scripts/checkpatch.pl warns like this:
      
       WARNING: Missing or malformed SPDX-License-Identifier tag in line *
      
      So, people should notice it before the patch submission, but in reality
      broken tags sometimes slip in. The checkpatch warning is not useful for
      checking the committed files globally since large number of files still
      have no SPDX tag.
      
      Also, I am not sure about the legal effect when the SPDX tag is broken.
      
      Anyway, these typos are absolutely worth fixing. It is pretty easy to
      find suspicious lines by grep.
      
        $ git grep --not -e SPDX-License-Identifier --and -e SPDX- -- \
          :^LICENSES :^scripts/spdxcheck.py :^*/license-rules.rst
        arch/arm/kernel/bugs.c:// SPDX-Identifier: GPL-2.0
        drivers/phy/st/phy-stm32-usbphyc.c:// SPDX-Licence-Identifier: GPL-2.0
        drivers/pinctrl/sh-pfc/pfc-r8a77980.c:// SPDX-Lincense-Identifier: GPL 2.0
        lib/test_stackinit.c:// SPDX-Licenses: GPLv2
        sound/soc/codecs/max9759.c:// SPDX-Licence-Identifier: GPL-2.0
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e82fe2a
  7. May 31, 2019
    • Johannes Weiner's avatar
      mm: fix page cache convergence regression · 7b785645
      Johannes Weiner authored
      
      Since a2833486 ("page cache: Finish XArray conversion"), on most
      major Linux distributions, the page cache doesn't correctly transition
      when the hot data set is changing, and leaves the new pages thrashing
      indefinitely instead of kicking out the cold ones.
      
      On a freshly booted, freshly ssh'd into virtual machine with 1G RAM
      running stock Arch Linux:
      
      [root@ham ~]# ./reclaimtest.sh
      + dd of=workingset-a bs=1M count=0 seek=600
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + ./mincore workingset-a
      153600/153600 workingset-a
      + dd of=workingset-b bs=1M count=0 seek=600
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      104029/153600 workingset-a
      120086/153600 workingset-b
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      104029/153600 workingset-a
      120268/153600 workingset-b
      
      workingset-b is a 600M file on a 1G host that is otherwise entirely
      idle. No matter how often it's being accessed, it won't get cached.
      
      While investigating, I noticed that the non-resident information gets
      aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is
      a problem because a workingset transition like this relies on the
      non-resident information tracked in the page cache tree of evicted
      file ranges: when the cache faults are refaults of recently evicted
      cache, we challenge the existing active set, and that allows a new
      workingset to establish itself.
      
      Tracing the shrinker that maintains this memory revealed that all page
      cache tree nodes were allocated to the root cgroup. This is a problem,
      because 1) the shrinker sizes the amount of non-resident information
      it keeps to the size of the cgroup's other memory and 2) on most major
      Linux distributions, only kernel threads live in the root cgroup and
      everything else gets put into services or session groups:
      
      [root@ham ~]# cat /proc/self/cgroup
      0::/user.slice/user-0.slice/session-c1.scope
      
      As a result, we basically maintain no non-resident information for the
      workloads running on the system, thus breaking the caching algorithm.
      
      Looking through the code, I found the culprit in the above-mentioned
      patch: when switching from the radix tree to xarray, it dropped the
      __GFP_ACCOUNT flag from the tree node allocations - the flag that
      makes sure the allocated memory gets charged to and tracked by the
      cgroup of the calling process - in this case, the one doing the fault.
      
      To fix this, allow xarray users to specify per-tree flag that makes
      xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache
      tree annotation to request such cgroup tracking for the cache nodes.
      
      With this patch applied, the page cache correctly converges on new
      workingsets again after just a few iterations:
      
      [root@ham ~]# ./reclaimtest.sh
      + dd of=workingset-a bs=1M count=0 seek=600
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + cat workingset-a
      + ./mincore workingset-a
      153600/153600 workingset-a
      + dd of=workingset-b bs=1M count=0 seek=600
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      124607/153600 workingset-a
      87876/153600 workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      81313/153600 workingset-a
      133321/153600 workingset-b
      + cat workingset-b
      + ./mincore workingset-a workingset-b
      63036/153600 workingset-a
      153600/153600 workingset-b
      
      Cc: stable@vger.kernel.org # 4.20+
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      7b785645
  8. May 30, 2019
  9. May 24, 2019
  10. May 23, 2019
  11. May 21, 2019
Loading