1. 17 Jul, 2019 2 commits
    • Pavel Tatashin's avatar
      device-dax: "Hotremove" persistent memory that is used like normal RAM · 9f960da7
      Pavel Tatashin authored
      It is now allowed to use persistent memory like a regular RAM, but
      currently there is no way to remove this memory until machine is
      rebooted.
      
      This work expands the functionality to also allows hotremoving
      previously hotplugged persistent memory, and recover the device for use
      for other purposes.
      
      To hotremove persistent memory, the management software must first
      offline all memory blocks of dax region, and than unbind it from
      device-dax/kmem driver.  So, operations should look like this:
      
        echo offline > /sys/devices/system/memory/memoryN/state
        ...
        echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
      
      Note: if unbind is done without offlining memory beforehand, it won't be
      possible to do dax0.0 hotremove, and dax's memory is going to be part of
      System RAM until reboot.
      
      Link: http://lkml.kernel.org/r/20190517215438.6487-4-pasha.tatashin@soleen.comSigned-off-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f960da7
    • Pavel Tatashin's avatar
      device-dax: fix memory and resource leak if hotplug fails · 31e4ca92
      Pavel Tatashin authored
      Patch series ""Hotremove" persistent memory", v6.
      
      Recently, adding a persistent memory to be used like a regular RAM was
      added to Linux.  This work extends this functionality to also allow hot
      removing persistent memory.
      
      We (Microsoft) have an important use case for this functionality.
      
      The requirement is for physical machines with small amount of RAM (~8G)
      to be able to reboot in a very short period of time (<1s).  Yet, there
      is a userland state that is expensive to recreate (~2G).
      
      The solution is to boot machines with 2G preserved for persistent
      memory.
      
      Copy the state, and hotadd the persistent memory so machine still has
      all 8G available for runtime.  Before reboot, offline and hotremove
      device-dax 2G, copy the memory that is needed to be preserved to pmem0
      device, and reboot.
      
      The series of operations look like this:
      
      1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
         and free ramdisk.
      2. Convert raw pmem0 to devdax
         ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
      3. Hotadd to System RAM
         echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
         echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
         echo online_movable > /sys/devices/system/memoryXXX/state
      4. Before reboot hotremove device-dax memory from System RAM
         echo offline > /sys/devices/system/memoryXXX/state
         echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
      5. Create raw pmem0 device
         ndctl create-namespace --mode raw  -e namespace0.0 -f
      6. Copy the state that was stored by apps to ramdisk to pmem device
      7. Do kexec reboot or reboot through firmware if firmware does not
         zero memory in pmem0 region (These machines have only regular
         volatile memory). So to have pmem0 device either memmap kernel
         parameter is used, or devices nodes in dtb are specified.
      
      This patch (of 3):
      
      When add_memory() fails, the resource and the memory should be freed.
      
      Link: http://lkml.kernel.org/r/20190517215438.6487-2-pasha.tatashin@soleen.com
      Fixes: c221c0b0 ("device-dax: "Hotplug" persistent memory for use like normal RAM")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31e4ca92
  2. 05 Jul, 2019 1 commit
  3. 02 Jul, 2019 4 commits
  4. 21 Jun, 2019 1 commit
    • Vishal Verma's avatar
      device-dax: Add a 'resource' attribute · 40cdc60a
      Vishal Verma authored
      device-dax based devices were missing a 'resource' attribute to indicate
      the physical address range contributed by the device in question. This
      information is desirable to userspace tooling that may want to use the
      dax device as system-ram, and wants to selectively hotplug and online
      the memory blocks associated with a given device.
      
      Without this, the tooling would have to parse /proc/iomem for the memory
      ranges contributed by dax devices, which can be a workaround, but it is
      far easier to provide this information in the sysfs hierarchy.
      
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      40cdc60a
  5. 14 Jun, 2019 1 commit
  6. 05 Jun, 2019 1 commit
  7. 25 May, 2019 2 commits
    • David Howells's avatar
      vfs: Convert dax to use the new mount API · 75d4e06f
      David Howells authored
      Convert the dax filesystem to the new internal mount API as the old
      one will be obsoleted and removed.  This allows greater flexibility in
      communication of mount parameters between userspace, the VFS and the
      filesystem.
      
      See Documentation/filesystems/mount_api.txt for more information.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Dan Williams <dan.j.williams@intel.com>
      cc: Vishal Verma <vishal.l.verma@intel.com>
      cc: Keith Busch <keith.busch@intel.com>
      cc: Dave Jiang <dave.jiang@intel.com>
      cc: linux-nvdimm@lists.01.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      75d4e06f
    • Al Viro's avatar
      mount_pseudo(): drop 'name' argument, switch to d_make_root() · 1f58bb18
      Al Viro authored
      Once upon a time we used to set ->d_name of e.g. pipefs root
      so that d_path() on pipes would work.  These days it's
      completely pointless - dentries of pipes are not even connected
      to pipefs root.  However, mount_pseudo() had set the root
      dentry name (passed as the second argument) and callers
      kept inventing names to pass to it.  Including those that
      didn't *have* any non-root dentries to start with...
      
      All of that had been pointless for about 8 years now; it's
      time to get rid of that cargo-culting...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1f58bb18
  8. 21 May, 2019 2 commits
  9. 20 May, 2019 1 commit
    • Dan Williams's avatar
      dax: Arrange for dax_supported check to span multiple devices · 7bf7eac8
      Dan Williams authored
      Pankaj reports that starting with commit ad428cdb "dax: Check the
      end of the block-device capacity with dax_direct_access()" device-mapper
      no longer allows dax operation. This results from the stricter checks in
      __bdev_dax_supported() that validate that the start and end of a
      block-device map to the same 'pagemap' instance.
      
      Teach the dax-core and device-mapper to validate the 'pagemap' on a
      per-target basis. This is accomplished by refactoring the
      bdev_dax_supported() internals into generic_fsdax_supported() which
      takes a sector range to validate. Consequently generic_fsdax_supported()
      is suitable to be used in a device-mapper ->iterate_devices() callback.
      A new ->dax_supported() operation is added to allow composite devices to
      split and route upper-level bdev_dax_supported() requests.
      
      Fixes: ad428cdb ("dax: Check the end of the block-device...")
      Cc: <stable@vger.kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      7bf7eac8
  10. 14 May, 2019 1 commit
  11. 07 May, 2019 1 commit
  12. 02 May, 2019 1 commit
  13. 22 Apr, 2019 1 commit
  14. 28 Feb, 2019 2 commits
    • Dave Hansen's avatar
      device-dax: "Hotplug" persistent memory for use like normal RAM · c221c0b0
      Dave Hansen authored
      This is intended for use with NVDIMMs that are physically persistent
      (physically like flash) so that they can be used as a cost-effective
      RAM replacement.  Intel Optane DC persistent memory is one
      implementation of this kind of NVDIMM.
      
      Currently, a persistent memory region is "owned" by a device driver,
      either the "Direct DAX" or "Filesystem DAX" drivers.  These drivers
      allow applications to explicitly use persistent memory, generally
      by being modified to use special, new libraries. (DIMM-based
      persistent memory hardware/software is described in great detail
      here: Documentation/nvdimm/nvdimm.txt).
      
      However, this limits persistent memory use to applications which
      *have* been modified.  To make it more broadly usable, this driver
      "hotplugs" memory into the kernel, to be managed and used just like
      normal RAM would be.
      
      To make this work, management software must remove the device from
      being controlled by the "Device DAX" infrastructure:
      
      	echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
      
      and then tell the new driver that it can bind to the device:
      
      	echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
      
      After this, there will be a number of new memory sections visible
      in sysfs that can be onlined, or that may get onlined by existing
      udev-initiated memory hotplug rules.
      
      This rebinding procedure is currently a one-way trip.  Once memory
      is bound to "kmem", it's there permanently and can not be
      unbound and assigned back to device_dax.
      
      The kmem driver will never bind to a dax device unless the device
      is *explicitly* bound to the driver.  There are two reasons for
      this: One, since it is a one-way trip, it can not be undone if
      bound incorrectly.  Two, the kmem driver destroys data on the
      device.  Think of if you had good data on a pmem device.  It
      would be catastrophic if you compile-in "kmem", but leave out
      the "device_dax" driver.  kmem would take over the device and
      write volatile data all over your good data.
      
      This inherits any existing NUMA information for the newly-added
      memory from the persistent memory device that came from the
      firmware.  On Intel platforms, the firmware has guarantees that
      require each socket's persistent memory to be in a separate
      memory-only NUMA node.  That means that this patch is not expected
      to create NUMA nodes, but will simply hotplug memory into existing
      nodes.
      
      Because NUMA nodes are created, the existing NUMA APIs and tools
      are sufficient to create policies for applications or memory areas
      to have affinity for or an aversion to using this memory.
      
      There is currently some metadata at the beginning of pmem regions.
      The section-size memory hotplug restrictions, plus this small
      reserved area can cause the "loss" of a section or two of capacity.
      This should be fixable in follow-on patches.  But, as a first step,
      losing 256MB of memory (worst case) out of hundreds of gigabytes
      is a good tradeoff vs. the required code to fix this up precisely.
      This calculation is also the reason we export
      memory_block_size_bytes().
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: linux-nvdimm@lists.01.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c221c0b0
    • Vishal Verma's avatar
      device-dax: Add a 'modalias' attribute to DAX 'bus' devices · c347bd71
      Vishal Verma authored
      Add a 'modalias' attribute to devices under the DAX bus so that userspace
      is able to dynamically load modules as needed.
      
      Normally, udev can get the modalias from 'uevent', and that is correctly
      set up by the DAX bus. However other tooling such as 'libndctl' for
      interacting with drivers/nvdimm/, and 'libdaxctl' for drivers/dax/ can
      also use the modalias to dynamically load modules via libkmod lookups.
      
      The 'nd' bus set up by the libnvdimm subsystem exports a modalias
      attribute. Imitate this to export the same for the 'dax' bus.
      
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c347bd71
  15. 21 Feb, 2019 1 commit
  16. 20 Feb, 2019 1 commit
    • Dan Williams's avatar
      device-dax: Add a 'target_node' attribute · 21c75763
      Dan Williams authored
      The target-node attribute is the Linux numa-node that a device-dax
      instance may create when it is online. Prior to being online the
      device's 'numa_node' property reflects the closest online cpu node which
      is the typical expectation of a device 'numa_node'. Once it is online it
      becomes its own distinct numa node, i.e. 'target_node'.
      
      Export the 'target_node' property to give userspace tooling the ability
      to predict the effective numa-node from a device-dax instance configured
      to provide 'System RAM' capacity.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reported-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      21c75763
  17. 24 Jan, 2019 1 commit
  18. 07 Jan, 2019 9 commits
    • Dan Williams's avatar
      acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node · 8fc5c735
      Dan Williams authored
      Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
      Interface Table), is the first known instance of a memory range
      described by a unique "target" proximity domain. Where "initiator" and
      "target" proximity domains is an approach that the ACPI HMAT
      (Heterogeneous Memory Attributes Table) uses to described the unique
      performance properties of a memory range relative to a given initiator
      (e.g. CPU or DMA device).
      
      Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
      char-device follows the traditional notion of 'numa-node' where the
      attribute conveys the closest online numa-node. That numa-node attribute
      is useful for cpu-binding and memory-binding processes *near* the
      device. However, when the memory range backing a 'pmem', or 'dax' device
      is onlined (memory hot-add) the memory-only-numa-node representing that
      address needs to be differentiated from the set of online nodes. In
      other words, the numa-node association of the device depends on whether
      you can bind processes *near* the cpu-numa-node in the offline
      device-case, or bind process *on* the memory-range directly after the
      backing address range is onlined.
      
      Allow for the case that platform firmware describes persistent memory
      with a unique proximity domain, i.e. when it is distinct from the
      proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
      numa-node translation of that proximity through the libnvdimm region
      device to namespaces that are in device-dax mode. With this in place the
      proposed kmem driver [1] can optionally discover a unique numa-node
      number for the address range as it transitions the memory from an
      offline state managed by a device-driver to an online memory range
      managed by the core-mm.
      
      [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.comReported-by: default avatarFan Du <fan.du@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Oliver O'Halloran" <oohall@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      8fc5c735
    • Dan Williams's avatar
      device-dax: Add /sys/class/dax backwards compatibility · 730926c3
      Dan Williams authored
      On the expectation that some environments may not upgrade libdaxctl
      (userspace component that depends on the /sys/class/dax hierarchy),
      provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat
      driver implements the original /sys/class/dax sysfs layout rather than
      /sys/bus/dax. When userspace is upgraded it can blacklist this module
      and switch to the dax_pmem driver going forward.
      
      CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according
      to the dax_pmem entry in Documentation/ABI/obsolete/.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      730926c3
    • Dan Williams's avatar
      device-dax: Add support for a dax override driver · d200781e
      Dan Williams authored
      Introduce the 'new_id' concept for enabling a custom device-driver attach
      policy for dax-bus drivers. The intended use is to have a mechanism for
      hot-plugging device-dax ranges into the page allocator on-demand. With
      this in place the default policy of using device-dax for performance
      differentiated memory can be overridden by user-space policy that can
      arrange for the memory range to be managed as 'System RAM' with
      user-defined NUMA and other performance attributes.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      d200781e
    • Dan Williams's avatar
      device-dax: Move resource pinning+mapping into the common driver · 89ec9f2c
      Dan Williams authored
      Move the responsibility of calling devm_request_resource() and
      devm_memremap_pages() into the common device-dax driver. This is another
      preparatory step to allowing an alternate personality driver for a
      device-dax range.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      89ec9f2c
    • Dan Williams's avatar
      device-dax: Introduce bus + driver model · 9567da0b
      Dan Williams authored
      In support of multiple device-dax instances per device-dax-region and
      allowing the 'kmem' driver to attach to dax-instances instead of the
      current device-node access, convert the dax sub-system from a class to a
      bus. Recall that the kmem driver takes reserved / special purpose
      memories and assigns them to be managed by the core-mm.
      
      Aside from the fact the device-dax instances are registered and probed
      on a bus, two other lifetime-management changes are made:
      
      1/ Delay attaching a cdev until driver probe time
      
      2/ A new run_dax() helper is introduced to allow restoring dax-operation
         after a kill_dax() event. So, at driver ->probe() time we run_dax()
         and at ->remove() time we kill_dax() and invalidate all mappings.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      9567da0b
    • Dan Williams's avatar
      device-dax: Start defining a dax bus model · 51cf784c
      Dan Williams authored
      Towards eliminating the dax_class, move the dax-device-attribute
      enabling to a new bus.c file in the core. The amount of code
      thrash of sub-sequent patches is reduced as no logic changes are made,
      just pure code movement.
      
      A temporary export of unregister_dex_dax() and dax_attribute_groups is
      needed to preserve compilation, but those symbols become static again in
      a follow-on patch.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      51cf784c
    • Dan Williams's avatar
      device-dax: Remove multi-resource infrastructure · 753a0850
      Dan Williams authored
      The multi-resource implementation anticipated discontiguous sub-division
      support. That has not yet materialized, delete the infrastructure and
      related code.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      753a0850
    • Dan Williams's avatar
      device-dax: Kill dax_region base · 93694f96
      Dan Williams authored
      Nothing consumes this attribute of a region and devres otherwise
      remembers the value for de-allocation purposes.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      93694f96
    • Dan Williams's avatar
      device-dax: Kill dax_region ida · 21b9e979
      Dan Williams authored
      Commit bbb3be17 "device-dax: fix sysfs duplicate warnings" arranged
      for passing a dax instance-id to devm_create_dax_dev(), rather than
      generating one internally. Remove the dax_region ida and related code.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      21b9e979
  19. 28 Dec, 2018 1 commit
    • Dan Williams's avatar
      mm, devm_memremap_pages: fix shutdown handling · a95c90f1
      Dan Williams authored
      The last step before devm_memremap_pages() returns success is to allocate
      a release action, devm_memremap_pages_release(), to tear the entire setup
      down.  However, the result from devm_add_action() is not checked.
      
      Checking the error from devm_add_action() is not enough.  The api
      currently relies on the fact that the percpu_ref it is using is killed by
      the time the devm_memremap_pages_release() is run.  Rather than continue
      this awkward situation, offload the responsibility of killing the
      percpu_ref to devm_memremap_pages_release() directly.  This allows
      devm_memremap_pages() to do the right thing relative to init failures and
      shutdown.
      
      Without this change we could fail to register the teardown of
      devm_memremap_pages().  The likelihood of hitting this failure is tiny as
      small memory allocations almost always succeed.  However, the impact of
      the failure is large given any future reconfiguration, or disable/enable,
      of an nvdimm namespace will fail forever as subsequent calls to
      devm_memremap_pages() will fail to setup the pgmap_radix since there will
      be stale entries for the physical address range.
      
      An argument could be made to require that the ->kill() operation be set in
      the @pgmap arg rather than passed in separately.  However, it helps code
      readability, tracking the lifetime of a given instance, to be able to grep
      the kill routine directly at the devm_memremap_pages() call site.
      
      Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...")
      Reviewed-by: default avatar"Jérôme Glisse" <jglisse@redhat.com>
      Reported-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a95c90f1
  20. 22 Sep, 2018 1 commit
  21. 04 Sep, 2018 1 commit
  22. 17 Aug, 2018 1 commit
  23. 01 Aug, 2018 1 commit
    • Stefan Hajnoczi's avatar
      device-dax: avoid hang on error before devm_memremap_pages() · b7751410
      Stefan Hajnoczi authored
      dax_pmem_percpu_exit() waits for dax_pmem_percpu_release() to invoke the
      dax_pmem->cmp completion.  Unfortunately this approach to cleaning up
      the percpu_ref only works after devm_memremap_pages() was successful.
      
      If devm_add_action_or_reset() or devm_memremap_pages() fails,
      dax_pmem_percpu_release() is not invoked.  Therefore
      dax_pmem_percpu_exit() hangs waiting for the completion:
      
        rc = devm_add_action_or_reset(dev, dax_pmem_percpu_exit,
        				&dax_pmem->ref);
        if (rc)
        	return rc;
      
        dax_pmem->pgmap.ref = &dax_pmem->ref;
        addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
      
      Avoid the hang by calling percpu_ref_exit() in the error paths instead
      of going through dax_pmem_percpu_exit().
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      b7751410
  24. 30 Jul, 2018 1 commit
  25. 20 Jul, 2018 1 commit
    • Dan Williams's avatar
      device-dax: Set page->index · 35de2995
      Dan Williams authored
      In support of enabling memory_failure() handling for device-dax
      mappings, set ->index to the pgoff of the page. The rmap implementation
      requires ->index to bound the search through the vma interval tree.
      
      The ->index value is never cleared. There is no possibility for the
      page to become associated with another pgoff while the device is
      enabled. When the device is disabled the 'struct page' array for the
      device is destroyed and ->index is reinitialized to zero.
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      35de2995