1. 03 Nov, 2018 4 commits
    • Michal Hocko's avatar
      memory_hotplug: cond_resched in __remove_pages · dd33ad7b
      Michal Hocko authored
      We have received a bug report that unbinding a large pmem (>1TB) can
      result in a soft lockup:
      
        NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365]
        [...]
        Supported: Yes
        CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4
        Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
        task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000
        RIP: 0010:__put_page+0x62/0x80
        Call Trace:
         devm_memremap_pages_release+0x152/0x260
         release_nodes+0x18d/0x1d0
         device_release_driver_internal+0x160/0x210
         unbind_store+0xb3/0xe0
         kernfs_fop_write+0x102/0x180
         __vfs_write+0x26/0x150
         vfs_write+0xad/0x1a0
         SyS_write+0x42/0x90
         do_syscall_64+0x74/0x150
         entry_SYSCALL_64_after_hwframe+0x3d/0xa2
        RIP: 0033:0x7fd13166b3d0
      
      It has been reported on an older (4.12) kernel but the current upstream
      code doesn't cond_resched in the hot remove code at all and the given
      range to remove might be really large.  Fix the issue by calling
      cond_resched once per memory section.
      
      Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd33ad7b
    • Michal Hocko's avatar
      mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask · 89c83fb5
      Michal Hocko authored
      THP allocation mode is quite complex and it depends on the defrag mode.
      This complexity is hidden in alloc_hugepage_direct_gfpmask from a large
      part currently. The NUMA special casing (namely __GFP_THISNODE) is
      however independent and placed in alloc_pages_vma currently. This both
      adds an unnecessary branch to all vma based page allocation requests and
      it makes the code more complex unnecessarily as well. Not to mention
      that e.g. shmem THP used to do the node reclaiming unconditionally
      regardless of the defrag mode until recently. This was not only
      unexpected behavior but it was also hardly a good default behavior and I
      strongly suspect it was just a side effect of the code sharing more than
      a deliberate decision which suggests that such a layering is wrong.
      
      Get rid of the thp special casing from alloc_pages_vma and move the
      logic to alloc_hugepage_direct_gfpmask. __GFP_THISNODE is applied to the
      resulting gfp mask only when the direct reclaim is not requested and
      when there is no explicit numa binding to preserve the current logic.
      
      Please note that there's also a slight difference wrt MPOL_BIND now. The
      previous code would avoid using __GFP_THISNODE if the local node was
      outside of policy_nodemask(). After this patch __GFP_THISNODE is avoided
      for all MPOL_BIND policies. So there's a difference that if local node
      is actually allowed by the bind policy's nodemask, previously
      __GFP_THISNODE would be added, but now it won't be. From the behavior
      POV this is still correct because the policy nodemask is used.
      
      Link: http://lkml.kernel.org/r/20180925120326.24392-3-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89c83fb5
    • Andrea Arcangeli's avatar
      mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings · ac5b2c18
      Andrea Arcangeli authored
      THP allocation might be really disruptive when allocated on NUMA system
      with the local node full or hard to reclaim.  Stefan has posted an
      allocation stall report on 4.12 based SLES kernel which suggests the
      same issue:
      
        kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null)
        kvm cpuset=/ mems_allowed=0-1
        CPU: 10 PID: 84752 Comm: kvm Tainted: G        W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased)
        Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017
        Call Trace:
         dump_stack+0x5c/0x84
         warn_alloc+0xe0/0x180
         __alloc_pages_slowpath+0x820/0xc90
         __alloc_pages_nodemask+0x1cc/0x210
         alloc_pages_vma+0x1e5/0x280
         do_huge_pmd_wp_page+0x83f/0xf00
         __handle_mm_fault+0x93d/0x1060
         handle_mm_fault+0xc6/0x1b0
         __do_page_fault+0x230/0x430
         do_page_fault+0x2a/0x70
         page_fault+0x7b/0x80
         [...]
        Mem-Info:
        active_anon:126315487 inactive_anon:1612476 isolated_anon:5
         active_file:60183 inactive_file:245285 isolated_file:0
         unevictable:15657 dirty:286 writeback:1 unstable:0
         slab_reclaimable:75543 slab_unreclaimable:2509111
         mapped:81814 shmem:31764 pagetables:370616 bounce:0
         free:32294031 free_pcp:6233 free_cma:0
        Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
        Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
      
      The defrag mode is "madvise" and from the above report it is clear that
      the THP has been allocated for MADV_HUGEPAGA vma.
      
      Andrea has identified that the main source of the problem is
      __GFP_THISNODE usage:
      
      : The problem is that direct compaction combined with the NUMA
      : __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very
      : hard the local node, instead of failing the allocation if there's no
      : THP available in the local node.
      :
      : Such logic was ok until __GFP_THISNODE was added to the THP allocation
      : path even with MPOL_DEFAULT.
      :
      : The idea behind the __GFP_THISNODE addition, is that it is better to
      : provide local memory in PAGE_SIZE units than to use remote NUMA THP
      : backed memory. That largely depends on the remote latency though, on
      : threadrippers for example the overhead is relatively low in my
      : experience.
      :
      : The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in
      : extremely slow qemu startup with vfio, if the VM is larger than the
      : size of one host NUMA node. This is because it will try very hard to
      : unsuccessfully swapout get_user_pages pinned pages as result of the
      : __GFP_THISNODE being set, instead of falling back to PAGE_SIZE
      : allocations and instead of trying to allocate THP on other nodes (it
      : would be even worse without vfio type1 GUP pins of course, except it'd
      : be swapping heavily instead).
      
      Fix this by removing __GFP_THISNODE for THP requests which are
      requesting the direct reclaim.  This effectivelly reverts 5265047a
      on the grounds that the zone/node reclaim was known to be disruptive due
      to premature reclaim when there was memory free.  While it made sense at
      the time for HPC workloads without NUMA awareness on rare machines, it
      was ultimately harmful in the majority of cases.  The existing behaviour
      is similar, if not as widespare as it applies to a corner case but
      crucially, it cannot be tuned around like zone_reclaim_mode can.  The
      default behaviour should always be to cause the least harm for the
      common case.
      
      If there are specialised use cases out there that want zone_reclaim_mode
      in specific cases, then it can be built on top.  Longterm we should
      consider a memory policy which allows for the node reclaim like behavior
      for the specific memory ranges which would allow a
      
      [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com
      
      Mel said:
      
      : Both patches look correct to me but I'm responding to this one because
      : it's the fix.  The change makes sense and moves further away from the
      : severe stalling behaviour we used to see with both THP and zone reclaim
      : mode.
      :
      : I put together a basic experiment with usemem configured to reference a
      : buffer multiple times that is 80% the size of main memory on a 2-socket
      : box with symmetric node sizes and defrag set to "always".  The defrag
      : setting is not the default but it would be functionally similar to
      : accessing a buffer with madvise(MADV_HUGEPAGE).  Usemem is configured to
      : reference the buffer multiple times and while it's not an interesting
      : workload, it would be expected to complete reasonably quickly as it fits
      : within memory.  The results were;
      :
      : usemem
      :                                   vanilla           noreclaim-v1
      : Amean     Elapsd-1       42.78 (   0.00%)       26.87 (  37.18%)
      : Amean     Elapsd-3       27.55 (   0.00%)        7.44 (  73.00%)
      : Amean     Elapsd-4        5.72 (   0.00%)        5.69 (   0.45%)
      :
      : This shows the elapsed time in seconds for 1 thread, 3 threads and 4
      : threads referencing buffers 80% the size of memory.  With the patches
      : applied, it's 37.18% faster for the single thread and 73% faster with two
      : threads.  Note that 4 threads showing little difference does not indicate
      : the problem is related to thread counts.  It's simply the case that 4
      : threads gets spread so their workload mostly fits in one node.
      :
      : The overall view from /proc/vmstats is more startling
      :
      :                          4.19.0-rc1  4.19.0-rc1
      :                             vanillanoreclaim-v1r1
      : Minor Faults               35593425      708164
      : Major Faults                 484088          36
      : Swap Ins                    3772837           0
      : Swap Outs                   3932295           0
      :
      : Massive amounts of swap in/out without the patch
      :
      : Direct pages scanned        6013214           0
      : Kswapd pages scanned              0           0
      : Kswapd pages reclaimed            0           0
      : Direct pages reclaimed      4033009           0
      :
      : Lots of reclaim activity without the patch
      :
      : Kswapd efficiency              100%        100%
      : Kswapd velocity               0.000       0.000
      : Direct efficiency               67%        100%
      : Direct velocity           11191.956       0.000
      :
      : Mostly from direct reclaim context as you'd expect without the patch.
      :
      : Page writes by reclaim  3932314.000       0.000
      : Page writes file                 19           0
      : Page writes anon            3932295           0
      : Page reclaim immediate        42336           0
      :
      : Writes from reclaim context is never good but the patch eliminates it.
      :
      : We should never have default behaviour to thrash the system for such a
      : basic workload.  If zone reclaim mode behaviour is ever desired but on a
      : single task instead of a global basis then the sensible option is to build
      : a mempolicy that enforces that behaviour.
      
      This was a severe regression compared to previous kernels that made
      important workloads unusable and it starts when __GFP_THISNODE was
      added to THP allocations under MADV_HUGEPAGE.  It is not a significant
      risk to go to the previous behavior before __GFP_THISNODE was added, it
      worked like that for years.
      
      This was simply an optimization to some lucky workloads that can fit in
      a single node, but it ended up breaking the VM for others that can't
      possibly fit in a single node, so going back is safe.
      
      [mhocko@suse.com: rewrote the changelog based on the one from Andrea]
      Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org
      Fixes: 5265047a ("mm, thp: really limit transparent hugepage allocation to local node")
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
      Debugged-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: <stable@vger.kernel.org>	[4.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac5b2c18
    • Roman Gushchin's avatar
      mm: handle no memcg case in memcg_kmem_charge() properly · e68599a3
      Roman Gushchin authored
      Mike Galbraith reported a regression caused by the commit 9b6f7e16
      ("mm: rework memcg kernel stack accounting") on a system with
      "cgroup_disable=memory" boot option: the system panics with the following
      stack trace:
      
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
        PGD 0 P4D 0
        Oops: 0002 [#1] PREEMPT SMP PTI
        CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fed4
        RIP: 0010:page_counter_try_charge+0x22/0xc0
        Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49
        Call Trace:
         try_charge+0xcb/0x780
         memcg_kmem_charge_memcg+0x28/0x80
         memcg_kmem_charge+0x8b/0x1d0
         copy_process.part.41+0x1ca/0x2070
         _do_fork+0xd7/0x3d0
         do_syscall_64+0x5a/0x180
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The problem occurs because get_mem_cgroup_from_current() returns the NULL
      pointer if memory controller is disabled.  Let's check if this is a case
      at the beginning of memcg_kmem_charge() and just return 0 if
      mem_cgroup_disabled() returns true.  This is how we handle this case in
      many other places in the memory controller code.
      
      Link: http://lkml.kernel.org/r/20181029215123.17830-1-guro@fb.com
      Fixes: 9b6f7e16 ("mm: rework memcg kernel stack accounting")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e68599a3
  2. 02 Nov, 2018 1 commit
  3. 31 Oct, 2018 29 commits
    • Fengguang Wu's avatar
      mm/gup.c: fix __get_user_pages_fast() comment · 2ebe8228
      Fengguang Wu authored
      mmu_gather_tlb() no longer exists.  Replace with mmu_table_batch().
      
      Link: http://lkml.kernel.org/r/20180928053441.rpzwafzlsnp74mkl@wfg-t540p.sh.intel.comSigned-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Jiri Kosina <trivial@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ebe8228
    • Jan Kara's avatar
      mm: Fix warning in insert_pfn() · f2c57d91
      Jan Kara authored
      In DAX mode a write pagefault can race with write(2) in the following
      way:
      
      CPU0                            CPU1
                                      write fault for mapped zero page (hole)
      dax_iomap_rw()
        iomap_apply()
          xfs_file_iomap_begin()
            - allocates blocks
          dax_iomap_actor()
            invalidate_inode_pages2_range()
              - invalidates radix tree entries in given range
                                      dax_iomap_pte_fault()
                                        grab_mapping_entry()
                                          - no entry found, creates empty
                                        ...
                                        xfs_file_iomap_begin()
                                          - finds already allocated block
                                        ...
                                        vmf_insert_mixed_mkwrite()
                                          - WARNs and does nothing because there
                                            is still zero page mapped in PTE
              unmap_mapping_pages()
      
      This race results in WARN_ON from insert_pfn() and is occasionally
      triggered by fstest generic/344. Note that the race is otherwise
      harmless as before write(2) on CPU0 is finished, we will invalidate page
      tables properly and thus user of mmap will see modified data from
      write(2) from that point on. So just restrict the warning only to the
      case when the PFN in PTE is not zero page.
      
      Link: http://lkml.kernel.org/r/20180824154542.26872-1-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2c57d91
    • David Hildenbrand's avatar
      mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock · 381eab4a
      David Hildenbrand authored
      There seem to be some problems as result of 30467e0b ("mm, hotplug:
      fix concurrent memory hot-add deadlock"), which tried to fix a possible
      lock inversion reported and discussed in [1] due to the two locks
      	a) device_lock()
      	b) mem_hotplug_lock
      
      While add_memory() first takes b), followed by a) during
      bus_probe_device(), onlining of memory from user space first took a),
      followed by b), exposing a possible deadlock.
      
      In [1], and it was decided to not make use of device_hotplug_lock, but
      rather to enforce a locking order.
      
      The problems I spotted related to this:
      
      1. Memory block device attributes: While .state first calls
         mem_hotplug_begin() and the calls device_online() - which takes
         device_lock() - .online does no longer call mem_hotplug_begin(), so
         effectively calls online_pages() without mem_hotplug_lock.
      
      2. device_online() should be called under device_hotplug_lock, however
         onlining memory during add_memory() does not take care of that.
      
      In addition, I think there is also something wrong about the locking in
      
      3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
         without locks. This was introduced after 30467e0b. And skimming over
         the code, I assume it could need some more care in regards to locking
         (e.g. device_online() called without device_hotplug_lock. This will
         be addressed in the following patches.
      
      Now that we hold the device_hotplug_lock when
      - adding memory (e.g. via add_memory()/add_memory_resource())
      - removing memory (e.g. via remove_memory())
      - device_online()/device_offline()
      
      We can move mem_hotplug_lock usage back into
      online_pages()/offline_pages().
      
      Why is mem_hotplug_lock still needed? Essentially to make
      get_online_mems()/put_online_mems() be very fast (relying on
      device_hotplug_lock would be very slow), and to serialize against
      addition of memory that does not create memory block devices (hmm).
      
      [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
          2015-February/065324.html
      
      This patch is partly based on a patch by Vitaly Kuznetsov.
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      381eab4a
    • David Hildenbrand's avatar
      mm/memory_hotplug: make add_memory() take the device_hotplug_lock · 8df1d0e4
      David Hildenbrand authored
      add_memory() currently does not take the device_hotplug_lock, however
      is aleady called under the lock from
      	arch/powerpc/platforms/pseries/hotplug-memory.c
      	drivers/acpi/acpi_memhotplug.c
      to synchronize against CPU hot-remove and similar.
      
      In general, we should hold the device_hotplug_lock when adding memory to
      synchronize against online/offline request (e.g.  from user space) - which
      already resulted in lock inversions due to device_lock() and
      mem_hotplug_lock - see 30467e0b ("mm, hotplug: fix concurrent memory
      hot-add deadlock").  add_memory()/add_memory_resource() will create memory
      block devices, so this really feels like the right thing to do.
      
      Holding the device_hotplug_lock makes sure that a memory block device
      can really only be accessed (e.g. via .online/.state) from user space,
      once the memory has been fully added to the system.
      
      The lock is not held yet in
      	drivers/xen/balloon.c
      	arch/powerpc/platforms/powernv/memtrace.c
      	drivers/s390/char/sclp_cmd.c
      	drivers/hv/hv_balloon.c
      So, let's either use the locked variants or take the lock.
      
      Don't export add_memory_resource(), as it once was exported to be used by
      XEN, which is never built as a module.  If somebody requires it, we also
      have to export a locked variant (as device_hotplug_lock is never
      exported).
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8df1d0e4
    • David Hildenbrand's avatar
      mm/memory_hotplug: make remove_memory() take the device_hotplug_lock · d15e5926
      David Hildenbrand authored
      Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.
      
      Reading through the code and studying how mem_hotplug_lock is to be used,
      I noticed that there are two places where we can end up calling
      device_online()/device_offline() - online_pages()/offline_pages() without
      the mem_hotplug_lock.  And there are other places where we call
      device_online()/device_offline() without the device_hotplug_lock.
      
      While e.g.
      	echo "online" > /sys/devices/system/memory/memory9/state
      is fine, e.g.
      	echo 1 > /sys/devices/system/memory/memory9/online
      Will not take the mem_hotplug_lock. However the device_lock() and
      device_hotplug_lock.
      
      E.g.  via memory_probe_store(), we can end up calling
      add_memory()->online_pages() without the device_hotplug_lock.  So we can
      have concurrent callers in online_pages().  We e.g.  touch in
      online_pages() basically unprotected zone->present_pages then.
      
      Looks like there is a longer history to that (see Patch #2 for details),
      and fixing it to work the way it was intended is not really possible.  We
      would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
      sounds wrong.
      
      Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
      More details can be found in patch 3 and patch 6.
      
      I propose the general rules (documentation added in patch 6):
      
      1. add_memory/add_memory_resource() must only be called with
         device_hotplug_lock.
      2. remove_memory() must only be called with device_hotplug_lock. This is
         already documented and holds for all callers.
      3. device_online()/device_offline() must only be called with
         device_hotplug_lock. This is already documented and true for now in core
         code. Other callers (related to memory hotplug) have to be fixed up.
      4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
         online_pages/offline_pages.
      
      To me, this looks way cleaner than what we have right now (and easier to
      verify).  And looking at the documentation of remove_memory, using
      lock_device_hotplug also for add_memory() feels natural.
      
      This patch (of 6):
      
      remove_memory() is exported right now but requires the
      device_hotplug_lock, which is not exported.  So let's provide a variant
      that takes the lock and only export that one.
      
      The lock is already held in
      	arch/powerpc/platforms/pseries/hotplug-memory.c
      	drivers/acpi/acpi_memhotplug.c
      	arch/powerpc/platforms/powernv/memtrace.c
      
      Apart from that, there are not other users in the tree.
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d15e5926
    • Mike Rapoport's avatar
      mm/memblock.c: warn if zero alignment was requested · 2f770806
      Mike Rapoport authored
      After updating all memblock users to explicitly specify SMP_CACHE_BYTES
      alignment rather than use 0, it is still possible that uncovered users may
      sneak in.  Add a WARN_ON_ONCE for such cases.
      
      [sfr@canb.auug.org.au: use dump_stack() instead of WARN_ON_ONCE for the alignment checks]
        Link: http://lkml.kernel.org/r/20181016131927.6ceba6ab@canb.auug.org.au
      [akpm@linux-foundation.org: add apologetic comment]
      Link: http://lkml.kernel.org/r/20181011060850.GA19822@rapoport-lnxSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f770806
    • Mike Rapoport's avatar
      memblock: stop using implicit alignment to SMP_CACHE_BYTES · 7e1c4e27
      Mike Rapoport authored
      When a memblock allocation APIs are called with align = 0, the alignment
      is implicitly set to SMP_CACHE_BYTES.
      
      Implicit alignment is done deep in the memblock allocator and it can
      come as a surprise.  Not that such an alignment would be wrong even
      when used incorrectly but it is better to be explicit for the sake of
      clarity and the prinicple of the least surprise.
      
      Replace all such uses of memblock APIs with the 'align' parameter
      explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
      in the memblock internal allocation functions.
      
      For the case when memblock APIs are used via helper functions, e.g.  like
      iommu_arena_new_node() in Alpha, the helper functions were detected with
      Coccinelle's help and then manually examined and updated where
      appropriate.
      
      The direct memblock APIs users were updated using the semantic patch below:
      
      @@
      expression size, min_addr, max_addr, nid;
      @@
      (
      |
      - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
      nid)
      |
      - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
      nid)
      |
      - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
      |
      - memblock_alloc(size, 0)
      + memblock_alloc(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_raw(size, 0)
      + memblock_alloc_raw(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_from(size, 0, min_addr)
      + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
      |
      - memblock_alloc_nopanic(size, 0)
      + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_low(size, 0)
      + memblock_alloc_low(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_low_nopanic(size, 0)
      + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_from_nopanic(size, 0, min_addr)
      + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
      |
      - memblock_alloc_node(size, 0, nid)
      + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
      )
      
      [mhocko@suse.com: changelog update]
      [akpm@linux-foundation.org: coding-style fixes]
      [rppt@linux.ibm.com: fix missed uses of implicit alignment]
        Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
      Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e1c4e27
    • Mike Rapoport's avatar
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport authored
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
    • Mike Rapoport's avatar
      memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants · 97ad1087
      Mike Rapoport authored
      Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of
      identical MEMBLOCK definitions.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-29-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97ad1087
    • Mike Rapoport's avatar
      mm: remove nobootmem · bda49a81
      Mike Rapoport authored
      Move a few remaining functions from nobootmem.c to memblock.c and remove
      nobootmem
      
      Link: http://lkml.kernel.org/r/1536927045-23536-28-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bda49a81
    • Mike Rapoport's avatar
      memblock: rename __free_pages_bootmem to memblock_free_pages · 7c2ee349
      Mike Rapoport authored
      The conversion is done using
      
      sed -i 's@__free_pages_bootmem@memblock_free_pages@' \
          $(git grep -l __free_pages_bootmem)
      
      Link: http://lkml.kernel.org/r/1536927045-23536-27-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c2ee349
    • Mike Rapoport's avatar
      memblock: rename free_all_bootmem to memblock_free_all · c6ffc5ca
      Mike Rapoport authored
      The conversion is done using
      
      sed -i 's@free_all_bootmem@memblock_free_all@' \
          $(git grep -l free_all_bootmem)
      
      Link: http://lkml.kernel.org/r/1536927045-23536-26-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6ffc5ca
    • Mike Rapoport's avatar
      memblock: replace free_bootmem_late with memblock_free_late · 53ab85eb
      Mike Rapoport authored
      The free_bootmem_late and memblock_free_late do exactly the same thing:
      they iterate over a range and give pages to the page allocator.
      
      Replace calls to free_bootmem_late with calls to memblock_free_late and
      remove the bootmem variant.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-25-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53ab85eb
    • Mike Rapoport's avatar
      memblock: replace free_bootmem{_node} with memblock_free · 2013288f
      Mike Rapoport authored
      The free_bootmem and free_bootmem_node are merely wrappers for
      memblock_free. Replace their usage with a call to memblock_free using the
      following semantic patch:
      
      @@
      expression e1, e2, e3;
      @@
      (
      - free_bootmem(e1, e2)
      + memblock_free(e1, e2)
      |
      - free_bootmem_node(e1, e2, e3)
      + memblock_free(e2, e3)
      )
      
      Link: http://lkml.kernel.org/r/1536927045-23536-24-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2013288f
    • Mike Rapoport's avatar
      mm: nobootmem: remove bootmem allocation APIs · 6c7835f8
      Mike Rapoport authored
      The bootmem compatibility APIs are not used and can be removed.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-23-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c7835f8
    • Mike Rapoport's avatar
      memblock: add align parameter to memblock_alloc_node() · 3913c8f9
      Mike Rapoport authored
      With the align parameter memblock_alloc_node() can be used as drop in
      replacement for alloc_bootmem_pages_node() and __alloc_bootmem_node(),
      which is done in the following patches.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-15-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3913c8f9
    • Mike Rapoport's avatar
      memblock: remove _virt from APIs returning virtual address · eb31d559
      Mike Rapoport authored
      The conversion is done using
      
      sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
      	$(git grep -l memblock_virt_alloc)
      
      Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb31d559
    • Mike Rapoport's avatar
      memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc* · 9a8dd708
      Mike Rapoport authored
      Make it explicit that the caller gets a physical address rather than a
      virtual one.
      
      This will also allow using meblock_alloc prefix for memblock allocations
      returning virtual address, which is done in the following patches.
      
      The conversion is done using the following semantic patch:
      
      @@
      expression e1, e2, e3;
      @@
      (
      - memblock_alloc(e1, e2)
      + memblock_phys_alloc(e1, e2)
      |
      - memblock_alloc_nid(e1, e2, e3)
      + memblock_phys_alloc_nid(e1, e2, e3)
      |
      - memblock_alloc_try_nid(e1, e2, e3)
      + memblock_phys_alloc_try_nid(e1, e2, e3)
      )
      
      Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a8dd708
    • Mike Rapoport's avatar
      mm: nobootmem: remove dead code · b146ada2
      Mike Rapoport authored
      Several bootmem functions and macros are not used. Remove them.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-6-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b146ada2
    • Mike Rapoport's avatar
      mm: remove bootmem allocator implementation. · 355c45af
      Mike Rapoport authored
      All architectures have been converted to use MEMBLOCK + NO_BOOTMEM. The
      bootmem allocator implementation can be removed.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-5-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      355c45af
    • Mike Rapoport's avatar
      mm: remove CONFIG_HAVE_MEMBLOCK · aca52c39
      Mike Rapoport authored
      All architecures use memblock for early memory management. There is no need
      for the CONFIG_HAVE_MEMBLOCK configuration option.
      
      [rppt@linux.vnet.ibm.com: of/fdt: fixup #ifdefs]
        Link: http://lkml.kernel.org/r/20180919103457.GA20545@rapoport-lnx
      [rppt@linux.vnet.ibm.com: csky: fixups after bootmem removal]
        Link: http://lkml.kernel.org/r/20180926112744.GC4628@rapoport-lnx
      [rppt@linux.vnet.ibm.com: remove stale #else and the code it protects]
        Link: http://lkml.kernel.org/r/1538067825-24835-1-git-send-email-rppt@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/1536927045-23536-4-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: default avatarJonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aca52c39
    • Mike Rapoport's avatar
      mm: remove CONFIG_NO_BOOTMEM · b4a991ec
      Mike Rapoport authored
      All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any
      kernel configuration and therefore it can be removed.
      
      [alexander.h.duyck@linux.intel.com: remove now defunct NO_BOOTMEM from depends list for deferred init]
        Link: http://lkml.kernel.org/r/20180925201814.3576.15105.stgit@localhost.localdomain
      Link: http://lkml.kernel.org/r/1536927045-23536-3-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4a991ec
    • Dan Carpenter's avatar
      mm/gup_benchmark.c: prevent integer overflow in ioctl · 4b408c74
      Dan Carpenter authored
      The concern here is that "gup->size" is a u64 and "nr_pages" is unsigned
      long.  On 32 bit systems we could trick the kernel into allocating fewer
      pages than expected.
      
      Link: http://lkml.kernel.org/r/20181025061546.hnhkv33diogf2uis@kili.mountain
      Fixes: 64c349f4 ("mm: add infrastructure for get_user_pages_fast() benchmarking")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b408c74
    • Jérôme Glisse's avatar
      mm/hmm: invalidate device page table at start of invalidation · ec131b2d
      Jérôme Glisse authored
      Invalidate device page table at start of invalidation and invalidate in
      progress CPU page table snapshooting at both start and end of any
      invalidation.
      
      This is helpful when device need to dirty page because the device page
      table report the page as dirty.  Dirtying page must happen in the start
      mmu notifier callback and not in the end one.
      
      Link: http://lkml.kernel.org/r/20181019160442.18723-7-jglisse@redhat.comSigned-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ec131b2d
    • Jérôme Glisse's avatar
      mm/hmm: use a structure for update callback parameters · 44532d4c
      Jérôme Glisse authored
      Use a structure to gather all the parameters for the update callback.
      This make it easier when adding new parameters by avoiding having to
      update all callback function signature.
      
      The hmm_update structure is always associated with a mmu_notifier
      callbacks so we are not planing on grouping multiple updates together.
      Nor do we care about page size for the range as range will over fully
      cover the page being invalidated (this is a mmu_notifier property).
      
      Link: http://lkml.kernel.org/r/20181019160442.18723-6-jglisse@redhat.comSigned-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44532d4c
    • Jérôme Glisse's avatar
      mm/hmm: properly handle migration pmd · d08faca0
      Jérôme Glisse authored
      Before this patch migration pmd entry (!pmd_present()) would have been
      treated as a bad entry (pmd_bad() returns true on migration pmd entry).
      The outcome was that device driver would believe that the range covered by
      the pmd was bad and would either SIGBUS or simply kill all the device's
      threads (each device driver decide how to react when the device tries to
      access poisonnous or invalid range of memory).
      
      This patch explicitly handle the case of migration pmd entry which are non
      present pmd entry and either wait for the migration to finish or report
      empty range (when device is just trying to pre- fill a range of virtual
      address and thus do not want to wait or trigger page fault).
      
      Link: http://lkml.kernel.org/r/20181019160442.18723-5-jglisse@redhat.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d08faca0
    • Ralph Campbell's avatar
      mm/hmm: fix race between hmm_mirror_unregister() and mmu_notifier callback · 86a2d598
      Ralph Campbell authored
      In hmm_mirror_unregister(), mm->hmm is set to NULL and then
      mmu_notifier_unregister_no_release() is called.  That creates a small
      window where mmu_notifier can call mmu_notifier_ops with mm->hmm equal to
      NULL.  Fix this by first unregistering mmu notifier callbacks and then
      setting mm->hmm to NULL.
      
      Similarly in hmm_register(), set mm->hmm before registering mmu_notifier
      callbacks so callback functions always see mm->hmm set.
      
      Link: http://lkml.kernel.org/r/20181019160442.18723-4-jglisse@redhat.comSigned-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86a2d598
    • Ralph Campbell's avatar
      mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly · aab8d052
      Ralph Campbell authored
      Private ZONE_DEVICE pages use a special pte entry and thus are not
      present.  Properly handle this case in map_pte(), it is already handled in
      check_pte(), the map_pte() part was lost in some rebase most probably.
      
      Without this patch the slow migration path can not migrate back to any
      private ZONE_DEVICE memory to regular memory.  This was found after stress
      testing migration back to system memory.  This ultimatly can lead to the
      CPU constantly page fault looping on the special swap entry.
      
      Link: http://lkml.kernel.org/r/20181019160442.18723-3-jglisse@redhat.comSigned-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aab8d052
    • Jérôme Glisse's avatar
      mm/hmm: fix utf8 ... · f813f219
      Jérôme Glisse authored
      Patch series "HMM updates, improvements and fixes", v2
      
      Few fixes that only affect HMM users.  Improve the synchronization call
      back so that we match was other mmu_notifier listener do and add proper
      support to the new blockable flags in the process.
      
      For curious folks here are branches to leverage HMM in various existing
      device drivers:
      
      https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-nouveau-v01
      https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-radeon-v00
      https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-intel-v00
      
      More to come (amd gpu, Mellanox, ...)
      
      I expect more of the preparatory work for nouveau will be merge in 4.20
      (like we have been doing since 4.16) and i will wait until this patchset
      is upstream before pushing the patches that actualy make use of HMM (to
      avoid complex tree inter-dependency).
      
      This patch (of 6):
      
      Somehow utf=8 must have been broken.
      
      Link: http://lkml.kernel.org/r/20181019160442.18723-2-jglisse@redhat.comSigned-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f813f219
  4. 29 Oct, 2018 5 commits
  5. 26 Oct, 2018 1 commit
    • Mike Kravetz's avatar
      hugetlbfs: dirty pages as they are added to pagecache · 22146c3c
      Mike Kravetz authored
      Some test systems were experiencing negative huge page reserve counts and
      incorrect file block counts.  This was traced to /proc/sys/vm/drop_caches
      removing clean pages from hugetlbfs file pagecaches.  When non-hugetlbfs
      explicit code removes the pages, the appropriate accounting is not
      performed.
      
      This can be recreated as follows:
       fallocate -l 2M /dev/hugepages/foo
       echo 1 > /proc/sys/vm/drop_caches
       fallocate -l 2M /dev/hugepages/foo
       grep -i huge /proc/meminfo
         AnonHugePages:         0 kB
         ShmemHugePages:        0 kB
         HugePages_Total:    2048
         HugePages_Free:     2047
         HugePages_Rsvd:    18446744073709551615
         HugePages_Surp:        0
         Hugepagesize:       2048 kB
         Hugetlb:         4194304 kB
       ls -lsh /dev/hugepages/foo
         4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
      
      To address this issue, dirty pages as they are added to pagecache.  This
      can easily be reproduced with fallocate as shown above.  Read faulted
      pages will eventually end up being marked dirty.  But there is a window
      where they are clean and could be impacted by code such as drop_caches.
      So, just dirty them all as they are added to the pagecache.
      
      Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com
      Fixes: 6bda666a ("hugepages: fold find_or_alloc_pages into huge_no_page()")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMihcla Hocko <mhocko@suse.com>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22146c3c