Commit 02bf9bc1 authored by Stephen Rothwell's avatar Stephen Rothwell

Merge branch 'akpm-current/current'

parents 3c1c6ab5 f531bf6f
......@@ -1813,6 +1813,9 @@
ip= [IP_PNP]
See Documentation/filesystems/nfs/nfsroot.txt.
ipcmni_extend [KNL] Extend the maximum number of unique System V
IPC identifiers from 32,768 to 8,388,608.
irqaffinity= [SMP] Set the default irq affinity mask
The argument is a cpu list, as described above.
......
......@@ -75,9 +75,10 @@ number of times a page is mapped.
20. NOPAGE
21. KSM
22. THP
23. BALLOON
23. OFFLINE
24. ZERO_PAGE
25. IDLE
26. PGTABLE
* ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the
memory cgroup each page is charged to, indexed by PFN. Only available when
......@@ -118,8 +119,8 @@ Short descriptions to the page flags
identical memory pages dynamically shared between one or more processes
22 - THP
contiguous pages which construct transparent hugepages
23 - BALLOON
balloon compaction page
23 - OFFLINE
page is logically offline
24 - ZERO_PAGE
zero page for pfn_zero or huge_zero page
25 - IDLE
......@@ -128,6 +129,8 @@ Short descriptions to the page flags
Note that this flag may be stale in case the page was accessed via
a PTE. To make sure the flag is up-to-date one has to read
``/sys/kernel/mm/page_idle/bitmap`` first.
26 - PGTABLE
page is in use as a page table
IO related page flags
---------------------
......
===================================
Using flexible arrays in the kernel
===================================
Large contiguous memory allocations can be unreliable in the Linux kernel.
Kernel programmers will sometimes respond to this problem by allocating
pages with :c:func:`vmalloc()`. This solution not ideal, though. On 32-bit
systems, memory from vmalloc() must be mapped into a relatively small address
space; it's easy to run out. On SMP systems, the page table changes required
by vmalloc() allocations can require expensive cross-processor interrupts on
all CPUs. And, on all systems, use of space in the vmalloc() range increases
pressure on the translation lookaside buffer (TLB), reducing the performance
of the system.
In many cases, the need for memory from vmalloc() can be eliminated by piecing
together an array from smaller parts; the flexible array library exists to make
this task easier.
A flexible array holds an arbitrary (within limits) number of fixed-sized
objects, accessed via an integer index. Sparse arrays are handled
reasonably well. Only single-page allocations are made, so memory
allocation failures should be relatively rare. The down sides are that the
arrays cannot be indexed directly, individual object size cannot exceed the
system page size, and putting data into a flexible array requires a copy
operation. It's also worth noting that flexible arrays do no internal
locking at all; if concurrent access to an array is possible, then the
caller must arrange for appropriate mutual exclusion.
The creation of a flexible array is done with :c:func:`flex_array_alloc()`::
#include <linux/flex_array.h>
struct flex_array *flex_array_alloc(int element_size,
unsigned int total,
gfp_t flags);
The individual object size is provided by ``element_size``, while total is the
maximum number of objects which can be stored in the array. The flags
argument is passed directly to the internal memory allocation calls. With
the current code, using flags to ask for high memory is likely to lead to
notably unpleasant side effects.
It is also possible to define flexible arrays at compile time with::
DEFINE_FLEX_ARRAY(name, element_size, total);
This macro will result in a definition of an array with the given name; the
element size and total will be checked for validity at compile time.
Storing data into a flexible array is accomplished with a call to
:c:func:`flex_array_put()`::
int flex_array_put(struct flex_array *array, unsigned int element_nr,
void *src, gfp_t flags);
This call will copy the data from src into the array, in the position
indicated by ``element_nr`` (which must be less than the maximum specified when
the array was created). If any memory allocations must be performed, flags
will be used. The return value is zero on success, a negative error code
otherwise.
There might possibly be a need to store data into a flexible array while
running in some sort of atomic context; in this situation, sleeping in the
memory allocator would be a bad thing. That can be avoided by using
``GFP_ATOMIC`` for the flags value, but, often, there is a better way. The
trick is to ensure that any needed memory allocations are done before
entering atomic context, using :c:func:`flex_array_prealloc()`::
int flex_array_prealloc(struct flex_array *array, unsigned int start,
unsigned int nr_elements, gfp_t flags);
This function will ensure that memory for the elements indexed in the range
defined by ``start`` and ``nr_elements`` has been allocated. Thereafter, a
``flex_array_put()`` call on an element in that range is guaranteed not to
block.
Getting data back out of the array is done with :c:func:`flex_array_get()`::
void *flex_array_get(struct flex_array *fa, unsigned int element_nr);
The return value is a pointer to the data element, or NULL if that
particular element has never been allocated.
Note that it is possible to get back a valid pointer for an element which
has never been stored in the array. Memory for array elements is allocated
one page at a time; a single allocation could provide memory for several
adjacent elements. Flexible array elements are normally initialized to the
value ``FLEX_ARRAY_FREE`` (defined as 0x6c in <linux/poison.h>), so errors
involving that number probably result from use of unstored array entries.
Note that, if array elements are allocated with ``__GFP_ZERO``, they will be
initialized to zero and this poisoning will not happen.
Individual elements in the array can be cleared with
:c:func:`flex_array_clear()`::
int flex_array_clear(struct flex_array *array, unsigned int element_nr);
This function will set the given element to ``FLEX_ARRAY_FREE`` and return
zero. If storage for the indicated element is not allocated for the array,
``flex_array_clear()`` will return ``-EINVAL`` instead. Note that clearing an
element does not release the storage associated with it; to reduce the
allocated size of an array, call :c:func:`flex_array_shrink()`::
int flex_array_shrink(struct flex_array *array);
The return value will be the number of pages of memory actually freed.
This function works by scanning the array for pages containing nothing but
``FLEX_ARRAY_FREE`` bytes, so (1) it can be expensive, and (2) it will not work
if the array's pages are allocated with ``__GFP_ZERO``.
It is possible to remove all elements of an array with a call to
:c:func:`flex_array_free_parts()`::
void flex_array_free_parts(struct flex_array *array);
This call frees all elements, but leaves the array itself in place.
Freeing the entire array is done with :c:func:`flex_array_free()`::
void flex_array_free(struct flex_array *array);
As of this writing, there are no users of flexible arrays in the mainline
kernel. The functions described here are also not exported to modules;
that will probably be fixed when somebody comes up with a need for it.
Flexible array functions
------------------------
.. kernel-doc:: include/linux/flex_array.h
......@@ -129,7 +129,7 @@ writing of special-purpose memory allocators in the future.
:functions: gen_pool_for_each_chunk
.. kernel-doc:: lib/genalloc.c
:functions: addr_in_gen_pool
:functions: gen_pool_has_addr
.. kernel-doc:: lib/genalloc.c
:functions: gen_pool_avail
......
=================================
Generic radix trees/sparse arrays
=================================
.. kernel-doc:: include/linux/generic-radix-tree.h
:doc: Generic radix trees/sparse arrays
generic radix tree functions
----------------------------
.. kernel-doc:: include/linux/generic-radix-tree.h
:functions:
......@@ -28,6 +28,7 @@ Core utilities
errseq
printk-formats
circular-buffers
generic-radix-tree
memory-allocation
mm-api
gfp_mask-from-fs-io
......
......@@ -508,9 +508,14 @@ manner. The codes are the following:
sd - soft-dirty flag
mm - mixed map area
hg - huge page advise flag
nh - no-huge page advise flag
nh - no-huge page advise flag [*]
mg - mergable advise flag
[*] A process mapping may be advised to not be backed by transparent hugepages
by either madvise(MADV_NOHUGEPAGE) or prctl(PR_SET_THP_DISABLE). See
Documentation/admin-guide/mm/transhuge.rst for system-wide and process
mapping policies.
Note that there is no guarantee that every flag and associated mnemonic will
be present in all further kernel releases. Things get changed, the flags may
be vanished or the reverse -- new added. Interpretation of their meaning
......
===================================
Using flexible arrays in the kernel
===================================
:Updated: Last updated for 2.6.32
:Author: Jonathan Corbet <corbet@lwn.net>
Large contiguous memory allocations can be unreliable in the Linux kernel.
Kernel programmers will sometimes respond to this problem by allocating
pages with vmalloc(). This solution not ideal, though. On 32-bit systems,
memory from vmalloc() must be mapped into a relatively small address space;
it's easy to run out. On SMP systems, the page table changes required by
vmalloc() allocations can require expensive cross-processor interrupts on
all CPUs. And, on all systems, use of space in the vmalloc() range
increases pressure on the translation lookaside buffer (TLB), reducing the
performance of the system.
In many cases, the need for memory from vmalloc() can be eliminated by
piecing together an array from smaller parts; the flexible array library
exists to make this task easier.
A flexible array holds an arbitrary (within limits) number of fixed-sized
objects, accessed via an integer index. Sparse arrays are handled
reasonably well. Only single-page allocations are made, so memory
allocation failures should be relatively rare. The down sides are that the
arrays cannot be indexed directly, individual object size cannot exceed the
system page size, and putting data into a flexible array requires a copy
operation. It's also worth noting that flexible arrays do no internal
locking at all; if concurrent access to an array is possible, then the
caller must arrange for appropriate mutual exclusion.
The creation of a flexible array is done with::
#include <linux/flex_array.h>
struct flex_array *flex_array_alloc(int element_size,
unsigned int total,
gfp_t flags);
The individual object size is provided by element_size, while total is the
maximum number of objects which can be stored in the array. The flags
argument is passed directly to the internal memory allocation calls. With
the current code, using flags to ask for high memory is likely to lead to
notably unpleasant side effects.
It is also possible to define flexible arrays at compile time with::
DEFINE_FLEX_ARRAY(name, element_size, total);
This macro will result in a definition of an array with the given name; the
element size and total will be checked for validity at compile time.
Storing data into a flexible array is accomplished with a call to::
int flex_array_put(struct flex_array *array, unsigned int element_nr,
void *src, gfp_t flags);
This call will copy the data from src into the array, in the position
indicated by element_nr (which must be less than the maximum specified when
the array was created). If any memory allocations must be performed, flags
will be used. The return value is zero on success, a negative error code
otherwise.
There might possibly be a need to store data into a flexible array while
running in some sort of atomic context; in this situation, sleeping in the
memory allocator would be a bad thing. That can be avoided by using
GFP_ATOMIC for the flags value, but, often, there is a better way. The
trick is to ensure that any needed memory allocations are done before
entering atomic context, using::
int flex_array_prealloc(struct flex_array *array, unsigned int start,
unsigned int nr_elements, gfp_t flags);
This function will ensure that memory for the elements indexed in the range
defined by start and nr_elements has been allocated. Thereafter, a
flex_array_put() call on an element in that range is guaranteed not to
block.
Getting data back out of the array is done with::
void *flex_array_get(struct flex_array *fa, unsigned int element_nr);
The return value is a pointer to the data element, or NULL if that
particular element has never been allocated.
Note that it is possible to get back a valid pointer for an element which
has never been stored in the array. Memory for array elements is allocated
one page at a time; a single allocation could provide memory for several
adjacent elements. Flexible array elements are normally initialized to the
value FLEX_ARRAY_FREE (defined as 0x6c in <linux/poison.h>), so errors
involving that number probably result from use of unstored array entries.
Note that, if array elements are allocated with __GFP_ZERO, they will be
initialized to zero and this poisoning will not happen.
Individual elements in the array can be cleared with::
int flex_array_clear(struct flex_array *array, unsigned int element_nr);
This function will set the given element to FLEX_ARRAY_FREE and return
zero. If storage for the indicated element is not allocated for the array,
flex_array_clear() will return -EINVAL instead. Note that clearing an
element does not release the storage associated with it; to reduce the
allocated size of an array, call::
int flex_array_shrink(struct flex_array *array);
The return value will be the number of pages of memory actually freed.
This function works by scanning the array for pages containing nothing but
FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work
if the array's pages are allocated with __GFP_ZERO.
It is possible to remove all elements of an array with a call to::
void flex_array_free_parts(struct flex_array *array);
This call frees all elements, but leaves the array itself in place.
Freeing the entire array is done with::
void flex_array_free(struct flex_array *array);
As of this writing, there are no users of flexible arrays in the mainline
kernel. The functions described here are also not exported to modules;
that will probably be fixed when somebody comes up with a need for it.
......@@ -4,6 +4,7 @@
#include <linux/smp.h>
#include <linux/threads.h>
#include <linux/numa.h>
#include <asm/machvec.h>
#ifdef CONFIG_NUMA
......@@ -29,7 +30,7 @@ static const struct cpumask *cpumask_of_node(int node)
{
int cpu;
if (node == -1)
if (node == NUMA_NO_NODE)
return cpu_all_mask;
cpumask_clear(&node_to_cpumask_map[node]);
......
......@@ -563,7 +563,7 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page)
static bool __in_atomic_pool(void *start, size_t size)
{
return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
return gen_pool_has_addr(atomic_pool, (unsigned long)start, size);
}
static int __free_from_pool(void *start, size_t size)
......
......@@ -719,16 +719,9 @@ EXPORT_SYMBOL(phys_mem_access_prot);
#define vectors_base() (vectors_high() ? 0xffff0000 : 0)
static void __init *early_alloc_aligned(unsigned long sz, unsigned long align)
{
void *ptr = __va(memblock_phys_alloc(sz, align));
memset(ptr, 0, sz);
return ptr;
}
static void __init *early_alloc(unsigned long sz)
{
return early_alloc_aligned(sz, sz);
return memblock_alloc(sz, sz);
}
static void *__init late_alloc(unsigned long sz)
......@@ -1000,7 +993,7 @@ void __init iotable_init(struct map_desc *io_desc, int nr)
if (!nr)
return;
svm = early_alloc_aligned(sizeof(*svm) * nr, __alignof__(*svm));
svm = memblock_alloc(sizeof(*svm) * nr, __alignof__(*svm));
for (md = io_desc; nr; md++, nr--) {
create_mapping(md);
......@@ -1022,7 +1015,7 @@ void __init vm_reserve_area_early(unsigned long addr, unsigned long size,
struct vm_struct *vm;
struct static_vm *svm;
svm = early_alloc_aligned(sizeof(*svm), __alignof__(*svm));
svm = memblock_alloc(sizeof(*svm), __alignof__(*svm));
vm = &svm->vm;
vm->addr = (void *)addr;
......
......@@ -1467,6 +1467,10 @@ config SYSVIPC_COMPAT
def_bool y
depends on COMPAT && SYSVIPC
config ARCH_ENABLE_HUGEPAGE_MIGRATION
def_bool y
depends on HUGETLB_PAGE && MIGRATION
menu "Power management options"
source "kernel/power/Kconfig"
......
......@@ -20,6 +20,11 @@
#include <asm/page.h>
#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
#define arch_hugetlb_migration_supported arch_hugetlb_migration_supported
extern bool arch_hugetlb_migration_supported(struct hstate *h);
#endif
#define __HAVE_ARCH_HUGE_PTEP_GET
static inline pte_t huge_ptep_get(pte_t *ptep)
{
......
......@@ -27,6 +27,26 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
bool arch_hugetlb_migration_supported(struct hstate *h)
{
size_t pagesize = huge_page_size(h);
switch (pagesize) {
#ifdef CONFIG_ARM64_4K_PAGES
case PUD_SIZE:
#endif
case PMD_SIZE:
case CONT_PMD_SIZE:
case CONT_PTE_SIZE:
return true;
}
pr_warn("%s: unrecognized huge page size 0x%lx\n",
__func__, pagesize);
return false;
}
#endif
int pmd_huge(pmd_t pmd)
{
return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
......
......@@ -30,6 +30,7 @@ generic-y += pgalloc.h
generic-y += preempt.h
generic-y += segment.h
generic-y += serial.h
generic-y += shmparam.h
generic-y += tlbflush.h
generic-y += topology.h
generic-y += trace_clock.h
......
include include/uapi/asm-generic/Kbuild.asm
generic-y += kvm_para.h
generic-y += shmparam.h
generic-y += ucontext.h
......@@ -121,8 +121,6 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr,
*/
void __init coherent_mem_init(phys_addr_t start, u32 size)
{
phys_addr_t bitmap_phys;
if (!size)
return;
......@@ -138,11 +136,8 @@ void __init coherent_mem_init(phys_addr_t start, u32 size)
if (dma_size & (PAGE_SIZE - 1))
++dma_pages;
bitmap_phys = memblock_phys_alloc(BITS_TO_LONGS(dma_pages) * sizeof(long),
sizeof(long));
dma_bitmap = phys_to_virt(bitmap_phys);
memset(dma_bitmap, 0, dma_pages * PAGE_SIZE);
dma_bitmap = memblock_alloc(BITS_TO_LONGS(dma_pages) * sizeof(long),
sizeof(long));
}
static void c6x_dma_sync(struct device *dev, phys_addr_t paddr, size_t size,
......
......@@ -40,6 +40,7 @@ generic-y += preempt.h
generic-y += scatterlist.h
generic-y += sections.h
generic-y += serial.h
generic-y += shmparam.h
generic-y += sizes.h
generic-y += spinlock.h
generic-y += timex.h
......
include include/uapi/asm-generic/Kbuild.asm
generic-y += kvm_para.h
generic-y += shmparam.h
generic-y += ucontext.h
......@@ -30,6 +30,7 @@ generic-y += rwsem.h
generic-y += sections.h
generic-y += segment.h
generic-y += serial.h
generic-y += shmparam.h
generic-y += sizes.h
generic-y += topology.h
generic-y += trace_clock.h
......
include include/uapi/asm-generic/Kbuild.asm
generic-y += shmparam.h
generic-y += ucontext.h
......@@ -74,7 +74,7 @@ void __init build_cpu_to_node_map(void)
cpumask_clear(&node_to_cpu_mask[node]);
for_each_possible_early_cpu(cpu) {
node = -1;
node = NUMA_NO_NODE;
for (i = 0; i < NR_CPUS; ++i)
if (cpu_physical_id(cpu) == node_cpuid[i].phys_id) {
node = node_cpuid[i].nid;
......
......@@ -227,7 +227,7 @@ void __init setup_per_cpu_areas(void)
* CPUs are put into groups according to node. Walk cpu_map
* and create new groups at node boundaries.
*/
prev_node = -1;
prev_node = NUMA_NO_NODE;
ai->nr_groups = 0;
for (unit = 0; unit < nr_units; unit++) {
cpu = cpu_map[unit];
......@@ -435,7 +435,7 @@ static void __init *memory_less_node_alloc(int nid, unsigned long pernodesize)
{
void *ptr = NULL;
u8 best = 0xff;
int bestnode = -1, node, anynode = 0;
int bestnode = NUMA_NO_NODE, node, anynode = 0;
for_each_online_node(node) {
if (node_isset(node, memory_less_mask))
......@@ -447,7 +447,7 @@ static void __init *memory_less_node_alloc(int nid, unsigned long pernodesize)
anynode = node;
}
if (bestnode == -1)
if (bestnode == NUMA_NO_NODE)
bestnode = anynode;
ptr = memblock_alloc_try_nid(pernodesize, PERCPU_PAGE_SIZE,
......
......@@ -20,6 +20,7 @@ generic-y += mm-arch-hooks.h
generic-y += percpu.h
generic-y += preempt.h
generic-y += sections.h
generic-y += shmparam.h
generic-y += spinlock.h
generic-y += topology.h
generic-y += trace_clock.h
......
......@@ -2,4 +2,3 @@ include include/uapi/asm-generic/Kbuild.asm
generated-y += unistd_32.h
generic-y += kvm_para.h
generic-y += shmparam.h
......@@ -26,6 +26,7 @@ generic-y += parport.h
generic-y += percpu.h
generic-y += preempt.h
generic-y += serial.h
generic-y += shmparam.h
generic-y += syscalls.h
generic-y += topology.h
generic-y += trace_clock.h
......
......@@ -2,5 +2,4 @@ include include/uapi/asm-generic/Kbuild.asm
generated-y += unistd_32.h
generic-y += kvm_para.h
generic-y += shmparam.h
generic-y += ucontext.h
......@@ -363,8 +363,9 @@ void __init *early_get_page(void)
* Mem start + kernel_tlb -> here is limit
* because of mem mapping from head.S
*/
return __va(memblock_alloc_base(PAGE_SIZE, PAGE_SIZE,
memory_start + kernel_tlb));
return memblock_alloc_try_nid_raw(PAGE_SIZE, PAGE_SIZE,
MEMBLOCK_LOW_LIMIT, memory_start + kernel_tlb,
NUMA_NO_NODE);
}
#endif /* CONFIG_MMU */
......
......@@ -78,8 +78,7 @@ static void __init map_ram(void)
}
/* Alloc one page for holding PTE's... */
pte = (pte_t *) __va(memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE));
memset(pte, 0, PAGE_SIZE);
pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
set_pmd(pme, __pmd(__pa(pte) + _PAGE_KERNEL_TABLE));
/* Fill the newly allocated page with PTE'S */
......@@ -111,8 +110,7 @@ static void __init fixedrange_init(void)
pgd = swapper_pg_dir + pgd_index(vaddr);
pud = pud_offset(pgd, vaddr);
pmd = pmd_offset(pud, vaddr);
fixmap_pmd_p = (pmd_t *) __va(memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE));
memset(fixmap_pmd_p, 0, PAGE_SIZE);
fixmap_pmd_p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
set_pmd(pmd, __pmd(__pa(fixmap_pmd_p) + _PAGE_KERNEL_TABLE));
#ifdef CONFIG_HIGHMEM
......@@ -124,8 +122,7 @@ static void __init fixedrange_init(void)
pgd = swapper_pg_dir + pgd_index(vaddr);
pud = pud_offset(pgd, vaddr);
pmd = pmd_offset(pud, vaddr);
pte = (pte_t *) __va(memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE));
memset(pte, 0, PAGE_SIZE);
pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
set_pmd(pmd, __pmd(__pa(pte) + _PAGE_KERNEL_TABLE));
pkmap_page_table = pte;
#endif /* CONFIG_HIGHMEM */
......@@ -150,8 +147,7 @@ void __init paging_init(void)
fixedrange_init();
/* allocate space for empty_zero_page */
zero_page = __va(memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE));
memset(zero_page, 0, PAGE_SIZE);
zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
zone_sizes_init();
empty_zero_page = virt_to_page(zero_page);
......
......@@ -34,6 +34,7 @@ generic-y += qrwlock_types.h
generic-y += qrwlock.h
generic-y += sections.h
generic-y += segment.h
generic-y += shmparam.h
generic-y += string.h
generic-y += switch_to.h
generic-y += topology.h
......
include include/uapi/asm-generic/Kbuild.asm
generic-y += kvm_para.h
generic-y += shmparam.h
generic-y += ucontext.h
......@@ -122,13 +122,10 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
{