Commit 01aa9d51 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'docs-4.20' of git://

Pull documentation updates from Jonathan Corbet:
 "This is a fairly typical cycle for documentation. There's some welcome
  readability improvements for the formatted output, some LICENSES
  updates including the addition of the ISC license, the removal of the
  unloved and unmaintained 00-INDEX files, the deprecated APIs document
  from Kees, more MM docs from Mike Rapoport, and the usual pile of typo
  fixes and corrections"

* tag 'docs-4.20' of git:// (41 commits)
  docs: Fix typos in histogram.rst
  docs: Introduce deprecated APIs list
  kernel-doc: fix declaration type determination
  doc: fix a typo in adding-syscalls.rst
  docs/admin-guide: memory-hotplug: remove table of contents
  doc: printk-formats: Remove bogus kobject references for device nodes
  Documentation: preempt-locking: Use better example
  dm flakey: Document "error_writes" feature
  docs/completion.txt: Fix a couple of punctuation nits
  LICENSES: Add ISC license text
  LICENSES: Add note to CDDL-1.0 license that it should not be used
  docs/core-api: memory-hotplug: add some details about locking internals
  docs/core-api: rename memory-hotplug-notifier to memory-hotplug
  docs: improve readability for people with poorer eyesight
  yama: clarify ptrace_scope=2 in Yama documentation
  docs/vm: split memory hotplug notifier description to Documentation/core-api
  docs: move memory hotplug description into admin-guide/mm
  doc: Fix acronym "FEKEK" in ecryptfs
  docs: fix some broken documentation references
  iommu: Fix passthrough option documentation
parents 5993692f aea74de4
This diff is collapsed.
- this file
- info on how PCI host bridges are represented in ACPI
- the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
- a guide describing the PCI Express Port Bus driver
- info on PCI error recovery
- the PCI Express I/O Virtualization HOWTO
- info on the PCI subsystem for device driver authors
- the PCI Express Advanced Error Reporting Driver Guide HOWTO
- guide to add endpoint controller driver and endpoint function driver.
- guide to use configfs to configure the PCI endpoint function.
- specification of *PCI test* function device.
- userguide for PCI endpoint test function.
- binding documentation for PCI endpoint function
- This file
- Using RCU to Protect Read-Mostly Arrays
- Review Checklist for RCU Patches
- Using RCU to Protect Read-Mostly Linked Lists
- RCU and lockdep checking
- RCU Lockdep splats explained.
- Using RCU to Protect Dynamic NMI Handlers
- Proper care and feeding of return values from rcu_dereference()
- RCU and Unloadable Modules
- RCU list primitives for use with SLAB_TYPESAFE_BY_RCU
- Reference-count design for elements of lists/arrays protected by RCU
- RCU Concepts
- List of RCU papers (bibliography) going back to 1980.
- RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress)
- RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
- RCU on Uniprocessor Systems
- What is RCU?
......@@ -87,7 +87,3 @@ o Where can I find more information on RCU?
See the RTFP.txt file in this directory.
Or point your browser at
o What are all these files in this directory?
See 00-INDEX for the list.
......@@ -64,8 +64,8 @@ The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are:
Using ``PTRACE_TRACEME`` is unchanged.
2 - admin-only attach:
only processes with ``CAP_SYS_PTRACE`` may use ptrace
with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``.
only processes with ``CAP_SYS_PTRACE`` may use ptrace, either with
``PTRACE_ATTACH`` or through children calling ``PTRACE_TRACEME``.
3 - no attach:
no processes may use ptrace with ``PTRACE_ATTACH`` nor via
......@@ -51,8 +51,7 @@ Documentation
- There are various README files in the Documentation/ subdirectory:
these typically contain kernel-specific installation notes for some
drivers for example. See Documentation/00-INDEX for a list of what
is contained in each file. Please read the
drivers for example. Please read the
:ref:`Documentation/process/changes.rst <changes>` file, as it
contains information about the problems, which may result by upgrading
your kernel.
......@@ -1764,7 +1764,7 @@
Format: { "0" | "1" }
0 - Use IOMMU translation for DMA.
1 - Bypass the IOMMU for DMA.
unset - Use IOMMU translation for DMA.
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
......@@ -553,7 +553,7 @@ When nested virtualization is in use, three operating systems are involved:
the bare metal hypervisor, the nested hypervisor and the nested virtual
machine. VMENTER operations from the nested hypervisor into the nested
guest will always be processed by the bare metal hypervisor. If KVM is the
bare metal hypervisor it wiil:
bare metal hypervisor it will:
- Flush the L1D cache on every switch from the nested hypervisor to the
nested virtual machine, so that the nested hypervisor's secrets are not
......@@ -29,6 +29,7 @@ the Linux memory management.
- this file
- requirements for booting
- Cache Coherent Network ring-bus and perf PMU driver.
- ARM Interrupt subsystem documentation
- Intel IXP4xx Network processor.
- Netwinder specific documentation
- Symbol definitions for porting Linux to a new ARM machine.
- Kernel initialization parameters on ARM Linux
- General ARM documentation
- SA1100 documentation
- S3C24XX ARM Linux Overview
- ST SPEAr platform Linux Overview
- Release notes for Linux Kernel Vector Floating Point support code
- Algorithm for CPU and Cluster setup/teardown
- Ltd's Empeg MP3 Car Audio Player
- Secure firmware registration and calling.
- How to use NEON instructions in kernel mode
- Helper functions in kernel space made available for userspace.
- alignment abort handler documentation
- description of the virtual memory layout
- NWFPE floating point emulator documentation
- SWP/SWPB emulation handler/logging description
- ARM Tightly Coupled Memory
- [U]EFI configuration and runtime services documentation
- Voting locks, low-level mechanism relying on memory system atomic writes.
- This file
- BFQ IO scheduler and its tunables
- Notes on the Generic Block Layer Rewrite in Linux 2.5
- Immutable biovecs and biovec iterators
- Generic Block Device Capability (/sys/block/<device>/capability)
- CFQ IO scheduler tunables
- how to specify block device partitions on kernel command line
- Block data integrity
- Deadline IO scheduler tunables
- Block io priorities (in CFQ scheduler)
- Block layer support for Persistent Reservations
- Null block for block-layer benchmarking.
- Queue's sysfs entries
- The members of struct request (in include/linux/blkdev.h)
- Block layer statistics in /sys/block/<device>/stat
- Switching I/O schedulers at runtime
- Control of volatile write back caches
- this file
- info on Mylex DAC960/DAC1100 PCI RAID Controller Driver for Linux.
- info, major/minor #'s for Compaq's SMART Array Controllers.
- info on using Compaq's SMART2 Intelligent Disk Array Controllers.
- notes and driver options for the floppy disk driver.
- info on mGine m(g)flash driver for linux.
- info on a TCP implementation of a network block device.
- information about the parallel port IDE subsystem.
- short guide on how to set up and use the RAM disk.
- this file (info on CD-ROMs and Linux)
- only used to generate TeX output from the documentation.
- LaTeX document on standardizing the CD-ROM programming interface.
- info on setting up and using ATAPI (aka IDE) CD-ROMs.
- Info on the CDRW packet writing module
- this file
- Description for Block IO Controller, implementation and usage details.
- Control Groups definition, implementation details, examples and API.
- CPU Accounting Controller; account CPU usage for groups of tasks.
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
- Device Whitelist Controller; description, interface and security.
- checkpointing; rationale to not use signals, interface.
- HugeTLB Controller implementation and usage details.
- Memory Resource Controller; implementation details.
- Memory Resource Controller; design, accounting, interface, testing.
- Network classifier cgroups details and usages.
- Network priority cgroups details and usages.
- Process number cgroups details and usages.
......@@ -259,7 +259,7 @@ latex_elements = {
'papersize': 'a4paper',
# The font size ('10pt', '11pt' or '12pt').
'pointsize': '8pt',
'pointsize': '11pt',
# Latex figure (float) alignment
#'figure_align': 'htbp',
......@@ -272,8 +272,8 @@ latex_elements = {
'preamble': '''
% Use some font with UTF-8 support with XeLaTeX
\\setsansfont{DejaVu Serif}
\\setromanfont{DejaVu Sans}
\\setsansfont{DejaVu Sans}
\\setromanfont{DejaVu Serif}
\\setmonofont{DejaVu Sans Mono}
......@@ -76,7 +76,7 @@ These interfaces available only with bootmem, i.e when ``CONFIG_NO_BOOTMEM=n``
.. kernel-doc:: include/linux/bootmem.h
.. kernel-doc:: mm/bootmem.c
Memblock specific API
......@@ -89,4 +89,4 @@ really happens under the hood.
.. kernel-doc:: include/linux/memblock.h
.. kernel-doc:: mm/memblock.c
.. _gfp_mask_from_fs_io:
GFP masks used from FS/IO context
......@@ -27,10 +27,13 @@ Core utilities
Interfaces for kernel debugging
Memory Allocation Guide
Linux provides a variety of APIs for memory allocation. You can
allocate small chunks using `kmalloc` or `kmem_cache_alloc` families,
large virtually contiguous areas using `vmalloc` and its derivatives,
or you can directly request pages from the page allocator with
`alloc_pages`. It is also possible to use more specialized allocators,
for instance `cma_alloc` or `zs_malloc`.
Most of the memory allocation APIs use GFP flags to express how that
memory should be allocated. The GFP acronym stands for "get free
pages", the underlying memory allocation function.
Diversity of the allocation APIs combined with the numerous GFP flags
makes the question "How should I allocate memory?" not that easy to
answer, although very likely you should use
kzalloc(<size>, GFP_KERNEL);
Of course there are cases when other allocation APIs and different GFP
flags must be used.
Get Free Page flags
The GFP flags control the allocators behavior. They tell what memory
zones can be used, how hard the allocator should try to find free
memory, whether the memory can be accessed by the userspace etc. The
:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides
reference documentation for the GFP flags and their combinations and
here we briefly outline their recommended usage:
* Most of the time ``GFP_KERNEL`` is what you need. Memory for the
kernel data structures, DMAable memory, inode cache, all these and
many other allocations types can use ``GFP_KERNEL``. Note, that
using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that
direct reclaim may be triggered under memory pressure; the calling
context must be allowed to sleep.
* If the allocation is performed from an atomic context, e.g interrupt
handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and
IO or filesystem operations. Consequently, under memory pressure
``GFP_NOWAIT`` allocation is likely to fail. Allocations which
have a reasonable fallback should be using ``GFP_NOWARN``.
* If you think that accessing memory reserves is justified and the kernel
will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.
* Untrusted allocations triggered from userspace should be a subject
of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There
is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL``
allocations that should be accounted.
* Userspace allocations should use either of the ``GFP_USER``,
``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer
the flag name the less restrictive it is.
``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory
will be directly accessible by the kernel and implies that the
data is movable.
``GFP_HIGHUSER`` means that the allocated memory is not movable,
but it is not required to be directly accessible by the kernel. An
example may be a hardware allocation that maps data directly into
userspace but has no addressing limitations.
``GFP_USER`` means that the allocated memory is not movable and it
must be directly accessible by the kernel.
You may notice that quite a few allocations in the existing code
specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to
prevent recursion deadlocks caused by direct memory reclaim calling
back into the FS or IO paths and blocking on already held
resources. Since 4.12 the preferred way to address this issue is to
use new scope APIs described in
:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are
used to ensure that the allocated memory is accessible by hardware
with limited addressing capabilities. So unless you are writing a
driver for a device with such restrictions, avoid using these flags.
And even with hardware with restrictions it is preferable to use
`dma_alloc*` APIs.
Selecting memory allocator
The most straightforward way to allocate memory is to use a function
from the :c:func:`kmalloc` family. And, to be on the safe size it's
best to use routines that set memory to zero, like
:c:func:`kzalloc`. If you need to allocate memory for an array, there
are :c:func:`kmalloc_array` and :c:func:`kcalloc` helpers.
The maximal size of a chunk that can be allocated with `kmalloc` is
limited. The actual limit depends on the hardware and the kernel
configuration, but it is a good practice to use `kmalloc` for objects
smaller than page size.
For large allocations you can use :c:func:`vmalloc` and
:c:func:`vzalloc`, or directly request pages from the page
allocator. The memory allocated by `vmalloc` and related functions is
not physically contiguous.
If you are not sure whether the allocation size is too large for
`kmalloc`, it is possible to use :c:func:`kvmalloc` and its
derivatives. It will try to allocate memory with `kmalloc` and if the
allocation fails it will be retried with `vmalloc`. There are
restrictions on which GFP flags can be used with `kvmalloc`; please
see :c:func:`kvmalloc_node` reference documentation. Note that
`kvmalloc` may return memory that is not physically contiguous.
If you need to allocate many identical objects you can use the slab
cache allocator. The cache should be set up with
:c:func:`kmem_cache_create` before it can be used. Afterwards
:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate
memory from that cache.
When the allocated memory is no longer needed it must be freed. You
can use :c:func:`kvfree` for the memory allocated with `kmalloc`,
`vmalloc` and `kvmalloc`. The slab caches should be freed with
:c:func:`kmem_cache_free`. And don't forget to destroy the cache with
.. _memory_hotplug:
Memory hotplug
Memory hotplug event notifier
Hotplugging events are sent to a notification queue.
There are six types of notification defined in ``include/linux/memory.h``:
Generated before new memory becomes available in order to be able to
prepare subsystems to handle memory. The page allocator is still unable
to allocate from the new memory.
Generated if MEM_GOING_ONLINE fails.
Generated when memory has successfully brought online. The callback may
allocate pages from the new memory.
Generated to begin the process of offlining memory. Allocations are no
longer possible from the memory but some of the memory to be offlined
is still in use. The callback can be used to free memory known to a
subsystem from the indicated memory block.
Generated if MEM_GOING_OFFLINE fails. Memory is available again from
the memory block that we attempted to offline.
Generated after offlining memory is complete.
A callback routine can be registered by calling::
hotplug_memory_notifier(callback_func, priority)
Callback functions with higher values of priority are called before callback
functions with lower values.
A callback function must have the following prototype::
int callback_func(
struct notifier_block *self, unsigned long action, void *arg);
The first argument of the callback function (self) is a pointer to the block
of the notifier chain that points to the callback function itself.
The second argument (action) is one of the event types described above.
The third argument (arg) passes a pointer of struct memory_notify::
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
int status_change_nid_high;
int status_change_nid;
- start_pfn is start_pfn of online/offline memory.
- nr_pages is # of pages of online/offline memory.
- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for the
node if necessary.
The callback routine shall return one of the values
defined in ``include/linux/notifier.h``
NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
further processing of the notification queue.
NOTIFY_STOP stops further processing of the notification queue.
Locking Internals
When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
the device_hotplug_lock should be held to:
- synchronize against online/offline requests (e.g. via sysfs). This way, memory
block devices can only be accessed (.online/.state attributes) by user
space once memory has been fully added. And when removing memory, we
know nobody is in critical sections.
- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
Especially, there is a possible lock inversion that is avoided using
device_hotplug_lock when adding memory and user space tries to online that
memory faster than expected:
- device_online() will first take the device_lock(), followed by
- add_memory_resource() will first take the mem_hotplug_lock, followed by
the device_lock() (while creating the devices, during bus_add_device()).
As the device is visible to user space before taking the device_lock(), this
can result in a lock inversion.
onlining/offlining of memory should be done via device_online()/
device_offline() - to make sure it is properly synchronized to actions
via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock in
write mode to serialise memory hotplug (e.g. access to global/zone
In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
mode allows for a quite efficient get_online_mems/put_online_mems
implementation, so code accessing memory can protect from that memory
......@@ -14,6 +14,8 @@ User Space Memory Access
.. kernel-doc:: mm/util.c
:functions: get_user_pages_fast
.. _mm-api-gfp-flags:
Memory Allocation Controls
......@@ -376,15 +376,15 @@ correctness of the format string and va_list arguments.
Passed by reference.
Device tree nodes
For printing kobject based structs (device nodes). Default behaviour is
For printing device tree node structures. Default behaviour is
equivalent to %pOFf.
- f - device node full_name
......@@ -30,18 +30,29 @@ of many distributions, e.g. :
- NetBSD
- FreeBSD
You can get the latest version released from the Coccinelle homepage at
Some distribution packages are obsolete and it is recommended
to use the latest version released from the Coccinelle homepage at
Once you have it, run the following command::
Or from Github at:
Once you have it, run the following commands::
as a regular user, and install it with::
sudo make install
More detailed installation instructions to build from source can be
found at:
Supplemental documentation
......@@ -51,6 +62,10 @@
The wiki documentation always refers to the linux-next version of the script.
For Semantic Patch Language(SmPL) grammar documentation refer to:
Using Coccinelle on the Linux kernel
......@@ -223,7 +238,7 @@ Since coccicheck runs through make, it naturally runs from the kernel
proper dir, as such the second rule above would be implied for picking up a
.cocciconfig when using ``make coccicheck``.
``make coccicheck`` also supports using M= targets.If you do not supply
``make coccicheck`` also supports using M= targets. If you do not supply
any M= target, it is assumed you want to target the entire kernel.
The kernel coccicheck script has::
......@@ -159,7 +159,7 @@ Contributing new tests (details)
* If a test needs specific kernel config options enabled, add a config file in
the test directory to enable them.
e.g: tools/testing/selftests/android/ion/config
e.g: tools/testing/selftests/android/config
Test Harness
......@@ -33,6 +33,10 @@ Optional feature parameters:
All write I/O is silently ignored.
Read I/O is handled correctly.
All write I/O is failed with an error signalled.
Read I/O is handled correctly.
corrupt_bio_byte <Nth_byte> <direction> <value> <flags>:
During <down interval>, replace <Nth_byte> of the data of
each matching bio with <value>.
Documentation for device trees, a data structure by which bootloaders pass
hardware layout to Linux in a device-independent manner, simplifying hardware
probing. This subsystem is maintained by Grant Likely
<> and has a mailing list at
- this file
- Booting Linux without Open Firmware, describes history and format of device trees.
- How Linux uses DT and what DT aims to solve.
\ No newline at end of file
......@@ -121,6 +121,9 @@ Kernel utility functions
.. kernel-doc:: kernel/rcu/update.c
.. kernel-doc:: include/linux/overflow.h
Device Resource Management
Firewire (IEEE 1394) driver Interface Guide