1. 07 Apr, 2018 14 commits
    • Masahiro Yamada's avatar
      kbuild: rename *-asn1.[ch] to *.asn1.[ch] · 4fa8bc94
      Masahiro Yamada authored
      Our convention is to distinguish file types by suffixes with a period
      as a separator.
      *-asn1.[ch] is a different pattern from other generated sources such
      as *.lex.c, *.tab.[ch], *.dtb.S, etc.  More confusing, files with
      '-asn1.[ch]' are generated files, but '_asn1.[ch]' are checked-in
      Rename generated files to *.asn1.[ch] for consistency.
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: clean up *-asn1.[ch] patterns from top-level Makefile · 3ca3273e
      Masahiro Yamada authored
      Clean up these patterns from the top Makefile to omit 'clean-files'
      in each Makefile.
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore · 9ce285cf
      Masahiro Yamada authored
      These are common patterns where source files are parsed by the
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: add %.dtb.S and %.dtb to 'targets' automatically · a7f92419
      Masahiro Yamada authored
      Another common pattern that consists of chained commands is to compile
      a DTB as binary data into the kernel image or a module.  It is used in
      several places in the source tree.  Support it in the core Makefile.
      $(call if_changed,dt_S_dtb) is more suitable than $(call cmd,dt_S_dtb)
      in case cmd_dt_S_dtb is changed in the future.
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: 's avatarFrank Rowand <frowand.list@gmail.com>
    • Masahiro Yamada's avatar
      kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically · b23d1a24
      Masahiro Yamada authored
      Files generated by if_changed* must be added to 'targets' to include
      *.cmd files.  Otherwise, they would be regenerated every time.
      The build system automatically adds objects to 'targets' where
      appropriate, such as obj-y, extra-y, etc. but does nothing for
      intermediate files.  So, each Makefile needs to add them by itself.
      There are some common cases where objects are generated by chained
      rules.  Lexers and parsers are compiled like follows:
         %.lex.o <- %.lex.c <- %.l
         %.tab.o <- %.tab.c <- %.y
      They are common patterns, so it is reasonable to take care of them
      in the core Makefile instead of requiring each Makefile to do so.
      At this moment, you cannot delete 'target += zconf.lex.c' in the
      Kconfig Makefile because zconf.lex.c is included from zconf.tab.c
      instead of being compiled separately.  It should be deleted after
      Kconfig is more refactored.
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: 's avatarFrank Rowand <frowand.list@gmail.com>
    • Masahiro Yamada's avatar
      genksyms: generate lexer and parser during build instead of shipping · 833e6224
      Masahiro Yamada authored
      Now that the kernel build supports flex and bison, remove the _shipped
      files and generate them during the build instead.
      There are no more shipped lexer and parser, so I ripped off the rules
      in scripts/Malefile.lib that were used for REGENERATE_PARSERS.
      The genksyms parser has ambiguous grammar, which would emit warnings:
       scripts/genksyms/parse.y: warning: 9 shift/reduce conflicts [-Wconflicts-sr]
       scripts/genksyms/parse.y: warning: 5 reduce/reduce conflicts [-Wconflicts-rr]
      They are normally suppressed, but displayed when W=1 is given.
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile · 9a8dfb39
      Masahiro Yamada authored
      Files suffixed by .lex.c, .tab.[ch] are generated lexers, parsers,
      respectively.  Clean them up globally from the top Makefile.
      Some of the final host programs those lexer/parser are linked into
      are necessary for building external modules, but the intermediates
      are unneeded.  They can be cleaned away by 'make clean' instead of
      'make mrproper'.
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: 's avatarFrank Rowand <frowand.list@gmail.com>
    • Masahiro Yamada's avatar
      .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore · 59889300
      Masahiro Yamada authored
      These patterns are common to host programs that require lexer and parser.
      Move them to the top .gitignore.
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: 's avatarFrank Rowand <frowand.list@gmail.com>
    • Robin Jarry's avatar
      kbuild: use HOSTLDFLAGS for single .c executables · 63185b46
      Robin Jarry authored
      When compiling executables from a single .c file, the linker is also
      invoked. Pass the HOSTLDFLAGS like for other linker commands.
      Signed-off-by: 's avatarRobin Jarry <robin.jarry@6wind.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: 's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio · f605ba97
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
       - Adopt iommu_unmap_fast() interface to type1 backend
         (Suravee Suthikulpanit)
       - mdev sample driver fixup (Shunyong Yang)
       - More efficient PFN mapping handling in type1 backend
         (Jason Cai)
       - VFIO device ioeventfd interface (Alex Williamson)
       - Tag new vfio-platform sub-maintainer (Alex Williamson)
      * tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio:
        MAINTAINERS: vfio/platform: Update sub-maintainer
        vfio/pci: Add ioeventfd support
        vfio/pci: Use endian neutral helpers
        vfio/pci: Pull BAR mapping setup from read-write path
        vfio/type1: Improve memory pinning process for raw PFN mapping
        vfio-mdev/samples: change RDI interrupt condition
        vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 016c6f25
      Linus Torvalds authored
      Pull fw_cfg, vhost updates from Michael Tsirkin:
       "This cleans up the qemu fw cfg device driver.
        On top of this, vmcore is dumped there on crash to help debugging
        with kASLR enabled.
        Also included are some fixes in vhost"
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vhost: add vsock compat ioctl
        vhost: fix vhost ioctl signature to build with clang
        fw_cfg: write vmcoreinfo details
        crash: export paddr_vmcoreinfo_note()
        fw_cfg: add DMA register
        fw_cfg: add a public uapi header
        fw_cfg: handle fw_cfg_read_blob() error
        fw_cfg: remove inline from fw_cfg_read_blob()
        fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
        fw_cfg: fix sparse warning reading FW_CFG_ID
        fw_cfg: fix sparse warnings with fw_cfg_file
        fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
        ptr_ring: fix build
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 3c0d551e
      Linus Torvalds authored
      Pull PCI updates from Bjorn Helgaas:
       - move pci_uevent_ers() out of pci.h (Michael Ellerman)
       - skip ASPM common clock warning if BIOS already configured it (Sinan
       - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)
       - remove last user of pci_get_bus_and_slot() and the function itself
         (Sinan Kaya)
       - add decoding for 16 GT/s link speed (Jay Fang)
       - add interfaces to get max link speed and width (Tal Gilboa)
       - add pcie_bandwidth_capable() to compute max supported link bandwidth
         (Tal Gilboa)
       - add pcie_bandwidth_available() to compute bandwidth available to
         device (Tal Gilboa)
       - add pcie_print_link_status() to log link speed and whether it's
         limited (Tal Gilboa)
       - use PCI core interfaces to report when device performance may be
         limited by its slot instead of doing it in each driver (Tal Gilboa)
       - fix possible cpqphp NULL pointer dereference (Shawn Lin)
       - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
         hotplug (Mika Westerberg)
       - add support for PCI I/O port space that's neither directly accessible
         via CPU in/out instructions nor directly mapped into CPU physical
         memory space. This is fairly intrusive and includes minor changes to
         interfaces used for I/O space on most platforms (Zhichang Yuan, John
       - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
         John Garry)
       - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
       - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
         (Shawn Lin)
       - report quirk timings with dev_info (Bjorn Helgaas)
       - report quirks that take longer than 10ms (Bjorn Helgaas)
       - add and use Altera Vendor ID (Johannes Thumshirn)
       - tidy Makefiles and comments (Bjorn Helgaas)
       - don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
         ia64, and mn10300 with x86 (Bjorn Helgaas)
       - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
       - merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
       - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
       - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
       - remove portdrv link order dependency (Bjorn Helgaas)
       - remove support for unused VC portdrv service (Bjorn Helgaas)
       - simplify portdrv feature permission checking (Bjorn Helgaas)
       - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
       - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
       - use cached AER capability offset (Frederick Lawler)
       - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
       - rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
       - use generic pci_mmap_resource_range() instead of powerpc and xtensa
         arch-specific versions (David Woodhouse)
       - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)
       - remove System and Video ROM reservations on sparc (Bjorn Helgaas)
       - probe for device reset support during enumeration instead of runtime
         (Bjorn Helgaas)
       - add ACS quirk for Ampere (née APM) root ports (Feng Kan)
       - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
       - protect device restore with device lock (Sinan Kaya)
       - handle failure of FLR gracefully (Sinan Kaya)
       - handle CRS (config retry status) after device resets (Sinan Kaya)
       - skip various config reads for SR-IOV VFs as an optimization
         (KarimAllah Ahmed)
       - consolidate VPD code in vpd.c (Bjorn Helgaas)
       - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
       - add DT support for R-Car r8a7743 (Biju Das)
       - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
         bridge driver that causes a general protection fault (Dexuan Cui)
       - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
         (Dexuan Cui)
       - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
         (Dexuan Cui)
       - make several structures static (Fengguang Wu)
       - increase number of MSI IRQs supported by Synopsys DesignWare bridges
         from 32 to 256 (Gustavo Pimentel)
       - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
         API from DesignWare drivers (Gustavo Pimentel)
       - add Tegra power management support (Manikanta Maddireddy)
       - add Tegra loadable module support (Manikanta Maddireddy)
       - handle 64-bit BARs correctly in endpoint support (Niklas Cassel)
       - support optional regulator for HiSilicon STB (Shawn Guo)
       - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)
       - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)
      * tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
        MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
        HISI LPC: Add ACPI support
        ACPI / scan: Do not enumerate Indirect IO host children
        ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
        HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
        of: Add missing I/O range exception for indirect-IO devices
        PCI: Apply the new generic I/O management on PCI IO hosts
        PCI: Add fwnode handler as input param of pci_register_io_range()
        PCI: Remove __weak tag from pci_register_io_range()
        MAINTAINERS: Add missing /drivers/pci/cadence directory entry
        fm10k: Report PCIe link properties with pcie_print_link_status()
        net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
        net/mlx5: Report PCIe link properties with pcie_print_link_status()
        net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
        PCI: Add pcie_print_link_status() to log link speed and whether it's limited
        PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
        misc: pci_endpoint_test: Handle 64-bit BARs properly
        PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
        PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
        PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
    • Linus Torvalds's avatar
      Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 19fd08b8
      Linus Torvalds authored
      Pull rdma updates from Jason Gunthorpe:
       "Doug and I are at a conference next week so if another PR is sent I
        expect it to only be bug fixes. Parav noted yesterday that there are
        some fringe case behavior changes in his work that he would like to
        fix, and I see that Intel has a number of rc looking patches for HFI1
        they posted yesterday.
        Parav is again the biggest contributor by patch count with his ongoing
        work to enable container support in the RDMA stack, followed by Leon
        doing syzkaller inspired cleanups, though most of the actual fixing
        went to RC.
        There is one uncomfortable series here fixing the user ABI to actually
        work as intended in 32 bit mode. There are lots of notes in the commit
        messages, but the basic summary is we don't think there is an actual
        32 bit kernel user of drivers/infiniband for several good reasons.
        However we are seeing people want to use a 32 bit user space with 64
        bit kernel, which didn't completely work today. So in fixing it we
        required a 32 bit rxe user to upgrade their userspace. rxe users are
        still already quite rare and we think a 32 bit one is non-existing.
         - Fix RDMA uapi headers to actually compile in userspace and be more
         - Three shared with netdev pull requests from Mellanox:
            * 7 patches, mostly to net with 1 IB related one at the back).
              This series addresses an IRQ performance issue (patch 1),
              cleanups related to the fix for the IRQ performance problem
              (patches 2-6), and then extends the fragmented completion queue
              support that already exists in the net side of the driver to the
              ib side of the driver (patch 7).
            * Mostly IB, with 5 patches to net that are needed to support the
              remaining 10 patches to the IB subsystem. This series extends
              the current 'representor' framework when the mlx5 driver is in
              switchdev mode from being a netdev only construct to being a
              netdev/IB dev construct. The IB dev is limited to raw Eth queue
              pairs only, but by having an IB dev of this type attached to the
              representor for a switchdev port, it enables DPDK to work on the
              switchdev device.
            * All net related, but needed as infrastructure for the rdma
         - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
         - SRP performance updates
         - IB uverbs write path cleanup patch series from Leon
         - Add RDMA_CM support to ib_srpt. This is disabled by default. Users
           need to set the port for ib_srpt to listen on in configfs in order
           for it to be enabled
         - TSO and Scatter FCS support in mlx4
         - Refactor of modify_qp routine to resolve problems seen while
           working on new code that is forthcoming
         - More refactoring and updates of RDMA CM for containers support from
         - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
           memory' user API features
         - Infrastructure updates for the new IOCTL interface, based on
           increased usage
         - ABI compatibility bug fixes to fully support 32 bit userspace on 64
           bit kernel as was originally intended. See the commit messages for
           extensive details
         - Syzkaller bugs and code cleanups motivated by them"
      * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
        IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
        IB/mlx5: Device memory mr registration support
        net/mlx5: Mkey creation command adjustments
        IB/mlx5: Device memory support in mlx5_ib
        net/mlx5: Query device memory capabilities
        IB/uverbs: Add device memory registration ioctl support
        IB/uverbs: Add alloc/free dm uverbs ioctl support
        IB/uverbs: Add device memory capabilities reporting
        IB/uverbs: Expose device memory capabilities to user
        RDMA/qedr: Fix wmb usage in qedr
        IB/rxe: Removed GID add/del dummy routines
        RDMA/qedr: Zero stack memory before copying to user space
        IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
        IB/mlx5: Add information for querying IPsec capabilities
        IB/mlx5: Add IPsec support for egress and ingress
        {net,IB}/mlx5: Add ipsec helper
        IB/mlx5: Add modify_flow_action_esp verb
        IB/mlx5: Add implementation for create and destroy action_xfrm
        IB/uverbs: Introduce ESP steering match filter
        IB/uverbs: Add modify ESP flow_action
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 28da7be5
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
       - New Hi3660 mailbox driver
       - Fix TEGRA Kconfig warning
       - Broadcom: use dma_pool_zalloc instead of dma_pool_alloc+memset
      * tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: Add support for Hi3660 mailbox
        dt-bindings: mailbox: Introduce Hi3660 controller binding
        mailbox: tegra: relax TEGRA_HSP_MBOX Kconfig dependencies
        maillbox: bcm-flexrm-mailbox: Use dma_pool_zalloc()
  2. 06 Apr, 2018 26 commits
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 9eda2d2d
      Linus Torvalds authored
      Pull SELinux updates from Paul Moore:
       "A bigger than usual pull request for SELinux, 13 patches (lucky!)
        along with a scary looking diffstat.
        Although if you look a bit closer, excluding the usual minor
        tweaks/fixes, there are really only two significant changes in this
        pull request: the addition of proper SELinux access controls for SCTP
        and the encapsulation of a lot of internal SELinux state.
        The SCTP changes are the result of a multi-month effort (maybe even a
        year or longer?) between the SELinux folks and the SCTP folks to add
        proper SELinux controls. A special thanks go to Richard for seeing
        this through and keeping the effort moving forward.
        The state encapsulation work is a bit of janitorial work that came out
        of some early work on SELinux namespacing. The question of namespacing
        is still an open one, but I believe there is some real value in the
        encapsulation work so we've split that out and are now sending that up
        to you"
      * tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: wrap AVC state
        selinux: wrap selinuxfs state
        selinux: fix handling of uninitialized selinux state in get_bools/classes
        selinux: Update SELinux SCTP documentation
        selinux: Fix ltp test connect-syscall failure
        selinux: rename the {is,set}_enforcing() functions
        selinux: wrap global selinux state
        selinux: fix typo in selinux_netlbl_sctp_sk_clone declaration
        selinux: Add SCTP support
        sctp: Add LSM hooks
        sctp: Add ip option support
        security: Add support for SCTP security hooks
        netlabel: If PF_INET6, check sk_buff ip header version
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 6ad11bdd
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "We didn't have anything to send for v4.16, but we're back with a
        little more than usual for v4.17.
        Eleven patches in total, most fall into the small fix category, but
        there are three non-trivial changes worth calling out:
         - the audit entry filter is being removed after deprecating it for
           quite a while (years of no one really using it because it turns out
           to be not very practical)
         - created our own version of "__mutex_owner()" because the locking
           folks were upset we were using theirs
         - improved our handling of kernel command line parameters to make
           them more forgiving
         - we fixed auditing of symlink operations
        Everything passes the audit-testsuite and as of a few minutes ago it
        merges well with your tree"
      * tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: add refused symlink to audit_names
        audit: remove path param from link denied function
        audit: link denied should not directly generate PATH record
        audit: make ANOM_LINK obey audit_enabled and audit_dummy_context
        audit: do not panic on invalid boot parameter
        audit: track the owner of the command mutex ourselves
        audit: return on memory error to avoid null pointer dereference
        audit: bail before bug check if audit disabled
        audit: deprecate the AUDIT_FILTER_ENTRY filter
        audit: session ID should not set arch quick field pointer
        audit: update bugtracker and source URIs
    • Linus Torvalds's avatar
      Merge tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 69824bcc
      Linus Torvalds authored
      Pull pstore updates from Kees Cook:
       "This cycle was almost entirely improvements to the pstore compression
        options, noted below:
         - Add lz4hc and 842 to pstore compression options (Geliang Tang)
         - Refactor to use crypto compression API (Geliang Tang)
         - Fix up Kconfig dependencies for compression (Arnd Bergmann)
         - Allow for run-time compression selection
         - Remove stack VLA usage"
      * tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        pstore: fix crypto dependencies
        pstore: Use crypto compress API
        pstore/ram: Do not use stack VLA for parity workspace
        pstore: Select compression at runtime
        pstore: Avoid size casts for 842 compression
        pstore: Add lz4hc and 842 compression support
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 3b54765c
      Linus Torvalds authored
      Merge updates from Andrew Morton:
       - a few misc things
       - ocfs2 updates
       - the v9fs maintainers have been missing for a long time. I've taken
         over v9fs patch slinging.
       - most of MM
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits)
        mm,oom_reaper: check for MMF_OOM_SKIP before complaining
        mm/ksm: fix interaction with THP
        mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t
        headers: untangle kmemleak.h from mm.h
        include/linux/mmdebug.h: make VM_WARN* non-rvals
        mm/page_isolation.c: make start_isolate_page_range() fail if already isolated
        mm: change return type to vm_fault_t
        mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
        mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory
        kernel/fork.c: detect early free of a live mm
        mm: make counting of list_lru_one::nr_items lockless
        mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static
        block_invalidatepage(): only release page if the full page was invalidated
        mm: kernel-doc: add missing parameter descriptions
        mm/swap.c: remove @cold parameter description for release_pages()
        mm/nommu: remove description of alloc_vm_area
        zram: drop max_zpage_size and use zs_huge_class_size()
        zsmalloc: introduce zs_huge_class_size()
        mm: fix races between swapoff and flush dcache
        fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
    • Linus Torvalds's avatar
      Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd · 3fd14cdc
      Linus Torvalds authored
      Pull MTD updates from Boris Brezillon:
       "MTD Core:
         - Remove support for asynchronous erase (not implemented by any of
           the existing drivers anyway)
         - Remove Cyrille from the list of SPI NOR and MTD maintainers
         - Fix kernel doc headers
         - Allow users to define the partitions parsers they want to test
           through a DT property (compatible of the partitions subnode)
         - Remove the bfin-async-flash driver (the only architecture using it
           has been removed)
         - Fix pagetest test
         - Add extra checks in mtd_erase()
         - Simplify the MTD partition creation logic and get rid of
        MTD Drivers:
         - Add endianness information to the physmap DT binding
         - Add Eon EN29LV400A IDs to JEDEC probe logic
         - Use %*ph where appropriate
        SPI NOR Drivers:
         - Make fsl-quaspi assign different names to MTD devices connected to
           the same QSPI controller
         - Remove an unneeded driver.bus assigned in the fsl-qspi driver
        NAND Core:
         - Prepare arrival of the SPI NAND subsystem by implementing a generic
           (interface-agnostic) layer to ease manipulation of NAND devices
         - Move onenand code base to the drivers/mtd/nand/ dir
         - Rework timing mode selection
         - Provide a generic way for NAND chip drivers to flag a specific
           GET/SET FEATURE operation as supported/unsupported
         - Stop embedding ONFI/JEDEC param page in nand_chip
        NAND Drivers:
         - Rework/cleanup of the mxc driver
         - Various cleanups in the vf610 driver
         - Migrate the fsmc and vf610 to ->exec_op()
         - Get rid of the pxa driver (replaced by marvell_nand)
         - Support ->setup_data_interface() in the GPMI driver
         - Fix probe error path in several drivers
         - Remove support for unused hw_syndrome mode in sunxi_nand
         - Various minor improvements"
      * tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd: (89 commits)
        dt-bindings: fsl-quadspi: Add the example of two SPI NOR
        mtd: fsl-quadspi: Distinguish the mtd device names
        mtd: nand: Fix some function description mismatches in core.c
        mtd: fsl-quadspi: Remove unneeded driver.bus assignment
        mtd: rawnand: marvell: Rename ->ecc_clk into ->core_clk
        mtd: rawnand: s3c2410: enhance the probe function error path
        mtd: rawnand: tango: fix probe function error path
        mtd: rawnand: sh_flctl: fix the probe function error path
        mtd: rawnand: omap2: fix the probe function error path
        mtd: rawnand: mxc: fix probe function error path
        mtd: rawnand: denali: fix probe function error path
        mtd: rawnand: davinci: fix probe function error path
        mtd: rawnand: cafe: fix probe function error path
        mtd: rawnand: brcmnand: fix probe function error path
        mtd: rawnand: sunxi: Stop supporting ECC_HW_SYNDROME mode
        mtd: rawnand: marvell: Fix clock resource by adding a register clock
        mtd: ftl: Use DIV_ROUND_UP()
        mtd: Fix some function description mismatches in mtdcore.c
        mtd: physmap_of: update struct map_info's swap as per map requirement
        dt-bindings: mtd-physmap: Add endianness supports
    • Linus Torvalds's avatar
      Merge tag 'for-4.17/dm-changes' of... · 83c7c18b
      Linus Torvalds authored
      Merge tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      Pull device mapper updates from Mike Snitzer:
       - DM core passthrough ioctl fix to retain reference to DM table, and
         that table's block devices, while issuing the ioctl to one of those
         block devices.
       - DM core passthrough ioctl fix to _not_ override the fmode_t used to
         issue the ioctl. Overriding by using the fmode_t that the block
         device was originally open with during DM table load is a liability.
       - Add DM core support for secure erase forwarding and update the DM
         linear and DM striped targets to support them.
       - A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write
         same, write zeroes) for targets that make use of the non-splitting IO
         variant (as is done for multipath or thinp when layered directly on
       - Allow DM targets to return a payload in response to a DM message that
         they are sent. This is useful for DM targets that would like to
         provide statistics data in response to DM messages.
       - Update DM bufio to support non-power-of-2 block sizes. Numerous other
         related changes prepare the DM bufio code for this support.
       - Fix DM crypt to use a bounded amount of memory across the entire
         system. This is to avoid OOM that can otherwise occur in response to
         certain pathological IO workloads (e.g. discarding a large DM crypt
       - Add a 'check_at_most_once' feature to the DM verity target to allow
         verity to be used on mobile devices that have very limited resources.
       - Fix the DM integrity target to fail early if a keyed algorithm (e.g.
         HMAC) is to be used but the key isn't set.
       - Add non-power-of-2 support to the DM unstripe target.
       - Eliminate the use of a Variable Length Array in the DM stripe target.
       - Update the DM log-writes target to record metadata (REQ_META flag).
       - DM raid fixes for its nosync status and some variable range issues.
      * tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
        dm: remove fmode_t argument from .prepare_ioctl hook
        dm: hold DM table for duration of ioctl rather than use blkdev_get
        dm raid: fix parse_raid_params() variable range issue
        dm verity: make verity_for_io_block static
        dm verity: add 'check_at_most_once' option to only validate hashes once
        dm bufio: don't embed a bio in the dm_buffer structure
        dm bufio: support non-power-of-two block sizes
        dm bufio: use slab cache for dm_buffer structure allocations
        dm bufio: reorder fields in dm_buffer structure
        dm bufio: relax alignment constraint on slab cache
        dm bufio: remove code that merges slab caches
        dm bufio: get rid of slab cache name allocations
        dm bufio: move dm-bufio.h to include/linux/
        dm bufio: delete outdated comment
        dm: add support for secure erase forwarding
        dm: backfill abnormal IO support to non-splitting IO submission
        dm raid: fix nosync status
        dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
        dm stripe: get rid of a Variable Length Array (VLA)
        dm log writes: record metadata flag for better flags record
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9022ca6b
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "Assorted stuff, including Christoph's I_DIRTY patches"
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: move I_DIRTY_INODE to fs.h
        ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
        ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
        gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
        fs: fold open_check_o_direct into do_dentry_open
        vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
        vfs: make sure struct filename->iname is word-aligned
        get rid of pointless includes of fs_struct.h
        [poll] annotate SAA6588_CMD_POLL users
    • Bjorn Helgaas's avatar
      Merge remote-tracking branch 'lorenzo/pci/cadence' into next · 5f764419
      Bjorn Helgaas authored
      * lorenzo/pci/cadence:
        MAINTAINERS: Add missing /drivers/pci/cadence directory entry
    • Tetsuo Handa's avatar
      mm,oom_reaper: check for MMF_OOM_SKIP before complaining · 97b1255c
      Tetsuo Handa authored
      I got "oom_reaper: unable to reap pid:" messages when the victim thread
      was blocked inside free_pgtables() (which occurred after returning from
      unmap_vmas() and setting MMF_OOM_SKIP).  We don't need to complain when
      exit_mmap() already set MMF_OOM_SKIP.
        Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
        oom_reaper: unable to reap pid:7558 (a.out)
        a.out           D13272  7558   6931 0x00100084
        Call Trace:
      Link: http://lkml.kernel.org/r/201803221946.DHG65638.VFJHFtOSQLOMOF@I-love.SAKURA.ne.jpSigned-off-by: 's avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: 's avatarDavid Rientjes <rientjes@google.com>
      Acked-by: 's avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Claudio Imbrenda's avatar
      mm/ksm: fix interaction with THP · 77da2ba0
      Claudio Imbrenda authored
      This patch fixes a corner case for KSM.  When two pages belong or
      belonged to the same transparent hugepage, and they should be merged,
      KSM fails to split the page, and therefore no merging happens.
      This bug can be reproduced by:
      * making sure ksm is running (in case disabling ksmtuned)
      * enabling transparent hugepages
      * allocating a THP-aligned 1-THP-sized buffer
        e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
      * filling it with the same values
        e.g. memset(p, 42, 1<<21)
      * performing madvise to make it mergeable
        e.g. madvise(p, 1<<21, MADV_MERGEABLE)
      * waiting for KSM to perform a few scans
      The expected outcome is that the all the pages get merged (1 shared and
      the rest sharing); the actual outcome is that no pages get merged (1
      unshared and the rest volatile)
      The reason of this behaviour is that we increase the reference count
      once for both pages we want to merge, but if they belong to the same
      hugepage (or compound page), the reference counter used in both cases is
      the one of the head of the compound page.  This means that
      split_huge_page will find a value of the reference counter too high and
      will fail.
      This patch solves this problem by testing if the two pages to merge
      belong to the same hugepage when attempting to merge them.  If so, the
      hugepage is split safely.  This means that the hugepage is not split if
      not necessary.
      Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.comSigned-off-by: 's avatarClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Co-authored-by: 's avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Stefan Agner's avatar
      mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t · 644d87dc
      Stefan Agner authored
      This fixes a warning shown when phys_addr_t is 32-bit int when compiling
      with clang:
        mm/memblock.c:927:15: warning: implicit conversion from 'unsigned long long'
              to 'phys_addr_t' (aka 'unsigned int') changes value from
              18446744073709551615 to 4294967295 [-Wconstant-conversion]
                                        r->base : ULLONG_MAX;
        ./include/linux/kernel.h:30:21: note: expanded from macro 'ULLONG_MAX'
        #define ULLONG_MAX      (~0ULL)
      Link: http://lkml.kernel.org/r/20180319005645.29051-1-stefan@agner.chSigned-off-by: 's avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Randy Dunlap's avatar
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap authored
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: 's avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: 's avatarIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Michal Hocko's avatar
      include/linux/mmdebug.h: make VM_WARN* non-rvals · 91241681
      Michal Hocko authored
      At present the construct
      	if (VM_WARN(...))
      will compile OK with CONFIG_DEBUG_VM=y and will fail with
      CONFIG_DEBUG_VM=n.  The reason is that VM_{WARN,BUG}* have always been
      special wrt.  {WARN/BUG}* and never generate any code when DEBUG_VM is
      disabled.  So we cannot really use it in conditionals.
      We considered changing things so that this construct works in both cases
      but that might cause unwanted code generation with CONFIG_DEBUG_VM=n.
      It is safer and simpler to make the build fail in both cases.
      [akpm@linux-foundation.org: changelog]
      Signed-off-by: 's avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mike Kravetz's avatar
      mm/page_isolation.c: make start_isolate_page_range() fail if already isolated · 2c7452a0
      Mike Kravetz authored
      start_isolate_page_range() is used to set the migrate type of a set of
      pageblocks to MIGRATE_ISOLATE while attempting to start a migration
      operation.  It assumes that only one thread is calling it for the
      specified range.  This routine is used by CMA, memory hotplug and
      gigantic huge pages.  Each of these users synchronize access to the
      range within their subsystem.  However, two subsystems (CMA and gigantic
      huge pages for example) could attempt operations on the same range.  If
      this happens, one thread may 'undo' the work another thread is doing.
      This can result in pageblocks being incorrectly left marked as
      MIGRATE_ISOLATE and therefore not available for page allocation.
      What is ideally needed is a way to synchronize access to a set of
      pageblocks that are undergoing isolation and migration.  The only thing
      we know about these pageblocks is that they are all in the same zone.  A
      per-node mutex is too coarse as we want to allow multiple operations on
      different ranges within the same zone concurrently.  Instead, we will
      use the migration type of the pageblocks themselves as a form of
      start_isolate_page_range sets the migration type on a set of page-
      blocks going in order from the one associated with the smallest pfn to
      the largest pfn.  The zone lock is acquired to check and set the
      migration type.  When going through the list of pageblocks check if
      MIGRATE_ISOLATE is already set.  If so, this indicates another thread is
      working on this pageblock.  We know exactly which pageblocks we set, so
      clean up by undo those and return -EBUSY.
      This allows start_isolate_page_range to serve as a synchronization
      mechanism and will allow for more general use of callers making use of
      these interfaces.  Update comments in alloc_contig_range to reflect this
      new functionality.
      Each CPU holds the associated zone lock to modify or examine the
      migration type of a pageblock.  And, it will only examine/update a
      single pageblock per lock acquire/release cycle.
      Link: http://lkml.kernel.org/r/20180309224731.16978-1-mike.kravetz@oracle.comSigned-off-by: 's avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Souptick Joarder's avatar
      mm: change return type to vm_fault_t · 1c8f4220
      Souptick Joarder authored
      The plan for these patches is to introduce the typedef, initially just
      as documentation ("These functions should return a VM_FAULT_ status").
      We'll trickle the patches to individual drivers/filesystems in through
      the maintainers, as far as possible.  Then we'll change the typedef to
      an unsigned int and break the compilation of any unconverted
      vmf_insert_page(), vmf_insert_mixed() and vmf_insert_pfn() are three
      newly added functions.  The various drivers/filesystems where return
      value of fault(), huge_fault(), page_mkwrite() and pfn_mkwrite() get
      converted, will need them.  These functions will return correct
      VM_FAULT_ code based on err value.
      We've had bugs before where drivers returned -EFOO.  And we have this
      silly inefficiency where vm_insert_xxx() return an errno which (afaict)
      every driver then converts into a VM_FAULT code.  In many cases drivers
      failed to return correct VM_FAULT code value despite of vm_insert_xxx()
      fails.  We have indentified and clean up all those existing bugs and
      silly inefficiencies in driver/filesystems by adding these three new
      inline wrappers.  As mentioned above, we will trickle those patches to
      individual drivers/filesystems in through maintainers after these three
      wrapper functions are merged.
      Eventually we can convert vm_insert_xxx() into vmf_insert_xxx() and
      remove these inline wrappers, but these are a good intermediate step.
      Link: http://lkml.kernel.org/r/20180310162351.GA7422@jordon-HP-15-Notebook-PCSigned-off-by: 's avatarSouptick Joarder <jrdr.linux@gmail.com>
      Acked-by: 's avatarMichal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • David Rientjes's avatar
      mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes · d46078b2
      David Rientjes authored
      Since the 2.6 kernel, the oom killer has slightly biased away from
      CAP_SYS_ADMIN processes by discounting some of its memory usage in
      comparison to other processes.
      This has always been implicit and nothing exactly relies on the
      Gaurav notices that __task_cred() can dereference a potentially freed
      pointer if the task under consideration is exiting because a reference
      to the task_struct is not held.
      Remove the CAP_SYS_ADMIN bias so that all processes are treated equally.
      If any CAP_SYS_ADMIN process would like to be biased against, it is
      always allowed to adjust /proc/pid/oom_score_adj.
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803071548510.6996@chino.kir.corp.google.comSigned-off-by: 's avatarDavid Rientjes <rientjes@google.com>
      Reported-by: 's avatarGaurav Kohli <gkohli@codeaurora.org>
      Acked-by: 's avatarMichal Hocko <mhocko@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • David Rientjes's avatar
      mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory · 5ecd9d40
      David Rientjes authored
      Kswapd will not wakeup if per-zone watermarks are not failing or if too
      many previous attempts at background reclaim have failed.
      This can be true if there is a lot of free memory available.  For high-
      order allocations, kswapd is responsible for waking up kcompactd for
      background compaction.  If the zone is not below its watermarks or
      reclaim has recently failed (lots of free memory, nothing left to
      reclaim), kcompactd does not get woken up.
      When __GFP_DIRECT_RECLAIM is not allowed, allow kcompactd to still be
      woken up even if kswapd will not reclaim.  This allows high-order
      allocations, such as thp, to still trigger background compaction even
      when the zone has an abundance of free memory.
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803111659420.209721@chino.kir.corp.google.comSigned-off-by: 's avatarDavid Rientjes <rientjes@google.com>
      Acked-by: 's avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mark Rutland's avatar
      kernel/fork.c: detect early free of a live mm · 3eda69c9
      Mark Rutland authored
      KASAN splats indicate that in some cases we free a live mm, then
      continue to access it, with potentially disastrous results.  This is
      likely due to a mismatched mmdrop() somewhere in the kernel, but so far
      the culprit remains elusive.
      Let's have __mmdrop() verify that the mm isn't live for the current
      task, similar to the existing check for init_mm.  This way, we can catch
      this class of issue earlier, and without requiring KASAN.
      Currently, idle_task_exit() leaves active_mm stale after it switches to
      init_mm.  This isn't harmful, but will trigger the new assertions, so we
      must adjust idle_task_exit() to update active_mm.
      Link: http://lkml.kernel.org/r/20180312140103.19235-1-mark.rutland@arm.comSigned-off-by: 's avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Kirill Tkhai's avatar
      mm: make counting of list_lru_one::nr_items lockless · 0c7c1bed
      Kirill Tkhai authored
      During the reclaiming slab of a memcg, shrink_slab iterates over all
      registered shrinkers in the system, and tries to count and consume
      objects related to the cgroup.  In case of memory pressure, this behaves
      bad: I observe high system time and time spent in list_lru_count_one()
      for many processes on RHEL7 kernel.
      This patch makes list_lru_node::memcg_lrus rcu protected, that allows to
      skip taking spinlock in list_lru_count_one().
      Shakeel Butt with the patch observes significant perf graph change.  He
      Setup: running a fork-bomb in a memcg of 200MiB on a 8GiB and 4 vcpu
      VM and recording the trace with 'perf record -g -a'.
      The trace without the patch:
      +  34.19%     fb.sh  [kernel.kallsyms]  [k] queued_spin_lock_slowpath
      +  30.77%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_lock
      +   3.53%     fb.sh  [kernel.kallsyms]  [k] list_lru_count_one
      +   2.26%     fb.sh  [kernel.kallsyms]  [k] super_cache_count
      +   1.68%     fb.sh  [kernel.kallsyms]  [k] shrink_slab
      +   0.59%     fb.sh  [kernel.kallsyms]  [k] down_read_trylock
      +   0.48%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
      +   0.38%     fb.sh  [kernel.kallsyms]  [k] shrink_node_memcg
      +   0.32%     fb.sh  [kernel.kallsyms]  [k] queue_work_on
      +   0.26%     fb.sh  [kernel.kallsyms]  [k] count_shadow_nodes
      With the patch:
      +   0.16%     swapper  [kernel.kallsyms]    [k] default_idle
      +   0.13%     oom_reaper  [kernel.kallsyms]    [k] mutex_spin_on_owner
      +   0.05%     perf  [kernel.kallsyms]    [k] copy_user_generic_string
      +   0.05%     init.real  [kernel.kallsyms]    [k] wait_consider_task
      +   0.05%     kworker/0:0  [kernel.kallsyms]    [k] finish_task_switch
      +   0.04%     kworker/2:1  [kernel.kallsyms]    [k] finish_task_switch
      +   0.04%     kworker/3:1  [kernel.kallsyms]    [k] finish_task_switch
      +   0.04%     kworker/1:0  [kernel.kallsyms]    [k] finish_task_switch
      +   0.03%     binary  [kernel.kallsyms]    [k] copy_page
      Thanks Shakeel for the testing.
      [ktkhai@virtuozzo.com: v2]
        Link: http://lkml.kernel.org/r/151203869520.3915.2587549826865799173.stgit@localhost.localdomain
      Link: http://lkml.kernel.org/r/150583358557.26700.8490036563698102569.stgit@localhost.localdomainSigned-off-by: 's avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: 's avatarShakeel Butt <shakeelb@google.com>
      Acked-by: 's avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Colin Ian King's avatar
      mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static · f5c754d6
      Colin Ian King authored
      The bool enable_vma_readahead and swap_vma_readahead() are local to the
      source and do not need to be in global scope, so make them static.
      Cleans up sparse warnings:
        mm/swap_state.c:41:6: warning: symbol 'enable_vma_readahead' was not declared. Should it be static?
        mm/swap_state.c:742:13: warning: symbol 'swap_vma_readahead' was not declared. Should it be static?
      Link: http://lkml.kernel.org/r/20180223164852.5159-1-colin.king@canonical.comSigned-off-by: 's avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: 's avatar"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Jeff Moyer's avatar
      block_invalidatepage(): only release page if the full page was invalidated · 3172485f
      Jeff Moyer authored
      Prior to commit d47992f8 ("mm: change invalidatepage prototype to
      accept length"), an offset of 0 meant that the full page was being
      invalidated.  After that commit, we need to instead check the length.
      Jan said:
      : The only possible issue is that try_to_release_page() was called more
      : often than necessary.  Otherwise the issue is harmless but still it's good
      : to have this fixed.
      Link: http://lkml.kernel.org/r/x49fu5rtnzs.fsf@segfault.boston.devel.redhat.com
      Fixes: d47992f8 ("mm: change invalidatepage prototype to accept length")
      Signed-off-by: 's avatarJeff Moyer <jmoyer@redhat.com>
      Reviewed-by: 's avatarJan Kara <jack@suse.cz>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mike Rapoport's avatar
    • Mike Rapoport's avatar
      mm/swap.c: remove @cold parameter description for release_pages() · 002843de
      Mike Rapoport authored
      The 'cold' parameter was removed from release_pages function by commit
      c6f92f9f ("mm: remove cold parameter for release_pages").
      Update the description to match the code.
      Link: http://lkml.kernel.org/r/1519585191-10180-3-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: 's avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mike Rapoport's avatar
      mm/nommu: remove description of alloc_vm_area · e48e3c59
      Mike Rapoport authored
      The alloc_mm_area in nommu is a stub, but its description states it
      allocates kernel address space.  Remove the description to make the code
      and the documentation agree.
      Link: http://lkml.kernel.org/r/1519585191-10180-2-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: 's avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Sergey Senozhatsky's avatar
      zram: drop max_zpage_size and use zs_huge_class_size() · 60f5921a
      Sergey Senozhatsky authored
      Remove ZRAM's enforced "huge object" value and use zsmalloc huge-class
      watermark instead, which makes more sense.
      - I used a 1G zram device, LZO compression back-end, original
        data set size was 444MB. Looking at zsmalloc classes stats the
        test ended up to be pretty fair.
      zram mm_stat
      498978816 191482495 199831552        0 199831552    15634        0
      zsmalloc classes
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0           380        380        304                4        0
         254  4096           0            0         10620      10620      10620                1        0
       Total                 7           46        106982     106187      48787                         0
      zram mm_stat
      498978816 182579184 194248704        0 194248704    15628        0
      zsmalloc classes
       class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
         151  2448           0            0          1240       1240        744                3        0
         168  2720           0            0          4200       4200       2800                2        0
         190  3072           0            0         10100      10100       7575                3        0
         202  3264           0            0          7180       7180       5744                4        0
         254  4096           0            0          3820       3820       3820                1        0
       Total                 8           45        106959     106193      47424                         0
      As we can see, we reduced the number of objects stored in class-4096,
      because a huge number of objects which we previously forcibly stored in
      class-4096 now stored in non-huge class-3264.  This results in lower
      memory consumption:
      - zsmalloc now uses 47424 physical pages, which is less than 48787 pages
        zsmalloc used before.
      - objects that we store in class-3264 share zspages.  That's why overall
        the number of pages that both class-4096 and class-3264 consumed went
        down from 10924 to 9564.
      [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
        Link: http://lkml.kernel.org/r/20180314081833.1096-3-sergey.senozhatsky@gmail.com
      Link: http://lkml.kernel.org/r/20180306070639.7389-3-sergey.senozhatsky@gmail.comSigned-off-by: 's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: 's avatarMinchan Kim <minchan@kernel.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Sergey Senozhatsky's avatar
      zsmalloc: introduce zs_huge_class_size() · 010b495e
      Sergey Senozhatsky authored
      Patch series "zsmalloc/zram: drop zram's max_zpage_size", v3.
      ZRAM's max_zpage_size is a bad thing.  It forces zsmalloc to store
      normal objects as huge ones, which results in bigger zsmalloc memory
      usage.  Drop it and use actual zsmalloc huge-class value when decide if
      the object is huge or not.
      This patch (of 2):
      Not every object can be share its zspage with other objects, e.g.  when
      the object is as big as zspage or nearly as big a zspage.  For such
      objects zsmalloc has a so called huge class - every object which belongs
      to huge class consumes the entire zspage (which consists of a physical
      page).  On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is
      3264, so starting down from size 3264, objects can share page(-s) and
      thus minimize memory wastage.
      ZRAM, however, has its own statically defined watermark for huge
      objects, namely "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every
      object larger than this watermark (3072) as a PAGE_SIZE object, in other
      words, to a huge class, while zsmalloc can keep some of those objects in
      non-huge classes.  This results in increased memory consumption.
      zsmalloc knows better if the object is huge or not.  Introduce
      zs_huge_class_size() function which tells if the given object can be
      stored in one of non-huge classes or not.  This will let us to drop
      ZRAM's huge object watermark and fully rely on zsmalloc when we decide
      if the object is huge.
      [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
        Link: http://lkml.kernel.org/r/20180314081833.1096-2-sergey.senozhatsky@gmail.com
      Link: http://lkml.kernel.org/r/20180306070639.7389-2-sergey.senozhatsky@gmail.comSigned-off-by: 's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: 's avatarMinchan Kim <minchan@kernel.org>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>