1. 21 Feb, 2019 1 commit
  2. 13 Feb, 2019 1 commit
    • Qian Cai's avatar
      Revert "mm: use early_pfn_to_nid in page_ext_init" · 2f1ee091
      Qian Cai authored
      This reverts commit fe53ca54 ("mm: use early_pfn_to_nid in
      page_ext_init").
      
      When booting a system with "page_owner=on",
      
      start_kernel
        page_ext_init
          invoke_init_callbacks
            init_section_page_ext
              init_page_owner
                init_early_allocated_pages
                  init_zones_in_node
                    init_pages_in_zone
                      lookup_page_ext
                        page_to_nid
      
      The issue here is that page_to_nid() will not work since some page flags
      have no node information until later in page_alloc_init_late() due to
      DEFERRED_STRUCT_PAGE_INIT.  Hence, it could trigger an out-of-bounds
      access with an invalid nid.
      
        UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
        index 7 is out of range for type 'zone [5]'
      
      Also, kernel will panic since flags were poisoned earlier with,
      
      CONFIG_DEBUG_VM_PGFLAGS=y
      CONFIG_NODE_NOT_IN_PAGE_FLAGS=n
      
      start_kernel
        setup_arch
          pagetable_init
            paging_init
              sparse_init
                sparse_init_nid
                  memblock_alloc_try_nid_raw
      
      It did not handle it well in init_pages_in_zone() which ends up calling
      page_to_nid().
      
        page:ffffea0004200000 is uninitialized and poisoned
        raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
        raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
        page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
        page_owner info is not active (free page?)
        kernel BUG at include/linux/mm.h:990!
        RIP: 0010:init_page_owner+0x486/0x520
      
      This means that assumptions behind commit fe53ca54 ("mm: use
      early_pfn_to_nid in page_ext_init") are incomplete.  Therefore, revert
      the commit for now.  A proper way to move the page_owner initialization
      to sooner is to hook into memmap initialization.
      
      Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f1ee091
  3. 01 Feb, 2019 2 commits
  4. 14 Jan, 2019 1 commit
    • Paul Burton's avatar
      kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7 · 16fd20aa
      Paul Burton authored
      When building using GCC 4.7 or older, -ffunction-sections & the -pg flag
      used by ftrace are incompatible. This causes warnings or build failures
      (where -Werror applies) such as the following:
      
        arch/mips/generic/init.c:
          error: -ffunction-sections disabled; it makes profiling impossible
      
      This used to be taken into account by the ordering of calls to cc-option
      from within the top-level Makefile, which was introduced by commit
      90ad4052 ("kbuild: avoid conflict between -ffunction-sections and
      -pg on gcc-4.7"). Unfortunately this was broken when the
      CONFIG_LD_DEAD_CODE_DATA_ELIMINATION cc-option check was moved to
      Kconfig in commit e85d1d65 ("kbuild: test dead code/data elimination
      support in Kconfig"), because the flags used by this check no longer
      include -pg.
      
      Fix this by not allowing CONFIG_LD_DEAD_CODE_DATA_ELIMINATION to be
      enabled at the same time as ftrace/CONFIG_FUNCTION_TRACER when building
      using GCC 4.7 or older.
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: e85d1d65 ("kbuild: test dead code/data elimination support in Kconfig")
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      16fd20aa
  5. 06 Jan, 2019 1 commit
    • Masahiro Yamada's avatar
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada authored
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  6. 04 Jan, 2019 3 commits
  7. 28 Dec, 2018 1 commit
    • Qian Cai's avatar
      debugobjects: call debug_objects_mem_init eariler · a9ee3a63
      Qian Cai authored
      The current value of the early boot static pool size, 1024 is not big
      enough for systems with large number of CPUs with timer or/and workqueue
      objects selected.  As the results, systems have 60+ CPUs with both timer
      and workqueue objects enabled could trigger "ODEBUG: Out of memory.
      ODEBUG disabled".
      
      Some debug objects are allocated during the early boot.  Enabling some
      options like timers or workqueue objects may increase the size required
      significantly with large number of CPUs.  For example,
      
      CONFIG_DEBUG_OBJECTS_TIMERS:
      No. CPUs x 2 (worker pool) objects:
      start_kernel
        workqueue_init_early
          init_worker_pool
            init_timer_key
              debug_object_init
      
      plus No. CPUs objects (CONFIG_HIGH_RES_TIMERS):
      sched_init
        hrtick_rq_init
          hrtimer_init
      
      CONFIG_DEBUG_OBJECTS_WORK:
      No. CPUs objects:
      vmalloc_init
        __init_work
      
      plus No. CPUs x 6 (workqueue) objects:
      workqueue_init_early
        alloc_workqueue
          __alloc_workqueue_key
            alloc_and_link_pwqs
              init_pwq
      
      Also, plus No. CPUs objects:
      perf_event_init
        __init_srcu_struct
          init_srcu_struct_fields
            init_srcu_struct_nodes
              __init_work
      
      However, none of the things are actually used or required before
      debug_objects_mem_init() is invoked, so just move the call right before
      vmalloc_init().
      
      According to tglx, "the reason why the call is at this place in
      start_kernel() is historical.  It's because back in the days when
      debugobjects were added the memory allocator was enabled way later than
      today."
      
      Link: http://lkml.kernel.org/r/20181126102407.1836-1-cai@gmx.usSigned-off-by: default avatarQian Cai <cai@gmx.us>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9ee3a63
  8. 20 Dec, 2018 1 commit
  9. 14 Dec, 2018 1 commit
  10. 30 Nov, 2018 3 commits
    • Li Zhijian's avatar
      initramfs: clean old path before creating a hardlink · 7c0950d4
      Li Zhijian authored
      sys_link() can fail due to the new path already existing.  This case
      ofen occurs when we use a concated initrd, for example:
      
      1) prepare a basic rootfs, it contains a regular files rc.local
      lizhijian@:~/yocto-tiny-i386-2016-04-22$ cat etc/rc.local
       #!/bin/sh
       echo "Running /etc/rc.local..."
      yocto-tiny-i386-2016-04-22$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rootfs.cgz
      
      2) create a extra initrd which also includes a etc/rc.local
      lizhijian@:~/lkp-x86_64/etc$ echo "append initrd" >rc.local
      lizhijian@:~/lkp/lkp-x86_64/etc$ cat rc.local
      append initrd
      lizhijian@:~/lkp/lkp-x86_64/etc$ ln rc.local rc.local.hardlink
      append initrd
      lizhijian@:~/lkp/lkp-x86_64/etc$ stat rc.local rc.local.hardlink
        File: 'rc.local'
        Size: 14        	Blocks: 8          IO Block: 4096   regular file
      Device: 801h/2049d	Inode: 11296086    Links: 2
      Access: (0664/-rw-rw-r--)  Uid: ( 1002/lizhijian)   Gid: ( 1002/lizhijian)
      Access: 2018-11-15 16:08:28.654464815 +0800
      Modify: 2018-11-15 16:07:57.514903210 +0800
      Change: 2018-11-15 16:08:24.180228872 +0800
       Birth: -
        File: 'rc.local.hardlink'
        Size: 14        	Blocks: 8          IO Block: 4096   regular file
      Device: 801h/2049d	Inode: 11296086    Links: 2
      Access: (0664/-rw-rw-r--)  Uid: ( 1002/lizhijian)   Gid: ( 1002/lizhijian)
      Access: 2018-11-15 16:08:28.654464815 +0800
      Modify: 2018-11-15 16:07:57.514903210 +0800
      Change: 2018-11-15 16:08:24.180228872 +0800
       Birth: -
      
      lizhijian@:~/lkp/lkp-x86_64$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rc-local.cgz
      lizhijian@:~/lkp/lkp-x86_64$ gzip -dc ../rc-local.cgz | cpio -t
      .
      etc
      etc/rc.local.hardlink <<< it will be extracted first at this initrd
      etc/rc.local
      
      3) concate 2 initrds and boot
      lizhijian@:~/lkp$ cat rootfs.cgz rc-local.cgz >concate-initrd.cgz
      lizhijian@:~/lkp$ qemu-system-x86_64 -nographic -enable-kvm -cpu host -smp 1 -m 1024 -kernel ~/lkp/linux/arch/x86/boot/bzImage -append "console=ttyS0 earlyprint=ttyS0 ignore_loglevel" -initrd ./concate-initr.cgz -serial stdio -nodefaults
      
      In this case, sys_link(2) will fail and return -EEXIST, so we can only get
      the rc.local at rootfs.cgz instead of rc-local.cgz
      
      [akpm@linux-foundation.org: move code to avoid forward declaration]
      Link: http://lkml.kernel.org/r/1542352368-13299-1-git-send-email-lizhijian@cn.fujitsu.comSigned-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Cc: Philip Li <philip.li@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Li Zhijian <zhijianx.li@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c0950d4
    • Johannes Weiner's avatar
      psi: make disabling/enabling easier for vendor kernels · e0c27447
      Johannes Weiner authored
      Mel Gorman reports a hackbench regression with psi that would prohibit
      shipping the suse kernel with it default-enabled, but he'd still like
      users to be able to opt in at little to no cost to others.
      
      With the current combination of CONFIG_PSI and the psi_disabled bool set
      from the commandline, this is a challenge.  Do the following things to
      make it easier:
      
      1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
         to enable CONFIG_PSI in their kernel but leave the feature disabled
         unless a user requests it at boot-time.
      
         To avoid double negatives, rename psi_disabled= to psi=.
      
      2. Make psi_disabled a static branch to eliminate any branch costs
         when the feature is disabled.
      
      In terms of numbers before and after this patch, Mel says:
      
      : The following is a comparision using CONFIG_PSI=n as a baseline against
      : your patch and a vanilla kernel
      :
      :                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
      :                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
      : Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
      : Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
      : Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
      : Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
      : Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
      : Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
      : Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
      : Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
      : Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
      :
      : It's showing that the vanilla kernel takes a hit (as the bisection
      : indicated it would) and that disabling PSI by default is reasonably
      : close in terms of performance for this particular workload on this
      : particular machine so;
      
      Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0c27447
    • Sai Praneeth Prakhya's avatar
      x86/efi: Move efi_<reserve/free>_boot_services() to arch/x86 · 47c33a09
      Sai Praneeth Prakhya authored
      efi_<reserve/free>_boot_services() are x86 specific quirks and as such
      should be in asm/efi.h, so move them from linux/efi.h. Also, call
      efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
      specific call and ideally shouldn't be part of init/main.c
      Signed-off-by: default avatarSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Snowberg <eric.snowberg@oracle.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jon Hunter <jonathanh@nvidia.com>
      Cc: Julien Thierry <julien.thierry@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: YiFei Zhu <zhuyifei1999@gmail.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181129171230.18699-7-ard.biesheuvel@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      47c33a09
  11. 27 Nov, 2018 1 commit
  12. 26 Nov, 2018 2 commits
  13. 19 Nov, 2018 1 commit
  14. 07 Nov, 2018 1 commit
    • Jens Axboe's avatar
      block: remove dead elevator code · a1ce35fa
      Jens Axboe authored
      This removes a bunch of core and elevator related code. On the core
      front, we remove anything related to queue running, draining,
      initialization, plugging, and congestions. We also kill anything
      related to request allocation, merging, retrieval, and completion.
      
      Remove any checking for single queue IO schedulers, as they no
      longer exist. This means we can also delete a bunch of code related
      to request issue, adding, completion, etc - and all the SQ related
      ops and helpers.
      
      Also kill the load_default_modules(), as all that did was provide
      for a way to load the default single queue elevator.
      Tested-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a1ce35fa
  15. 31 Oct, 2018 5 commits
    • Mike Rapoport's avatar
      memblock: stop using implicit alignment to SMP_CACHE_BYTES · 7e1c4e27
      Mike Rapoport authored
      When a memblock allocation APIs are called with align = 0, the alignment
      is implicitly set to SMP_CACHE_BYTES.
      
      Implicit alignment is done deep in the memblock allocator and it can
      come as a surprise.  Not that such an alignment would be wrong even
      when used incorrectly but it is better to be explicit for the sake of
      clarity and the prinicple of the least surprise.
      
      Replace all such uses of memblock APIs with the 'align' parameter
      explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
      in the memblock internal allocation functions.
      
      For the case when memblock APIs are used via helper functions, e.g.  like
      iommu_arena_new_node() in Alpha, the helper functions were detected with
      Coccinelle's help and then manually examined and updated where
      appropriate.
      
      The direct memblock APIs users were updated using the semantic patch below:
      
      @@
      expression size, min_addr, max_addr, nid;
      @@
      (
      |
      - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
      nid)
      |
      - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
      nid)
      |
      - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
      |
      - memblock_alloc(size, 0)
      + memblock_alloc(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_raw(size, 0)
      + memblock_alloc_raw(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_from(size, 0, min_addr)
      + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
      |
      - memblock_alloc_nopanic(size, 0)
      + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_low(size, 0)
      + memblock_alloc_low(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_low_nopanic(size, 0)
      + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_from_nopanic(size, 0, min_addr)
      + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
      |
      - memblock_alloc_node(size, 0, nid)
      + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
      )
      
      [mhocko@suse.com: changelog update]
      [akpm@linux-foundation.org: coding-style fixes]
      [rppt@linux.ibm.com: fix missed uses of implicit alignment]
        Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
      Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e1c4e27
    • Mike Rapoport's avatar
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport authored
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
    • Mike Rapoport's avatar
      memblock: replace alloc_bootmem with memblock_alloc · 2a5bda5a
      Mike Rapoport authored
      The alloc_bootmem(size) is a shortcut for allocation of SMP_CACHE_BYTES
      aligned memory. When the align parameter of memblock_alloc() is 0, the
      alignment is implicitly set to SMP_CACHE_BYTES and thus alloc_bootmem(size)
      and memblock_alloc(size, 0) are equivalent.
      
      The conversion is done using the following semantic patch:
      
      @@
      expression size;
      @@
      - alloc_bootmem(size)
      + memblock_alloc(size, 0)
      
      Link: http://lkml.kernel.org/r/1536927045-23536-22-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a5bda5a
    • Mike Rapoport's avatar
      memblock: remove _virt from APIs returning virtual address · eb31d559
      Mike Rapoport authored
      The conversion is done using
      
      sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
      	$(git grep -l memblock_virt_alloc)
      
      Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb31d559
    • Nikolaus Voss's avatar
      init/do_mounts.c: add root=PARTLABEL=<name> support · f027c34d
      Nikolaus Voss authored
      Support referencing the root partition label from GPT as argument
      to the root= option on the kernel command line in analogy to
      referencing the partition uuid as root=PARTUUID=<uuid>.
      
      Specifying the partition label instead of the uuid is often much
      easier, e.g. in embedded environments when there is an
      A/B rootfs partition scheme for interruptible firmware updates
      (i.e. rootfsA/ rootfsB).
      
      The partition label can be queried with the blkid command.
      
      Link: http://lkml.kernel.org/r/20180822060904.828E510665E@pc-niv.weinmann.comSigned-off-by: default avatarNikolaus Voss <nikolaus.voss@loewensteinmedical.de>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Sasha Levin <Alexander.Levin@microsoft.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f027c34d
  16. 26 Oct, 2018 2 commits
  17. 09 Oct, 2018 1 commit
    • Martin Schwidefsky's avatar
      init: add arch_call_rest_init to allow stack switching · 53c99bd6
      Martin Schwidefsky authored
      With CONFIG_VMAP_STACK=y the kernel stack of all tasks should be
      allocated in the vmalloc space. The initial stack used for all
      the early init code is in the init_thread_union. To be able to
      switch from this early stack to a properly allocated stack
      from vmalloc the architecture needs a switch-over point.
      
      Introduce the arch_call_rest_init() function with a weak definition
      in init/main.c with the only purpose to call rest_init() from the
      end of start_kernel(). The architecture override can then do the
      necessary magic to switch to the new vmalloc'ed stack.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      53c99bd6
  18. 02 Oct, 2018 1 commit
  19. 27 Sep, 2018 1 commit
    • Ard Biesheuvel's avatar
      jump_label: Annotate entries that operate on __init code earlier · 19483677
      Ard Biesheuvel authored
      Jump table entries are mostly read-only, with the exception of the
      init and module loader code that defuses entries that point into init
      code when the code being referred to is freed.
      
      For robustness, it would be better to move these entries into the
      ro_after_init section, but clearing the 'code' member of each jump
      table entry referring to init code at module load time races with the
      module_enable_ro() call that remaps the ro_after_init section read
      only, so we'd like to do it earlier.
      
      So given that whether such an entry refers to init code can be decided
      much earlier, we can pull this check forward. Since we may still need
      the code entry at this point, let's switch to setting a low bit in the
      'key' member just like we do to annotate the default state of a jump
      table entry.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Link: https://lkml.kernel.org/r/20180919065144.25010-8-ard.biesheuvel@linaro.org
      19483677
  20. 23 Aug, 2018 1 commit
  21. 22 Aug, 2018 5 commits
  22. 17 Aug, 2018 1 commit
  23. 12 Aug, 2018 1 commit
    • Linus Torvalds's avatar
      init: rename and re-order boot_cpu_state_init() · b5b1404d
      Linus Torvalds authored
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5b1404d
  24. 10 Aug, 2018 1 commit
  25. 09 Aug, 2018 1 commit
    • Eric W. Biederman's avatar
      signal: Don't restart fork when signals come in. · c3ad2c3b
      Eric W. Biederman authored
      Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
      report that a periodic signal received during fork can cause fork to
      continually restart preventing an application from making progress.
      
      The code was being overly pessimistic.  Fork needs to guarantee that a
      signal sent to multiple processes is logically delivered before the
      fork and just to the forking process or logically delivered after the
      fork to both the forking process and it's newly spawned child.  For
      signals like periodic timers that are always delivered to a single
      process fork can safely complete and let them appear to logically
      delivered after the fork().
      
      While examining this issue I also discovered that fork today will miss
      signals delivered to multiple processes during the fork and handled by
      another thread.  Similarly the current code will also miss blocked
      signals that are delivered to multiple process, as those signals will
      not appear pending during fork.
      
      Add a list of each thread that is currently forking, and keep on that
      list a signal set that records all of the signals sent to multiple
      processes.  When fork completes initialize the new processes
      shared_pending signal set with it.  The calculate_sigpending function
      will see those signals and set TIF_SIGPENDING causing the new task to
      take the slow path to userspace to handle those signals.  Making it
      appear as if those signals were received immediately after the fork.
      
      It is not possible to send real time signals to multiple processes and
      exceptions don't go to multiple processes, which means that that are
      no signals sent to multiple processes that require siginfo.  This
      means it is safe to not bother collecting siginfo on signals sent
      during fork.
      
      The sigaction of a child of fork is initially the same as the
      sigaction of the parent process.  So a signal the parent ignores the
      child will also initially ignore.  Therefore it is safe to ignore
      signals sent to multiple processes and ignored by the forking process.
      
      Signals sent to only a single process or only a single thread and delivered
      during fork are treated as if they are received after the fork, and generally
      not dealt with.  They won't cause any problems.
      
      V2: Added removal from the multiprocess list on failure.
      V3: Use -ERESTARTNOINTR directly
      V4: - Don't queue both SIGCONT and SIGSTOP
          - Initialize signal_struct.multiprocess in init_task
          - Move setting of shared_pending to before the new task
            is visible to signals.  This prevents signals from comming
            in before shared_pending.signal is set to delayed.signal
            and being lost.
      V5: - rework list add and delete to account for idle threads
      v6: - Use sigdelsetmask when removing stop signals
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
      Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
      Reported-by: default avatarmajiang <ma.jiang@zte.com.cn>
      Fixes: 4a2c7a78 ("[PATCH] make fork() atomic wrt pgrp/session signals")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      c3ad2c3b