Skip to content
Snippets Groups Projects
  1. Jul 17, 2019
  2. Jul 15, 2019
  3. Jul 12, 2019
    • Alexander Potapenko's avatar
      mm: init: report memory auto-initialization features at boot time · 23a5c8cb
      Alexander Potapenko authored
      Print the currently enabled stack and heap initialization modes.
      
      Stack initialization is enabled by a config flag, while heap
      initialization is configured at boot time with defaults being set in the
      config.  It's more convenient for the user to have all information about
      these hardening measures in one place at boot, so the user can reason
      about the expected behavior of the running system.
      
      The possible options for stack are:
       - "all" for CONFIG_INIT_STACK_ALL;
       - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL;
       - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF;
       - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER;
       - "off" otherwise.
      
      Depending on the values of init_on_alloc and init_on_free boottime options
      we also report "heap alloc" and "heap free" as "on"/"off".
      
      In the init_on_free mode initializing pages at boot time may take a while,
      so print a notice about that as well.  This depends on how much memory is
      installed, the memory bandwidth, etc.  On a relatively modern x86 system,
      it takes about 0.75s/GB to wipe all memory:
      
        [    0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on
        [    0.419765] mem auto-init: clearing system memory may take some time...
        [   12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved)
      
      Link: http://lkml.kernel.org/r/20190617151050.92663-3-glider@google.com
      
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Sandeep Patil <sspatil@android.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Kaiwan N Billimoria <kaiwan@kaiwantech.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23a5c8cb
  4. Jul 09, 2019
  5. Jul 08, 2019
    • Masahiro Yamada's avatar
      kbuild: compile-test exported headers to ensure they are self-contained · d6fc9fcb
      Masahiro Yamada authored
      Multiple people have suggested compile-testing UAPI headers to ensure
      they can be really included from user-space. "make headers_check" is
      obviously not enough to catch bugs, and we often leak unresolved
      references to user-space.
      
      Use the new header-test-y syntax to implement it. Please note exported
      headers are compile-tested with a completely different set of compiler
      flags. The header search path is set to $(objtree)/usr/include since
      exported headers should not include unexported ones.
      
      We use -std=gnu89 for the kernel space since the kernel code highly
      depends on GNU extensions. On the other hand, UAPI headers should be
      written in more standardized C, so they are compiled with -std=c90.
      This will emit errors if C++ style comments, the keyword 'inline', etc.
      are used. Please use C style comments (/* ... */), '__inline__', etc.
      in UAPI headers.
      
      There is additional compiler requirement to enable this test because
      many of UAPI headers include <stdlib.h>, <sys/ioctl.h>, <sys/time.h>,
      etc. directly or indirectly. You cannot use kernel.org pre-built
      toolchains [1] since they lack <stdlib.h>.
      
      I reused CONFIG_CC_CAN_LINK to check the system header availability.
      The intention is slightly different, but a compiler that can link
      userspace programs provide system headers.
      
      For now, a lot of headers need to be excluded because they cannot
      be compiled standalone, but this is a good start point.
      
      [1] https://mirrors.edge.kernel.org/pub/tools/crosstool/index.html
      
      
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarSam Ravnborg <sam@ravnborg.org>
      d6fc9fcb
  6. Jul 07, 2019
  7. Jul 05, 2019
    • Al Viro's avatar
      mnt_init(): call shmem_init() unconditionally · 037f11b4
      Al Viro authored
      
      No point having two call sites (earlier in init_rootfs() from
      mnt_init() in case we are going to use shmem-style rootfs,
      later from do_basic_setup() unconditionally), along with the
      logics in shmem_init() itself to make the second call a no-op...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      037f11b4
    • Al Viro's avatar
      don't bother with registering rootfs · fd3e007f
      Al Viro authored
      
      init_mount_tree() can get to rootfs_fs_type directly and that simplifies
      a lot of things.  We don't need to register it, we don't need to look
      it up *and* we don't need to bother with preventing subsequent userland
      mounts.  That's the way we should've done that from the very beginning.
      
      There is a user-visible change, namely the disappearance of "rootfs"
      from /proc/filesystems.  Note that it's been unmountable all along
      and it didn't show up in /proc/mounts; however, it *is* a user-visible
      change and theoretically some script might've been using its presence
      in /proc/filesystems to tell 2.4.11+ from earlier kernels.
      
      *IF* any complaints about behaviour change do show up, we could fake
      it in /proc/filesystems.  I very much doubt we'll have to, though.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      fd3e007f
    • Al Viro's avatar
      init_rootfs(): don't bother with init_ramfs_fs() · 14a253ce
      Al Viro authored
      
      the only thing done by the latter is making ramfs visible
      to mount(2); we don't need it there - rootfs is separate
      and, in fact, made visible to mount(2) in the same init_rootfs().
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      14a253ce
  8. Jun 29, 2019
    • Geert Uytterhoeven's avatar
      initramfs: fix populate_initrd_image() section mismatch · 4ada1e81
      Geert Uytterhoeven authored
      With gcc-4.6.3:
      
          WARNING: vmlinux.o(.text.unlikely+0x140): Section mismatch in reference from the function populate_initrd_image() to the variable .init.ramfs.info:__initramfs_size
          The function populate_initrd_image() references
          the variable __init __initramfs_size.
          This is often because populate_initrd_image lacks a __init
          annotation or the annotation of __initramfs_size is wrong.
      
          WARNING: vmlinux.o(.text.unlikely+0x14c): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:unpack_to_rootfs()
          The function populate_initrd_image() references
          the function __init unpack_to_rootfs().
          This is often because populate_initrd_image lacks a __init
          annotation or the annotation of unpack_to_rootfs is wrong.
      
          WARNING: vmlinux.o(.text.unlikely+0x198): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:xwrite()
          The function populate_initrd_image() references
          the function __init xwrite().
          This is often because populate_initrd_image lacks a __init
          annotation or the annotation of xwrite is wrong.
      
      Indeed, if the compiler decides not to inline populate_initrd_image(), a
      warning is generated.
      
      Fix this by adding the missing __init annotations.
      
      Link: http://lkml.kernel.org/r/20190617074340.12779-1-geert@linux-m68k.org
      
      
      Fixes: 7c184ecd ("initramfs: factor out a helper to populate the initrd image")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ada1e81
  9. Jun 24, 2019
    • Patrick Bellasi's avatar
      sched/uclamp: Add CPU's clamp buckets refcounting · 69842cba
      Patrick Bellasi authored
      
      Utilization clamping allows to clamp the CPU's utilization within a
      [util_min, util_max] range, depending on the set of RUNNABLE tasks on
      that CPU. Each task references two "clamp buckets" defining its minimum
      and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
      bucket is active if there is at least one RUNNABLE tasks enqueued on
      that CPU and refcounting that bucket.
      
      When a task is {en,de}queued {on,from} a rq, the set of active clamp
      buckets on that CPU can change. If the set of active clamp buckets
      changes for a CPU a new "aggregated" clamp value is computed for that
      CPU. This is because each clamp bucket enforces a different utilization
      clamp value.
      
      Clamp values are always MAX aggregated for both util_min and util_max.
      This ensures that no task can affect the performance of other
      co-scheduled tasks which are more boosted (i.e. with higher util_min
      clamp) or less capped (i.e. with higher util_max clamp).
      
      A task has:
         task_struct::uclamp[clamp_id]::bucket_id
      to track the "bucket index" of the CPU's clamp bucket it refcounts while
      enqueued, for each clamp index (clamp_id).
      
      A runqueue has:
         rq::uclamp[clamp_id]::bucket[bucket_id].tasks
      to track how many RUNNABLE tasks on that CPU refcount each
      clamp bucket (bucket_id) of a clamp index (clamp_id).
      It also has a:
         rq::uclamp[clamp_id]::bucket[bucket_id].value
      to track the clamp value of each clamp bucket (bucket_id) of a clamp
      index (clamp_id).
      
      The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
      needed to find a new MAX aggregated clamp value for a clamp_id. This
      operation is required only when it's dequeued the last task of a clamp
      bucket tracking the current MAX aggregated clamp value. In this case,
      the CPU is either entering IDLE or going to schedule a less boosted or
      more clamped task.
      The expected number of different clamp values configured at build time
      is small enough to fit the full unordered array into a single cache
      line, for configurations of up to 7 buckets.
      
      Add to struct rq the basic data structures required to refcount the
      number of RUNNABLE tasks for each clamp bucket. Add also the max
      aggregation required to update the rq's clamp value at each
      enqueue/dequeue event.
      
      Use a simple linear mapping of clamp values into clamp buckets.
      Pre-compute and cache bucket_id to avoid integer divisions at
      enqueue/dequeue time.
      
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      69842cba
  10. Jun 20, 2019
  11. Jun 15, 2019
  12. Jun 14, 2019
  13. Jun 03, 2019
  14. May 30, 2019
  15. May 24, 2019
  16. May 21, 2019
  17. May 18, 2019
  18. May 15, 2019
    • Dan Williams's avatar
      mm: shuffle initial free memory to improve memory-side-cache utilization · e900a918
      Dan Williams authored
      Patch series "mm: Randomize free memory", v10.
      
      This patch (of 3):
      
      Randomization of the page allocator improves the average utilization of
      a direct-mapped memory-side-cache.  Memory side caching is a platform
      capability that Linux has been previously exposed to in HPC
      (high-performance computing) environments on specialty platforms.  In
      that instance it was a smaller pool of high-bandwidth-memory relative to
      higher-capacity / lower-bandwidth DRAM.  Now, this capability is going
      to be found on general purpose server platforms where DRAM is a cache in
      front of higher latency persistent memory [1].
      
      Robert offered an explanation of the state of the art of Linux
      interactions with memory-side-caches [2], and I copy it here:
      
          It's been a problem in the HPC space:
          http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
      
          A kernel module called zonesort is available to try to help:
          https://software.intel.com/en-us/articles/xeon-phi-software
      
          and this abandoned patch series proposed that for the kernel:
          https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com
      
          Dan's patch series doesn't attempt to ensure buffers won't conflict, but
          also reduces the chance that the buffers will. This will make performance
          more consistent, albeit slower than "optimal" (which is near impossible
          to attain in a general-purpose kernel).  That's better than forcing
          users to deploy remedies like:
              "To eliminate this gradual degradation, we have added a Stream
               measurement to the Node Health Check that follows each job;
               nodes are rebooted whenever their measured memory bandwidth
               falls below 300 GB/s."
      
      A replacement for zonesort was merged upstream in commit cc9aec03
      ("x86/numa_emulation: Introduce uniform split capability").  With this
      numa_emulation capability, memory can be split into cache sized
      ("near-memory" sized) numa nodes.  A bind operation to such a node, and
      disabling workloads on other nodes, enables full cache performance.
      However, once the workload exceeds the cache size then cache conflicts
      are unavoidable.  While HPC environments might be able to tolerate
      time-scheduling of cache sized workloads, for general purpose server
      platforms, the oversubscribed cache case will be the common case.
      
      The worst case scenario is that a server system owner benchmarks a
      workload at boot with an un-contended cache only to see that performance
      degrade over time, even below the average cache performance due to
      excessive conflicts.  Randomization clips the peaks and fills in the
      valleys of cache utilization to yield steady average performance.
      
      Here are some performance impact details of the patches:
      
      1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
         3X speedup in a contrived case that tries to force cache conflicts.
         The contrived cased used the numa_emulation capability to force an
         instance of the benchmark to be run in two of the near-memory sized
         numa nodes.  If both instances were placed on the same emulated they
         would fit and cause zero conflicts.  While on separate emulated nodes
         without randomization they underutilized the cache and conflicted
         unnecessarily due to the in-order allocation per node.
      
      2/ A well known Java server application benchmark was run with a heap
         size that exceeded cache size by 3X.  The cache conflict rate was 8%
         for the first run and degraded to 21% after page allocator aging.  With
         randomization enabled the rate levelled out at 11%.
      
      3/ A MongoDB workload did not observe measurable difference in
         cache-conflict rates, but the overall throughput dropped by 7% with
         randomization in one case.
      
      4/ Mel Gorman ran his suite of performance workloads with randomization
         enabled on platforms without a memory-side-cache and saw a mix of some
         improvements and some losses [3].
      
      While there is potentially significant improvement for applications that
      depend on low latency access across a wide working-set, the performance
      may be negligible to negative for other workloads.  For this reason the
      shuffle capability defaults to off unless a direct-mapped
      memory-side-cache is detected.  Even then, the page_alloc.shuffle=0
      parameter can be specified to disable the randomization on those systems.
      
      Outside of memory-side-cache utilization concerns there is potentially
      security benefit from randomization.  Some data exfiltration and
      return-oriented-programming attacks rely on the ability to infer the
      location of sensitive data objects.  The kernel page allocator, especially
      early in system boot, has predictable first-in-first out behavior for
      physical pages.  Pages are freed in physical address order when first
      onlined.
      
      Quoting Kees:
          "While we already have a base-address randomization
           (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
           memory layouts would certainly be using the predictability of
           allocation ordering (i.e. for attacks where the base address isn't
           important: only the relative positions between allocated memory).
           This is common in lots of heap-style attacks. They try to gain
           control over ordering by spraying allocations, etc.
      
           I'd really like to see this because it gives us something similar
           to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."
      
      While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
      caches it leaves vast bulk of memory to be predictably in order allocated.
      However, it should be noted, the concrete security benefits are hard to
      quantify, and no known CVE is mitigated by this randomization.
      
      Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
      a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
      are initially populated with free memory at boot and at hotplug time.  Do
      this based on either the presence of a page_alloc.shuffle=Y command line
      parameter, or autodetection of a memory-side-cache (to be added in a
      follow-on patch).
      
      The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
      pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e.  10,
      4MB this trades off randomization granularity for time spent shuffling.
      MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
      while still showing memory-side cache behavior improvements, and the
      expectation that the security implications of finer granularity
      randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM.  The
      performance impact of the shuffling appears to be in the noise compared to
      other memory initialization work.
      
      This initial randomization can be undone over time so a follow-on patch is
      introduced to inject entropy on page free decisions.  It is reasonable to
      ask if the page free entropy is sufficient, but it is not enough due to
      the in-order initial freeing of pages.  At the start of that process
      putting page1 in front or behind page0 still keeps them close together,
      page2 is still near page1 and has a high chance of being adjacent.  As
      more pages are added ordering diversity improves, but there is still high
      page locality for the low address pages and this leads to no significant
      impact to the cache conflict rate.
      
      [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
      [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
      [3]: https://lkml.org/lkml/2018/10/12/309
      
      [dan.j.williams@intel.com: fix shuffle enable]
        Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
      [cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
        Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
      Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Robert Elliott <elliott@hpe.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e900a918
  19. May 14, 2019
  20. May 05, 2019
  21. Apr 30, 2019
  22. Apr 29, 2019
    • Joel Fernandes (Google)'s avatar
      init/config: Do not select BUILD_BIN2C for IKCONFIG · bc0c6045
      Joel Fernandes (Google) authored
      
      Since commit 13610aa9 ("kernel/configs: use .incbin directive to
      embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
      it from being selected in Kconfig.
      
      Reviewed-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc0c6045
    • Joel Fernandes (Google)'s avatar
      Provide in-kernel headers to make extending kernel easier · 43d8ce9d
      Joel Fernandes (Google) authored
      Introduce in-kernel headers which are made available as an archive
      through proc (/proc/kheaders.tar.xz file). This archive makes it
      possible to run eBPF and other tracing programs that need to extend the
      kernel for tracing purposes without any dependency on the file system
      having headers.
      
      A github PR is sent for the corresponding BCC patch at:
      https://github.com/iovisor/bcc/pull/2312
      
      
      
      On Android and embedded systems, it is common to switch kernels but not
      have kernel headers available on the file system. Further once a
      different kernel is booted, any headers stored on the file system will
      no longer be useful. This is an issue even well known to distros.
      By storing the headers as a compressed archive within the kernel, we can
      avoid these issues that have been a hindrance for a long time.
      
      The best way to use this feature is by building it in. Several users
      have a need for this, when they switch debug kernels, they do not want to
      update the filesystem or worry about it where to store the headers on
      it. However, the feature is also buildable as a module in case the user
      desires it not being part of the kernel image. This makes it possible to
      load and unload the headers from memory on demand. A tracing program can
      load the module, do its operations, and then unload the module to save
      kernel memory. The total memory needed is 3.3MB.
      
      By having the archive available at a fixed location independent of
      filesystem dependencies and conventions, all debugging tools can
      directly refer to the fixed location for the archive, without concerning
      with where the headers on a typical filesystem which significantly
      simplifies tooling that needs kernel headers.
      
      The code to read the headers is based on /proc/config.gz code and uses
      the same technique to embed the headers.
      
      Other approaches were discussed such as having an in-memory mountable
      filesystem, but that has drawbacks such as requiring an in-kernel xz
      decompressor which we don't have today, and requiring usage of 42 MB of
      kernel memory to host the decompressed headers at anytime. Also this
      approach is simpler than such approaches.
      
      Reviewed-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43d8ce9d
  23. Apr 20, 2019
    • Kees Cook's avatar
      random: move rand_initialize() earlier · d5553523
      Kees Cook authored
      
      Right now rand_initialize() is run as an early_initcall(), but it only
      depends on timekeeping_init() (for mixing ktime_get_real() into the
      pools). However, the call to boot_init_stack_canary() for stack canary
      initialization runs earlier, which triggers a warning at boot:
      
      random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0
      
      Instead, this moves rand_initialize() to after timekeeping_init(), and moves
      canary initialization here as well.
      
      Note that this warning may still remain for machines that do not have
      UEFI RNG support (which initializes the RNG pools during setup_arch()),
      or for x86 machines without RDRAND (or booting without "random.trust=on"
      or CONFIG_RANDOM_TRUST_CPU=y).
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d5553523
  24. Apr 19, 2019
    • Dan Williams's avatar
      init: initialize jump labels before command line option parsing · 6041186a
      Dan Williams authored
      When a module option, or core kernel argument, toggles a static-key it
      requires jump labels to be initialized early.  While x86, PowerPC, and
      ARM64 arrange for jump_label_init() to be called before parse_args(),
      ARM does not.
      
        Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
        page_alloc_shuffle+0x12c/0x1ac
        static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
        before call to jump_label_init()
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted
        5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
        Hardware name: ARM Integrator/CP (Device Tree)
        [<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18)
        [<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24)
        [<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108)
        [<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c)
        [<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>]
        (page_alloc_shuffle+0x12c/0x1ac)
        [<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48)
        [<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350)
        [<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488)
      
      Move the fallback call to jump_label_init() to occur before
      parse_args().
      
      The redundant calls to jump_label_init() in other archs are left intact
      in case they have static key toggling use cases that are even earlier
      than option parsing.
      
      Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarGuenter Roeck <groeck@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Russell King <rmk@armlinux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6041186a
Loading