Skip to content
Snippets Groups Projects
  1. Aug 31, 2019
  2. Aug 30, 2019
    • Thomas Bogendoerfer's avatar
      MIPS: SGI-IP27: restructure ioc3 register access · cbe7d517
      Thomas Bogendoerfer authored
      
      Break up the big ioc3 register struct into functional pieces to
      make use in sub-function drivers more straightforward. And while
      doing that get rid of all volatile access by using readX/writeX.
      
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbe7d517
    • Thomas Bogendoerfer's avatar
      MIPS: SGI-IP27: remove ioc3 ethernet init · 688125a6
      Thomas Bogendoerfer authored
      
      Removed not needed disabling of ethernet interrupts in IP27 platform code.
      
      Acked-by: default avatarPaul Burton <paul.burton@mips.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      688125a6
    • Kim Phillips's avatar
      perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops · 0f4cd769
      Kim Phillips authored
      
      When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
      sample bias, IBS hardware preloads the least significant 7 bits of
      current count (IbsOpCurCnt) with random values, such that, after the
      interrupt is handled and counting resumes, the next sample taken
      will be slightly perturbed.
      
      The current count bitfield is in the IBS execution control h/w register,
      alongside the maximum count field.
      
      Currently, the IBS driver writes that register with the maximum count,
      leaving zeroes to fill the current count field, thereby overwriting
      the random bits the hardware preloaded for itself.
      
      Fix the driver to actually retain and carry those random bits from the
      read of the IBS control register, through to its write, instead of
      overwriting the lower current count bits with zeroes.
      
      Tested with:
      
      perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>
      
      'perf annotate' output before:
      
       15.70  65:   addsd     %xmm0,%xmm1
       17.30        add       $0x1,%rax
       15.88        cmp       %rdx,%rax
                    je        82
       17.32  72:   test      $0x1,%al
                    jne       7c
        7.52        movapd    %xmm1,%xmm0
        5.90        jmp       65
        8.23  7c:   sqrtsd    %xmm1,%xmm0
       12.15        jmp       65
      
      'perf annotate' output after:
      
       16.63  65:   addsd     %xmm0,%xmm1
       16.82        add       $0x1,%rax
       16.81        cmp       %rdx,%rax
                    je        82
       16.69  72:   test      $0x1,%al
                    jne       7c
        8.30        movapd    %xmm1,%xmm0
        8.13        jmp       65
        8.24  7c:   sqrtsd    %xmm1,%xmm0
        8.39        jmp       65
      
      Tested on Family 15h and 17h machines.
      
      Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
      and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
      affect their operation.
      
      It is unknown why commit db98c5fa ("perf/x86: Implement 64-bit
      counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
      field; the number of preloaded random bits has always been 7, AFAICT.
      
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
      Cc: <x86@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: "Namhyung Kim" <namhyung@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com
      0f4cd769
    • Josh Hunt's avatar
      perf/x86/intel: Restrict period on Nehalem · 44d3bbb6
      Josh Hunt authored
      
      We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
      some cases when using perf:
      
      perfevents: irq loop stuck!
      WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
      ...
      RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
      ...
      Call Trace:
      <NMI>
      ? perf_event_nmi_handler+0x2e/0x50
      ? intel_pmu_save_and_restart+0x50/0x50
      perf_event_nmi_handler+0x2e/0x50
      nmi_handle+0x6e/0x120
      default_do_nmi+0x3e/0x100
      do_nmi+0x102/0x160
      end_repeat_nmi+0x16/0x50
      ...
      ? native_write_msr+0x6/0x20
      ? native_write_msr+0x6/0x20
      </NMI>
      intel_pmu_enable_event+0x1ce/0x1f0
      x86_pmu_start+0x78/0xa0
      x86_pmu_enable+0x252/0x310
      __perf_event_task_sched_in+0x181/0x190
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      finish_task_switch+0x158/0x260
      __schedule+0x2f6/0x840
      ? hrtimer_start_range_ns+0x153/0x210
      schedule+0x32/0x80
      schedule_hrtimeout_range_clock+0x8a/0x100
      ? hrtimer_init+0x120/0x120
      ep_poll+0x2f7/0x3a0
      ? wake_up_q+0x60/0x60
      do_epoll_wait+0xa9/0xc0
      __x64_sys_epoll_wait+0x1a/0x20
      do_syscall_64+0x4e/0x110
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fdeb1e96c03
      ...
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: acme@kernel.org
      Cc: Josh Hunt <johunt@akamai.com>
      Cc: bpuranda@akamai.com
      Cc: mingo@redhat.com
      Cc: jolsa@redhat.com
      Cc: tglx@linutronix.de
      Cc: namhyung@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
      44d3bbb6
  3. Aug 29, 2019
    • Thomas Gleixner's avatar
      x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text · 7af01450
      Thomas Gleixner authored
      
      ftrace does not use text_poke() for enabling trace functionality. It uses
      its own mechanism and flips the whole kernel text to RW and back to RO.
      
      The CPA rework removed a loop based check of 4k pages which tried to
      preserve a large page by checking each 4k page whether the change would
      actually cover all pages in the large page.
      
      This resulted in endless loops for nothing as in testing it turned out that
      it actually never preserved anything. Of course testing missed to include
      ftrace, which is the one and only case which benefitted from the 4k loop.
      
      As a consequence enabling function tracing or ftrace based kprobes results
      in a full 4k split of the kernel text, which affects iTLB performance.
      
      The kernel RO protection is the only valid case where this can actually
      preserve large pages.
      
      All other static protections (RO data, data NX, PCI, BIOS) are truly
      static.  So a conflict with those protections which results in a split
      should only ever happen when a change of memory next to a protected region
      is attempted. But these conflicts are rightfully splitting the large page
      to preserve the protected regions. In fact a change to the protected
      regions itself is a bug and is warned about.
      
      Add an exception for the static protection check for kernel text RO when
      the to be changed region spawns a full large page which allows to preserve
      the large mappings. This also prevents the syslog to be spammed about CPA
      violations when ftrace is used.
      
      The exception needs to be removed once ftrace switched over to text_poke()
      which avoids the whole issue.
      
      Fixes: 585948f4 ("x86/mm/cpa: Avoid the 4k pages check completely")
      Reported-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSong Liu <songliubraving@fb.com>
      Reviewed-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de
      7af01450
    • Gustavo A. R. Silva's avatar
      nds32: Mark expected switch fall-throughs · 7c9eb2db
      Gustavo A. R. Silva authored
      
      Mark switch cases where we are expecting to fall through.
      
      This patch fixes the following warnings (Building: allmodconfig nds32):
      
      include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/nds32/kernel/signal.c:362:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/nds32/kernel/signal.c:315:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      7c9eb2db
    • Gustavo A. R. Silva's avatar
      ARC: unwind: Mark expected switch fall-through · 00a0c845
      Gustavo A. R. Silva authored
      
      Mark switch cases where we are expecting to fall through.
      
      This patch fixes the following warnings (Building: haps_hs_defconfig arc):
      
      arch/arc/kernel/unwind.c: In function ‘read_pointer’:
      ./include/linux/compiler.h:328:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
        do {        \
           ^
      ./include/linux/compiler.h:338:2: note: in expansion of macro ‘__compiletime_assert’
        __compiletime_assert(condition, msg, prefix, suffix)
        ^~~~~~~~~~~~~~~~~~~~
      ./include/linux/compiler.h:350:2: note: in expansion of macro ‘_compiletime_assert’
        _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
        ^~~~~~~~~~~~~~~~~~~
      ./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
       #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                           ^~~~~~~~~~~~~~~~~~
      ./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
        BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
        ^~~~~~~~~~~~~~~~
      arch/arc/kernel/unwind.c:573:3: note: in expansion of macro ‘BUILD_BUG_ON’
         BUILD_BUG_ON(sizeof(u32) != sizeof(value));
         ^~~~~~~~~~~~
      arch/arc/kernel/unwind.c:575:2: note: here
        case DW_EH_PE_native:
        ^~~~
      
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      00a0c845
  4. Aug 28, 2019
    • zhaoyang's avatar
      ARM: 8901/1: add a criteria for pfn_valid of arm · 5b3efa4f
      zhaoyang authored
      
      pfn_valid can be wrong when parsing a invalid pfn whose phys address
      exceeds BITS_PER_LONG as the MSB will be trimed when shifted.
      
      The issue originally arise from bellowing call stack, which corresponding to
      an access of the /proc/kpageflags from userspace with a invalid pfn parameter
      and leads to kernel panic.
      
      [46886.723249] c7 [<c031ff98>] (stable_page_flags) from [<c03203f8>]
      [46886.723264] c7 [<c0320368>] (kpageflags_read) from [<c0312030>]
      [46886.723280] c7 [<c0311fb0>] (proc_reg_read) from [<c02a6e6c>]
      [46886.723290] c7 [<c02a6e24>] (__vfs_read) from [<c02a7018>]
      [46886.723301] c7 [<c02a6f74>] (vfs_read) from [<c02a778c>]
      [46886.723315] c7 [<c02a770c>] (SyS_pread64) from [<c0108620>]
      (ret_fast_syscall+0x0/0x28)
      
      Signed-off-by: default avatarZhaoyang Huang <zhaoyang.huang@unisoc.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      5b3efa4f
    • Anup Patel's avatar
      RISC-V: Fix FIXMAP area corruption on RV32 systems · a256f2e3
      Anup Patel authored
      
      Currently, various virtual memory areas of Linux RISC-V are organized
      in increasing order of their virtual addresses is as follows:
      1. User space area (This is lowest area and starts at 0x0)
      2. FIXMAP area
      3. VMALLOC area
      4. Kernel area (This is highest area and starts at PAGE_OFFSET)
      
      The maximum size of user space aread is represented by TASK_SIZE.
      
      On RV32 systems, TASK_SIZE is defined as VMALLOC_START which causes the
      user space area to overlap the FIXMAP area. This allows user space apps
      to potentially corrupt the FIXMAP area and kernel OF APIs will crash
      whenever they access corrupted FDT in the FIXMAP area.
      
      On RV64 systems, TASK_SIZE is set to fixed 256GB and no other areas
      happen to overlap so we don't see any FIXMAP area corruptions.
      
      This patch fixes FIXMAP area corruption on RV32 systems by setting
      TASK_SIZE to FIXADDR_START. We also move FIXADDR_TOP, FIXADDR_SIZE,
      and FIXADDR_START defines to asm/pgtable.h so that we can avoid cyclic
      header includes.
      
      Signed-off-by: default avatarAnup Patel <anup.patel@wdc.com>
      Tested-by: default avatarAlistair Francis <alistair.francis@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      a256f2e3
    • Linus Torvalds's avatar
      x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning · 42e0e954
      Linus Torvalds authored
      
      One of the very few warnings I have in the current build comes from
      arch/x86/boot/edd.c, where I get the following with a gcc9 build:
      
         arch/x86/boot/edd.c: In function ‘query_edd’:
         arch/x86/boot/edd.c:148:11: warning: taking address of packed member of ‘struct boot_params’ may result in an unaligned pointer value [-Waddress-of-packed-member]
           148 |  mbrptr = boot_params.edd_mbr_sig_buffer;
               |           ^~~~~~~~~~~
      
      This warning triggers because we throw away all the CFLAGS and then make
      a new set for REALMODE_CFLAGS, so the -Wno-address-of-packed-member we
      added in the following commit is not present:
      
        6f303d60 ("gcc-9: silence 'address-of-packed-member' warning")
      
      The simplest solution for now is to adjust the warning for this version
      of CFLAGS as well, but it would definitely make sense to examine whether
      REALMODE_CFLAGS could be derived from CFLAGS, so that it picks up changes
      in the compiler flags environment automatically.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      42e0e954
    • René van Dorst's avatar
      dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the new phylink API · bd69baaa
      René van Dorst authored
      
      This patch the removes the recently added mediatek,physpeed property.
      Use the fixed-link property speed = <2500> to set the phy in 2.5Gbit.
      See mt7622-bananapi-bpi-r64.dts for a working example.
      
      Signed-off-by: default avatarRené van Dorst <opensource@vdorst.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd69baaa
  5. Aug 27, 2019
  6. Aug 26, 2019
    • Bandan Das's avatar
      x86/apic: Include the LDR when clearing out APIC registers · 558682b5
      Bandan Das authored
      
      Although APIC initialization will typically clear out the LDR before
      setting it, the APIC cleanup code should reset the LDR.
      
      This was discovered with a 32-bit KVM guest jumping into a kdump
      kernel. The stale bits in the LDR triggered a bug in the KVM APIC
      implementation which caused the destination mapping for VCPUs to be
      corrupted.
      
      Note that this isn't intended to paper over the KVM APIC bug. The kernel
      has to clear the LDR when resetting the APIC registers except when X2APIC
      is enabled.
      
      This lacks a Fixes tag because missing to clear LDR goes way back into pre
      git history.
      
      [ tglx: Made x2apic_enabled a function call as required ]
      
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
      558682b5
    • Bandan Das's avatar
      x86/apic: Do not initialize LDR and DFR for bigsmp · bae3a8d3
      Bandan Das authored
      Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
      bigsmp APIC implementation uses physical destination mode, but it
      nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
      multiple bit being set.
      
      This does not cause a functional problem because LDR and DFR are ignored
      when physical destination mode is active, but it triggered a problem on a
      32-bit KVM guest which jumps into a kdump kernel.
      
      The multiple bits set unearthed a bug in the KVM APIC implementation. The
      code which creates the logical destination map for VCPUs ignores the
      disabled state of the APIC and ends up overwriting an existing valid entry
      and as a result, APIC calibration hangs in the guest during kdump
      initialization.
      
      Remove the bogus LDR/DFR initialization.
      
      This is not intended to work around the KVM APIC bug. The LDR/DFR
      ininitalization is wrong on its own.
      
      The issue goes back into the pre git history. The fixes tag is the commit
      in the bitkeeper import which introduced bigsmp support in 2003.
      
        git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
      
      
      Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems")
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com
      
      bae3a8d3
    • Nick Desaulniers's avatar
    • Mischa Jonker's avatar
      ARCv2: IDU-intc: Add support for edge-triggered interrupts · 174ae4e9
      Mischa Jonker authored
      
      This adds support for an optional extra interrupt cell to specify edge
      vs level triggered. It is backward compatible with dts files with only
      one cell, and will default to level-triggered in such a case.
      
      Note that I had to make a change to idu_irq_set_affinity as well, as
      this function was setting the interrupt type to "level" unconditionally,
      since this was the only type supported previously.
      
      Signed-off-by: default avatarMischa Jonker <mischa.jonker@synopsys.com>
      Reviewed-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      174ae4e9
    • Sebastian Mayr's avatar
      uprobes/x86: Fix detection of 32-bit user mode · 9212ec7d
      Sebastian Mayr authored
      
      32-bit processes running on a 64-bit kernel are not always detected
      correctly, causing the process to crash when uretprobes are installed.
      
      The reason for the crash is that in_ia32_syscall() is used to determine the
      process's mode, which only works correctly when called from a syscall.
      
      In the case of uretprobes, however, the function is called from a exception
      and always returns 'false' on a 64-bit kernel. In consequence this leads to
      corruption of the process's return address.
      
      Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which
      is correct in any situation.
      
      [ tglx: Add a comment and the following historical info ]
      
      This should have been detected by the rename which happened in commit
      
        abfb9498 ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()")
      
      which states in the changelog:
      
          The is_ia32_task()/is_x32_task() function names are a big misnomer: they
          suggests that the compat-ness of a system call is a task property, which
          is not true, the compatness of a system call purely depends on how it
          was invoked through the system call layer.
          .....
      
      and then it went and blindly renamed every call site.
      
      Sadly enough this was already mentioned here:
      
         8faaed1b ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and
      arch_uretprobe_hijack_return_addr()")
      
      where the changelog says:
      
          TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
          not necessarily mean 32bit. Fortunately syscall-like insns can't be
          probed so it actually works, but it would be better to rename and
          use is_ia32_frame().
      
      and goes all the way back to:
      
          0326f5a9 ("uprobes/core: Handle breakpoint and singlestep exceptions")
      
      Oh well. 7+ years until someone actually tried a uretprobe on a 32bit
      process on a 64bit kernel....
      
      Fixes: 0326f5a9 ("uprobes/core: Handle breakpoint and singlestep exceptions")
      Signed-off-by: default avatarSebastian Mayr <me@sam.st>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st
      9212ec7d
    • Thomas Gleixner's avatar
      x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines · 3e5bedc2
      Thomas Gleixner authored
      
      Rahul Tanwar reported the following bug on DT systems:
      
      > 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is
      > updated to the end of hardware IRQ numbers but this is done only when IOAPIC
      > configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is
      > a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration
      > comes from devicetree.
      >
      > See dtb_add_ioapic() in arch/x86/kernel/devicetree.c
      >
      > In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base'
      > remains to zero initialized value. This means that for OF based systems,
      > virtual IRQ base will get set to zero.
      
      Such systems will very likely not even boot.
      
      For DT enabled machines ioapic_dynirq_base is irrelevant and not
      updated, so simply map the IRQ base 1:1 instead.
      
      Reported-by: default avatarRahul Tanwar <rahul.tanwar@linux.intel.com>
      Tested-by: default avatarRahul Tanwar <rahul.tanwar@linux.intel.com>
      Tested-by: default avatarAndy Shevchenko <andriy.shevchenko@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: alan@linux.intel.com
      Cc: bp@alien8.de
      Cc: cheol.yong.kim@intel.com
      Cc: qi-ming.wu@intel.com
      Cc: rahul.tanwar@intel.com
      Cc: rppt@linux.ibm.com
      Cc: tony.luck@intel.com
      Link: http://lkml.kernel.org/r/20190821081330.1187-1-rahul.tanwar@linux.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3e5bedc2
  7. Aug 25, 2019
  8. Aug 24, 2019
    • Julian Wiedmann's avatar
      s390/qeth: add TX NAPI support for IQD devices · e53edf74
      Julian Wiedmann authored
      
      Due to their large MTU and potentially low utilization of TX buffers,
      IQD devices in particular require fast TX recycling. This makes them
      a prime candidate for a TX NAPI path in qeth.
      
      qeth_tx_poll() uses the recently introduced qdio_inspect_queue() helper
      to poll the TX queue for completed buffers. To avoid hogging the CPU for
      too long, we yield to the stack after completing an entire queue's worth
      of buffers.
      While IQD is expected to transfer its buffers synchronously (and thus
      doesn't support TX interrupts), a timer covers for the odd case where a
      TX buffer doesn't complete synchronously. Currently this timer should
      only ever fire for
      (1) the mcast queue,
      (2) the occasional race, where the NAPI poll code observes an update to
          queue->used_buffers while the TX doorbell hasn't been issued yet.
      
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e53edf74
    • Julian Wiedmann's avatar
      s390/qdio: let drivers opt-out from Output Queue scanning · 313dc689
      Julian Wiedmann authored
      
      If a driver wants to use the new Output Queue poll code, then the qdio
      layer must disable its internal Queue scanning. Let the driver select
      this mode by passing a special scan_threshold of 0.
      
      As the scan_threshold is the same for all Output Queues, also move it
      into the main qdio_irq struct. This allows for fast opt-out checking, a
      driver is expected to operate either _all_ or none of its Output Queues
      in polling mode.
      
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Acked-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      313dc689
    • Julian Wiedmann's avatar
      s390/qdio: enable drivers to poll for Output completions · 7c47f5af
      Julian Wiedmann authored
      
      While commit d36deae7 ("qdio: extend API to allow polling") enhanced
      the qdio layer so that drivers can poll their Input Queues, we don't
      have the corresponding infrastructure for Output Queues yet.
      
      Factor out a helper that scans a single QDIO Queue, so that qeth can
      implement TX NAPI on top of it.
      While doing so, remove the duplicated tracking of the next-to-scan index
      (q->first_to_check vs q->first_to_kick) in this code path.
      
      qdio_handle_aobs() needs to move slightly upwards in the code hierarchy,
      so that it's still called from the polling path.
      
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Acked-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c47f5af
  9. Aug 23, 2019
    • Sean Christopherson's avatar
      x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 · b63f20a7
      Sean Christopherson authored
      
      Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
      avoid clobbering flags.
      
      KVM's emulator makes indirect calls into a jump table of sorts, where
      the destination of the CALL_NOSPEC is a small blob of code that performs
      fast emulation by executing the target instruction with fixed operands.
      
        adcb_al_dl:
           0x000339f8 <+0>:   adc    %dl,%al
           0x000339fa <+2>:   ret
      
      A major motiviation for doing fast emulation is to leverage the CPU to
      handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
      both an input and output to the target of CALL_NOSPEC.  Clobbering flags
      results in all sorts of incorrect emulation, e.g. Jcc instructions often
      take the wrong path.  Sans the nops...
      
        asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
           0x0003595a <+58>:  mov    0xc0(%ebx),%eax
           0x00035960 <+64>:  mov    0x60(%ebx),%edx
           0x00035963 <+67>:  mov    0x90(%ebx),%ecx
           0x00035969 <+73>:  push   %edi
           0x0003596a <+74>:  popf
           0x0003596b <+75>:  call   *%esi
           0x000359a0 <+128>: pushf
           0x000359a1 <+129>: pop    %edi
           0x000359a2 <+130>: mov    %eax,0xc0(%ebx)
           0x000359b1 <+145>: mov    %edx,0x60(%ebx)
      
        ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
           0x000359a8 <+136>: mov    -0x10(%ebp),%eax
           0x000359ab <+139>: and    $0x8d5,%edi
           0x000359b4 <+148>: and    $0xfffff72a,%eax
           0x000359b9 <+153>: or     %eax,%edi
           0x000359bd <+157>: mov    %edi,0x4(%ebx)
      
      For the most part this has gone unnoticed as emulation of guest code
      that can trigger fast emulation is effectively limited to MMIO when
      running on modern hardware, and MMIO is rarely, if ever, accessed by
      instructions that affect or consume flags.
      
      Breakage is almost instantaneous when running with unrestricted guest
      disabled, in which case KVM must emulate all instructions when the guest
      has invalid state, e.g. when the guest is in Big Real Mode during early
      BIOS.
      
      Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
      Fixes: 1a29b5b7 ("KVM: x86: Make indirect calls in emulator speculation safe")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.com
      b63f20a7
    • Lvqiang Huang's avatar
      ARM: 8897/1: check stmfd instruction using right shift · 69389837
      Lvqiang Huang authored
      
      In the commit ef41b5c9 ("ARM: make kernel oops easier to read"),
      -               .word   0xe92d0000 >> 10        @ stmfd sp!, {}
      +               .word   0xe92d0000 >> 11        @ stmfd sp!, {}
      then the shift need to change to 11.
      
      Signed-off-by: default avatarLvqiang Huang <Lvqiang.Huang@unisoc.com>
      Signed-off-by: default avatarChunyan Zhang <zhang.lyra@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      69389837
    • Doug Berger's avatar
      ARM: 8874/1: mm: only adjust sections of valid mm structures · c51bc12d
      Doug Berger authored
      
      A timing hazard exists when an early fork/exec thread begins
      exiting and sets its mm pointer to NULL while a separate core
      tries to update the section information.
      
      This commit ensures that the mm pointer is not NULL before
      setting its section parameters. The arguments provided by
      commit 11ce4b33 ("ARM: 8672/1: mm: remove tasklist locking
      from update_sections_early()") are equally valid for not
      requiring grabbing the task_lock around this check.
      
      Fixes: 08925c2f ("ARM: 8464/1: Update all mm structures with section adjustments")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Acked-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: Peng Fan <peng.fan@nxp.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      c51bc12d
  10. Aug 22, 2019
    • Johannes Berg's avatar
      um: fix time travel mode · e0917f87
      Johannes Berg authored
      
      Unfortunately, my build fix for when time travel mode isn't
      enabled broke time travel mode, because I forgot that we need
      to use the timer time after the timer has been marked disabled,
      and thus need to leave the time stored instead of zeroing it.
      
      Fix that by splitting the inline into two, so we can call only
      the _mode() one in the relevant code path.
      
      Fixes: b482e48d ("um: fix build without CONFIG_UML_TIME_TRAVEL_SUPPORT")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      e0917f87
  11. Aug 21, 2019
  12. Aug 20, 2019
  13. Aug 19, 2019
    • Tom Lendacky's avatar
      x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h · c49a0a80
      Tom Lendacky authored
      
      There have been reports of RDRAND issues after resuming from suspend on
      some AMD family 15h and family 16h systems. This issue stems from a BIOS
      not performing the proper steps during resume to ensure RDRAND continues
      to function properly.
      
      RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
      reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
      support using CPUID, including the kernel, will believe that RDRAND is
      not supported.
      
      Update the CPU initialization to clear the RDRAND CPUID bit for any family
      15h and 16h processor that supports RDRAND. If it is known that the family
      15h or family 16h system does not have an RDRAND resume issue or that the
      system will not be placed in suspend, the "rdrand=force" kernel parameter
      can be used to stop the clearing of the RDRAND CPUID bit.
      
      Additionally, update the suspend and resume path to save and restore the
      MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
      place after resuming from suspend.
      
      Note, that clearing the RDRAND CPUID bit does not prevent a processor
      that normally supports the RDRAND instruction from executing it. So any
      code that determined the support based on family and model won't #UD.
      
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chen Yu <yu.c.chen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
      Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "x86@kernel.org" <x86@kernel.org>
      Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
      c49a0a80
    • Kirill A. Shutemov's avatar
      x86/boot/compressed/64: Fix boot on machines with broken E820 table · 0a46fff2
      Kirill A. Shutemov authored
      
      BIOS on Samsung 500C Chromebook reports very rudimentary E820 table that
      consists of 2 entries:
      
        BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] usable
        BIOS-e820: [mem 0x00000000fffff000-0x00000000ffffffff] reserved
      
      It breaks logic in find_trampoline_placement(): bios_start lands on the
      end of the first 4k page and trampoline start gets placed below 0.
      
      Detect underflow and don't touch bios_start for such cases. It makes
      kernel ignore E820 table on machines that doesn't have two usable pages
      below BIOS_START_MAX.
      
      Fixes: 1b3a6264 ("x86/boot/compressed/64: Validate trampoline placement against E820")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203463
      Link: https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel.com
      0a46fff2
    • Thomas Gleixner's avatar
      x86/apic: Handle missing global clockevent gracefully · f897e60a
      Thomas Gleixner authored
      
      Some newer machines do not advertise legacy timers. The kernel can handle
      that situation if the TSC and the CPU frequency are enumerated by CPUID or
      MSRs and the CPU supports TSC deadline timer. If the CPU does not support
      TSC deadline timer the local APIC timer frequency has to be known as well.
      
      Some Ryzens machines do not advertize legacy timers, but there is no
      reliable way to determine the bus frequency which feeds the local APIC
      timer when the machine allows overclocking of that frequency.
      
      As there is no legacy timer the local APIC timer calibration crashes due to
      a NULL pointer dereference when accessing the not installed global clock
      event device.
      
      Switch the calibration loop to a non interrupt based one, which polls
      either TSC (if frequency is known) or jiffies. The latter requires a global
      clockevent. As the machines which do not have a global clockevent installed
      have a known TSC frequency this is a non issue. For older machines where
      TSC frequency is not known, there is no known case where the legacy timers
      do not exist as that would have been reported long ago.
      
      Reported-by: default avatarDaniel Drake <drake@endlessm.com>
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de
      Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
      f897e60a
    • Su Yanjun's avatar
      perf/x86: Fix typo in comment · 77d76032
      Su Yanjun authored
      
      No functional change.
      
      Signed-off-by: default avatarSu Yanjun <suyj.fnst@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1565945001-4413-1-git-send-email-suyj.fnst@cn.fujitsu.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      77d76032
  14. Aug 17, 2019
Loading