1. 21 May, 2019 1 commit
  2. 17 Apr, 2019 1 commit
    • Thomas Gleixner's avatar
      x86/exceptions: Split debug IST stack · 2a594d4c
      Thomas Gleixner authored
      
      
      The debug IST stack is actually two separate debug stacks to handle #DB
      recursion. This is required because the CPU starts always at top of stack
      on exception entry, which means on #DB recursion the second #DB would
      overwrite the stack of the first.
      
      The low level entry code therefore adjusts the top of stack on entry so a
      secondary #DB starts from a different stack page. But the stack pages are
      adjacent without a guard page between them.
      
      Split the debug stack into 3 stacks which are separated by guard pages. The
      3rd stack is never mapped into the cpu_entry_area and is only there to
      catch triple #DB nesting:
      
            --- top of DB_stack	<- Initial stack
            --- end of DB_stack
            	  guard page
      
            --- top of DB1_stack	<- Top of stack after entering first #DB
            --- end of DB1_stack
            	  guard page
      
            --- top of DB2_stack	<- Top of stack after entering second #DB
            --- end of DB2_stack
            	  guard page
      
      If DB2 would not act as the final guard hole, a second #DB would point the
      top of #DB stack to the stack below #DB1 which would be valid and not catch
      the not so desired triple nesting.
      
      The backing store does not allocate any memory for DB2 and its guard page
      as it is not going to be mapped into the cpu_entry_area.
      
       - Adjust the low level entry code so it adjusts top of #DB with the offset
         between the stacks instead of exception stack size.
      
       - Make the dumpstack code aware of the new stacks.
      
       - Adjust the in_debug_stack() implementation and move it into the NMI code
         where it belongs. As this is NMI hotpath code, it just checks the full
         area between top of DB_stack and bottom of DB1_stack without checking
         for the guard page. That's correct because the NMI cannot hit a
         stackpointer pointing to the guard page between DB and DB1 stack.  Even
         if it would, then the NMI operation still is unaffected, but the resume
         of the debug exception on the topmost DB stack will crash by touching
         the guard page.
      
        [ bp: Make exception_stack_names static const char * const ]
      Suggested-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de
      2a594d4c
  3. 06 Mar, 2019 1 commit
  4. 25 Oct, 2017 1 commit
    • Mark Rutland's avatar
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland authored
      
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6aa7de05
  5. 16 Aug, 2017 1 commit
  6. 10 Apr, 2017 1 commit
    • Borislav Petkov's avatar
      x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI · db47d5f8
      Borislav Petkov authored
      Apparently, some machines used to report DRAM errors through a PCI SERR
      NMI. This is why we have a call into EDAC in the NMI handler. See
      
        c0d12172
      
       ("drivers/edac: add new nmi rescan").
      
      From looking at the patch above, that's two drivers: e752x_edac.c and
      e7xxx_edac.c. Now, I wanna say those are old machines which are probably
      decommissioned already.
      
      Tony says that "[t]the newest CPU supported by either of those drivers
      is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
      folks are still using these ... but people that hold onto h/w for 7
      years generally cling to old s/w too ... so I'd guess it unlikely that
      we will get complaints for breaking these in upstream."
      
      So even if there is a small number still in use, we did load EDAC with
      edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
      which means a default EDAC setup without any parameters supplied on the
      command line or otherwise would never even log the error in the NMI
      handler because we're polling by default:
      
        inline int edac_handler_set(void)
        {
               if (edac_op_state == EDAC_OPSTATE_POLL)
                       return 0;
      
               return atomic_read(&edac_handlers);
        }
      
      So, long story short, I'd like to get rid of that nastiness called
      edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
      we ever have to do stuff like that again, it should be notifiers we're
      using and not some insanity like this one.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      db47d5f8
  7. 13 Mar, 2017 1 commit
  8. 02 Mar, 2017 2 commits
  9. 06 Jun, 2016 1 commit
    • Arnd Bergmann's avatar
      x86: include linux/ratelimit.h in nmi.c · c361db5c
      Arnd Bergmann authored
      
      
      When building random configurations, we now occasionally get a new
      build error:
      
         In file included from include/linux/kernel.h:13:0,
                          from include/linux/list.h:8,
                          from include/linux/preempt.h:10,
                          from include/linux/spinlock.h:50,
                          from arch/x86/kernel/nmi.c:13:
         arch/x86/kernel/nmi.c: In function 'nmi_max_handler':
         include/linux/printk.h:375:9: error: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE' [-Werror=implicit-int]
           static DEFINE_RATELIMIT_STATE(_rs,    \
                  ^
         arch/x86/kernel/nmi.c:110:2: note: in expansion of macro 'printk_ratelimited'
           printk_ratelimited(KERN_INFO
           ^~~~~~~~~~~~~~~~~~
      
      This was working before the rtc rework series because linux/ratelimit.h
      was included implictly through asm/mach_traps.h -> asm/mc146818rtc.h
      -> linux/mc146818rtc.h -> linux/rtc.h -> linux/device.h.
      
      We clearly shouldn't rely on this indirect inclusion, so this adds
      an explicit #include in the file that needs it.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Fixes: 5ab788d7
      
       ("rtc: cmos: move mc146818rtc code out of asm-generic/rtc.h")
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      c361db5c
  10. 08 Mar, 2016 1 commit
    • Kostenzer Felix's avatar
      x86/nmi: Mark 'ignore_nmis' as __read_mostly · 8e2a7f5b
      Kostenzer Felix authored
      
      
      ignore_nmis is used in two distinct places:
      
       1. modified through {stop,restart}_nmi by alternative_instructions
       2. read by do_nmi to determine if default_do_nmi should be called or not
      
      thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath.
      Signed-off-by: default avatarKostenzer Felix <fkostenzer@live.at>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8e2a7f5b
  11. 19 Dec, 2015 3 commits
    • Hidehiro Kawai's avatar
      x86/nmi: Save regs in crash dump on external NMI · b279d67d
      Hidehiro Kawai authored
      
      
      Now, multiple CPUs can receive an external NMI simultaneously by
      specifying the "apic_extnmi=all" command line parameter. When we take
      a crash dump by using external NMI with this option, we fail to save
      registers into the crash dump. This happens as follows:
      
        CPU 0                              CPU 1
        ================================   =============================
        receive an external NMI
        default_do_nmi()                   receive an external NMI
          spin_lock(&nmi_reason_lock)      default_do_nmi()
          io_check_error()                   spin_lock(&nmi_reason_lock)
            panic()                            busy loop
            ...
              kdump_nmi_shootdown_cpus()
                issue NMI IPI -----------> blocked until IRET
                                               busy loop...
      
      Here, since CPU 1 is in NMI context, an additional NMI from CPU 0
      remains unhandled until CPU 1 IRETs. However, CPU 1 will never execute
      IRET so the NMI is not handled and the callback function to save
      registers is never called.
      
      To solve this issue, we check if the IPI for crash dumping was issued
      while waiting for nmi_reason_lock to be released, and if so, call its
      callback function directly. If the IPI is not issued (e.g. kdump is
      disabled), the actual behavior doesn't change.
      Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20151210065245.4587.39316.stgit@softrs
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      b279d67d
    • Hidehiro Kawai's avatar
      panic, x86: Allow CPUs to save registers even if looping in NMI context · 58c5661f
      Hidehiro Kawai authored
      
      
      Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
      sends an NMI IPI to CPUs which haven't called panic() to stop them,
      save their register information and do some cleanups for crash dumping.
      However, if such a CPU is infinitely looping in NMI context, we fail to
      save its register information into the crash dump.
      
      For example, this can happen when unknown NMIs are broadcast to all
      CPUs as follows:
      
        CPU 0                             CPU 1
        ===========================       ==========================
        receive an unknown NMI
        unknown_nmi_error()
          panic()                         receive an unknown NMI
            spin_trylock(&panic_lock)     unknown_nmi_error()
            crash_kexec()                   panic()
                                              spin_trylock(&panic_lock)
                                              panic_smp_self_stop()
                                                infinite loop
              kdump_nmi_shootdown_cpus()
                issue NMI IPI -----------> blocked until IRET
                                                infinite loop...
      
      Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
      blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
      so the NMI is not handled and the callback function to save registers is
      never called.
      
      In practice, this can happen on some servers which broadcast NMIs to all
      CPUs when the NMI button is pushed.
      
      To save registers in this case, we need to:
      
        a) Return from NMI handler instead of looping infinitely
        or
        b) Call the callback function directly from the infinite loop
      
      Inherently, a) is risky because NMI is also used to prevent corrupted
      data from being propagated to devices.  So, we chose b).
      
      This patch does the following:
      
      1. Move the infinite looping of CPUs which haven't called panic() in NMI
         context (actually done by panic_smp_self_stop()) outside of panic() to
         enable us to refer pt_regs. Please note that panic_smp_self_stop() is
         still used for normal context.
      
      2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
         registers and do some cleanups after setting waiting_for_crash_ipi which
         is used for counting down the number of CPUs which handled the callback
      Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: lkml <linux-kernel@vger.kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
      
      
      [ Cleanup comments, fixup formatting. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      58c5661f
    • Hidehiro Kawai's avatar
      panic, x86: Fix re-entrance problem due to panic on NMI · 1717f209
      Hidehiro Kawai authored
      
      
      If panic on NMI happens just after panic() on the same CPU, panic() is
      recursively called. Kernel stalls, as a result, after failing to acquire
      panic_lock.
      
      To avoid this problem, don't call panic() in NMI context if we've
      already entered panic().
      
      For that, introduce nmi_panic() macro to reduce code duplication. In
      the case of panic on NMI, don't return from NMI handlers if another CPU
      already panicked.
      Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: lkml <linux-kernel@vger.kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
      
      
      [ Cleanup comments, fixup formatting. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1717f209
  12. 21 Jul, 2015 1 commit
  13. 17 Jul, 2015 2 commits
  14. 24 Apr, 2014 1 commit
    • Masami Hiramatsu's avatar
      kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation · 9326638c
      Masami Hiramatsu authored
      
      
      Use NOKPROBE_SYMBOL macro for protecting functions
      from kprobes instead of __kprobes annotation under
      arch/x86.
      
      This applies nokprobe_inline annotation for some cases,
      because NOKPROBE_SYMBOL() will inhibit inlining by
      referring the symbol address.
      
      This just folds a bunch of previous NOKPROBE_SYMBOL()
      cleanup patches for x86 to one patch.
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp
      
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Lebon <jlebon@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9326638c
  15. 09 Feb, 2014 1 commit
  16. 29 Oct, 2013 1 commit
    • Peter Zijlstra's avatar
      perf/x86: Fix NMI measurements · e8a923cc
      Peter Zijlstra authored
      
      
      OK, so what I'm actually seeing on my WSM is that sched/clock.c is
      'broken' for the purpose we're using it for.
      
      What triggered it is that my WSM-EP is broken :-(
      
        [    0.001000] tsc: Fast TSC calibration using PIT
        [    0.002000] tsc: Detected 2533.715 MHz processor
        [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
        [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
        [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
      
      For some reason it consistently detects TSC skew, even though NHM+
      should have a single clock domain for 'reasonable' systems.
      
      This marks sched_clock_stable=0, which means that we do fancy stuff to
      try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
      clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
      it either wakes up or gets kicked by another cpu.
      
      While this is perfectly fine for the scheduler -- it only cares about
      actually running stuff, and when we're running stuff we're obviously not
      idle. This does somewhat break down for perf which can trigger events
      just fine on an otherwise idle cpu.
      
      So I've got NMIs get get 'measured' as taking ~1ms, which actually
      don't last nearly that long:
      
                <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
        ...
                <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990
      
      So ftrace (which uses sched_clock(), not the fancy bits) only sees
      ~27us, but we measure ~1ms !!
      
      Now since all this measurement stuff lives in x86 code, we can actually
      fix it.
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: mingo@kernel.org
      Cc: dave.hansen@linux.intel.com
      Cc: eranian@google.com
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: jmario@redhat.com
      Cc: acme@infradead.org
      Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e8a923cc
  17. 12 Jul, 2013 1 commit
  18. 23 Jun, 2013 2 commits
    • Dave Hansen's avatar
      x86: Add NMI duration tracepoints · 0c4df02d
      Dave Hansen authored
      
      
      This patch has been invaluable in my adventures finding
      issues in the perf NMI handler.  I'm as big a fan of
      printk() as anybody is, but using printk() in NMIs is
      deadly when they're happening frequently.
      
      Even hacking in trace_printk() ended up eating enough
      CPU to throw off some of the measurements I was making.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c4df02d
    • Dave Hansen's avatar
      x86: Warn when NMI handlers take large amounts of time · 2ab00456
      Dave Hansen authored
      
      
      I have a system which is causing all kinds of problems.  It has
      8 NUMA nodes, and lots of cores that can fight over cachelines.
      If things are not working _perfectly_, then NMIs can take longer
      than expected.
      
      If we get too many of them backed up to each other, we can
      easily end up in a situation where we are doing nothing *but*
      running NMIs.  The biggest problem, though, is that this happens
      _silently_.  You might be lucky to get an hrtimer warning, but
      most of the time system simply hangs.
      
      This patch should at least give us some warning before we fall
      off the cliff.  the warnings look like this:
      
      	nmi_handle: perf_event_nmi_handler() took: 26095071 ns
      
      The message is triggered whenever we notice the longest NMI
      we've seen to date.  You can always view and reset this value
      via the debugfs interface if you like.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2ab00456
  19. 17 Jan, 2013 1 commit
  20. 08 Jun, 2012 2 commits
    • Steven Rostedt's avatar
      x86: Save cr2 in NMI in case NMIs take a page fault (for i386) · 70fb74a5
      Steven Rostedt authored
      Avi Kivity reported that page faults in NMIs could cause havic if
      the NMI preempted another page fault handler:
      
         The recent changes to NMI allow exceptions to take place in NMI
         handlers, but I think that a #PF (say, due to access to vmalloc space)
         is still problematic.  Consider the sequence
      
          #PF  (cr2 set by processor)
            NMI
              ...
              #PF (cr2 clobbered)
                do_page_fault()
                IRET
              ...
              IRET
            do_page_fault()
              address = read_cr2()
      
         The last line reads the overwritten cr2 value.
      
      This is the i386 version, which has the luxury of doing the work
      in C code.
      
      Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com
      
      Reported-by: default avatarAvi Kivity <avi@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      70fb74a5
    • Steven Rostedt's avatar
      x86: Remove cmpxchg from i386 NMI nesting code · c7d65a78
      Steven Rostedt authored
      I've been informed by someone on LWN called 'slashdot' that
      some i386 machines do not support a true cmpxchg. The cmpxchg
      used by the i386 NMI nesting code must be a true cmpxchg as
      disabling interrupts will not work for NMIs (which is the work
      around for i386s that do not have a true cmpxchg).
      
      This 'slashdot' character also suggested a fix to the issue.
      As the state of the nesting NMIs goes as follows:
      
        NOT_RUNNING -> EXECUTING
        EXECUTING   -> NOT_RUNNING
        EXECUTING   -> LATCHED
        LATCHED     -> EXECUTING
      
      Having these states as enum values of:
      
        NOT_RUNNING = 0
        EXECUTING   = 1
        LATCHED     = 2
      
      Instead of a cmpxchg to make EXECUTING -> NOT_RUNNING a
      dec_and_test() would work as well. If the dec_and_test brings
      the state to NOT_RUNNING, that is the same as a cmpxchg
      succeeding to change EXECUTING to NOT_RUNNING. If a nested NMI
      were to come in and change it to LATCHED, the dec_and_test() would
      convert the state to EXECUTING (what we want it to be in such a
      case anyway).
      
      I asked 'slashdot' to post this as a patch, but it never came to
      be. I decided to do the work instead.
      
      Thanks to H. Peter Anvin for suggesting to use this_cpu_dec_and_return()
      instead of local_dec_and_test(&__get_cpu_var()).
      
      Link: http://lwn.net/Articles/484932/
      
      
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c7d65a78
  21. 01 Jun, 2012 1 commit
    • Steven Rostedt's avatar
      x86: Reset the debug_stack update counter · c0525a69
      Steven Rostedt authored
      
      
      When an NMI goes off and it sees that it preempted the debug stack,
      to keep the debug stack safe, it changes the IDT to point to one that
      does not modify the stack on breakpoint (to allow breakpoints in NMIs).
      
      But the variable that gets set to know to undo it on exit never gets
      cleared on exit. Thus every NMI will reset it on exit the first time
      it is done even if it does not need to be reset.
      
      [ Added H. Peter Anvin's suggestion to use this_cpu_read/write ]
      
      Cc: <stable@vger.kernel.org> # v3.3
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c0525a69
  22. 17 May, 2012 1 commit
    • Paul Gortmaker's avatar
      MCA: delete all remaining traces of microchannel bus support. · bb8187d3
      Paul Gortmaker authored
      
      
      Hardware with MCA bus is limited to 386 and 486 class machines
      that are now 20+ years old and typically with less than 32MB
      of memory.  A quick search on the internet, and you see that
      even the MCA hobbyist/enthusiast community has lost interest
      in the early 2000 era and never really even moved ahead from
      the 2.4 kernels to the 2.6 series.
      
      This deletes anything remaining related to CONFIG_MCA from core
      kernel code and from the x86 architecture.  There is no point in
      carrying this any further into the future.
      
      One complication to watch for is inadvertently scooping up
      stuff relating to machine check, since there is overlap in
      the TLA name space (e.g. arch/x86/boot/mca.c).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: x86@kernel.org
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      bb8187d3
  23. 09 May, 2012 1 commit
  24. 28 Apr, 2012 1 commit
    • Steven Rostedt's avatar
      ftrace/x86: Remove the complex ftrace NMI handling code · 4a6d70c9
      Steven Rostedt authored
      
      
      As ftrace function tracing would require modifying code that could
      be executed in NMI context, which is not stopped with stop_machine(),
      ftrace had to do a complex algorithm with various stages of setup
      and memory barriers to make it work.
      
      With the new breakpoint method, this is no longer required. The changes
      to the code can be done without any problem in NMI context, as well as
      without stop machine altogether. Remove the complex code as it is
      no longer needed.
      
      Also, a lot of the notrace annotations could be removed from the
      NMI code as it is now safe to trace them. With the exception of
      do_nmi itself, which does some special work to handle running in
      the debug stack. The breakpoint method can cause NMIs to double
      nest the debug stack if it's not setup properly, and that is done
      in do_nmi(), thus that function must not be traced.
      
      (Note the arch sh may want to do the same)
      
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4a6d70c9
  25. 25 Apr, 2012 2 commits
    • Li Zhong's avatar
      x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled · 72b3fb24
      Li Zhong authored
      
      
      This patch tries to fix the problem of page fault exception
      caused by accessing nmiaction structure in nmi if kmemcheck
      is enabled.
      
      If kmemcheck is enabled, the memory allocated through slab are
      in pages that are marked non-present, so that some checks could
      be done in the page fault handling code ( e.g. whether the
      memory is read before written to ).
      
      As nmiaction is allocated in this way, so it resides in a
      non-present page. Then there is a page fault while the nmi code
      accessing the nmiaction structure, which would then cause a
      warning by WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called
      by do_page_fault().
      
      This significantly simplifies the code as well, as the whole
      dynamic allocation dance goes away.
      
      v2: as Peter suggested, changed the nmiaction to use static
          storage.
      
      v3: as Peter suggested, use macro to shorten the codes. Also
          keep the original usage of register_nmi_handler, so users of
          this call doesn't need change.
      Tested-by: default avatarSeiji Aguchi <seiji.aguchi@hds.com>
      Fixes: https://lkml.org/lkml/2012/3/2/356
      
      Signed-off-by: default avatarLi Zhong <zhong@linux.vnet.ibm.com>
      [ simplified the wrappers ]
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: thomas.mingarelli@hp.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1333051877-15755-4-git-send-email-dzickus@redhat.com
      
      
      [ tidied the patch a bit ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      72b3fb24
    • Don Zickus's avatar
      x86/nmi: Add new NMI queues to deal with IO_CHK and SERR · 553222f3
      Don Zickus authored
      
      
      In discussions with Thomas Mingarelli about hpwdt, he explained
      to me some issues they were some when using their virtual NMI
      button to test the hpwdt driver.
      
      It turns out the virtual NMI button used on HP's machines do no
      send unknown NMIs but instead send IO_CHK NMIs.  The way the
      kernel code is written, the hpwdt driver can not register itself
      against that type of NMI and therefore can not successfully
      capture system information before panic'ing.
      
      To solve this I created two new NMI queues to allow driver to
      register against the IO_CHK and SERR NMIs.  Or in the hpwdt all
      three (if you include unknown NMIs too).
      
      The change is straightforward and just mimics what the unknown
      NMI does.
      Reported-and-tested-by: default avatarThomas Mingarelli <thomas.mingarelli@hp.com>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1333051877-15755-3-git-send-email-dzickus@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      553222f3
  26. 21 Dec, 2011 2 commits
    • Steven Rostedt's avatar
      x86: Allow NMIs to hit breakpoints in i386 · ccd49c23
      Steven Rostedt authored
      
      
      With i386, NMIs and breakpoints use the current stack and they
      do not reset the stack pointer to a fix point that might corrupt
      a previous NMI or breakpoint (as it does in x86_64). But NMIs are
      still not made to be re-entrant, and need to prevent the case that
      an NMI hitting a breakpoint (which does an iret), doesn't allow
      another NMI to run.
      
      The fix is to let the NMI be in 3 different states:
      
      1) not running
      2) executing
      3) latched
      
      When no NMI is executing on a given CPU, the state is "not running".
      When the first NMI comes in, the state is switched to "executing".
      On exit of that NMI, a cmpxchg is performed to switch the state
      back to "not running" and if that fails, the NMI is restarted.
      
      If a breakpoint is hit and does an iret, which re-enables NMIs,
      and another NMI comes in before the first NMI finished, it will
      detect that the state is not in the "not running" state and the
      current NMI is nested. In this case, the state is switched to "latched"
      to let the interrupted NMI know to restart the NMI handler, and
      the nested NMI exits without doing anything.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul Turner <pjt@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ccd49c23
    • Steven Rostedt's avatar
      x86: Keep current stack in NMI breakpoints · 228bdaa9
      Steven Rostedt authored
      
      
      We want to allow NMI handlers to have breakpoints to be able to
      remove stop_machine from ftrace, kprobes and jump_labels. But if
      an NMI interrupts a current breakpoint, and then it triggers a
      breakpoint itself, it will switch to the breakpoint stack and
      corrupt the data on it for the breakpoint processing that it
      interrupted.
      
      Instead, have the NMI check if it interrupted breakpoint processing
      by checking if the stack that is currently used is a breakpoint
      stack. If it is, then load a special IDT that changes the IST
      for the debug exception to keep the same stack in kernel context.
      When the NMI is done, it puts it back.
      
      This way, if the NMI does trigger a breakpoint, it will keep
      using the same stack and not stomp on the breakpoint data for
      the breakpoint it interrupted.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      228bdaa9
  27. 10 Nov, 2011 2 commits
  28. 31 Oct, 2011 1 commit
    • Paul Gortmaker's avatar
      x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE · 69c60c88
      Paul Gortmaker authored
      
      
      These files were implicitly getting EXPORT_SYMBOL via device.h
      which was including module.h, but that will be fixed up shortly.
      
      By fixing these now, we can avoid seeing things like:
      
      arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’
      
      [ with input from Randy Dunlap <rdunlap@xenotime.net> and also
        from Stephen Rothwell <sfr@canb.auug.org.au> ]
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      69c60c88
  29. 10 Oct, 2011 3 commits
    • Ingo Molnar's avatar
      x86, nmi, drivers: Fix nmi splitup build bug · d48b0e17
      Ingo Molnar authored
      
      
      nmi.c needs an #include <linux/mca.h>:
      
       arch/x86/kernel/nmi.c: In function ‘unknown_nmi_error’:
       arch/x86/kernel/nmi.c:286:6: error: ‘MCA_bus’ undeclared (first use in this function)
       arch/x86/kernel/nmi.c:286:6: note: each undeclared identifier is reported only once for each function it appears in
      
      Another one is the hpwdt driver:
      
       drivers/watchdog/hpwdt.c:507:9: error: ‘NMI_DONE’ undeclared (first use in this function)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d48b0e17
    • Don Zickus's avatar
      x86, nmi: Track NMI usage stats · efc3aac5
      Don Zickus authored
      
      
      Now that the NMI handler are broken into lists, increment the appropriate
      stats for each list.  This allows us to see what is going on when they
      get printed out in the next patch.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-6-git-send-email-dzickus@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      efc3aac5
    • Don Zickus's avatar
      x86, nmi: Add in logic to handle multiple events and unknown NMIs · b227e233
      Don Zickus authored
      
      
      Previous patches allow the NMI subsystem to process multipe NMI events
      in one NMI.  As previously discussed this can cause issues when an event
      triggered another NMI but is processed in the current NMI.  This causes the
      next NMI to go unprocessed and become an 'unknown' NMI.
      
      To handle this, we first have to flag whether or not the NMI handler handled
      more than one event or not.  If it did, then there exists a chance that
      the next NMI might be already processed.  Once the NMI is flagged as a
      candidate to be swallowed, we next look for a back-to-back NMI condition.
      
      This is determined by looking at the %rip from pt_regs.  If it is the same
      as the previous NMI, it is assumed the cpu did not have a chance to jump
      back into a non-NMI context and execute code and instead handled another NMI.
      
      If both of those conditions are true then we will swallow any unknown NMI.
      
      There still exists a chance that we accidentally swallow a real unknown NMI,
      but for now things seem better.
      
      An optimization has also been added to the nmi notifier rountine.  Because x86
      can latch up to one NMI while currently processing an NMI, we don't have to
      worry about executing _all_ the handlers in a standalone NMI.  The idea is
      if multiple NMIs come in, the second NMI will represent them.  For those
      back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
      execute all the handlers in the second half of a detected back-to-back NMI.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b227e233