Skip to content
  • Chen Gong's avatar
    x86/mce: Add CMCI poll mode · 55babd8f
    Chen Gong authored
    
    
    On Intel systems corrected machine check interrupts (CMCI) may be sent to
    multiple logical processors; possibly to all processors on the affected
    socket (SDM Volume 3B "15.5.1 CMCI Local APIC Interface").  This means
    that a persistent error (such as a stuck bit in ECC memory) may cause
    a storm of interrupts that greatly hinders or prevents forward progress
    (probably on many processors).
    
    To solve this we keep track of the rate at which each processor sees
    CMCI. If we exceed a threshold, we disable CMCI delivery and switch to
    polling the machine check banks. If the storm subsides (none of the
    affected processors see any more errors for a complete poll interval) we
    re-enable CMCI.
    
    [Tony: Added console messages when storm begins/ends and increased storm
    threshold from 5 to 15 so we have a few more logged entries before we
    disable interrupts and start dropping reports]
    
    Signed-off-by: default avatarChen Gong <gong.chen@linux.intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarChen Gong <gong.chen@linux.intel.com>
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    55babd8f