Skip to content

Commit 243d657

Browse files
ashok-rajIngo Molnar
authored andcommitted
x86/mce: Handle Local MCE events
Add the necessary changes to do_machine_check() to be able to process MCEs signaled as local MCEs. Typically, only recoverable errors (SRAR type) will be Signaled as LMCE. The architecture does not restrict to only those errors, however. When errors are signaled as LMCE, there is no need for the MCE handler to perform rendezvous with other logical processors unlike earlier processors that would broadcast machine check errors. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-17-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 88d5386 commit 243d657

File tree

2 files changed

+27
-6
lines changed

2 files changed

+27
-6
lines changed

arch/x86/kernel/cpu/mcheck/mce.c

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1047,6 +1047,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
10471047
char *msg = "Unknown";
10481048
u64 recover_paddr = ~0ull;
10491049
int flags = MF_ACTION_REQUIRED;
1050+
int lmce = 0;
10501051

10511052
prev_state = ist_enter(regs);
10521053

@@ -1074,11 +1075,20 @@ void do_machine_check(struct pt_regs *regs, long error_code)
10741075
kill_it = 1;
10751076

10761077
/*
1077-
* Go through all the banks in exclusion of the other CPUs.
1078-
* This way we don't report duplicated events on shared banks
1079-
* because the first one to see it will clear it.
1078+
* Check if this MCE is signaled to only this logical processor
10801079
*/
1081-
order = mce_start(&no_way_out);
1080+
if (m.mcgstatus & MCG_STATUS_LMCES)
1081+
lmce = 1;
1082+
else {
1083+
/*
1084+
* Go through all the banks in exclusion of the other CPUs.
1085+
* This way we don't report duplicated events on shared banks
1086+
* because the first one to see it will clear it.
1087+
* If this is a Local MCE, then no need to perform rendezvous.
1088+
*/
1089+
order = mce_start(&no_way_out);
1090+
}
1091+
10821092
for (i = 0; i < cfg->banks; i++) {
10831093
__clear_bit(i, toclear);
10841094
if (!test_bit(i, valid_banks))
@@ -1155,8 +1165,18 @@ void do_machine_check(struct pt_regs *regs, long error_code)
11551165
* Do most of the synchronization with other CPUs.
11561166
* When there's any problem use only local no_way_out state.
11571167
*/
1158-
if (mce_end(order) < 0)
1159-
no_way_out = worst >= MCE_PANIC_SEVERITY;
1168+
if (!lmce) {
1169+
if (mce_end(order) < 0)
1170+
no_way_out = worst >= MCE_PANIC_SEVERITY;
1171+
} else {
1172+
/*
1173+
* Local MCE skipped calling mce_reign()
1174+
* If we found a fatal error, we need to panic here.
1175+
*/
1176+
if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
1177+
mce_panic("Machine check from unknown source",
1178+
NULL, NULL);
1179+
}
11601180

11611181
/*
11621182
* At insane "tolerant" levels we take no action. Otherwise

arch/x86/kernel/cpu/mcheck/mce_intel.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -452,4 +452,5 @@ void mce_intel_feature_init(struct cpuinfo_x86 *c)
452452
{
453453
intel_init_thermal(c);
454454
intel_init_cmci();
455+
intel_init_lmce();
455456
}

0 commit comments

Comments
 (0)