User's Manual

15-24 Vol. 3
MACHINE-CHECK ARCHITECTURE
15.6.1 Detection of Software Error Recovery Support
Software must use bit 24 of IA32_MCG_CAP (MCG_SER_P) to detect the
presence of software error recovery support (see
Figure 15-2). When
IA32_MCG_CAP[24] is set, this indicates that the processor supports soft-
ware error recovery. When this bit is clear, this indicates that there is no
support for error recovery from the processor and the primary responsibility
of the machine check handler is logging the machine check error information
and shutting down the system.
The new class of architectural MCA errors from which system software can
attempt recovery is called Uncorrected Recoverable (UCR) Errors. UCR
errors are uncorrected errors that have been detected and signaled but have
not corrupted the processor context. For certain UCR errors, this means that
once system software has performed a certain recovery action, it is possible
to continue execution on this processor. UCR error reporting provides an
error containment mechanism for data poisoning. The machine check
handler will use the error log information from the error reporting registers
to analyze and implement specific error recovery actions for UCR errors.
15.6.2 UCR Error Reporting and Logging
IA32_MCi_STATUS MSR is used for reporting UCR errors and existing
corrected or uncorrected errors. The definitions of IA32_MCi_STATUS,
including bit fields to identify UCR errors, is shown in
Figure 15-5. UCR
errors can be signaled through either the corrected machine check interrupt
(CMCI) or machine check exception (MCE) path depending on the type of the
UCR error.
When IA32_MCG_CAP[24] is set, a UCR error is indicated by the following
bit settings in the IA32_MCi_STATUS register:
Valid (bit 63) = 1
UC (bit 61) = 1
PCC (bit 57) = 0
Additional information from the IA32_MCi_MISC and the IA32_MCi_ADDR
registers for the UCR error are available when the ADDRV and the MISCV
flags in the IA32_MCi_STATUS register are set (see
Section 15.3.2.4). The
MCA error code field of the IA32_MCi_STATUS register indicates the type of
UCR error. System software can interpret the MCA error code field to analyze
and identify the necessary recovery action for the given UCR error.
In addition, the IA32_MCi_STATUS register bit fields, bits 56:55, are defined
(see
Figure 15-5) to provide additional information to help system software
to properly identify the necessary recovery action for the UCR error: