The Complete Debugging Guide to Stop 0x124 – Part 1

Introduction:

The Stop 0x124 is mostly caused by hardware, and in some exceptional cases, can be potentially caused by buggy device drivers. There isn’t much of a debugging methodology to debugging a Stop 0x124, but there is plenty of background information which would be useful for understanding some of the terminology witnessed within a Stop 0x124 bugcheck.

A failure of a Stop 0x124 to be successfully created, usually produces a Stop 0x122, a debugging tutorial for Stop 0x122 can be found here – Debugging Stop 0x122 – WHEA_INTERNAL_ERROR

Background:

WHEA (Windows Hardware Error Architecture) was introduced on Windows Vista and Windows Server 2008, to provide a effective error reporting system which would make debugging more effective, and take precedence over the MCA (Machine Check Architecture) as a primary error reporting architecture for hardware devices. MCA and MCE do still exist on Windows Vista and later operating systems, but are delivered through WHEA instead.

Structure of WHEA:

WHEA consists of a number of different components, the main concepts are LLHEHs (Low-Level Hardware Error Handler), PSHEDs (Platform-Specific Hardware Error Driver) and WHEA error records. The following diagram obtained from the Microsoft documentation provides an overview of how these components interact with the rest of the operating system:

WHEA
The LLHEH is the first component which would handle the error discovered by the error source. Error sources will discussed later in this guide, but for now, I will simply mention that the error source is the hardware component which discovered the hardware error, and does not mean where the error originated from. The following flow diagram will hopefully help to illustrate the entire WHEA process.

Hardware Error -> Error Source Alerts OS -> LLHEH for corresponding error source is invoked -> Error Packet is created -> Error Packet is processed into a Error Record -> Error Record is processed by PSHED -> Bugcheck is produced

It is important to note that the above flow diagram is rather crude and doesn’t necessarily show the details of each process involved in the WHEA bugchecking process. Please note it also only illustrates what happens with a fatal hardware error, something which will only lead to a bugcheck.

I will now begin to discuss Error Sources, and their purpose within a WHEA bugcheck. To begin, we need to understand and identify that the first parameter of the Stop 0x124 is the value of the error source.

2: kd> .bugcheck
Bugcheck code 00000124
Arguments 00000000`00000000 fffffa80`04ba6028 00000000`be000000 00000000`00800400

All error sources are stored within a enumeration called WHEA_ERROR_SOURCE_TYPE. This enumeration can be used to find the name of the error source. There are currently 13 different error sources. The most common being MCE (0x0) and PCIe (0x4).

2: kd> dt nt!_WHEA_ERROR_SOURCE_TYPE
   WheaErrSrcTypeMCE = 0n0
   WheaErrSrcTypeCMC = 0n1
   WheaErrSrcTypeCPE = 0n2
   WheaErrSrcTypeNMI = 0n3
   WheaErrSrcTypePCIe = 0n4
   WheaErrSrcTypeGeneric = 0n5
   WheaErrSrcTypeINIT = 0n6
   WheaErrSrcTypeBOOT = 0n7
   WheaErrSrcTypeSCIGeneric = 0n8
   WheaErrSrcTypeIPFMCA = 0n9
   WheaErrSrcTypeIPFCMC = 0n10
   WheaErrSrcTypeIPFCPE = 0n11
   WheaErrSrcTypeMax = 0n12

Our current error source type is the Machine Check Exception. The error source alerts the operating system of a hardware error, and when done so, the corresponding LLHEH will be ran to handle that error condition. The LLHEH isn’t actucally a separate entitiy which exists, it is simply a category of handlers, and thus a LLHEH can be a range of handlers, including interrupt handlers, exception handlers or callback functions. The LLHEH will process the error condition into a error packet, and then alert the operating system of the hardware condition.

2: kd> .frame /r 3
03 fffff880`02f6db00 fffff800`02c26052 hal!HalpMcaReportError+0x4c
rax=0000000000000000 rbx=fffffa8004c17ea0 rcx=0000000000000124
rdx=0000000000000000 rsi=fffff88002f6de00 rdi=fffffa8004c17ef0
rip=fffff80002c26700 rsp=fffff88002f6db00 rbp=fffff88002f6de30
 r8=fffffa8004ba6028  r9=00000000be000000 r10=0000000000800400
r11=0000000000000002 r12=00000000ffffff02 r13=0000000000000000
r14=0000000000000000 r15=0000000000000001
iopl=0         ov up ei pl nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000a06
hal!HalpMcaReportError+0x4c:
fffff800`02c26700 488b8c2430010000 mov     rcx,qword ptr [rsp+130h] ss:0018:fffff880`02f6dc30=ffff00906cfd8774

The hal!HalpMcaReportError+0x4c is the LLHEH for this current bugcheck, notice the bugcheck information stored within the status registers for the stack frame?

As mentioned previously, a LLHEH will produce a error packet, which in turn can be investigated by the debugger.

Each error packet is represented by the WHEA_ERROR_PACKET macro, and there is currently two different versions: WHEA_ERROR_PACKET_V1 and WHEA_ERROR_PACKET_V2. The V1 type is supported by Windows Vista SP1 and Windows Server 2008; V2 is supported by from Windows 7 and all latter operating systems.

Version 2:

The only difference between the two structures, is the Signature member. The Signature member takes the value of WHEA_ERROR_PACKET_V2_SIGNATURE for Version 2 or WHEA_ERROR_PACKET_V1_SIGNATURE for Version 1. Since Windows Vista systems are pretty much obsolete now, there isn’t any real reason to bother examining the Version 1 structure.

2: kd> dt _WHEA_ERROR_PACKET_V2
nt!_WHEA_ERROR_PACKET_V2
   +0x000 Signature        : Uint4B
   +0x004 Version          : Uint4B
   +0x008 Length           : Uint4B
   +0x00c Flags            : _WHEA_ERROR_PACKET_FLAGS
   +0x010 ErrorType        : _WHEA_ERROR_TYPE
   +0x014 ErrorSeverity    : _WHEA_ERROR_SEVERITY
   +0x018 ErrorSourceId    : Uint4B
   +0x01c ErrorSourceType  : _WHEA_ERROR_SOURCE_TYPE
   +0x020 NotifyType       : _GUID
   +0x030 Context          : Uint8B
   +0x038 DataFormat       : _WHEA_ERROR_PACKET_DATA_FORMAT
   +0x03c Reserved1        : Uint4B
   +0x040 DataOffset       : Uint4B
   +0x044 DataLength       : Uint4B
   +0x048 PshedDataOffset  : Uint4B
   +0x04c PshedDataLength  : Uint4B

The most important members of the data structure are: Error Type, ErrorSourceType and NotifyType.

The ErrorType field contains the WHEA_ERROR_TYPE structure which describes the hardware which reported the error.

2: kd> dt nt!_WHEA_ERROR_TYPE
   WheaErrTypeProcessor = 0n0
   WheaErrTypeMemory = 0n1
   WheaErrTypePCIExpress = 0n2
   WheaErrTypeNMI = 0n3
   WheaErrTypePCIXBus = 0n4
   WheaErrTypePCIXDevice = 0n5
   WheaErrTypeGeneric = 0n6

The ErrorSourceType has been explained earlier in this post. The NotifyType is the type of mechanism which reports the error to the operating system; for example MCE or BOOT. The _GUID is given the following values:

  • CMC_NOTIFY_TYPE_GUID
  • CPE_NOTIFY_TYPE_GUID
  • MCE_NOTIFY_TYPE_GUID
  • PCIe_NOTIFY_TYPE_GUID
  • INIT_NOTIFY_TYPE_GUID
  • NMI_NOTIFY_TYPE_GUID
  • BOOT_NOTIFY_TYPE_GUID

We can examine WHEA Error Packets using the !errpkt extension, but unfortunately that requires a WHEA Error Record with the Error Record Section named Error Packet/Hardware Error Packet. I started debugging in 2012, and I still haven’t seen a BSOD where !errpkt has worked.

Advertisements

About 0x14c

I'm a Computer Science student and writer. My primary interests are Graph Theory, Number Theory, Programming Language Theory, Logic and Windows Debugging.
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to The Complete Debugging Guide to Stop 0x124 – Part 1

  1. Alessandro says:

    From a 0x124 bugcheck the second argument is WHEA_ERROR_RECORD , so you can use !errrec extension to check the hardware device which caused the 0x124.
    https://smartwindows.wordpress.com/2014/07/24/bugcheck-0x124hardware-error-has-occurred/

    Keep up the good work !!! Nice post.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s