Some Thoughts on Debugging AutoBoost Locks

There are three different bugchecks related to Autoboost: Stop 0x153, Stop 0x162 and Stop 0x192. All of which are caused by slightly different error conditions.

Before we begin diving into these bugchecks, it’s imperative that you understand what AutoBoost is and how it aims to resolve common thread scheduling problems. AutoBoost was designed to overcome the priority inversion problem, which occurs when a higher priority level thread is blocked from running because a lower priority thread is holding a resource which the higher priority thread requires. This is a classic deadlock situation whereby two threads are preventing each other from processing. This is sometimes also known as CPU starvation.

The following excerpt from Windows Internals 7th Edition describes this problem:

Imagine the following situation: A priority 7 thread is running, preventing a
priority 4 thread from ever receiving CPU time. However, a priority 11 thread is
waiting for some resource that the priority 4 thread has locked. But because the
priority 7 thread in the middle is eating up all the CPU time, the priority 4 thread
will never run long enough to finish whatever it’s doing and release the resource
blocking the priority 11 thread. This scenario is known as priority inversion.

Windows Internals 7th Edition Part 1

AutoBoost partly decides which threads to boost the priority for based on the locks which they hold and any locks which they may be waiting on too. This information is maintained in an array (currently maximum of 6) of _KLOCK_ENTRY objects. Each object describes the locks which the thread currently holds and any locks which the thread may be waiting upon itself.

8: kd> dt _KTHREAD -y LockEntries
nt!_KTHREAD
   +0x320 LockEntries : [6] _KLOCK_ENTRY

The locks which are currently held by the thread are described by a red-black tree structure, this can be found in the lock entry object:

8: kd> dt _KLOCK_ENTRY -y OwnerTree
nt!_KLOCK_ENTRY
   +0x030 OwnerTree : _RTL_RB_TREE

Additionally, the locks which the thread is waiting upon are held in another red-black tree:

8: kd> dt _KLOCK_ENTRY -y WaiterTree
nt!_KLOCK_ENTRY
   +0x040 WaiterTree : _RTL_RB_TREE

The tree is then transversed to determine which priority level the thread should be boosted to. The owner thread’s priority is then boosted to the same priority as the thread which is waiting on the lock to be released. Please note that the thread’s priority will never be boosted beyond priority level 15. Additionally, the AutoBoost framework is currently only implemented for push locks and guarded mutexes.

You can dump push locks and mutexes respectively using the following structure:

3: kd> dt _EX_PUSH_LOCK
nt!_EX_PUSH_LOCK
   +0x000 Locked           : Pos 0, 1 Bit
   +0x000 Waiting          : Pos 1, 1 Bit
   +0x000 Waking           : Pos 2, 1 Bit
   +0x000 MultipleShared   : Pos 3, 1 Bit
   +0x000 Shared           : Pos 4, 60 Bits
   +0x000 Value            : Uint8B
   +0x000 Ptr              : Ptr64 Void

Guarded mutexes use the KGUARDED_MUTEX structure which isn’t available with public symbols.

Since a majority of these bugchecks appear to be caused by push locks, and this form of synchronisation mechanism is sparely documented, I thought it would be wise to add some information regarding their acquisition and release.

When you attempt to acquire a push lock, ExAcquirePushLockExclusiveEx will set-up a _KLOCK_ENTRY structure for the calling thread, it will then check if the push lock has been acquired by checking that the acquired bit has been set, and if so, then the lock entry will be added to the waiting tree of the corresponding lock entry. Otherwise, it will be added to the owner tree of the lock entry.

The LockState property of the _KLOCK_STATE_ENTRY structure is then set to the push lock which we’re acquiring.

9: kd> dt _KLOCK_ENTRY_LOCK_STATE -y LockState
nt!_KLOCK_ENTRY_LOCK_STATE
   +0x000 LockState : Ptr64 Void


The ExfAcquirePushLockExclusiveEx function will be called if the push lock is already held by another thread; a _EX_PUSH_LOCK_WAIT_BLOCK structure is then constructed, which is the wait block for the push lock which we attempted to acquire. This wait block contains a doubly linked list of all the waiting threads which are waiting to acquire exclusive access to the push lock. It should be noted that push locks can not be acquired at DISPATCH_LEVEL (IRQL level 2) or above, attempting to do so, will likely end up in a Stop 0x192. However, please note, that for the thread to call ExAcquirePushLockExclusiveEx and attempt to grant exclusive access to the push lock, then the thread must enter a critical region which will raise the IRQL Level to 1 or APC_LEVEL, this then disables user-mode and normal kernel APCs.

To release exclusive access to a push lock, the ExReleasePushLockExclusive macro will be called, this will release the push lock and enable it to be acquired by another thread. If the 2nd bit of the push lock is set, then the ExfTryToWakePushLock is called which will set the gate object associated to the wait block to the signaled state, which will then cause any waiting thread to be released from it’s wait state, the push lock value will then be updated by using a lock cmpxchg instruction.

The lock entries will be checked for the thread which was holding the push lock by checking that the acquired bit has been set to true, and the lock state contains the same pointer as the push lock. Remember that push locks are merely pointers. If the push lock is unable to be found in any of the thread’s lock entries and the AutoBoostEntriesExhausted flag for the thread is false, then the system will bugcheck with a Stop 0x162.

The following video also provides some background on the acquisition and release of push locks:

The accompanying slides can be found in the references at the end of this post.


Now, we’ve gained some understanding of how Autoboost is implemented and the problem it aims to help to resolve. Let’s investigate the bugchecks which are associated to the AutoBoost framework. Firstly, we’ll examine the Stop 0x153, which indicates that a thread holding a lock was prematurely terminated.

KERNEL_LOCK_ENTRY_LEAKED_ON_THREAD_TERMINATION (153)
A thread was terminated before it had freed all its AutoBoost lock entries.
This is typically caused when a thread never released a lock it previously
acquired (e.g. by relying on another thread to release it), or if the thread
did not supply a consistent set of flags to lock package APIs.
Arguments:
Arg1: ffffd902d2aa7080, The address of the thread << Terminated Thread
Arg2: ffffd902d2aa73a0, The address of the entry that was not freed << _KLOCK_ENTRY
Arg3: 0000000000000001, Lock pointer was not NULL
Arg4: ffffae08901abc50, The address of the lock associated with the entry << _EX_PUSH_LOCK

If we dump the thread object from the first parameter, we can indeed see, that the thread was terminated prematurely.

8: kd> !thread ffffd902d2aa7080
THREAD ffffd902d2aa7080  Cid 0f0c.1e64  Teb: 0000000000000000 Win32Thread: 0000000000000000 TERMINATED
Not impersonating
GetUlongFromAddress: unable to read from fffff8053162ba04
Owning Process            ffffd902d2d6d080       Image:         msiexec.exe
Attached Process          N/A            Image:         N/A
fffff78000000000: Unable to get shared data
Wait Start TickCount      444115       
Context Switch Count      148757         IdealProcessor: 11             
ReadMemory error: Cannot get nt!KeMaximumIncrement value.
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00000000778063b0
Stack Init 0000000000000000 Current ffffec8b4552c490
Base ffffec8b4552d000 Limit ffffec8b45527000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5

It should be mentioned, that there is a number of AutoBoost fields on the _KTHREAD structure. I’ll mention the ones which are of most interest.

8: kd> dt _KTHREAD -y AutoBoostActive ffffd902d2aa7080
nt!_KTHREAD
   +0x074 AutoBoostActive : 0y1

The AutoBoostActive field either indicates if AutoBoost was enabled for the thread or it was AutoBoost tracking was enabled for the lock which the thread is holding; in either case, we can clearly see that it was enabled.

Additionally, there is another field called AutoBoostEntriesExhausted, while this was mentioned earlier, I’m not exactly sure of it’s purpose, but I presume it suggests if the thread has the maximum amount of lock entries?

8: kd> dt _KTHREAD -y AutoBoostEntriesExhausted ffffd902d2aa7080
nt!_KTHREAD
   +0x078 AutoBoostEntriesExhausted : 0y0

Lastly, there is the AbOwnedEntryCount which indicates the number of lock entries the thread currently owns. In this case there is only one and is this the one mentioned by the bugcheck description.

8: kd> dt _KTHREAD -y AbOwnedEntryCount ffffd902d2aa7080
nt!_KTHREAD
   +0x58f AbOwnedEntryCount : 0x1 ''

We know that the thread was still holding a lock when it was terminated. To investigate this further, let’s dump the associated lock entry object.

8: kd> dt _KLOCK_ENTRY ffffd902d2aa73a0
nt!_KLOCK_ENTRY
   +0x000 TreeNode         : _RTL_BALANCED_NODE
   +0x000 FreeListEntry    : _SINGLE_LIST_ENTRY
   +0x018 EntryFlags       : 0x1010032
   +0x018 EntryOffset      : 0x32 '2'
   +0x019 ThreadLocalFlags : 0 ''
   +0x019 WaitingBit       : 0y0
   +0x019 Spare0           : 0y0000000 (0)
   +0x01a AcquiredByte     : 0x1 ''
   +0x01a AcquiredBit      : 0y1
   +0x01b CrossThreadFlags : 0x1 ''
   +0x01b HeadNodeBit      : 0y1
   +0x01b IoPriorityBit    : 0y0
   +0x01b IoQoSWaiter      : 0y0
   +0x01b Spare1           : 0y00000 (0)
   +0x018 StaticState      : 0y00110010 (0x32)
   +0x018 AllFlags         : 0y000000010000000100000000 (0x10100)
   +0x01c SpareFlags       : 0
   +0x020 LockState        : _KLOCK_ENTRY_LOCK_STATE
   +0x020 LockUnsafe       : 0xffffae08`901abc50 Void
   +0x020 CrossThreadReleasableAndBusyByte : 0x50 'P'
   +0x021 Reserved         : [6]  "???"
   +0x027 InTreeByte       : 0xff ''
   +0x028 SessionState     : 0x00000000`ffffffff Void
   +0x028 SessionId        : 0xffffffff
   +0x02c SessionPad       : 0
   +0x030 OwnerTree        : _RTL_RB_TREE
   +0x040 WaiterTree       : _RTL_RB_TREE
   +0x030 CpuPriorityKey   : 0 ''
   +0x050 EntryLock        : 0
   +0x058 BoostBitmap      : _KLOCK_ENTRY_BOOST_BITMAP
   +0x05c SparePad         : 0

The AcquiredBit field is set, so we’re holding a lock and the LockUnsafe field points to the address of the push lock which we were holding at the time of the thread termination. Notice that it matches the address shown in the fourth parameter of the bugcheck?

8: kd> dt _EX_PUSH_LOCK ffffae08`901abc50
win32k!_EX_PUSH_LOCK
   +0x000 Locked           : 0y1
   +0x000 Waiting          : 0y0
   +0x000 Waking           : 0y0
   +0x000 MultipleShared   : 0y0
   +0x000 Shared           : 0y000000000000000000000000000000000000000000000000000000000000 (0)
   +0x000 Value            : 1
   +0x000 Ptr              : 0x00000000`00000001 Void

The Locked field is set, therefore we hold that a thread is currently holding this lock. As we know from examining the lock entry structure, this lock was held by our terminated thread. There isn’t much you can do to debug this bugcheck unfortunately, although, if we dump the call stack, we can see that the process which the thread was associated to had been terminated.

8: kd> knL
 # Child-SP          RetAddr           Call Site
00 ffffec8b`46c99588 fffff805`31431813 nt!KeBugCheckEx
01 ffffec8b`46c99590 fffff805`3185ec0e nt!KeCleanupThreadState+0x165887
02 ffffec8b`46c995d0 fffff805`317f8410 nt!PspThreadDelete+0x1e
03 ffffec8b`46c99640 fffff805`3123a969 nt!ObpRemoveObjectRoutine+0x80
04 ffffec8b`46c996a0 fffff805`317eb1b7 nt!ObfDereferenceObjectWithTag+0xc9
05 ffffec8b`46c996e0 fffff805`318366d5 nt!ObCloseHandleTableEntry+0x2c7
06 ffffec8b`46c99820 fffff805`31836371 nt!ExSweepHandleTable+0xd5
07 ffffec8b`46c998d0 fffff805`31836899 nt!ObKillProcess+0x35
08 ffffec8b`46c99900 fffff805`3185ceb3 nt!PspRundownSingleProcess+0x131
09 ffffec8b`46c99980 fffff805`31837f1b nt!PspExitThread+0x60b
0a ffffec8b`46c99a90 fffff805`313d2918 nt!NtTerminateProcess+0xeb
0b ffffec8b`46c99b00 00000000`777a1cbc nt!KiSystemServiceCopyEnd+0x28
0c 00000000`00f8ec48 00000000`00000000 0x777a1cbc

My suggestion would be to find out who is calling NtTerminateProcess. In our case, the process was msiexec.exe, which is a Windows installer used for installing .msi packages. The next bugcheck we’ll investigate is Stop 0x162, which when a thread has attempted to release a lock which it does not own.


KERNEL_AUTO_BOOST_INVALID_LOCK_RELEASE (162)
A lock tracked by AutoBoost was released by a thread that did not own the lock.
This is typically caused when some thread releases a lock on behalf of another
thread (which is not legal with AutoBoost tracking enabled) or when some thread
tries to release a lock it no longer owns.
Arguments:
Arg1: ffffdb8d223cb080, The address of the thread
Arg2: ffffcb81844bab50, The lock address
Arg3: 00000000ffffffff, The session ID of the thread
Arg4: 0000000000000000, Reserved

The first parameter contains the address of the thread which attempted to release the lock. The first parameter contains the address of the thread which attempted to release the lock. The second parameter is the lock object in question. This appears to have been freed at the time of the crash. Let’s examine the call stack to see what may have happened. As we can see, it does certainly seem that a lock was released inappropriately. I’m not sure what exactly KeAbPostRelease does, however, judging from the function name, I would presume it’s some form of callback function which is called after a lock tracked by AutoBoost is released. I also noticed that the AbPostRelease function takes a parameter with the address of ffffcb81`844bab30, which is very close to the address of our push lock.

9: kd> knL
 # Child-SP          RetAddr           Call Site
00 ffff880e`4fb4f898 fffff804`281de4b5 nt!KeBugCheckEx
01 ffff880e`4fb4f8a0 fffff804`2867fb30 nt!KeAbPostRelease+0xcb5b5 << Crash here!
02 ffff880e`4fb4f8f0 fffff804`2867f355 nt!AlpcpReceiveMessage+0x620 << Lock released
03 ffff880e`4fb4f9d0 fffff804`281d3c15 nt!NtAlpcSendWaitReceivePort+0x105
04 ffff880e`4fb4fa90 00007fff`fb35d1a4 nt!KiSystemServiceCopyEnd+0x25
05 00000070`e20ff148 00000000`00000000 0x00007fff`fb35d1a4

We can find more information about what the lock is associated to by using the !pool command. It appears to be related to an ALPC message. This makes perfect sense since we were receiving an ALPC message at the time of the crash.

9: kd> !pool ffffcb81844bab50
Pool page ffffcb81844bab50 region is Paged pool
 ffffcb81844ba000 size:  580 previous size:    0  (Allocated)  Ntff
 ffffcb81844ba590 size:  580 previous size:    0  (Allocated)  Ntff
*ffffcb81844bab20 size:  360 previous size:    0  (Allocated) *AlMs
        Pooltag AlMs : ALPC message, Binary : nt!alpc
 ffffcb81844bae80 size:  160 previous size:    0  (Free)       ...}

The ALPC message object is stored at 0x40 bytes from the pool allocation start address. Let’s use the !alpc command with the message switch to dump the associated message.

9: kd> !alpc /m ffffcb81`844bab60

Message ffffcb81844bab60
  MessageID             : 0x19AC (6572)
  CallbackID            : 0x40A40A (4236298)
  SequenceNumber        : 0x0000016D (365)
  Type                  : LPC_REQUEST
  DataLength            : 0x0054 (84)
  TotalLength           : 0x007C (124)
  Canceled              : No
  Release               : No
  ReplyWaitReply        : No
  Continuation          : Yes
  OwnerPort             : ffffdb8d14b8d880 [ALPC_CLIENT_COMMUNICATION_PORT]
  WaitingThread         : 0000000000000000
  QueueType             : ALPC_MSGQUEUE_PENDING
  QueuePort             : ffffdb8d14bb2c90 [ALPC_CONNECTION_PORT]
  QueuePortOwnerProcess : ffffdb8d14b6e440 (svchost.exe)
  ServerThread          : ffffdb8d223cb080 << Same thread shown in the first parameter
  QuotaCharged          : No
  CancelQueuePort       : 0000000000000000
  CancelSequencePort    : 0000000000000000
  CancelSequenceNumber  : 0x00000000 (0)
  ClientContext         : 0000017d1764ede0
  ServerContext         : 0000000000000000
  PortContext           : 00000265e5274040
  CancelPortContext     : 0000000000000000
  SecurityData          : ffffcb81803d9d70
  View                  : 0000000000000000
  HandleData            : 0000000000000000

The actual lock object is stored as part of a larger structure called a blob. Blobs are special objects used by the ALPC library for managing objects related to ALPC such as messages. Blobs have similar attributes to objects exposed by the Object Manager: they can be garbage collected, have their references tracked and locked through synchronisation mechanisms such as push locks. Blobs are allocated using paged pool. We can dump a blob object by using the _BLOB structure.

9: kd> dt _BLOB ffffcb81844bab30
nt!_BLOB
   +0x000 ResourceList     : _LIST_ENTRY [ 0xffffcb81`844bab30 - 0xffffcb81`844bab30 ]
   +0x000 FreeListEntry    : _SLIST_ENTRY
   +0x010 u1               : <anonymous-tag>
   +0x011 ResourceId       : 0x2 ''
   +0x012 CachedReferences : 0n0
   +0x018 ReferenceCount   : 0n1
   +0x020 Lock             : _EX_PUSH_LOCK << Second parameter

I believe the ResourceId is used to index into the _BLOB_ID enumeration; 0x2 points to a message type, which makes perfect sense since we’re dealing with a ALPC message.

9: kd> dt _BLOB_ID
nt!_BLOB_ID
   BLOB_TYPE_UNKNOWN = 0n0
   BLOB_TYPE_CONNECTION_INFO = 0n1
   BLOB_TYPE_MESSAGE = 0n2
   BLOB_TYPE_SECURITY_CONTEXT = 0n3
   BLOB_TYPE_SECTION = 0n4
   BLOB_TYPE_REGION = 0n5
   BLOB_TYPE_VIEW = 0n6
   BLOB_TYPE_RESERVE = 0n7
   BLOB_TYPE_DIRECT_TRANSFER = 0n8
   BLOB_TYPE_HANDLE_DATA = 0n9
   BLOB_TYPE_MAX_ID = 0n10

Unfortunately, I haven’t been able to find much information at all about KeAbPostRelease. However, from reading some threads regarding this crash, it appears that this is commonly caused by drivers not implementing synchronization mechanisms correctly. I would strongly suggest running Driver Verifier with the deadlock verification checks enabled.


Lastly, there is the Stop 0x192, which indicates that a thread acquired a AutoBoost lock at an IRQL at DISPATCH_LEVEL or above.

KERNEL_AUTO_BOOST_LOCK_ACQUISITION_WITH_RAISED_IRQL (192)
A lock tracked by AutoBoost was acquired while executing at DISPATCH_LEVEL or
above.
Arguments:
Arg1: ffff990ec984a080, The address of the thread.
Arg2: ffffa92601302f90, The lock address.
Arg3: 0000000000000000, The IRQL at which the lock was acquired.
Arg4: 0000000000000000, Reserved.
0: kd> dt nt!_KPRCB -y DebuggerSavedIrql fffff8074b809180
   +0x5b98 DebuggerSavedIRQL : 0 ''

You may have noticed that the IRQL in the third parameter is at PASSIVE_LEVEL. I believe that this bugcheck has a generic umbrella description for all crashes which are related to threads attempting to acquire push locks. And the reason why the system crashes, is because the thread has already exhausted the number of locks which it can wait upon, the AutoBoostEntriesExhausted field is set to true in this instance, which is unlike the other Autoboost bugchecks which I’ve examined. I could be wrong, although, that is what makes the most sense to me in this circumstance.

0: kd> dt _KTHREAD -y AutoBoostEntriesExhausted ffff990ec984a080
nt!_KTHREAD
   +0x078 AutoBoostEntriesExhausted : 0y1

The win32kbase!GdiHandleEntryDirectory::AcquireEntryLock function does appear to enter and leave a critical region whilst attempting to obtain the push lock, this would only raise the IRQL Level to 1.

0: kd> !stack -p
Call Stack : 12 frames
## Stack-Pointer    Return-Address   Call-Site       
00 ffffd70155e14238 fffff8074ec39c35 nt!KeBugCheckEx+0 
	Parameter[0] = 0000000000000192
	Parameter[1] = ffff990ec984a080
	Parameter[2] = ffffa92601302f90
	Parameter[3] = 0000000000000000
01 ffffd70155e14240 ffffa9528a5ea377 nt!ExAcquirePushLockExclusiveEx+145 (perf)
	Parameter[0] = ffffa92601302f90 << Our push lock
	Parameter[1] = 0000000000000000
	Parameter[2] = (unknown)       
	Parameter[3] = (unknown)       
02 ffffd70155e14280 ffffa9528a5e82c4 win32kbase!GdiHandleEntryDirectory::AcquireEntryLock+c7 (perf)
	Parameter[0] = (unknown)       
	Parameter[1] = ffff990ec984a080 << Thread attempting to acquire the lock
	Parameter[2] = 0000000000000000
	Parameter[3] = (unknown) 

Again unfortunately, I haven’t been able to gather much information from the dump file, and I would advise that Driver Verifier be run, it would be particularly interesting to see if the IRQL verification checks are able to catch anything.

References:

https://census-labs.com/media/windows_10_rs2_rs3_exploitation_primitives.pdf
Windows 10 Anniversary Update: GDI handle management and vulnerabilities exploitation
ALPC monitoring
Pwning Windows Kernel – Unkillable User-Mode Process – Part 2
https://recon.cx/2008/a/thomas_garnier/LPC-ALPC-slides.pdf
avalon1610/ALPC
https://recon.cx/2008/a/thomas_garnier/LPC-ALPC-paper.pdf
ReactOS: ntoskrnl/ex/pushlock.c File Reference
Windows Internals 6th Edition Part 1

Posted in Windows Internals, Debugging, WinDbg, AutoBoost, Stop 0x153, Stop 0x162, Stop 0x192 | Leave a comment

Debugging Stop 0x189 – BAD_OBJECT_HEADER

There isn’t much you can do nor say about this bugcheck. It is fairly uncommon and not very well documented by MSDN. I thought I would explain what the bugcheck means since it is actually very simple to understand.

BAD_OBJECT_HEADER (189)
The OBJECT_HEADER has been corrupted
Arguments:
Arg1: ffffe00fcb46a730, Pointer to bad OBJECT_HEADER
Arg2: 0000000000000000, Pointer to the resulting OBJECT_TYPE based on the TypeIndex in the OBJECT_HEADER
Arg3: 0000000000000000, The type index is corrupt.
Arg4: 0000000000000000, Reserved.

The first parameter contains the address of our corrupted object header. Every object managed by the Object Manager will have an object header, this structure maintains important metadata about the object, such the reference count and number of open handles for the object.

0: kd> dt _OBJECT_HEADER -y TypeIndex ffffe00fcb46a730
nt!_OBJECT_HEADER
   +0x018 TypeIndex : 0 ''

Notice how the pointer doesn’t have any value? It does not contain a valid address for the object type structure. So why does the system crash? Let’s examine the call stack and we’ll see.

0: kd> knL
 # Child-SP          RetAddr           Call Site
00 ffffde02`7c2689a8 fffff804`09fe09b8 nt!KeBugCheckEx
01 ffffde02`7c2689b0 fffff804`09df5bee nt!ObpReferenceObjectByHandleWithTag+0x1eadb8 << Crash here!
02 ffffde02`7c268a40 fffff804`09ee0020 nt!ObReferenceObjectByHandle+0x2e
03 ffffde02`7c268a90 fffff804`09c085b8 nt!NtSetEvent+0x90
04 ffffde02`7c268b00 00007ffa`5602cfe4 nt!KiSystemServiceCopyEnd+0x28
05 000000c5`cc03ece8 00000000`00000000 0x00007ffa`5602cfe4

The call to ObReferenceObjectByHandleWithTag actually validates the handle passed to it, and ensures that the object type referenced by the handle is same as the object type passed to the function. In order to verify this, let’s dump the third parameter for the function call.

0: kd> !stack -p
Call Stack : 6 frames
## Stack-Pointer    Return-Address   Call-Site      
00 ffffde027c2689a8 fffff80409fe09b8 nt!KeBugCheckEx+0
    Parameter[0] = 0000000000000189
    Parameter[1] = ffffe00fcb46a730
    Parameter[2] = 0000000000000000
    Parameter[3] = 0000000000000000
01 ffffde027c2689b0 fffff80409df5bee nt!ObpReferenceObjectByHandleWithTag+1eadb8 (perf)
    Parameter[0] = 00000000000005d0 << Handle value
    Parameter[1] = fffff80400000002
    Parameter[2] = ffffe00fbf2a1400 << Pointer to _OBJECT_TYPE
    Parameter[3] = aaaaaaaaaaaaaa01
02 ffffde027c268a40 fffff80409ee0020 nt!ObReferenceObjectByHandle+2e
    Parameter[0] = e00fcb46a730b4cb
    Parameter[1] = 0000000000000002
    Parameter[2] = (unknown)      
    Parameter[3] = 0000000000000001
03 ffffde027c268a90 fffff80409c085b8 nt!NtSetEvent+90
    Parameter[0] = (unknown)      
    Parameter[1] = (unknown)      
    Parameter[2] = (unknown)      
    Parameter[3] = (unknown)      
04 ffffde027c268b00 00007ffa5602cfe4 nt!KiSystemServiceCopyEnd+28
    Parameter[0] = 00000000000005d0
    Parameter[1] = 0000000000000000
    Parameter[2] = 000000c5cc03f148
    Parameter[3] = 0000000000000000

Let’s dump the pointer for the object type structure and we can see that it belongs to an event object.

0: kd> dt _OBJECT_TYPE ffffe00fbf2a1400
win32k!_OBJECT_TYPE
   +0x000 TypeList         : _LIST_ENTRY [ 0xffffe00f`bf2a1400 - 0xffffe00f`bf2a1400 ]
   +0x010 Name             : _UNICODE_STRING "Event"
   +0x020 DefaultObject    : (null)
   +0x028 Index            : 0x10 ''
   +0x02c TotalNumberOfObjects : 0x5016
   +0x030 TotalNumberOfHandles : 0x5639
   +0x034 HighWaterNumberOfObjects : 0x559e
   +0x038 HighWaterNumberOfHandles : 0x5c1f
   +0x040 TypeInfo         : _OBJECT_TYPE_INITIALIZER
   +0x0b8 TypeLock         : _EX_PUSH_LOCK
   +0x0c0 Key              : 0x6e657645
   +0x0c8 CallbackList     : _LIST_ENTRY [ 0xffffe00f`bf2a14c8 - 0xffffe00f`bf2a14c8 ]

Notice how we’re referencing an event object in our function call, although, the handle to our object doesn’t contain a valid pointer to an event object?

The documentation specifies the following possible error:

Return codeDescription
STATUS_OBJECT_TYPE_MISMATCHThe ObjectType parameter specifies the wrong object type for the object that is identified by the Handle parameter.
Posted in Debugging, WinDbg | Leave a comment

Debugging Stop 0x10D – Buy Two Power Policy Owners And Get A Free BSOD

A Stop 0x10D, also known as a Windows driver framework violation, is a bugcheck which is produced when the framework detects one of the rules has been broken. You can think of this bugcheck as akin to a Stop 0xC4 or any other Driver Verifier produced bugcheck. There is a number of variations of this bugcheck and therefore you will need to examine the parameters for each crash, otherwise the call stack will have no context.

The Windows Driver Framework is an open-source framework used to allow driver developers to develop software more efficiently and with less errors, by allowing the framework to handle many of the common tasks which the Windows Driver Model would expose to the developer. As a general debugging tip, if you notice any Wdf1000!* calls on a call stack, then it would be advisable to load wdfkd.dll and start investigating.

Let’s examine the first parameter of the bugcheck.

WDF_VIOLATION (10d)
The Kernel-Mode Driver Framework was notified that Windows detected an error
in a framework-based driver. In general, the dump file will yield additional
information about the driver that caused this bug check.
Arguments:
Arg1: 000000000000000d, A power irp was received for the device but the irp was not
    requested by the device (the power policy owner). Possibly there
    are multiple power policy owners. Only one driver in the stack can
    be the power policy owner. A KMDF driver can change power policy
    ownership by calling the DDI WdfDeviceInitSetPowerPolicyOwnership.
Arg2: ffff9888d3710a70, Device object pointer.
Arg3: ffff9888d4753010, Power Irp.
Arg4: ffff9888d37e2a20, Reserved. << Pointer to PFX_DRIVER_GLOBALS

It states that a power IRP was received by a device which did not request the IRP to be sent. That is to say, the device in question is the power policy owner, however, it never requested the power operation. You may be wondering what is a power policy owner? The bugcheck description states that there should only be one power policy owner per a device stack, yet there is the possibility of there being multiple power policy owners on the same device stack. Let’s explore what a power policy owner is and how it is used.

For each device stack, one of the device objects must be assigned as the power policy owner. It is the responsibility of the power policy owner to manage the power state of the device, and generate power requests for when the device power state should be changed. It is then the responsibility of the framework itself to ensure that the device state is changed by requesting the bus driver of the device to change the power state. Remember, only bus drivers are able to complete power-related IRPs, thus why the framework is reliant upon the device’s bus driver. It may be useful to think of the framework as the middle-man between the device’s function driver and the device’s bus driver.

Moreover, it is possible for the power policy owner of a device to be changed, and in order to do so, the driver must call WdfDeviceInitSetPowerPolicyOwnership. The function takes two parameters: a pointer to the WDFDEVICE_INIT structure which is essentially the device object and a boolean which represents if the driver is the power policy owner. For the power policy owner to be successfully changed, then two drivers must call the above function; the default power policy owner must set the boolean value to false to state that it’s no longer the owner, on the other hand, the device which will become the owner must call the function with the boolean set to true.

This leads to two possible reasons for the bugcheck: a driver hasn’t stated that it is no longer the power policy owner, therefore leading to two power policy owners in the same device stack. Secondly, the developer hasn’t realised that the framework will automatically set the power policy owner to the function driver, unless other stated, thereby again, leading to two possible power policy owners. We can use the second parameter which contains the address of the device object which received the power IRP to find the device stack.

3: kd> !devstack ffff9888d3710a70
  !DevObj           !DrvObj            !DevExt           ObjectName
  ffff9888d37e9dd0  \Driver\WudfRd     ffff9888d37e9f20
> ffff9888d3710a70  \Driver\esif_lf    ffff9888d3721c20  esif_lf
  ffff9888d37e2c60  \Driver\dptf_cpu   ffff9888d210ba20
  ffff9888c465ddc0  \Driver\ACPI       ffff9888c46b3010
  ffff9888c67ef360  \Driver\pci        ffff9888c67ef4b0  NTPNP_PCI0002
!DevNode ffff9888c6817ca0 :
  DeviceInst is "PCI\VEN_8086&DEV_1903&SUBSYS_22B117AA&REV_0C\3&11583659&0&20"
  ServiceName is "dptf_cpu"

As we can see, there are two third-party drivers involved in this stack, both of which are related to Intel Thermal Power Management framework. The third parameter of the bugcheck contains the address of the power IRP which was sent to the power policy owner.

3: kd> !irp ffff9888`d4753010
Irp is active with 6 stacks 4 is current (= 0xffff9888d47531b8)
 No Mdl: No System Buffer: Thread 00000000:  Irp stack trace.
     cmd  flg cl Device   File     Completion-Context
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000

            Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000

            Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000

            Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
            0  0 ffff9888d3710a70 00000000 00000000-00000000
           \Driver\esif_lf
            Args: 00016600 00000001 00000004 00000005
 [IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
            0 e1 ffff9888d37e9dd0 00000000 fffff80410b77bc0-ffff9888d39fc380 Success Error Cancel pending
           \Driver\WudfRd    nt!PopRequestCompletion
            Args: 00016600 00000001 00000004 00000005
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-ffff9888d39fc380

            Args: 00000000 00000000 00000000 00000000

If we look closely, the current I/O stack location for the power request is associated to one of the Intel drivers. Notice how the framework driver has a completion routine set to complete the IRP? Remember it is the responsibility of the framework to inform the bus driver of the requested power state change.

We’ve gained some understanding behind what has possibly happened and why the bugcheck is called. Let’s begin to examine the call stack and see where we crash.

3: kd> knL
 # Child-SP          RetAddr           Call Site
00 ffff870c`e7172658 fffff804`15b08920 nt!KeBugCheckEx
01 (Inline Function) --------`-------- Wdf01000!Mx::MxBugCheckEx+0x19
02 ffff870c`e7172660 fffff804`15ad1842 Wdf01000!FxVerifierBugCheckWorker+0x24
03 ffff870c`e71726a0 fffff804`15ac10b5 Wdf01000!FxPkgFdo::DispatchDeviceSetPower+0x10782 << Crash here!
04 ffff870c`e71726f0 fffff804`15abcc60 Wdf01000!FxPkgFdo::_DispatchSetPower+0x25
05 ffff870c`e7172720 fffff804`15aba867 Wdf01000!FxPkgPnp::Dispatch+0xb0 << Called by framework
06 (Inline Function) --------`-------- Wdf01000!DispatchWorker+0x6b
07 (Inline Function) --------`-------- Wdf01000!FxDevice::Dispatch+0x89
08 ffff870c`e7172790 fffff804`10b9516f Wdf01000!FxDevice::DispatchWithLock+0x157
09 ffff870c`e71727f0 fffff804`10a52f6d nt!IopPoHandleIrp+0x3b
0a ffff870c`e7172820 fffff804`10b973f9 nt!IofCallDriver+0x6d
0b ffff870c`e7172860 fffff804`1b777efe nt!IoCallDriver+0x9
0c ffff870c`e7172890 fffff804`1b777be4 WUDFRd!RdPnpTracker::RdPnpProcessor+0x47e
0d ffff870c`e7172930 fffff804`1b777a61 WUDFRd!RdPnpTracker::RdPnpProcessor+0x164
0e ffff870c`e71729d0 fffff804`10b5a4c5 WUDFRd!RdPnpTracker::RdPnpCallbackAtPassiveInSystemProcess+0x11
0f ffff870c`e7172a00 fffff804`10a25975 nt!IopProcessWorkItem+0x135
10 ffff870c`e7172a70 fffff804`10b17e85 nt!ExpWorkerThread+0x105
11 ffff870c`e7172b10 fffff804`10bfd2a8 nt!PspSystemThreadStartup+0x55
12 ffff870c`e7172b60 00000000`00000000 nt!KiStartSystemThread+0x28

The function in which we appear to have crashed at is DispatchDeviceSetPower, this dispatches the bugcheck exception if the device in question is the power policy owner and has not requested for a power state change. This can be seen in the following code excerpt:

if (IsPowerPolicyOwner()) {
        if (m_PowerPolicyMachine.m_Owner->m_RequestedPowerUpIrp == FALSE &&
            m_PowerPolicyMachine.m_Owner->m_RequestedPowerDownIrp == FALSE) {
            //
            // A power irp arrived, but we did not request it.  log and bugcheck
            //
            DoTraceLevelMessage(
                GetDriverGlobals(), TRACE_LEVEL_ERROR, TRACINGPNP,
                "Received set device power irp 0x%p on WDFDEVICE 0x%p !devobj 0x%p, "
                "but the irp was not requested by the device (the power policy owner)",
                Irp->GetIrp(), m_Device->GetHandle(),
                m_Device->GetDeviceObject());

            FxVerifierBugCheck(GetDriverGlobals(),  // globals
                   WDF_POWER_MULTIPLE_PPO, // specific type
                   (ULONG_PTR)m_Device->GetDeviceObject(), //parm 2
                   (ULONG_PTR)Irp->GetIrp());  // parm 3

            /* NOTREACHED */
}

The power policy owner is determined by checking if the m_Owner field for the FxPowerPolicyMachine object is null, if it isn’t, then the device is said to be the power policy owner of the device stack. The power policy machine object is a property of a larger object called FxPkgFdo. The function then checks if the device requested the power state change by checking two fields within the FxPowerPolicySettings object (m_owner), if both of these are false then the framework will throw the bugcheck shown at the beginning of this post.

3: kd> dt FxPkgFdo -y m_PowerPolicyMachine ffff9888d37e5600
Wdf01000!FxPkgFdo
   +0x2b8 m_PowerPolicyMachine : FxPowerPolicyMachine
3: kd> dt FxPowerPolicyMachine -y m_Owner ffff9888`d37e58b8
Wdf01000!FxPowerPolicyMachine
   +0x0c0 m_Owner : 0xffff9888`d37d9a30 FxPowerPolicyOwnerSettings
3: kd> dt FxPowerPolicyOwnerSettings ffff9888`d37d9a30
Wdf01000!FxPowerPolicyOwnerSettings
   +0x000 m_PowerIdleMachine : FxPowerIdleMachine
   +0x128 m_PoxInterface   : FxPoxInterface
   +0x168 m_DeviceArmWakeFromS0 : FxPowerDeviceArmWakeFromS0
   +0x190 m_DeviceArmWakeFromSx : FxPowerDeviceArmWakeFromSx
   +0x1c8 m_DeviceDisarmWakeFromS0 : FxPowerDeviceDisarmWakeFromS0
   +0x1f0 m_DeviceDisarmWakeFromSx : FxPowerDeviceDisarmWakeFromSx
   +0x218 m_DeviceWakeFromS0Triggered : FxPowerDeviceWakeFromS0Triggered
   +0x240 m_DeviceWakeFromSxTriggered : FxPowerDeviceWakeFromSxTriggered
   +0x268 m_UsbIdle        : (null)
   +0x270 m_PkgPnp         : 0xffff9888`d37e5600 FxPkgPnp
   +0x278 m_WakeSettings   : WakePolicySettings
   +0x298 m_IdleSettings   : IdlePolicySettings
   +0x2d0 m_DevicePowerIrpTracker : FxDevicePowerIrpTracker
   +0x360 m_SystemToDeviceStateMap : 0x4440010
   +0x364 m_ChildrenPoweredOnCount : 0
   +0x368 m_ChildrenArmedCount : 0n0
   +0x36c m_WaitWakeStatus : 0n-1073741637
   +0x370 m_IdealDxStateForSx : 0x4 ''
   +0x371 m_RequestedPowerUpIrp : 0 '' << FALSE
   +0x372 m_RequestedPowerDownIrp : 0 '' << FALSE
   +0x373 m_RequestedWaitWakeIrp : 0 ''
   +0x374 m_WakeCompletionEventDropped : 0 ''
   +0x375 m_PowerFailed    : 0 ''
   +0x376 m_CanSaveState   : 0 ''
   +0x377 m_ChildrenCanPowerUp : 0x1 ''
   +0x378 m_SystemWakeSource : 0 ''
   +0x380 m_PowerCallbackObject : 0xffff9888`c46a1780 _CALLBACK_OBJECT
   +0x388 m_PowerCallbackRegistration : 0xffff9888`d3758b10 Void
   +0x390 m_WaitWakeCancelCompletionOwnership : 0n0


The address of the FxPkgFdo object can be found from the FxPkgPnp’s Dispatch function as the first parameter.

3: kd> !stack -p
Call Stack : 16 frames
## Stack-Pointer    Return-Address   Call-Site
00 ffff870ce7172658 fffff80415b08920 nt!KeBugCheckEx+0
    Parameter[0] = 000000000000010d
    Parameter[1] = 000000000000000d
    Parameter[2] = ffff9888d3710a70
    Parameter[3] = ffff9888d4753010
01 ffff870ce7172660 fffff80415ad1842 Wdf01000!FxVerifierBugCheckWorker+24
    Parameter[0] = ffff9888d37e2a20
    Parameter[1] = 000000000000000d
    Parameter[2] = (unknown)
    Parameter[3] = (unknown)
02 ffff870ce71726a0 fffff80415ac10b5 Wdf01000!FxPkgFdo::DispatchDeviceSetPower+10782 (perf)
    Parameter[0] = ffff9888d37e5600 <<FxPkgFdo
    Parameter[1] = ffff870ce7172790 << FxIrp
    Parameter[2] = (unknown)
    Parameter[3] = (unknown)
03 ffff870ce71726f0 fffff80415abcc60 Wdf01000!FxPkgFdo::_DispatchSetPower+25
    Parameter[0] = (unknown)
    Parameter[1] = ffff870ce7172790
    Parameter[2] = (unknown)
    Parameter[3] = (unknown)
04 ffff870ce7172720 fffff80415aba867 Wdf01000!FxPkgPnp::Dispatch+b0 (perf)
    Parameter[0] = ffff9888d37e5600 << FxPkgFdo
    Parameter[1] = ffff9888d4753010 << IRP
    Parameter[2] = (unknown)
    Parameter[3] = (unknown)
05 ffff870ce7172790 fffff80410b9516f Wdf01000!FxDevice::DispatchWithLock+157 (perf)
    Parameter[0] = 0000000000000002
    Parameter[1] = ffff9888d4753010
    Parameter[2] = (unknown)
    Parameter[3] = (unknown)
[...]

You may also notice that the function logs the error message into the WDF IFR log for the driver and dispatches the bugcheck exception by calling FxVerifierBugCheckWorker which is wrapper function for the standard KeBugcheckEx function.

VOID
__declspec(noreturn)
FxVerifierBugCheckWorker(
    __in PFX_DRIVER_GLOBALS FxDriverGlobals,
    __in WDF_BUGCHECK_CODES WdfBugCheckCode,
    __in_opt ULONG_PTR BugCheckParameter2,
    __in_opt ULONG_PTR BugCheckParameter3
)

The second parameter for FxVerifierBugcheckWorker is actually an enumeration called WDF_BUGCHECK_CODES. This enumeration contains all the possible reasons for the crash; in our case, it appears to be due to multiple power policy owners within the same device stack as mentioned earlier.

3: kd> dt WDF_BUGCHECK_CODES
Wdf01000!WDF_BUGCHECK_CODES
   WDF_POWER_ROUTINE_TIMED_OUT = 0n1
   WDF_RECURSIVE_LOCK = 0n2
   WDF_VERIFIER_FATAL_ERROR = 0n3
   WDF_REQUIRED_PARAMETER_IS_NULL = 0n4
   WDF_INVALID_HANDLE = 0n5
   WDF_REQUEST_FATAL_ERROR = 0n6
   WDF_OBJECT_ERROR = 0n7
   WDF_DMA_FATAL_ERROR = 0n8
   WDF_INVALID_INTERRUPT = 0n9
   WDF_QUEUE_FATAL_ERROR = 0n10
   WDF_INVALID_LOCK_OPERATION = 0n11
   WDF_PNP_FATAL_ERROR = 0n12
   WDF_POWER_MULTIPLE_PPO = 0n13
   WDF_VERIFIER_IRQL_MISMATCH = 0n14
   WDF_VERIFIER_CRITICAL_REGION_MISMATCH = 0n15
   WDF_API_UNAVAILABLE = 0n16

So, we understand why the bugcheck was caused and the bugcheck description suggests that there may be two power policy owners in the same device stack. However, let’s confirm this fact, by dumping the IFR logs for both drivers and their subsequent WDF device objects.

3: kd> !wdflogdump esif_lf
Trace searchpath is:

Trace format prefix is: %7!u!: %!FUNC! -
Trying to extract TMF information from - C:\ProgramData\dbg\sym\Wdf01000.pdb\6D6E9C327875C47856AC7DA750AB1C5E1\Wdf01000.pdb
Gather log: Please wait, this may take a moment (reading 4024 bytes).
% read so far ... 10, 20, 30, 40, 50, 60, 70, 100
There are 58 log entries
--- start of log ---
1: FxIFRStart - FxIFR logging started
2: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering PnP State WdfDevStatePnpInit from WdfDevStatePnpObjectCreated
3: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000000(IRP_MN_START_DEVICE) IRP 0xFFFF9888CF6A5AF0
4: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering PnP State WdfDevStatePnpInitStarting from WdfDevStatePnpInit
5: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering PnP State WdfDevStatePnpHardwareAvailable from WdfDevStatePnpInitStarting
6: FxPkgPnp::PnpMatchResources - Not enough interrupt objects created by WDFDEVICE 0x000067772C8DE6C8
7: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power policy state WdfDevStatePwrPolStarting from WdfDevStatePwrPolObjectCreated
8: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power idle state FxIdleStarted from FxIdleStopped
9: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerStartingCheckDeviceType from WdfDevStatePowerObjectCreated
10: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerD0Starting from WdfDevStatePowerStartingCheckDeviceType
11: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerD0StartingConnectInterrupt from WdfDevStatePowerD0Starting
12: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerD0StartingDmaEnable from WdfDevStatePowerD0StartingConnectInterrupt
13: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerD0StartingPostHardwareEnabled from WdfDevStatePowerD0StartingDmaEnable
14: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerD0StartingStartSelfManagedIo from WdfDevStatePowerD0StartingPostHardwareEnabled
15: FxPkgIo::ResumeProcessingForPower - Power resume all queues of WDFDEVICE 0x000067772C8DE6C8
16: FxSelfManagedIoMachine::ProcessEvent - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering self managed io state FxSelfManagedIoInit from FxSelfManagedIoCreated
17: FxIoTargetRemote::OpenTargetHandle - ZwCreateFile for WDFIOTARGET 000067772C667368 returned status STATUS_SUCCESS, info 0x0
18: FxIoTargetRemote::ClearTargetPointers - WDFIOTARGET 000067772C667368 cleared pointers FFFF870CE7181B30 state WdfIoTargetClosed, open state 1, pdo FFFF9888D0B396F0, fileobj FFFF9888D384F1E0, handle FFFFFFFF80000F9C
19: FxIoTargetRemote::Close - WDFIOTARGET 000067772C667368 derefing PDO FFFF9888D0B396F0 on close
20: FxIoTargetRemote::Close - WDFIOTARGET 000067772C667368 derefing FileObj FFFF9888D384F1E0 on close
21: FxIoTargetRemote::Close - WDFIOTARGET 000067772C667368 closing handle FFFFFFFF80000F9C on close
22: FxIoTargetRemote::ClearTargetPointers - WDFIOTARGET 000067772C667368 cleared pointers FFFF870CE7181A40 state WdfIoTargetDeleted, open state 1, pdo 0000000000000000, fileobj 0000000000000000, handle 0000000000000000
23: FxIoTarget::WaitForDisposeEvent - WDFIOTARGET 000067772C667368, Waiting on Dispose event FFFF870CE71819B0
24: FxIoTargetRemote::OpenTargetHandle - ZwCreateFile for WDFIOTARGET 000067772C47A748 returned status STATUS_SUCCESS, info 0x0
25: FxSelfManagedIoMachine::ProcessEvent - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering self managed io state FxSelfManagedIoStarted from FxSelfManagedIoInit
26: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power idle state FxIdleStartedPowerUp from FxIdleStarted
27: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power idle state FxIdleDisabled from FxIdleStartedPowerUp
28: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerDecideD0State from WdfDevStatePowerD0StartingStartSelfManagedIo
29: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering Power State WdfDevStatePowerD0 from WdfDevStatePowerDecideD0State
30: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power policy state WdfDevStatePwrPolStartingPoweredUp from WdfDevStatePwrPolStarting
31: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power policy state WdfDevStatePwrPolStartingSucceeded from WdfDevStatePwrPolStartingPoweredUp
32: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power policy state WdfDevStatePwrPolStartingDecideS0Wake from WdfDevStatePwrPolStartingSucceeded
33: FxPkgPnp::PowerPolicyEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power policy state WdfDevStatePwrPolStarted from WdfDevStatePwrPolStartingDecideS0Wake
34: FxPowerIdleMachine::ProcessEventLocked - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering power idle state FxIdleDisabled from FxIdleDisabled
35: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering PnP State WdfDevStatePnpEnableInterfaces from WdfDevStatePnpHardwareAvailable
36: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 entering PnP State WdfDevStatePnpStarted from WdfDevStatePnpEnableInterfaces
37: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2F20A20
38: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2F20A20
39: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000014(IRP_MN_QUERY_PNP_DEVICE_STATE) IRP 0xFFFF9888D2DE7B70
40: FxPkgFdo::HandleQueryPnpDeviceStateCompletion - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 returning PNP_DEVICE_STATE 0x32 IRP 0xFFFF9888D2DE7B70
41: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type BusRelations IRP 0xFFFF9888CF6F3940
42: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888CF6F3940
43: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2DE7B70
44: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2DE7B70
45: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888CBF59460
46: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888CBF59460
47: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888CBF59460
48: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2F21A20
49: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2F21A20
50: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2F21A20
51: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D0ED55E0
52: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888CBF59460
53: FxIoTargetRemote::OpenTargetHandle - ZwCreateFile for WDFIOTARGET 000067772C245458 returned status STATUS_SUCCESS, info 0x0
54: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2EFAA20
55: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFF9888D2EFAA20
56: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 IRP_MJ_POWER, 0x00000002(IRP_MN_SET_POWER) IRP 0xFFFF9888DCD99560 for PowerSystemShutdown (S5)
57: FxPkgPnp::Dispatch - WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70 IRP_MJ_POWER, 0x00000002(IRP_MN_SET_POWER) IRP 0xFFFF9888D4753010 for PowerDeviceD3
58: FxPkgFdo::DispatchDeviceSetPower - Received set device power irp 0xFFFF9888D4753010 on WDFDEVICE 0x000067772C8DE6C8 !devobj 0xFFFF9888D3710A70, but the irp was not requested by the device (the power policy owner)

Notice the same logging statement from the code sample shown earlier?

3: kd> !wdfdevice 0x000067772C8DE6C8
Treating handle as a KMDF handle!

Dumping WDFDEVICE 0x000067772c8de6c8
=================================

WDM PDEVICE_OBJECTs:  self ffff9888d3710a70, attached ffff9888d37e2c60, pdo ffff9888c67ef360

Pnp state:  119 ( WdfDevStatePnpStarted )
Power state:  307 ( WdfDevStatePowerD0 )
Power Pol state:  565 ( WdfDevStatePwrPolStarted )

Default WDFIOTARGET: 000067772d0742c8

Device is the power policy owner for the stack
No pended pnp, wait-wake irps
Pended system power !irp 0xffff9888dcd99560 (S5)

We’ll dump the other driver which was in the device stack by dumping the IFR log and then using handle address for the WDF Device object.

3: kd> !wdfdevice 0x000067772DEF48C8
Treating handle as a KMDF handle!

Dumping WDFDEVICE 0x000067772def48c8
=================================

WDM PDEVICE_OBJECTs:  self ffff9888d37e2c60, attached ffff9888c465ddc0, pdo ffff9888c67ef360

Pnp state:  119 ( WdfDevStatePnpStarted )
Power state:  307 ( WdfDevStatePowerD0 )
Power Pol state:  527 ( WdfDevStatePwrPolSleepingNoWakePowerDown )

Default WDFIOTARGET: 000067772d074628

Device is the power policy owner for the stack
No pended pnp, wait-wake irps
Pended system power !irp 0xffff9888dcd99560 (S5)

Ah! It seems that we do in fact have two power policy owners in the same device stack, hence why the system crashes with the bugcheck. Although, how do we confirm that dptf_cpu.sys is the correct power policy owner for the stack? In order to do so, we need to obtain the address of the FxDevice object and to do that, we can use the !wdfhandle command along with the address of the WDF device handle we obtained from the IFR log.

3: kd> !wdfhandle 0x000067772DEF48C8
Treating handle as a KMDF handle!

Dumping WDFHANDLE 0x000067772def48c8
=============================
Handle type is WDFDEVICE
Refcount: 2
Contexts:
    context:  dt 0xffff9888d210ba20 DEVICE_EXTENSION (size is 0x130 bytes)
    EvtCleanupCallback fffff8041c1e4670 dptf_cpu

Parent: !wdfhandle 0x000067772f521958, type is WDFDRIVER
Owning device: !wdfdevice 0x000067772def48c8

!wdfobject 0xffff9888d210b730 << FxDevice
3: kd> dt FxDevice -y m_PkgPnp ffff9888d210b730
Wdf01000!FxDevice
   +0x288 m_PkgPnp : 0xffff9888`d3720840 FxPkgPnp


The FxPkgPnp object contains a reference to the power policy settings object mentioned earlier.

3: kd> dt FxPkgPnp -y m_PowerPolicyMachine ffff9888`d3720840
Wdf01000!FxPkgPnp
   +0x2b8 m_PowerPolicyMachine : FxPowerPolicyMachine
3: kd> dt FxPowerPolicyMachine -y m_Owner ffff9888`d3720af8
Wdf01000!FxPowerPolicyMachine
   +0x0c0 m_Owner : 0xffff9888`d37e3a30 FxPowerPolicyOwnerSettings
3: kd> dt FxPowerPolicyOwnerSettings 0xffff9888`d37e3a30
Wdf01000!FxPowerPolicyOwnerSettings
   +0x000 m_PowerIdleMachine : FxPowerIdleMachine
   +0x128 m_PoxInterface   : FxPoxInterface
   +0x168 m_DeviceArmWakeFromS0 : FxPowerDeviceArmWakeFromS0
   +0x190 m_DeviceArmWakeFromSx : FxPowerDeviceArmWakeFromSx
   +0x1c8 m_DeviceDisarmWakeFromS0 : FxPowerDeviceDisarmWakeFromS0
   +0x1f0 m_DeviceDisarmWakeFromSx : FxPowerDeviceDisarmWakeFromSx
   +0x218 m_DeviceWakeFromS0Triggered : FxPowerDeviceWakeFromS0Triggered
   +0x240 m_DeviceWakeFromSxTriggered : FxPowerDeviceWakeFromSxTriggered
   +0x268 m_UsbIdle        : (null)
   +0x270 m_PkgPnp         : 0xffff9888`d3720840 FxPkgPnp
   +0x278 m_WakeSettings   : WakePolicySettings
   +0x298 m_IdleSettings   : IdlePolicySettings
   +0x2d0 m_DevicePowerIrpTracker : FxDevicePowerIrpTracker
   +0x360 m_SystemToDeviceStateMap : 0x4440010
   +0x364 m_ChildrenPoweredOnCount : 0
   +0x368 m_ChildrenArmedCount : 0n0
   +0x36c m_WaitWakeStatus : 0n-1073741637
   +0x370 m_IdealDxStateForSx : 0x4 ''
   +0x371 m_RequestedPowerUpIrp : 0 ''
   +0x372 m_RequestedPowerDownIrp : 0x1 '' << True
   +0x373 m_RequestedWaitWakeIrp : 0 ''
   +0x374 m_WakeCompletionEventDropped : 0 ''
   +0x375 m_PowerFailed    : 0 ''
   +0x376 m_CanSaveState   : 0 ''
   +0x377 m_ChildrenCanPowerUp : 0 ''
   +0x378 m_SystemWakeSource : 0 ''
   +0x380 m_PowerCallbackObject : 0xffff9888`c46a1780 _CALLBACK_OBJECT
   +0x388 m_PowerCallbackRegistration : 0xffff9888`d3758f90 Void
   +0x390 m_WaitWakeCancelCompletionOwnership : 0n0

As we can see, it was dptf_cpu.sys which requested the device to be powered down and is the legitimate power policy owner for the device stack. It would seem that esif_lf.sys would need to be updated or the source code be investigated, especially since both drivers are part of the same package which is the Intel thermal management software. As explained earlier, if we have multiple power policy owners in the same device stack; if the illegitimate power policy owner is higher up in the stack, it will receive the power IRP before the legitimate power policy owner, thus leading to the framework throwing the bugcheck exception.

3: kd> lmvm esif_lf
Browse full module list
start             end                 module name
fffff804`1c9d0000 fffff804`1ca2f000   esif_lf    (deferred)            
    Image path: \SystemRoot\System32\drivers\esif_lf.sys
    Image name: esif_lf.sys
    Browse all global symbols  functions  data
    Timestamp:        Thu Nov  2 21:28:12 2017 (59FB8DEC)
    CheckSum:         00068C4F
    ImageSize:        0005F000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

References:

microsoft/Windows-Driver-Frameworks
microsomicrosoft/Windows-Driver-Frameworks
microsoft/Windows-Driver-Frameworks
microsoft/Windows-Driver-Frameworks
One Reference to WDFDEVICE
Power Policy Ownership – Windows drivers

Posted in Debugging, WinDbg | Leave a comment

Windows Address Translation Deep Dive – Part 1

This is one of the most fundamental topics which you should read about while you’re debugging. The topic will be far too long to write into one post so I’m planning to split it into at least two separate posts; possibly three posts if I deem that it’s needed. Please note that this is a technical topic and I’m going to be discussing as much technical content as possible, and even then, we’ll only really be scratching at the surface of memory address translation. We’ll be looking at segmentation and its role in x86 address translation and how it is ignored in x86-x64. Afterwards, we’ll be prepared to take a look at paging and why it was introduced.

First of all, we need to go back to the past – the 16-bit era – and take a look at memory segmentation. A feature which still exists today on modern processors but is thankfully ignored on x64 processors when operating in long mode. Although, before we take a look at that, it’s important to recognise that there are three fundamental memory models: physical, flat (sometimes called linear) and segmented. Along with this, there are three modes of operation which the processor can be in: real mode, protected mode and system management mode (SMM).

Differences in Memory Models:

As the name suggests, the physical memory model is just that; physical memory. We’ve able to write directly to RAM and there is no absolutely no protection mechanisms in place. Once we’ve addressed all of the available physical memory, then the system will simply crash and we’ll be unable to continue.

The flat memory model adds the concept of virtual memory. Now the address space is a single contiguous sequential region of bytes, which each byte directly corresponding to a byte in physical memory. Each address in the flat memory model is known as a linear memory address. The address space is commonly referred to as the linear address space.

Now, the segmented memory model builds upon the flat memory model by dividing the address space into distinct regions called segments. Each address is determined by combining the effective address and the base address of the segment. This is known as the logical address, and in some circumstances, it it is also referred to as a far pointer.
The base address is stored in a special register called the segment selector. The effective address is an offset into the segment referenced by the segment selector.

Now, you could consider the paging model as a separate memory model, however, since it builds upon the flat and segmented memory models, I think it would be easier to refer to it as more of an extension of those memory models. It is also important to note, despite Windows using a segmented memory mode, the details of the process address space are abstracted away from the developer, and effectively the process address space of a process is flat. This is especially true when the processor is operating in long mode. We’ll discuss the different operating modes of the processor in the next section.

Differences in Modes:

Real Mode

Real Mode is first mode in which the processor is in when the computer first boots. This is before the operating system has been loaded. It implements the segmented memory model and there is very limited memory protections in place and therefore it is very easy to corrupt the system in this mode. The address space is limited to 20 bits and thus only 1MB of RAM is addressable. Now, each virtual (logical) address is split into two distinct parts: a 16-bit segment selector and a 16-bit offset. As mentioned previously, the segment selector refers to the base address of the segment and the offset is the effective address which is used to index into that segment. This is then used to map to a physical memory address in RAM.

While operating in Real Mode, the size of each register is limited to 16-bits. The registers are mostly a shortened version of their x64 counterparts, but please note that many registers which are available in its successor, are not available in Real Mode. There a few key registers which we’ll need to keep in mind, the segment registers are especially important, since they are key when using the segmented memory model.

The four segment registers are as follows:

CS – known as the code segment – contains the address of the segment selector for currently executing code block.

DS – known as the data segment – contains the address of the segment selector for the program’s data.

SS – known as the stack segment – contains the address of the segment selector for the program’s stack.

ES – known as the extra segment – acts as an additional data or code segment if the program requires it.

There is also the GS and FS segments which have no special meaning and are used by operating system as it so wishes. On Windows x86-x64, the GS segment is used to store the base address of the thread environment block (TEB).

These segment registers can be found in WinDbg using the standard r command.

0: kd> r
rax=fffff8041cdd4e80 rbx=ffff9888c9996000 rcx=ffff9888c9996000
rdx=ffff9888d4332010 rsi=0000000000000000 rdi=ffff9888d2cd0000
rip=fffff8041cdd4e80 rsp=fffff80417474518 rbp=fffff80417474800
r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=ffff9888d2cd0000 r13=fffff80417474cf8
r14=0000000000000000 r15=ffff9888d2018000
iopl=0         nv up ei pl nz na po nc
cs=0010  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000206
igdkmd64+0x334e80:
fffff804`1cdd4e80 48895c2408      mov     qword ptr [rsp+8],rbx ss:fffff804`17474520=ffff9888c9996000

In addition to the segment registers which store the segment selectors, there are the pointer registers, which are used to store the effective address (offset) for a particular segment. These are as follows:

IP – the instruction pointer – contains the address of the currently executing instruction and is used in conjunction with the code segment.

SP – the stack pointer – contains the address of the for the top of the current stack frame and is thus used with the data segment.

BP – the base pointer – points to the bottom of the current stack frame and is used with the stack segment.

Protected Mode

Now, Protected Mode is where things tend to become slightly more complicated. The segmented memory model is still used, however, paging has been added on top of this to provide additional more granular memory protection. On x64 systems, Protected Mode is extended further with the introduction of Long Mode. The key difference to note at this point in time, is that Long Mode sets the base address of all the segment registers (excluding GS and FS) to 0, which effectively, means that the address space is actually more of a flat memory model with the addition of paging. For now, we’ll simply focus on the segmented aspect of Protected Mode, and will pretend that paging has not been enabled.

The same segment selectors exist as stated beforehand, however, they no longer are used to map to a physical address space. Instead, x86 has introduced two registers which are the global descriptor table register (GDTR) and the local descriptor table register (LDTR). The local descriptor table is not used on Windows and therefore we’ll ignore it and pretend it does not exist. The GDTR contains the base address for the global descriptor table, each entry in the table is known as a segment descriptor and is used to describe a particular segment in the linear address space. The segment selector registers are used to index into the global descriptor table (GDT).

The segment selector is still 16-bits, although, the effective address has now been extended to 32-bits instead. The linear address is then constructed in the same manner as before, with one key difference, the base address is take from the segment descriptor and then combined with the effective address. Of course, in this case, the linear address is now 32-bits and the reason why x86 operating systems are able to address up to 4GB of RAM.

The following diagram describes the translation process from a logical address into the linear address space. With paging, the linear address would then be translated through the use of page tables to a physical address. We’ll cover this process in a later part of this tutorial series.

Before we continue, let’s explore some of the concepts which we’ve described using WinDbg. The GDTR can be dumped with the r command like any other register.

0: kd> r @gdtr
gdtr=fffff80417462fb0

Alternatively, the base address of the GDT can be found within the PCR structure for each processor.

0: kd> dt _KPCR -y GdtBase fffff8040defa000
nt!_KPCR
   +0x000 GdtBase : 0xfffff804`17462fb0 _KGDTENTRY64

The size of the table can be found by dumping the GDTL register as follows:

0: kd> r @gdtl
gdtl=0057

To dump a particular segment descriptor, then use the dg command along with the name of the segment selector which you wish to dump. Alternatively, you can provide the address of the segment selector.

2: kd> dg 0x33
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0033 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P  Lo 000002fb

Let’s examine of the fields shown by the command. Firstly, the Base Address is 0, which is what we would expect on x64 systems which are running on Long Mode. The base address is like this for almost all segments as described previously. The Type field describes the type of segment descriptor and in this example, the segment descriptor is for a code segment register i.e. the cs register. The other two important fields are the Privilege Level and the Present field. The Privilege Level describes the Descriptor Privilege Level (DPL) which corresponds which ring level the segment is able to be accessed at. The Present field indicates if the segment is resident within physical memory or not.

In fact, the segment selector can be dumped with the use of WinDbg by using the .formats command along with the value of the segment selector. A segment selector follows a particular format which we’ll describe in a moment.

2: kd> .formats 0x33
Evaluate expression:
  Hex:     00000000`00000033
  Decimal: 51
  Octal:   0000000000000000000063
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00110011
  Chars:   .......3
  Time:    Thu Jan  1 00:00:51 1970
  Float:   low 7.14662e-044 high 0
  Double:  2.51973e-322

The first two highlighted bits indicate the requested privilege level (RPL), in this case, the requested privilege level was ring 3 or user-mode. The third bit is a flag which indicates if the selector corresponds to the GDT (0) or the LDT (1). Lastly, the final 13 bits are reserved as an index. The index is taken by the processor and multiplied by 16 (8 on x86 systems), this is then added to the base of address of the GDT or LDT respectively; depending of if the table indicator bit is set or not i.e. the selector belongs to the GDT or the LDT.

The providing expression will provide the address of the corresponding descriptor for that selector.

? (index * 0n16) + base

The segment selector can be visualised in the following diagram:

Privilege Levels:

While this is not strictly related to address translation, I thought it would be worth mentioning, that the privilege levels of DPL and RPL are compared against the current privilege level (CPL) with instructions which involve CALL and JMP. This is to prevent user-mode applications from directly accessing hardware and/or executing instructions which are considered as “privileged”. The CPL is stored within the code segment selector, therefore instead of an RPL, we have a CPL for segments which are code. There are more nuances and other things to be aware of, however, they don’t merit discussion within this tutorial.

Global Descriptor Table:

We’ve now described some of the differences between protected mode and real mode, in addition to, the differences between the various memory models. Let’s have a look at the GDT and how we’re able to view it within WinDbg.

The first entry of the GDT always points to a null segment descriptor. This is a special entry and is used to raise an exception if a segment register is set to it and then subsequently used to access a region of memory. There is a good discussion thread which describes this behaviour.

0: kd> dg 0x00
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0000 00000000`00000000 00000000`00000000 <Reserved> 0 Nb By Np Nl 00000000

As previously mentioned, each entry within the GDT corresponds to a segment descriptor. Each descriptor is represented by a union type called _KGDTENTRY64. This is what the dg command uses to parse the segment selector passed to it. The structure is used by the !ms_gdt command as well.

0: kd> dt _KGDTENTRY64
nt!_KGDTENTRY64
   +0x000 LimitLow         : Uint2B
   +0x002 BaseLow          : Uint2B
   +0x004 Bytes            : <anonymous-tag>
   +0x004 Bits             : <anonymous-tag>
   +0x008 BaseUpper        : Uint4B
   +0x00c MustBeZero       : Uint4B
   +0x000 DataLow          : Int8B
   +0x008 DataHigh         : Int8B

Each processor will have it’s own GDT and the easiest method of dumping the GDT for each processor is by using the !ms_gdt command, which is available by loading the third-party SwishDbgExt extension library. Here is an sample of the output.

3: kd> !ms_gdt
    |-----|-----|--------------------|--------------------------------------------------------|
    | Cre | Idx | Type                             | Address            | Name                                                   |
    |-----|-----|--------------------|--------------------------------------------------------|
    |   0 |   0 | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   1 | TSS32 Busy                       | 0x0000000000000000 | None                                                   |
    |   0 |   2 | TSS32 Busy                       | 0x0000FFFF00000000 | None                                                   |
    |   0 |   3 | TSS32 Busy                       | 0x0000000000000000 | None                                                   |
    |   0 |   4 | Code RE Ac                       | 0xFFFFF80417461000 | None                                                   |
    |   0 |   5 | TSS16 Busy                       | 0x0000000000000000 | None                                                   |
    |   0 |   6 | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   7 | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   8 | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   9 | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   a | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   b | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   c | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   d | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   e | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |   f | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |  10 | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |  11 | Data RO                          | 0x0000000000000000 | None                                                   |
    |   0 |  12 | Data RO                          | 0x0000000000000000 | None                                                   |

The first column indicates the processor number (core index), with the second column referring to the index into the GDT. It’s important to also note, that on x64 systems, any attempts to modify or “hook” the GDT will result in a Stop 0x109 bugcheck. This is because the GDT is a protected structure on x64 systems.

Please note that the GDT can have up to 8,192 entries; the !ms_gdt command stops at 256 entries, therefore if you wish to see the entire table, then I suggest generating a simple script which will loop through the entire table and then dump the subsequent descriptors.

Paging and Segmentation:

Now, we’ve discussed the use of segmentation within Windows and how it is used in address translation. Please refer to the following diagram which will illustrate how paging and segmentation are utilised together. In the next part, we’ll be looking at paging and how it is managed on Windows.

References:

Translating Virtual to Physical Address on Windows: Segmentation – Infosec Resources
CPU Rings, Privilege, and Protection
Operating Systems Development Series
The 0x33 Segment Selector (Heavens Gate) – MalwareTech
Intel Processor Architecture Manual – Volume 3, Chapter 3
The Rootkit Arsenal 2nd Edition

Posted in Windows Internals, Debugging, WinDbg | Leave a comment

An Introduction to IRPs and Device Stacks

Introduction:

IRPs (I/O request packets) are the probably one of most important concepts to understand since they form the foundation of I/O. Please note that the I/O Manager and IRPs are a very lengthy topic, there are many nuances to learn and most of the I/O Manager’s functionality is undocumented. In this article, we’ll be covering the general structure of IRPs and how they are related to the device stack. This will help you understand the purpose of I/O stack locations and how they relate to IRP completion. With this knowledge, you’ll be able to understand and solve a number of more difficult bugchecks, especially crashes which involve a Stop 0x9F, Stop 0x133 or Stop 0x44.

With almost every I/O operation, wherever it be a disk read or a network request, an IRP is usually involved at some point during the lifetime of the operation. IRPs are created during the initiation of a I/O operation and disposed of once the I/O operation has been completed. They are used to enable drivers to communicate with each other during I/O, and the IRP is merely an abstraction of a I/O operation being performed. They are represented by a structure named _IRP. Here the word “opaque” refers to the fact that the structure is partially undocumented.

Typically, an IRP is intialised by the I/O Manager or one of its children: PnP Manager and the Power Manager. Please note that I said “typically” because drivers can create their own IRPs, and as you can imagine, the number of issues which can come with it. In fact, the I/O Manager is simply a set of APIs, there is no process or service which runs in the background.

IRPs are always allocated using non-paged pool. When an IRP is initialised it will specify a number of I/O stack locations. Since most IRPs typically use a common number of stack locations, lookaside lists are used for efficiency. The small lookaside list is used for IRPs which have one stack location; the medium lookaside list is used for requests which have between 2 and 4 stack locations and the large lookaside list is used for requests which have more than 4 stack locations. A non-paged lookaside list is typically represented by an opaque structure called NPAGED_LOOKASIDE_LIST.

The I/O Manager will initialise the first I/O stack location, however, it is the responsibility of all the subsequent drivers to initialise the other required I/O stack locations. A driver will only initialise the next I/O stack location directly below it and not the others. If not enough I/O stack locations are created with the IRP, then eventually the system will crash with a Stop 0x35.

I/O Stack Locations:

Let’s examine the general structure of an IRP and see where I/O stack locations come into play.

8: kd> dt _IRP
nt!_IRP
   +0x000 Type             : Int2B
   +0x002 Size             : Uint2B
   +0x004 AllocationProcessorNumber : Uint2B
   +0x006 Reserved         : Uint2B
   +0x008 MdlAddress       : Ptr64 _MDL
   +0x010 Flags            : Uint4B
   +0x018 AssociatedIrp    : <anonymous-tag>
   +0x020 ThreadListEntry  : _LIST_ENTRY
   +0x030 IoStatus         : _IO_STATUS_BLOCK
   +0x040 RequestorMode    : Char
   +0x041 PendingReturned  : UChar
   +0x042 StackCount       : Char
   +0x043 CurrentLocation  : Char
   +0x044 Cancel           : UChar
   +0x045 CancelIrql       : UChar
   +0x046 ApcEnvironment   : Char
   +0x047 AllocationFlags  : UChar
   +0x048 UserIosb         : Ptr64 _IO_STATUS_BLOCK
   +0x050 UserEvent        : Ptr64 _KEVENT
   +0x058 Overlay          : <anonymous-tag>
   +0x068 CancelRoutine    : Ptr64     void
   +0x070 UserBuffer       : Ptr64 Void
   +0x078 Tail             : <anonymous-tag>


The StackCount member indicates the number of stack locations which have been allocated to the IRP. The CurrentLocation member indicates which I/O stack location is currently active. As you can see, both members are char types and thus only take up 1 byte.

In fact, the I/O stack locations are connected to the end of the IRP as an array of I/O stack location structures. You can find the first I/O stack location by using the following technique. Firstly, dump the size of the IRP structure using the following expression.

0: kd> ?? sizeof(_IRP)
unsigned int64 0xd0

Next, find the size of the I/O stack location structure.

0: kd> ?? sizeof(_IO_STACK_LOCATION)
unsigned int64 0x48

Now, we have two offsets to work with and we’ll be using these with the address of a known IRP. I’m going to dump all the stack locations using the !irp command to demonstrate where we are in the stack.

0: kd> !irp ffff870aba01f8a0
Irp is active with 6 stacks 4 is current (= 0xffff870aba01fa48)
No Mdl: No System Buffer: Thread 00000000:  Irp stack trace.
     cmd  flg cl Device   File     Completion-Context
[N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000

            Args: 00000000 00000000 00000000 00000000
[N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000

            Args: 00000000 00000000 00000000 00000000
[IRP_MJ_POWER(16), IRP_MN_WAIT_WAKE(0)]
            0  0 ffff870ab6e72050 00000000 fffff80486559190-ffff870aba05a5c0
           \Driver\storahci    CLASSPNP!ClasspPowerUpCompletion
            Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
            0 e1 ffff870aba05a080 00000000 fffff804855f4730-00000000 Success Error Cancel pending
           \Driver\disk    partmgr!PmPowerCompletion
            Args: 00000000 00000001 00000001 00000000
[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
            0 e1 ffff870ab6fa58d0 00000000 fffff80482177aa0-ffff870aba0721e8 Success Error Cancel pending
           \Driver\partmgr    nt!PopRequestCompletion
            Args: 00000000 00000001 00000001 00000000
[N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-ffff870aba0721e8

            Args: 00000000 00000000 00000000 00000000

We’re got the base address of the structure and we know the size, so let’s apply that together to find the first I/O stack location. Please note that it will directly follow the stack shown above.

0: kd> ? ffff870aba01f8a0+d0
Evaluate expression: -132994836596368 = ffff870a`ba01f970

Let’s dump it like so.

0: kd> dt _IO_STACK_LOCATION ffff870a`ba01f970
win32k!_IO_STACK_LOCATION
   +0x000 MajorFunction    : 0 ''
   +0x001 MinorFunction    : 0 ''
   +0x002 Flags            : 0 ''
   +0x003 Control          : 0 ''
   +0x008 Parameters       : _IO_STACK_LOCATION::<unnamed-type-Parameters>
   +0x028 DeviceObject     : (null)
   +0x030 FileObject       : (null)
   +0x038 CompletionRoutine : (null)
   +0x040 Context          : (null)

It’s empty as shown in the stack produced by the !irp command. This is to be expected. Let’s jump to the third frame since we know that has data.

0: kd> ? ffff870a`ba01f970+(48*2)
Evaluate expression: -132994836596224 = ffff870a`ba01fa00

Let’s break down this expression quickly. I’ve used the address of the first I/O stack location and then added an offset equivalent to the size of two I/O stack locations since we’re jumping to the third one. Remember the size of a single I/O stack location is 0x48 bytes.

0: kd> dt _IO_STACK_LOCATION ffff870a`ba01fa00
win32k!_IO_STACK_LOCATION
   +0x000 MajorFunction    : 0x16 ''
   +0x001 MinorFunction    : 0 ''
   +0x002 Flags            : 0 ''
   +0x003 Control          : 0 ''
   +0x008 Parameters       : _IO_STACK_LOCATION::<unnamed-type-Parameters>
   +0x028 DeviceObject     : 0xffff870a`b6e72050 _DEVICE_OBJECT
   +0x030 FileObject       : (null)
   +0x038 CompletionRoutine : 0xfffff804`86559190     long  CLASSPNP!ClasspPowerUpCompletion+0
   +0x040 Context          : 0xffff870a`ba05a5c0 Void

As we can see, it’s the same as the third I/O stack location. For completeness, let’s dump the next I/O stack location.

0: kd> ? ffff870a`ba01fa00+48
Evaluate expression: -132994836596152 = ffff870a`ba01fa48
0: kd> dt _IO_STACK_LOCATION ffff870aba01fa48
win32k!_IO_STACK_LOCATION
   +0x000 MajorFunction    : 0x16 ''
   +0x001 MinorFunction    : 0x2 ''
   +0x002 Flags            : 0 ''
   +0x003 Control          : 0xe1 ''
   +0x008 Parameters       : _IO_STACK_LOCATION::<unnamed-type-Parameters>
   +0x028 DeviceObject     : 0xffff870a`ba05a080 _DEVICE_OBJECT
   +0x030 FileObject       : (null)
   +0x038 CompletionRoutine : 0xfffff804`855f4730     long  partmgr!PmPowerCompletion+0
   +0x040 Context          : (null)

Since we had known the address of the active I/O stack location, we could have also simply used +- 0x48 with the known address to see the next and previous I/O stack location.

The number of I/O stack locations should match the number of devices in the device stack. Please note that an IRP may pass through several different device stacks, so as a simplification, when I refer to the device stack, I’ll be talking about all the different device stacks merged together. In our example above, there would have been three different device stacks or three different device nodes. A single device stack represents a device node. Bear in mind, when I mention the word “device” I’m referring to the device node.

Device nodes are located within the device tree and represent an entire device. The device could be a SSD for example. Before we delve into device stacks, it should be noted that there are three different types of device object in the WDM (Windows Driver Model): physical device objects, functional device objects and filter device objects.

Physical device objects aren’t strictly “physical” and merely represent the fact that a device exists within the bus slot. For example, a graphics card driver would generate a form of physical device object. Physical device objects are created by bus drivers such as pci.sys. Upon boot, the PnP Manager will sent out an IRP with the Minor Code (we’ll discuss these later) QUERY_DEVICE_RELATIONS, this is what will be used to determine if a device exists on that particular bus, and if so, a physical device object will generated. You’ll actually end up with a Stop 0xCA crash if this isn’t handled properly.

Functional device objects are the objects which are the actually ‘heavy-duty’ devices which complete the work for the device. These are created by the functional device driver (nvlddmkm.sys) and provide the actual functionality.

Filter device objects are optional and a device may have several filter objects associated to it. These are created by device filter drivers.

Device Stacks:

Now that we’ve discussed device objects and I/O stack locations, let’s see how they relate to each other. To begin, let’s dump the device stack shown in the IRP above. We can do this with a known device object and the !devstack command.

Notice how the device objects correspond to the same device objects shown in the IRP? The first device object to be created at the bottom and the last device object to be created is shown at the top. Each device stack will have one physical device object which is always created first, followed by one functional device object and then a set of optional filter drivers. From the example below, we can see that storahci.sys is the bus driver; disk.sys is the functional device driver and partmgr.sys is the upper filter driver. Any filter drivers which are created before the functional device object are known as lower filter drivers.

Each device object is “attached” to the device object below it through the use of a pointer. This pointer is stored in the the AttachedDevice member of the device object structure shown earlier. The member shows the device object which was attaches above it. If no device object exists above itself, then the member will be null.

0: kd> dt _DEVICE_OBJECT -y AttachedDevice ffff870ab6e72050
win32k!_DEVICE_OBJECT
   +0x018 AttachedDevice : 0xffff870a`ba05a080 _DEVICE_OBJECT

Each device object attaches itself to the device object below it by calling IoAttachDeviceToDeviceStack API function. Alternatively, in some cases, the IoAttachDevice API function may be used instead.

0: kd> !devstack ffff870ab6e72050
  !DevObj           !DrvObj            !DevExt           ObjectName
  ffff870ab6fa58d0  \Driver\partmgr    ffff870ab6fa5a20  InfoMask field not found for _OBJECT_HEADER at ffff870ab6fa58a0

  ffff870aba05a080  \Driver\disk       ffff870aba05a1d0  InfoMask field not found for _OBJECT_HEADER at ffff870aba05a050

> ffff870ab6e72050  \Driver\storahci   ffff870ab6e721a0  Cannot read info offset from nt!ObpInfoMaskToOffset

!DevNode ffff870ab6e9f420 :
  DeviceInst is "SCSI\Disk&Ven_WDC&Prod_WD2500BEVT-22ZCT\5&399a7c98&0&040000"
  ServiceName is "disk"

The command also provides us with the corresponding device node. However, as we can already see from the instance path, it is likely we are dealing with a Western Digital hard-disk. Nevertheless, let’s dump the device node and see what we can find.

0: kd> !devnode ffff870ab6e9f420
DevNode 0xffff870ab6e9f420 for PDO 0xffff870ab6e72050
  Parent 0xffff870ab6dae9a0   Sibling 0xffff870ab6e9f750   Child 0000000000
  InstancePath is "SCSI\Disk&Ven_WDC&Prod_WD2500BEVT-22ZCT\5&399a7c98&0&040000"
  ServiceName is "disk"
  State = DeviceNodeStarted (0x308)
  Previous State = DeviceNodeEnumerateCompletion (0x30d)
  StateHistory[09] = DeviceNodeEnumerateCompletion (0x30d)
  StateHistory[08] = DeviceNodeEnumeratePending (0x30c)
  StateHistory[07] = DeviceNodeStarted (0x308)
  StateHistory[06] = DeviceNodeStartPostWork (0x307)
  StateHistory[05] = DeviceNodeStartCompletion (0x306)
  StateHistory[04] = DeviceNodeStartPending (0x305)
  StateHistory[03] = DeviceNodeResourcesAssigned (0x304)
  StateHistory[02] = DeviceNodeDriversAdded (0x303)
  StateHistory[01] = DeviceNodeInitialized (0x302)
  StateHistory[00] = DeviceNodeUninitialized (0x301)
  StateHistory[19] = Unknown State (0x0)
  StateHistory[18] = Unknown State (0x0)
  StateHistory[17] = Unknown State (0x0)
  StateHistory[16] = Unknown State (0x0)
  StateHistory[15] = Unknown State (0x0)
  StateHistory[14] = Unknown State (0x0)
  StateHistory[13] = Unknown State (0x0)
  StateHistory[12] = Unknown State (0x0)
  StateHistory[11] = Unknown State (0x0)
  StateHistory[10] = Unknown State (0x0)
  Flags (0x24000130)  DNF_ENUMERATED, DNF_IDS_QUERIED,
                      DNF_NO_RESOURCE_REQUIRED, DNF_NO_LOWER_DEVICE_FILTERS,
                      DNF_NO_UPPER_DEVICE_FILTERS
  CapabilityFlags (0x00002180)  SilentInstall, RawDeviceOK,
                                WakeFromD3

As mentioned earlier, device nodes are part of a larger structure known as a device tree. From the output shown above, we can see that the device node has one sibling node and one parent node. We can use these addresses to transverse through the device tree if we so wish.

IRP Major Codes & Minor Codes:

Now that we have a general idea of how I/O stack locations of an IRP and device stacks are related to each other. Let’s discuss what happens when a device driver receives an IRP. An IRP is always sent to the uppermost driver of a device stack, or put another way, the first I/O stack location associated to the IRP. The device driver receives the IRP via it’s dispatch routine. To illustrate this, I’ll dump the dispatch table for the storahci.sys driver.

0: kd> !drvobj \Driver\storahci 7
Driver object (ffffe3845e4ec410) is for:
\Driver\storahci

Driver Extension List: (id , addr)
(fffff8003ad03b60 ffffe3845d6f5ba0)
Device Object list:
ffffe3845e576050

DriverEntry:   fffff8003acbe010    storahci!GsDriverEntry
DriverStartIo: 00000000
DriverUnload:  fffff8003ad02e30    storport!RaDriverUnload
AddDevice:     fffff8003ad02b60    storport!RaDriverAddDevice

Dispatch routines:
[00] IRP_MJ_CREATE                      fffff8003ad41780    storport!RaDriverCreateIrp
[01] IRP_MJ_CREATE_NAMED_PIPE           fffff80038706e00    nt!IopInvalidDeviceRequest
[02] IRP_MJ_CLOSE                       fffff8003ad41830    storport!RaDriverCloseIrp
[03] IRP_MJ_READ                        fffff80038706e00    nt!IopInvalidDeviceRequest
[04] IRP_MJ_WRITE                       fffff80038706e00    nt!IopInvalidDeviceRequest
[05] IRP_MJ_QUERY_INFORMATION           fffff80038706e00    nt!IopInvalidDeviceRequest
[06] IRP_MJ_SET_INFORMATION             fffff80038706e00    nt!IopInvalidDeviceRequest
[07] IRP_MJ_QUERY_EA                    fffff80038706e00    nt!IopInvalidDeviceRequest
[08] IRP_MJ_SET_EA                      fffff80038706e00    nt!IopInvalidDeviceRequest
[09] IRP_MJ_FLUSH_BUFFERS               fffff80038706e00    nt!IopInvalidDeviceRequest
[0a] IRP_MJ_QUERY_VOLUME_INFORMATION    fffff80038706e00    nt!IopInvalidDeviceRequest
[0b] IRP_MJ_SET_VOLUME_INFORMATION      fffff80038706e00    nt!IopInvalidDeviceRequest
[0c] IRP_MJ_DIRECTORY_CONTROL           fffff80038706e00    nt!IopInvalidDeviceRequest
[0d] IRP_MJ_FILE_SYSTEM_CONTROL         fffff80038706e00    nt!IopInvalidDeviceRequest
[0e] IRP_MJ_DEVICE_CONTROL              fffff8003ad27230    storport!SrbShimHookDeviceControl
[0f] IRP_MJ_INTERNAL_DEVICE_CONTROL     fffff8003acda230    storport!RaDriverScsiIrp
[10] IRP_MJ_SHUTDOWN                    fffff80038706e00    nt!IopInvalidDeviceRequest
[11] IRP_MJ_LOCK_CONTROL                fffff80038706e00    nt!IopInvalidDeviceRequest
[12] IRP_MJ_CLEANUP                     fffff80038706e00    nt!IopInvalidDeviceRequest
[13] IRP_MJ_CREATE_MAILSLOT             fffff80038706e00    nt!IopInvalidDeviceRequest
[14] IRP_MJ_QUERY_SECURITY              fffff80038706e00    nt!IopInvalidDeviceRequest
[15] IRP_MJ_SET_SECURITY                fffff80038706e00    nt!IopInvalidDeviceRequest
[16] IRP_MJ_POWER                       fffff8003acdee00    storport!RaDriverPowerIrp
[17] IRP_MJ_SYSTEM_CONTROL              fffff8003ad43dd0    storport!RaDriverSystemControlIrp
[18] IRP_MJ_DEVICE_CHANGE               fffff80038706e00    nt!IopInvalidDeviceRequest
[19] IRP_MJ_QUERY_QUOTA                 fffff80038706e00    nt!IopInvalidDeviceRequest
[1a] IRP_MJ_SET_QUOTA                   fffff80038706e00    nt!IopInvalidDeviceRequest
[1b] IRP_MJ_PNP                         fffff8003acd7d30    storport!RaDriverPnpIrp

If a driver supports a particular IRP then it will have associated dispatch routine in its dispatch table. In this case, when the driver receives an IRP with a major code IRP_MJ_POWER, the storport!RaDriverPowerIrp function will be called. Each major code will then have a set of minor codes associated with it. When the driver receives the IRP via its dispatch routine, it will check what the minor code of the request is and then respond accordingly. In our case, we can see that the device from resuming from a sleep state.

The dispatch table is simply an array of function pointers which point to a dispatch routine. On x64 systems, each entry in the table is 8 bytes. For those interested, we can dump the dispatch table without the assistance of !drvobj using the following method.

0: kd> dt _DRIVER_OBJECT ffffe3845e4ec410
nt!_DRIVER_OBJECT
   +0x000 Type             : 0n4
   +0x002 Size             : 0n336
   +0x008 DeviceObject     : 0xffffe384`5e576050 _DEVICE_OBJECT
   +0x010 Flags            : 0x412
   +0x018 DriverStart      : 0xfffff800`3ac90000 Void
   +0x020 DriverSize       : 0x32000
   +0x028 DriverSection    : 0xffffe384`5c2806d0 Void
   +0x030 DriverExtension  : 0xffffe384`5e4ec560 _DRIVER_EXTENSION
   +0x038 DriverName       : _UNICODE_STRING "\Driver\storahci"
   +0x048 HardwareDatabase : 0xfffff800`3912d990 _UNICODE_STRING "\REGISTRY\MACHINE\HARDWARE\DESCRIPTION\SYSTEM"
   +0x050 FastIoDispatch   : (null)
   +0x058 DriverInit       : 0xfffff800`3acbe010     long  storahci!GsDriverEntry+0
   +0x060 DriverStartIo    : (null)
   +0x068 DriverUnload     : 0xfffff800`3ad02e30     void  storport!RaDriverUnload+0
   +0x070 MajorFunction    : [28] 0xfffff800`3ad41780     long  storport!RaDriverCreateIrp+0
0: kd> ? ffffe3845e4ec410+70
Evaluate expression: -31317319302016 = ffffe384`5e4ec480

This is the base address of the array, from here, we can use the dps command and dump all the function pointers.

0: kd> dps ffffe384`5e4ec480 L20
ffffe384`5e4ec480  fffff800`3ad41780 storport!RaDriverCreateIrp << Each offset is 8 bytes; second address is the dispatch routine address
ffffe384`5e4ec488  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec490  fffff800`3ad41830 storport!RaDriverCloseIrp
ffffe384`5e4ec498  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4a0  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4a8  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4b0  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4b8  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4c0  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4c8  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4d0  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4d8  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4e0  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4e8  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec4f0  fffff800`3ad27230 storport!SrbShimHookDeviceControl
ffffe384`5e4ec4f8  fffff800`3acda230 storport!RaDriverScsiIrp
ffffe384`5e4ec500  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec508  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec510  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec518  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec520  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec528  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec530  fffff800`3acdee00 storport!RaDriverPowerIrp
ffffe384`5e4ec538  fffff800`3ad43dd0 storport!RaDriverSystemControlIrp
ffffe384`5e4ec540  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec548  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec550  fffff800`38706e00 nt!IopInvalidDeviceRequest
ffffe384`5e4ec558  fffff800`3acd7d30 storport!RaDriverPnpIrp
[...]
0: kd> ? ffffe384`5e4ec480+0n8
Evaluate expression: -31317319302008 = ffffe384`5e4ec488

This produces the following:

0: kd> dps ffffe384`5e4ec488 L1
ffffe384`5e4ec488  fffff800`38706e00 nt!IopInvalidDeviceRequest

And, again:

0: kd> ? ffffe384`5e4ec480+0n16
Evaluate expression: -31317319302000 = ffffe384`5e4ec490
0: kd> dps ffffe384`5e4ec490 L1
ffffe384`5e4ec490  fffff800`3ad41830 storport!RaDriverCloseIrp

Before we move onto describing how an IRP is completed, I should mention the difference between a Major Code (MJ) and Minor Code (MN) for an IRP is. An MJ provides a category for the type of I/O operation, there is a MJ for every I/O stack location for the associated IRP, and it will be the exact same for each location. A device driver must provide an appropriate dispatch routine for every MJ it wishes to support. For example, if a device driver wishes to support read requests, then it must provide a routine for the IRP_MJ_READ major code.

In some cases, a MJ might be accompanied by a MN, this is used to describe a particular kind of I/O request for the corresponding MJ. For instance, let’s take a look at the IRP_MJ_POWER IRP. It has four different minor codes associated with it, each of which represents a slightly different kind of power request. It wouldn’t be wise to have one general dispatch routine for handling a device powering up and a device powering down. Instead, a handler method is provided as the dispatch routine, and smaller functions are assigned for each individual minor code. One of the MNs you’ll frequently see for a IRP_MJ_POWER request is the IRP_MN_SET_POWER minor code; this is used to set the power state of a particular device.

IRP Completion:

Let’s take a brief look at the completion of an IRP and how it transverses through the device stack. Please note that we’ll assume that the IRP will pass through the device stack with no issues and completes successfully for the sake of simplicity. This is usually not the case unfortunately, hence why you’ll see many Stop 0x9Fs and Stop 0x133s.

As noted earlier, the IRP will always be sent to the top of the device stack, from which the driver will examine through its dispatch routine. When the driver successfully receives the IRP, it has a few different decisions it can make. Please note that the scenarios described are general cases, please bear in mind, there are particular nauances which must be followed for Power and PnP IRPs.

If the IRP is uninteresting to the driver and it has no further input, it can simply pass the IRP onto the next device in the stack by calling IoSkipCurrentIrpLocation and then subsequently calling IoCallDriver. Remember it is the responsibility of the driver to properly initialise the next I/O stack location before passing the IRP on. Additionally, it can add a completion routine to the stack location by calling IoSetCompletionRoutine and then pass the request to the next driver in the stack.

Alternatively, it can complete the IRP immediately and the lower-level drivers will never see the IRP. This is because once an IRP has been completed, it transverses up the device stack, with each stack location being examined to see if it has a registered completion routine, and if so, this is called as well. This process continues until the request reaches the original caller or the I/O Manager.

Lastly, it could do what we all love, and mark the IRP as pending by calling IoMarkIrpPending and returning a status code of STATUS_PENDING. This successfully pends the IRP with the idea that it will be completed at a later time.

I hope that this article helps to clarify how I/O stack locations and device stacks are related, along with their role in IRP completion. If you have any questions, then please let me know.

References:

Windows Internals 7th Edition Part 1

Programming the Windows Kernel

Programming the Windows Driver Model 2nd Edition

Posted in Windows Internals, Debugging, WinDbg, Stop 0x9F, Stop 0x133 | Leave a comment

WinDbg Cheat Sheet – Revised Edition

I made a WinDbg cheat sheet which I thematically categorised a considerable time ago now, and since then, there’s lots of new commands which I’ve learnt and thought would be useful to others as well. I’ve now revised the cheat sheet and will include a .doc version which you’re free to edit and amend as you so wish.

The WinDbg cheat sheet (.zip folder) can be found on my OneDrive here. Please let me know if there is any issues or some of the documentation is incorrect. The latest version will always be available on my OneDrive.

Posted in Debugging, WinDbg, Miscellaneous | Leave a comment

Debugging Stop 0xCA – dptf_acpi.sys & dptf_cpu.sys

Background:

Within the past week or so, I’ve noticed a number of Stop 0xCA crashes caused by two drivers in particular: dptf_acpi.sys and dptf_cpu.sys. Both of which, appear to belong to same driver package which is Intel’s Dynamic Platform Thermal Framework (dptf). Despite the DRT (driver reference table) stating that the driver is available from Intel, there doesn’t appear to be any updates and from what I understand, this driver appears to be exclusively delivered through Windows Update. The Dell website appeared to have a driver package from 2015?

One user has mentioned that the update was named Intel DPTF 8.10600.150 in their case, however, I’m not certain if this is the same for everyone. The latest update on the Windows Update Catalog appears to be from 2017?

One of my spare laptops has been offered an updated called Intel – System – 2040.100.0.1029, however, this update was for the Intel Management Engine Interface instead.

I’ve been looking at the processors of the affected machines and there doesn’t appear to be any discernible pattern, other than they all belong to the same processor family which is 0x06. This processor family includes a wide variety of different processors. If you’re able to, then please check the processor information using the !sysinfo cpuinfo command.

The main purpose of this post to provide some information on what is happening with this crash, at the moment, I’m not aware of any definitive solution other than waiting for the driver to be patched by Intel. Most users have reported that using a system restore point has managed to resolve the issue for them.

PNP_DETECTED_FATAL_ERROR (ca)
PnP encountered a severe error, either as a result of a problem in a driver or
a problem in PnP itself.  The first argument describes the nature of the
problem, the second argument is the address of the PDO.  The other arguments
vary depending on argument 1.
Arguments:
Arg1: 000000000000000a, Incorrect notify callback behavior
    Driver failed to preserve IRQL or combined APC disable across
    a PlugPlay notification.
Arg2: ffffd68b8f2e6a70, Driver Object.
Arg3: 0000000000000000, IRQL after returning from driver callback.
Arg4: 000000000000ffff, Combined APC disable count after returning from driver callback.

As we can see from the bugcheck description and first parameter, a driver has failed to maintain a legal value within the combined APC disable field for the associated thread. We’re discuss this a little later on, but I would recommend reading about APCs before continuing with this post. They’re quite a lengthy topic and therefore I won’t discuss them here.

Firstly, let’s dump the driver object.

1: kd> !drvobj ffffd68b8f2e6a70
Driver object (ffffd68b8f2e6a70) is for:
\Driver\dptf_acpi

Driver Extension List: (id , addr)
(fffff80582304d50 ffffd68b90a34950) 
Device Object list:
ffffd68b90b4ad00  ffffd68b90b4cd00  ffffd68b90b4dd00  ffffd68b90b4ed00

Looks like the issue is with the Intel ACPI driver. Each of the associated device and their stacks appear to be related to a slightly different thermal control device e.g. fans.

1: kd> !devstack ffffd68b90b4ad00
  !DevObj           !DrvObj            !DevExt           ObjectName
> ffffd68b90b4ad00  \Driver\dptf_acpi  ffffd68b90f62b30 
  ffffd68b901b7cf0  \Driver\ACPI       ffffd68b9012ebe0  00000033
!DevNode ffffd68b90184310 :
  DeviceInst is "ACPI\INT3402\TMEM"
  ServiceName is "dptf_acpi"

So, we know which driver is responsible, however, we don’t know what exactly how it was caused. Let’s look at the call stack for the thread.

1: kd> knL
# Child-SP          RetAddr           Call Site
00 fffff900`78aa7998 fffff805`7e64bf58 nt!KeBugCheckEx
01 fffff900`78aa79a0 fffff805`7e522f26 nt!PnpNotifyDriverCallback+0x13ee78 << Crash here!
02 fffff900`78aa7a50 fffff805`7e50a400 nt!PnpNotifyDeviceClassChange+0x18e << The event category which the callback belongs to
03 fffff900`78aa7af0 fffff805`7e025975 nt!PnpDeviceEventWorker+0x290
04 fffff900`78aa7b70 fffff805`7e117e85 nt!ExpWorkerThread+0x105
05 fffff900`78aa7c10 fffff805`7e1fd2a8 nt!PspSystemThreadStartup+0x55
06 fffff900`78aa7c60 00000000`00000000 nt!KiStartSystemThread+0x28

I’ve highlighted the two most important parts of the call stack, but before we delve into those a little further, it would be best I briefly explained what PnP notifications are. Much like registry filter callbacks, the PnP Manager provides a number of different PnP events which drivers are able to subscribe to and then perform an action accordingly. For example, you may wish to listen to a particular event for debugging purposes and then log to a file each time it happens. Each PnP event belongs to a particular event category, these are stored in an enumeration called IO_NOTIFICATION_EVENT_CATEGORY.

1: kd> dt IO_NOTIFICATION_EVENT_CATEGORY
Wdf01000!IO_NOTIFICATION_EVENT_CATEGORY
   EventCategoryReserved = 0n0
   EventCategoryHardwareProfileChange = 0n1
   EventCategoryDeviceInterfaceChange = 0n2
   EventCategoryTargetDeviceChange = 0n3
   EventCategoryKernelSoftRestart = 0n4

A driver registers it’s callback routine (the function called when the event happens) by calling IoRegisterPlugPlayNotification. Once the event which the driver has subscribed to has fired, then the driver’s registered callback routine will be executed. There is a number of different rules which a notification routine must follow and this includes ensuring that the notification routine is only called at PASSIVE_LEVEL (IRQL Level 0). These rules are described here.

We can find the event and the event category by examining the second parameter passed to the PnpNotifyDriverCallback function. The callback routine takes a notification structure object which describes which event the callback is registered for.

1: kd> !stack -p
Call Stack : 7 frames
## Stack-Pointer    Return-Address   Call-Site      
00 fffff90078aa7998 fffff8057e64bf58 nt!KeBugCheckEx+0
    Parameter[0] = 00000000000000ca
    Parameter[1] = 000000000000000a
    Parameter[2] = ffffd68b8f2e6a70
    Parameter[3] = 0000000000000000
01 fffff90078aa79a0 fffff8057e522f26 nt!PnpNotifyDriverCallback+13ee78 (perf)
    Parameter[0] = ffffe688ce3d71f0
    Parameter[1] = fffff90078aa7a78 << Notification structure
    Parameter[2] = fffff90078aa7a70
    Parameter[3] = (unknown)

[...]
1: kd> dt _DEVICE_INTERFACE_CHANGE_NOTIFICATION fffff90078aa7a78
Wdf01000!_DEVICE_INTERFACE_CHANGE_NOTIFICATION
   +0x000 Version          : 1
   +0x002 Size             : 0x30
   +0x004 Event            : _GUID {cb3a4004-46f0-11d0-b08f-00609713053f} << GUID_DEVICE_INTERFACE_ARRIVAL
   +0x014 InterfaceClassGuid : _GUID {ee27098e-1b22-472a-89d8-5ccce16b1356}
   +0x028 SymbolicLinkName : 0xfffff900`78aa7b28 _UNICODE_STRING "\??\ACPI#INT3400#2&daba3ff&0#{ee27098e-1b22-472a-89d8-5ccce16b1356}"

As we can see, the notification callback routine was registered for the device interface change event category, which is responsible for checking when a device interface has been enabled or disabled. A device interface is a broad identifier for a number of different devices. For example, all of your disk storage devices would have a device interface identifier which indicated that it was a storage device.

The event in question can be found in the Event field. In this case, we were listening to an interface arrival event, which is the enabling of a new device with that particular device interface. For instance, imagine that you inserted a USB device, this would likely trigger a device interface arrival event as shown above.

We can look up the SymbolicLinkName using the !object command to reveal which device object the device interface is related to:

1: kd> !object \Global??\ACPI#INT3400#2&daba3ff&0#{09a5554b-da25-4461-8d80-5bc6c96dd932}
Object: ffffe688cef960d0  Type: (ffffd68b8f2acce0) SymbolicLink
    ObjectHeader: ffffe688cef960a0 (new version)
    HandleCount: 1  PointerCount: 2
    Directory Object: ffffe688c9c19ad0  Name: ACPI#INT3400#2&daba3ff&0#{09a5554b-da25-4461-8d80-5bc6c96dd932}
    Flags: 00000000 ( Local )
    Target String is '\Device\0000002f'
1: kd> !object \Device\0000002f
Object: ffffd68b901f18f0  Type: (ffffd68b8f2fb980) Device
    ObjectHeader: ffffd68b901f18c0 (new version)
    HandleCount: 0  PointerCount: 9
    Directory Object: ffffe688c9c65060  Name: 0000002f
1: kd> !devobj ffffd68b901f18f0
Device object (ffffd68b901f18f0) is for:
 0000002f \Driver\ACPI DriverObject ffffd68b8f5fb4e0
Current Irp 00000000 RefCount 1 Type 00000032 Flags 00001040
SecurityDescriptor ffffe688ca0bf060 DevExt ffffd68b9012e010 DevObjExt ffffd68b901f1a40 DevNode ffffd68b8f2e9050 
ExtensionFlags (0000000000)  
Characteristics (0x00000180)  FILE_AUTOGENERATED_DEVICE_NAME, FILE_DEVICE_SECURE_OPEN
AttachedDevice (Upper) ffffd68b984a2de0 \Driver\esif_lf
Device queue is not busy.

The device in question is related to ACPI which is then part of the device stack for the Intel Dynamic Platform Thermal Framework.


I would just like to point out, if you don’t know the type of notification structure the callback routine takes, then you can use the _PLUGPLAY_NOTIFICATION_HEADER structure first and then determine the appropriate structure from the Event field, since notification header structure is cast to the more specific structure based on that field value.

1: kd> dt _PLUGPLAY_NOTIFICATION_HEADER fffff90078aa7a78
Wdf01000!_PLUGPLAY_NOTIFICATION_HEADER
   +0x000 Version          : 1
   +0x002 Size             : 0x30
   +0x004 Event            : _GUID {cb3a4004-46f0-11d0-b08f-00609713053f} << GUID_DEVICE_INTERFACE_ARRIVAL

In fact, the same information can be obtained by using the !pnpevent command.

1: kd> !pnpevent

********************************************************************************
Dumping PnP DeviceEvent Queue @ 0xffffd68b8f2a9eb0
********************************************************************************

List = 0xffffe688cf13f280, 0xffffe688cf13f280

Dumping DeviceEventEntry @ 0xffffe688cf13f280
  ListEntry = 0xffffd68b8f2a9f28, 0xffffd68b8f2a9f28, Argument = 0x00000000
  CallerEvent = 0x00000000, Callback = 0x00000000, Context = 0x00000000
  VetoType = 0x00000000, VetoName = 0x00000000

  Dumping PlugPlayEventBlock @ 0xCF13F2F0
    EventGuid = GUID_DEVICE_INTERFACE_ARRIVAL
    Category = DeviceClassChangeEvent
    Result = 0x00000000, Flags = 0x00000000, TotalSize = 214
    DeviceObject = 0x00000000
      ClassGuid = 09A5554B-DA25-4461-8D80-5BC6C96DD932
      SymbolicLinkName = \??\ACPI#INT3400#2&daba3ff&0#{09a5554b-da25-4461-8d80-5bc6c96dd932}

  Total events in the list: 1

You may have noticed that the event category is a slightly different value, this is because the _PLUGPLAY_EVENT_BLOCK structure uses the the _PLUGPLAY_EVENT_CATEGORY enumeration instead.

1: kd> dt _PLUGPLAY_EVENT_CATEGORY
nt!_PLUGPLAY_EVENT_CATEGORY
   HardwareProfileChangeEvent = 0n0
   TargetDeviceChangeEvent = 0n1
   DeviceClassChangeEvent = 0n2
   CustomDeviceEvent = 0n3
   DeviceInstallEvent = 0n4
   DeviceArrivalEvent = 0n5
   VetoEvent = 0n6
   BlockedDriverEvent = 0n7
   InvalidIDEvent = 0n8
   DevicePropertyChangeEvent = 0n9
   DeviceInstanceRemovalEvent = 0n10
   DeviceInstanceStartedEvent = 0n11
   MaxPlugEventCategory = 0n12

I mentioned earlier about the combined APC disable count and the fact that it can be obtained from the thread itself. In order to do so, you simply have to dump the CombinedApcDisable field as so:

1: kd> dt _KTHREAD CombinedApcDisable ffffd68b90eb5040
nt!_KTHREAD
   +0x1e4 CombinedApcDisable : 0xffff

As you can see, the field has an invalid value of 0xFFFF. This value is actually the combination of two fields: KernelApcDisable and SpecialApcDisable. Both of which are part of the _KTHREAD structure.

1: kd> dt _KTHREAD SpecialApcDisable ffffd68b90eb5040
nt!_KTHREAD
   +0x1e6 SpecialApcDisable : 0n0
1: kd> dt _KTHREAD KernelApcDisable ffffd68b90eb5040
nt!_KTHREAD
   +0x1e4 KernelApcDisable : 0n-1

The CombinedApcDisable is then calculated using the following expression:

1: kd> ? 0n0 << 16 | 0n-1
Evaluate expression: -1 = ffffffff`ffffffff

As we can see, the KernelApcDisable field has a value of -1 which isn’t valid; typically this means a driver has left a critical region more times than it has entered one (if it entered one), which leads to the value of -1. This is a critical error and therefore the system crashes with the bugcheck shown. We know that the IRQL level was correct since the callback routine must be called at PASSIVE_LEVEL.

This crash seems to commonly occur during boot, whereupon the PnP Manager will be asking for which devices are currently connected to the system.

References:

Handling Device Interface Change Events – Windows drivers
Guidelines for Writing PnP Notification Callback Routines – Windows drivers
Using PnP Notification – Windows drivers

Posted in Windows Internals, Debugging, WinDbg, Stop 0xCA | 2 Comments

Debugging Stop 0x119 – VIDEO_SCHEDULER_INTERNAL_ERROR

In this post, we’re going to be looking at the bugcheck parameters of a Stop 0x119 which are documented, but not on the MSDN page for the bugcheck code itself for some reason?

Before we begin with examining the bugcheck, we’ll need to go some of the basic principles of the WDDM (Windows Display Driver Model). There is two drivers which an application will use: a user-mode display driver and a kernel-mode miniport display driver. The miniport driver is what does most of the “heavy-work”. Both drivers interact with the Direct X Kernel subsystem which is known as DXGKRNL.sys.

As we can see, the video scheduler is part of the Direct X kernel. This is where our crash occurs. The video scheduler has four different levels, each level differs on what functions are able to be run and the level of re-entrancy which is permitted. The level of re-entrancy is based upon distinct classes, in our example below, we were operating first level synchronization and within the GPU Scheduler class. At this level, no re-entrancy is permitted, and therefore only one thread will be running a function from the scheduler class.

The function which would have been called was DxgkDdiSubmitCommand. This builds a DMA buffer and then inserts it into the execution queue of a particular GPU engine. For example, the 3D rendering engine is a GPU engine. Each buffer which has been queued will have a unique identifier assigned to it called a fence identifier. Essentially, this ensures that the buffers are executed in the correct order. If a buffer is executed out of order or the fence identifier isn’t valid, then the system will bugcheck with another variant of a Stop 0x119.

There is actually two types of buffers available: command and DMA. The command buffers are used by the user-mode display driver and then sent to the graphics kernel, from there, the command buffer is validated and then converted into a DMA buffer. The DMA buffer contains the instructions for the GPU engine to execute. Both of the buffers are allocated from pageable memory. If a DMA buffer is currently paged out, then it must be paged into video memory.

Each process will have it’s own execution queue and the scheduler will provide a certain period of time for each task in the queue. If a task takes too long to process, then the system will bugcheck with either a Stop 0x116 or a Stop 0x117.

VIDEO_SCHEDULER_INTERNAL_ERROR (119)
The video scheduler has detected that fatal violation has occurred. This resulted
in a condition that video scheduler can no longer progress. Any other values after
parameter 1 must be individually examined according to the subtype.
Arguments:
Arg1: 0000000000000002, The driver failed upon the submission of a command.
Arg2: ffffffffc000000d << NTStatus Error Code
Arg3: ffff878c651fa860 << Pointer to DXGKARG_SUBMITCOMMAND structure
Arg4: ffffa1889c542570 << Parameter to dxgmms2!VidSchiSendToExecutionQueue+0x14d48

Anyhow, let’s begin with examining the bugcheck and checking why the system crashed. As you can see from the first parameter, the driver has failed to submit a paging command to the GPU’s execution queue. This is evident in the call stack.

6: kd> knL
# Child-SP RetAddr Call Site
00 ffff878c`651fa788 fffff806`98804210 nt!KeBugCheckEx
01 ffff878c`651fa790 fffff806`9a0faf38 watchdog!WdLogEvent5_WdCriticalError+0xe0
02 ffff878c`651fa7d0 fffff806`9a158d54 dxgmms2!VidSchiSendToExecutionQueue+0x14d48 << Crash here!
03 ffff878c`651fa900 fffff806`9a16a315 dxgmms2!VidSchiSubmitPagingCommand+0x2f4 << Command to be submitted
04 ffff878c`651faa80 fffff806`9a16a18a dxgmms2!VidSchiRun_PriorityTable+0x175
05 ffff878c`651faad0 fffff806`82d33585 dxgmms2!VidSchiWorkerThread+0xca
06 ffff878c`651fab10 fffff806`82dcb128 nt!PspSystemThreadStartup+0x55
07 ffff878c`651fab60 00000000`00000000 nt!KiStartSystemThread+0x28

Now, the second parameter fortunately provides an NTSTATUS code which indicates that an invalid parameter was passed to a function, presumably this is a parameter for the execution queue function.

6: kd> !error ffffffff`c000000d
Error code: (NTSTATUS) 0xc000000d (3221225485) - An invalid parameter was passed to a service or function.

If we examine the fourth parameter, then we can see that it’s the same parameter which is passed to the function in which we crashed at. The third parameter is a pointer to the DXGARG_SUBMITCOMMAND structure which is sent to the execution queue. So it seems that the driver has incorrectly set this and therefore has led to the crash you’ve experienced.

6: kd> !stack -p
Call Stack : 8 frames
## Stack-Pointer    Return-Address   Call-Site       
00 ffff878c651fa788 fffff80698804210 nt!KeBugCheckEx+0 
	Parameter[0] = 0000000000000119
	Parameter[1] = 0000000000000002
	Parameter[2] = ffffffffc000000d
	Parameter[3] = ffff878c651fa860
01 ffff878c651fa790 fffff8069a0faf38 watchdog!WdLogEvent5_WdCriticalError+e0 
	Parameter[0] = ffffa188982ea2b0
	Parameter[1] = (unknown)       
	Parameter[2] = (unknown)       
	Parameter[3] = (unknown)       
02 ffff878c651fa7d0 fffff8069a158d54 dxgmms2!VidSchiSendToExecutionQueue+14d48 (perf)
	Parameter[0] = ffffa1889c542570
	Parameter[1] = 0000000000000000
	Parameter[2] = (unknown)       
	Parameter[3] = (unknown)       
03 ffff878c651fa900 fffff8069a16a315 dxgmms2!VidSchiSubmitPagingCommand+2f4 (perf)
	Parameter[0] = ffffa1889f22d010
	Parameter[1] = (unknown)       
	Parameter[2] = (unknown)       
	Parameter[3] = (unknown)

If we check the raw stack for the thread, we can see what action might have been about to be undertaken. It appears that the graphics miniport driver was attempting to gain access to the hardware’s power settings. The Component Power Management model allows developers to have finer grained control of the individual components which make up a graphics card.

6: kd> !dpx
Start memory scan  : 0xffff878c651fa788 ($csp)
End memory scan    : 0xffff878c651fb000 (Kernel Stack Base)

               rsp : 0xffff878c651fa788 : 0xfffff80698804210 : watchdog!WdLogEvent5_WdCriticalError+0xe0
Unable to load image \SystemRoot\System32\DriverStore\FileRepository\nvlt.inf_amd64_e7444925b6f55a93\nvlddmkm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
0xffff878c651fa788 : 0xfffff80698804210 : watchdog!WdLogEvent5_WdCriticalError+0xe0
0xffff878c651fa858 : 0xfffff806990a47c8 : dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBInternal+0x70
0xffff878c651fa8f8 : 0xfffff8069a158d54 : dxgmms2!VidSchiSubmitPagingCommand+0x2f4
0xffff878c651fa948 : 0xfffff8069a15ab73 : dxgmms2!VidSchiCheckHwProgress+0x163
0xffff878c651fa9b8 : 0xfffff8069a0e8732 : dxgmms2!VidSchiScheduleCommandToRun+0x272
0xffff878c651faa28 : 0xfffff8069a0f1400 : dxgmms2!VidSchiClearFlipDevice+0x5c
0xffff878c651faa78 : 0xfffff8069a16a315 : dxgmms2!VidSchiRun_PriorityTable+0x175
0xffff878c651faa98 : 0xfffff8069a16a0c0 : dxgmms2!VidSchiWorkerThread
0xffff878c651faac8 : 0xfffff8069a16a18a : dxgmms2!VidSchiWorkerThread+0xca
0xffff878c651faad8 : 0xfffff8069a16a0c0 : dxgmms2!VidSchiWorkerThread
0xffff878c651fab08 : 0xfffff80682d33585 : nt!PspSystemThreadStartup+0x55
0xffff878c651fab58 : 0xfffff80682dcb128 : nt!KiStartSystemThread+0x28
0xffff878c651fab70 : 0xfffff80682d33530 : nt!PspSystemThreadStartup
6: kd> lmvm nvlddmkm
Browse full module list
start             end                 module name
fffff806`9ae40000 fffff806`9bc63000   nvlddmkm T (no symbols)           
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nvlt.inf_amd64_e7444925b6f55a93\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Mon Jun 19 02:28:31 2017 (594728BF)
    CheckSum:         00DD9945
    ImageSize:        00E23000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

So, as we can see, for some reason the display miniport driver has failed to submit the command to the execution queue. Unfortunately, like most graphics-related bugchecks, to really understand what is going on, we’re going to need to have some extensive knowledge of the DXGKRNL (most of this is quite secretive and undocumented to the public) along with the appropriate checked symbols. However, it does appear to be an issue with the graphics card driver in this case. In some circumstances, these bugchecks have been caused by either a faulty PSU or a faulty graphics card as well.

As a closing note, the general architecture for the GPU under WDDM can be found below:

References:

DXGKCB_SETPOWERCOMPONENTACTIVE callback function (d3dkmddi.h) DXGKDDI_SUBMITCOMMAND callback function (d3dkmddi.h) GPU power management of idle states and active power Enumerating GPU engine capabilities
Posted in Debugging, WinDbg, Stop 0x119 | Leave a comment

Debugging Stop 0xCA – PNP_DETECTED_FATAL_ERROR

Stop 0xCA’s are relatively rare but fortunately they’re usually quite easy to debug. It’s also one of those bugchecks where there isn’t a strict debugging methodology and therefore each individual case will need to be examined. In the example crash below, I hope to provide you with some insights in where you should look if you do happen to stumble upon this bugcheck.

PNP_DETECTED_FATAL_ERROR (ca)
PnP encountered a severe error, either as a result of a problem in a driver or
a problem in PnP itself.  The first argument describes the nature of the
problem, the second argument is the address of the PDO.  The other arguments
vary depending on argument 1.
Arguments:
Arg1: 0000000000000002, Invalid PDO
    An API which requires a PDO has been called with either an FDO,
    a PDO which hasn't been initialized yet (returned to PnP in a
    QueryDeviceRelation/BusRelations), or some random piece of
    memory.
Arg2: ffff800e9fce5060, Purported PDO.
Arg3: 0000000000000000, Driver object.
Arg4: 0000000000000000

The first parameter of the bugcheck reveals why we crashed. We appear to have passed an invalid physical device object to an API (function) which requires one. The description then provides us with two hints to why the physical device object may be invalid.

7: kd> knL
# Child-SP          RetAddr           Call Site
00 ffffe90e`7c4411f8 fffff805`536327d5 nt!KeBugCheckEx
01 ffffe90e`7c441200 fffff805`57962d1e nt!IoGetDevicePropertyData+0x144f45 << API in question
02 ffffe90e`7c441250 fffff805`57962cc0 partmgr!PmGetDevicePropertyData+0x52
03 ffffe90e`7c4412c0 fffff805`57949b4d partmgr!PM_DISK::GetPnpProperty+0x10
04 ffffe90e`7c4412f0 fffff805`57949a17 partmgr!SC_DEVICE::Initialize+0x4d
05 ffffe90e`7c441350 fffff805`57962135 partmgr!SC_DISK::Initialize+0x17
06 ffffe90e`7c4413a0 fffff805`57961a17 partmgr!PM_DISK::Initialize+0x3d
07 ffffe90e`7c4413d0 fffff805`57960289 partmgr!PmCollectTelemetry+0x177
08 ffffe90e`7c4415e0 fffff805`57944097 partmgr!PmSendTelemetry+0x51
09 ffffe90e`7c441a50 fffff805`5315a435 partmgr!PmNotificationWorkItem+0x337
0a ffffe90e`7c441b00 fffff805`53025975 nt!IopProcessWorkItem+0x135
0b ffffe90e`7c441b70 fffff805`53117e25 nt!ExpWorkerThread+0x105
0c ffffe90e`7c441c10 fffff805`531fcdd8 nt!PspSystemThreadStartup+0x55
0d ffffe90e`7c441c60 00000000`00000000 nt!KiStartSystemThread+0x28

The call stack reveals the API which was excepting a valid physical device object. If we check the MSDN documentation, we can see that it takes a physical device object as one of its parameters. Let’s dump the stack frame and examine the registers.

7: kd> .frame /r 01
01 ffffe90e`7c441200 fffff805`57962d1e nt!IoGetDevicePropertyData+0x144f45
rax=ffff800e9fce5a68 rbx=ffff800e9fce5060 rcx=00000000000000ca
rdx=0000000000000002 rsi=0000000000000000 rdi=ffffe90e7c441420
rip=fffff805536327d5 rsp=ffffe90e7c441200 rbp=0000000000000002
r8=ffff800e9fce5060  r9=0000000000000000 r10=ffff800e8c181000
r11=fffff0783c1e0000 r12=0000000000000000 r13=0000000000000200
r14=ffff800e9fce5060 r15=ffffe90e7c441880
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0000  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
nt!IoGetDevicePropertyData+0x144f45:
fffff805`536327d5 cc              int     3

As we can see, the device object has been saved in a few different registers which is likely due to the calling convention and compiler optimisations. Nevertheless, we can see that it is passed to the function which caused the crash. Let’s examine the device object and see what we can find.

7: kd> !devobj ffff800e9fce5060
Device object (ffff800e9fce5060) is for:
Cannot read info offset from nt!ObpInfoMaskToOffset
\Driver\tcesd DriverObject ffff800e8c1f0e00
Current Irp 00000000 RefCount 0 Type 00000007 Flags 00000050
Vpb ffff800ea36d7ac0 SecurityDescriptor ffffb9864f15e5e0 DevExt ffff800e9fce51b0 DevObjExt ffff800e9fce5a68
ExtensionFlags (0x00000002)  DOE_DELETE_PENDING
Characteristics (0x00000100)  FILE_DEVICE_SECURE_OPEN
AttachedDevice (Upper) ffff800ea1705960 \Driver\partmgr
Device queue is not busy.

I’ve highlighted the driver responsible for this device object. Let’s keep a note of that. Other than that, the !devobj command doesn’t reveal much useful information. Let’s dump the device object extension structure.

7: kd> dt _DEVOBJ_EXTENSION ffff800e`9fce5a68
nt!_DEVOBJ_EXTENSION
   +0x000 Type             : 0n13
   +0x002 Size             : 0
   +0x008 DeviceObject     : 0xffff800e`9fce5060 _DEVICE_OBJECT
   +0x010 PowerFlags       : 0
   +0x018 Dope             : (null)
   +0x020 ExtensionFlags   : 2
   +0x028 DeviceNode       : (null) << No device node created
   +0x030 AttachedTo       : (null)
   +0x038 StartIoCount     : 0n0
   +0x03c StartIoKey       : 0n0
   +0x040 StartIoFlags     : 0
   +0x048 Vpb              : (null)
   +0x050 DependencyNode   : (null)
   +0x058 InterruptContext : (null)
   +0x060 InterruptCount   : 0n0
   +0x068 VerifierContext  : (null)

Ah! We’ve found the issue! All physical device objects which have initialised correctly should have a device node associated with them, and as we can see, there doesn’t appear to be any device node.

7: kd> lmvm tcesd
Browse full module list
start             end                 module name
fffff805`578b0000 fffff805`5790c000   tcesd    T (no symbols)         
    Loaded symbol image file: tcesd.sys
    Image path: \SystemRoot\system32\drivers\tcesd.sys
    Image name: tcesd.sys
    Browse all global symbols  functions  data
    Timestamp:        Wed Jul  6 23:37:22 2016 (577D8822)
    CheckSum:         0003A81D
    ImageSize:        0005C000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

When a system boots, the PnP Manager will send out a PnP IRP with the minor code of IRP_MN_QUERY_DEVICE_RELATIONS, to check which devices are connected to the all the buses on the motherboard. This enables the PnP Manager to build a device tree of all the device nodes in the system and keep track of who is present and who isn’t. For example, take a USB mouse, when you plug the device into the USB port, the aforementioned IRP will be generated by the PnP Manager to check which device has been connected. And when you eject or remove the mouse, a similar IRP will be sent to state to the PnP Manager that the device has now been removed.

It is the responsibility of the associated device driver to respond appropriately to the request from the PnP Manager. Otherwise, you’ll end up with a Stop 0xCA bugcheck as shown above. The MSDN documentation also states the following:

Warning A device object cannot be passed to any routine that takes a PDO as an argument until the PnP manager creates a device node (devnode) for that object. (If the driver does pass a device object, the system will bug check with Bug Check 0xCA: PNP_DETECTED_FATAL_ERROR.) The PnP manager creates the devnode in response to the IRP_MN_QUERY_DEVICE_RELATIONS request. The driver can safely assume that the PDO’s devnode has been created when it receives an IRP_MN_QUERY_RESOURCE_REQUIREMENTS request.

The Disk Keeper driver is responsible for passing the appropriate PDO pointers back to the PnP Manager, and therefore removing the driver appeared to resolve the issue.

Posted in Windows Internals, Debugging, WinDbg, Stop 0xCA | Leave a comment

Wdf01000!FxVerifierDbgBreakPoint and !wdfkd.wdflogdump

This one of those obscure commands you’ll probably only ever use in very specific circumstances and I happen to only stumble upon this command a few weeks ago. It’s really useful when you want to see why driver has crashed the system, in this case it was quite obvious and the driver had been causing several different bugchecks.

KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffff80000003, The exception code that was not handled
Arg2: fffff80112872950, The address that the exception occurred at
Arg3: 0000000000000000, Parameter 0 of the exception
Arg4: 7efefefefefeff63, Parameter 1 of the exception

As we can see we have a pretty generic Stop 0x1E bugcheck. These are due to exceptions being thrown which either have no handler written for them or are unable to be caught using structured exception handling. For example, a page fault is an example of a exception which can’t be handled by a try-catch statement.

Let’s dump the exception which wasn’t handled:

1: kd> !error ffffffff80000003
Error code: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

Interesting?

Let’s see which function caused the exception.

1: kd> ln fffff80112872950
Browse module
Set bu breakpoint

(fffff801`12872950)   nt!DbgBreakPoint   |  (fffff801`12872960)   nt!DbgUserBreakPoint

If we read the documentation, it states the following:

In kernel mode, a break exception that is not handled will cause a bug check. You can, however, connect a kernel-mode debugger to a target computer that has stopped responding and has kernel debugging enabled.

This explains why we’ve received our bugcheck error, however, this isn’t very useful in understanding why the crash was caused.

Let’s examine the call stack and see what we can find.

1: kd> knL
 # Child-SP          RetAddr           Call Site
00 fffff587`5a22ce58 fffff801`1294cdde nt!KeBugCheckEx
01 fffff587`5a22ce60 fffff801`1287390f nt!KiFatalExceptionHandler+0x22
02 fffff587`5a22cea0 fffff801`126be885 nt!RtlpExecuteHandlerForException+0xf
03 fffff587`5a22ced0 fffff801`126bcf1e nt!RtlDispatchException+0x4a5
04 fffff587`5a22d620 fffff801`1287ca5d nt!KiDispatchException+0x16e
05 fffff587`5a22dcd0 fffff801`12876416 nt!KiExceptionDispatch+0x11d
06 fffff587`5a22deb0 fffff801`12872951 nt!KiBreakpointTrap+0x316 << Actually crash here!
07 fffff587`5a22e048 fffff801`13a2f89b nt!DbgBreakPoint+0x1
08 (Inline Function) --------`-------- Wdf01000!Mx::MxDbgBreakPoint+0x7
09 fffff587`5a22e050 fffff801`13a29e92 Wdf01000!FxVerifierDbgBreakPoint+0x4b
0a fffff587`5a22e090 fffff801`13a6e300 Wdf01000!FxVerifierCheckIrqlLevel+0x1e35a << Verification check
0b fffff587`5a22e0e0 fffff808`344c2e13 Wdf01000!imp_WdfUsbTargetDeviceSendControlTransferSynchronously+0xf0
0c fffff587`5a22e410 fffff808`344c226d sshid+0x2e13 << Problematic driver
0d fffff587`5a22e4d0 fffff801`13a0944d sshid+0x226d
0e (Inline Function) --------`-------- Wdf01000!FxIoQueueIoInternalDeviceControl::Invoke+0x42
0f fffff587`5a22e510 fffff801`13a08eb7 Wdf01000!FxIoQueue::DispatchRequestToDriver+0x16d
10 fffff587`5a22e5b0 fffff801`13a07076 Wdf01000!FxIoQueue::DispatchEvents+0x657
11 (Inline Function) --------`-------- Wdf01000!FxIoQueue::QueueRequest+0x92
12 (Inline Function) --------`-------- Wdf01000!FxPkgIo::DispatchStep2+0x4b6
13 fffff587`5a22e690 fffff801`13a06b2d Wdf01000!FxPkgIo::DispatchStep1+0x536
14 fffff587`5a22e750 fffff801`13a0ac83 Wdf01000!FxPkgIo::Dispatch+0x5d
15 (Inline Function) --------`-------- Wdf01000!DispatchWorker+0x9e
16 (Inline Function) --------`-------- Wdf01000!FxDevice::Dispatch+0xbc
17 fffff587`5a22e7b0 fffff801`128277ba Wdf01000!FxDevice::DispatchWithLock+0x113
18 fffff587`5a22e810 fffff801`1300a0a9 nt!IopfCallDriver+0x56
19 fffff587`5a22e850 fffff801`1289ca25 nt!IovCallDriver+0x275
1a fffff587`5a22e890 fffff808`344d104f nt!IofCallDriver+0x1be925
1b fffff587`5a22e8d0 fffff808`308595df mshidkmdf!HidKmdfPassThrough+0x3f
1c fffff587`5a22e900 fffff808`3085a9ec HIDCLASS!HidpSubmitInterruptRead+0x1af
1d fffff587`5a22e980 fffff801`1300a9d4 HIDCLASS!HidpInterruptReadComplete+0x57c
1e fffff587`5a22ea20 fffff801`1270da09 nt!IovpLocalCompletionRoutine+0x174
1f fffff587`5a22ea80 fffff801`1300a315 nt!IopfCompleteRequest+0x119
20 fffff587`5a22eb90 fffff801`128ae611 nt!IovCompleteRequest+0x1e1
21 fffff587`5a22ec80 fffff801`13a084a8 nt!IofCompleteRequest+0x1a0d51
22 (Inline Function) --------`-------- Wdf01000!FxIrp::CompleteRequest+0xe
23 fffff587`5a22ecb0 fffff801`13a05fec Wdf01000!FxRequest::CompleteInternal+0x228
24 (Inline Function) --------`-------- Wdf01000!FxRequest::Complete+0x31
25 (Inline Function) --------`-------- Wdf01000!FxRequest::CompleteWithInformation+0x3c
26 fffff587`5a22ed40 fffff808`344c52eb Wdf01000!imp_WdfRequestCompleteWithInformation+0x9c
27 fffff587`5a22eda0 fffff808`344c537b sshid+0x52eb
28 fffff587`5a22ede0 fffff808`344c2397 sshid+0x537b
29 fffff587`5a22ee40 fffff801`13a70a68 sshid+0x2397
2a fffff587`5a22eef0 fffff801`13a074d6 Wdf01000!FxUsbPipeContinuousReader::_FxUsbPipeRequestComplete+0x68
2b fffff587`5a22ef70 fffff801`13a0b098 Wdf01000!FxRequestBase::CompleteSubmitted+0xe6
2c (Inline Function) --------`-------- Wdf01000!FxIoTarget::CompleteRequest+0x8
2d (Inline Function) --------`-------- Wdf01000!FxIoTarget::RequestCompletionRoutine+0xa5
2e fffff587`5a22f000 fffff801`127db076 Wdf01000!FxIoTarget::_RequestCompletionRoutine+0xe8
2f fffff587`5a22f070 fffff801`1300a9d4 nt!IopUnloadSafeCompletion+0x56
30 fffff587`5a22f0a0 fffff801`1270da09 nt!IovpLocalCompletionRoutine+0x174
31 fffff587`5a22f100 fffff801`1300a315 nt!IopfCompleteRequest+0x119
32 fffff587`5a22f210 fffff801`128ae611 nt!IovCompleteRequest+0x1e1
33 fffff587`5a22f300 fffff801`13a084a8 nt!IofCompleteRequest+0x1a0d51
34 (Inline Function) --------`-------- Wdf01000!FxIrp::CompleteRequest+0xe
35 fffff587`5a22f330 fffff801`13a07f7b Wdf01000!FxRequest::CompleteInternal+0x228
36 (Inline Function) --------`-------- Wdf01000!FxRequest::Complete+0x31
37 fffff587`5a22f3c0 fffff808`314b7e80 Wdf01000!imp_WdfRequestComplete+0x8b
38 fffff587`5a22f420 fffff808`314b5cf4 USBXHCI!Bulk_Transfer_CompleteCancelable+0x19c
39 fffff587`5a22f480 fffff808`314b4460 USBXHCI!Bulk_ProcessTransferEventWithED1+0x464
3a fffff587`5a22f540 fffff808`314a98d3 USBXHCI!Bulk_EP_TransferEventHandler+0x10
3b fffff587`5a22f570 fffff808`3149ba4f USBXHCI!TR_TransferEventHandler+0x17
3c fffff587`5a22f5a0 fffff808`314bc2f2 USBXHCI!Endpoint_TransferEventHandler+0x167
3d fffff587`5a22f620 fffff808`3149ea3c USBXHCI!UsbDevice_TransferEventHandler+0xa6
3e fffff587`5a22f680 fffff808`3149f55c USBXHCI!Interrupter_DeferredWorkProcessor+0x578
3f fffff587`5a22f780 fffff801`13a026ad USBXHCI!Interrupter_WdfEvtInterruptDpc+0xc
40 (Inline Function) --------`-------- Wdf01000!FxInterrupt::DpcHandler+0x70
41 fffff587`5a22f7b0 fffff801`1276996a Wdf01000!FxInterrupt::_InterruptDpcThunk+0x9d
42 fffff587`5a22f7f0 fffff801`12768fbf nt!KiExecuteAllDpcs+0x30a
43 fffff587`5a22f930 fffff801`1286e64e nt!KiRetireDpcList+0x1ef
44 fffff587`5a22fb60 00000000`00000000 nt!KiIdleLoop+0x7e

We can see where we crashed and also a problematic driver which was called beforehand. The most interesting calls in the stack are Wdf01000!FxVerifierCheckIrqlLevel and Wdf01000!FxVerifierDbgBreakPoint. So, we now know why the breakpoint was called, it’s used by the FxVerifierCheckIrqlLevel function to enable a debugger to break in and allow the developer to see why their driver has crashed. Obviously, no debugger was attached at the time of the crash hence why the breakpoint exception is unhandled.

We can assume that the driver has failed the verification check hence why the breakpoint being called, although, we can confirm this for definite by using !wdflogdump extension command.

1: kd> !wdflogdump sshid
Trace searchpath is: 

Trace format prefix is: %7!u!: %!FUNC! - 
Trying to extract TMF information from - C:\ProgramData\dbg\sym\Wdf01000.pdb\066683F62DF745C1C6CD389F00F53E281\Wdf01000.pdb
Gather log: Please wait, this may take a moment (reading 4024 bytes).
% read so far ... 10, 20, 30, 40, 50, 100
There are 39 log entries
--- start of log ---
1: FxIFRStart - FxIFR logging started
2: LockVerifierSection - Increment Lock counter (3) for Verifier Paged Memory from  \REGISTRY\MACHINE\SYSTEM\ControlSet001\Services\sshid from driver globals FFFFE406F770EDE0
3: FxVerifierLock::InitializeLockOrder - Object Type 0x1036 does not have a lock order defined in fx\inc\FxVerifierLock.hpp
4: FxVerifierLock::InitializeLockOrder - Object Type 0x1036 does not have a lock order defined in fx\inc\FxVerifierLock.hpp
5: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering PnP State WdfDevStatePnpInit from WdfDevStatePnpObjectCreated
6: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000000(IRP_MN_START_DEVICE) IRP 0xFFFFE406F708C010
7: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering PnP State WdfDevStatePnpInitStarting from WdfDevStatePnpInit
8: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering PnP State WdfDevStatePnpHardwareAvailable from WdfDevStatePnpInitStarting
9: FxVerifierLock::InitializeLockOrder - Object Type 0x1204 does not have a lock order defined in fx\inc\FxVerifierLock.hpp
10: FxIoTarget::SubmitLocked - ignoring WDFIOTARGET 00001BF908E50288 state, sending WDFREQUEST FFFFF5875A4F6AF0, state WdfIoTargetStarted
11: FxPkgPnp::NotPowerPolicyOwnerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering not power policy owner state WdfDevStatePwrPolStarting from WdfDevStatePwrPolObjectCreated
12: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering Power State WdfDevStatePowerStartingCheckDeviceType from WdfDevStatePowerObjectCreated
13: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering Power State WdfDevStatePowerD0Starting from WdfDevStatePowerStartingCheckDeviceType
14: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering Power State WdfDevStatePowerD0StartingConnectInterrupt from WdfDevStatePowerD0Starting
15: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering Power State WdfDevStatePowerD0StartingDmaEnable from WdfDevStatePowerD0StartingConnectInterrupt
16: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering Power State WdfDevStatePowerD0StartingStartSelfManagedIo from WdfDevStatePowerD0StartingDmaEnable
17: FxPkgIo::ResumeProcessingForPower - Power resume all queues of WDFDEVICE 0x00001BF908FB2F68
18: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering Power State WdfDevStatePowerDecideD0State from WdfDevStatePowerD0StartingStartSelfManagedIo
19: FxPkgPnp::PowerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering Power State WdfDevStatePowerD0 from WdfDevStatePowerDecideD0State
20: FxPkgPnp::NotPowerPolicyOwnerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering not power policy owner state WdfDevStatePwrPolStarted from WdfDevStatePwrPolStarting
21: FxPkgPnp::NotPowerPolicyOwnerEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering not power policy owner state WdfDevStatePwrPolStartingSucceeded from WdfDevStatePwrPolStarted
22: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering PnP State WdfDevStatePnpEnableInterfaces from WdfDevStatePnpHardwareAvailable
23: FxPkgPnp::PnpEnterNewState - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 entering PnP State WdfDevStatePnpStarted from WdfDevStatePnpEnableInterfaces
24: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000014(IRP_MN_QUERY_PNP_DEVICE_STATE) IRP 0xFFFFE406F7081010
25: FxPkgFdo::HandleQueryPnpDeviceStateCompletion - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810 returning PNP_DEVICE_STATE 0x0 IRP 0xFFFFE406F7081010
26: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type 0xFFFFFFFF IRP 0xFFFFE406F7052010
27: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type 0xFFFFFFFF IRP 0xFFFFE406F7051010
28: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type TargetDeviceRelation IRP 0xFFFFE406F704A010
29: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type BusRelations IRP 0xFFFFE406F79D8010
30: FxPkgPnp::HandleQueryBusRelations - WDFDEVICE 00001BF908FB2F68 returning 3 devices in relations FFFF820D1A8B3690
31: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type BusRelations IRP 0xFFFFE406F708C010
32: FxPkgPnp::HandleQueryBusRelations - WDFDEVICE 00001BF908FB2F68 returning 3 devices in relations FFFF820D1A8B4610
33: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type BusRelations IRP 0xFFFFE406F7090010
34: FxPkgPnp::HandleQueryBusRelations - WDFDEVICE 00001BF908FB2F68 returning 3 devices in relations FFFF820D1551BAD0
35: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type BusRelations IRP 0xFFFFE406F710B7D0
36: FxPkgPnp::HandleQueryBusRelations - WDFDEVICE 00001BF908FB2F68 returning 3 devices in relations FFFF820D1551BC10
37: FxPkgPnp::Dispatch - WDFDEVICE 0x00001BF908FB2F68 !devobj 0xFFFFE406F7071810, IRP_MJ_PNP, 0x00000007(IRP_MN_QUERY_DEVICE_RELATIONS) type BusRelations IRP 0xFFFFE406F7088560
38: FxPkgPnp::HandleQueryBusRelations - WDFDEVICE 00001BF908FB2F68 returning 3 devices in relations FFFF820D1551C5D0
39: FxVerifierCheckIrqlLevel - Called at wrong IRQL; at level 2, should be at level 0

As we can see, the HandleQueryBusRelations function was called at IRQL Level 2 rather than Level 0, which means that the driver has failed the IRQL verification check.

1: kd> !irql
Debugger saved IRQL for processor 0x1 -- 2 (DISPATCH_LEVEL)

The solution to the issue is to update or remove the Steel Series keyboard driver:

1: kd> lmvm sshid
Browse full module list
start             end                 module name
fffff808`344c0000 fffff808`344cd000   sshid      (no symbols)           
    Loaded symbol image file: sshid.sys
    Image path: \SystemRoot\System32\drivers\sshid.sys
    Image name: sshid.sys
    Browse all global symbols  functions  data
    Timestamp:        Fri Jul 24 22:14:14 2020 (5F1B4F26)
    CheckSum:         000177F9
    ImageSize:        0000D000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

References:

verifier & wdf
DbgBreakPoint function (wdm.h) – Windows drivers
https://github.com/microsoft/Window…framework/shared/irphandlers/pnp/fxpkgpnp.cpp
microsoft/Windows-Driver-Frameworks

Posted in Windows Internals, Debugging, WinDbg, Computer Science | Leave a comment