Windows Address Translation Deep Dive – Part 3

The previous two parts have discussed address translation on an operating system agnostic hardware level. This part is going to focus on the translation mechanisms used almost exclusively with the Windows operating system.

Since processes will generally reserve large portions of memory which may never be touched, it doesn’t make sense to create page tables and consume physical memory which is not being used, therefore the Memory Manager will consult a AVL tree structure called the VAD (virtual address descriptor) tree. This is defined for each process, with each node of the tree being constructed when a process reserves or commits memory, generally this is achieved using VirtualAlloc.

How are VADs used in conjunction with address translation then? When thread attempts to access a given linear address, the Memory Manager will consult the VAD tree for the process which the thread belongs to, it will then find the corresponding VAD by checking the address range of each node and if the linear address falls within it for a given node, the protection bits for the node will be examined and the PTE control bits will then be constructed. This then enables the page fault to be successfully resolved.

The !vad debugger extension command can be used to dump the VAD tree for a given process:

0: kd> !vad
VAD             Level         Start             End              Commit
ffff838108005280  4              10              1f               0 Mapped       READWRITE          Pagefile section, shared commit 0x10
ffff83810800c9e0  5              20              20               0 Mapped       READONLY           Pagefile section, shared commit 0x1
ffff838108006d60  3              30              4c               0 Mapped       READONLY           Pagefile section, shared commit 0x1d
ffff83810c5a5050  4              50             14f               7 Private      READWRITE          
ffff838104845a60  2             150             153               0 Mapped       READONLY           Pagefile section, shared commit 0x4
ffff838104847400  5             160             160               0 Mapped       READONLY           Pagefile section, shared commit 0x1
ffff83810c5a2710  4             170             171               2 Private      READWRITE          
ffff83810800cee0  5             180             180               0 Mapped       READONLY           Pagefile section, shared commit 0x1
ffff83810800c620  3             190             190               0 Mapped       READONLY           Pagefile section, shared commit 0x1
ffff83810800e9c0  6             1a0             1a7               0 Mapped       READONLY           Pagefile section, shared commit 0x21
[...]

Alternatively, you can find the VAD root through the _EPROCESS structure, the root address is stored in the VadRoot field.

0: kd> dt _EPROCESS -y Vad ffff8381083dc0c0
nt!_EPROCESS
   +0x7d8 VadRoot : _RTL_AVL_TREE
   +0x7e0 VadHint : 0xffff8381`0c27d230 Void
   +0x7e8 VadCount : 0xb2
   +0x7f0 VadPhysicalPages : 0
   +0x7f8 VadPhysicalPagesLimit : 0
   +0x87c VadTrackingDisabled : 0y0

The VadCount field contains the number of nodes in the VAD tree. The VadRoot contains a pointer to the _RTL_AVL_TREE structure which acts as a wrapper structure for _RTL_BALANCED_NODE which in turn can then be used to transverse the rest of the tree.

Each VAD is represented by the _MMVAD structure as shown below:

0: kd> dt _MMVAD
nt!_MMVAD
   +0x000 Core             : _MMVAD_SHORT
   +0x040 u2               : <anonymous-tag>
   +0x048 Subsection       : Ptr64 _SUBSECTION
   +0x050 FirstPrototypePte : Ptr64 _MMPTE
   +0x058 LastContiguousPte : Ptr64 _MMPTE
   +0x060 ViewLinks        : _LIST_ENTRY
   +0x070 VadsProcess      : Ptr64 _EPROCESS
   +0x078 u4               : <anonymous-tag>
   +0x080 FileObject       : Ptr64 _FILE_OBJECT

The Core field contains a pointer to a another structure called _MMVAD_SHORT which contains the address range of the VAD which is shown by the StartingVpn and EndingVpn fields. The address is actually given as virtual page numbers which are then used to calculate the start and end virtual addresses. This is what the Start and End columns of the !vad output are referring to.

0: kd> dt _MMVAD_SHORT
nt!_MMVAD_SHORT
   +0x000 NextVad          : Ptr64 _MMVAD_SHORT
   +0x008 ExtraCreateInfo  : Ptr64 Void
   +0x000 VadNode          : _RTL_BALANCED_NODE
   +0x018 StartingVpn      : Uint4B
   +0x01c EndingVpn        : Uint4B
   +0x020 StartingVpnHigh  : UChar
   +0x021 EndingVpnHigh    : UChar
   +0x022 CommitChargeHigh : UChar
   +0x023 SpareNT64VadUChar : UChar
   +0x024 ReferenceCount   : Int4B
   +0x028 PushLock         : _EX_PUSH_LOCK
   +0x030 u                : <anonymous-tag>
   +0x034 u1               : <anonymous-tag>
   +0x038 EventList        : Ptr64 _MI_VAD_EVENT_BLOCK

The VadNode field refers to the AVL tree node; the ReferenceCount is the number of references to this VAD and the PushLock field contains the push lock which controls write access to the VAD structure. Moreover, the union type specified at offset 0x30 in the above structure provides the protection flags of the VAD, using the example from earlier, we can see:

0: kd> dt _MMVAD_FLAGS ffff838108005280+0x30
nt!_MMVAD_FLAGS
   +0x000 Lock             : 0y0
   +0x000 LockContended    : 0y0
   +0x000 DeleteInProgress : 0y0
   +0x000 NoChange         : 0y0
   +0x000 VadType          : 0y000
   +0x000 Protection       : 0y00100 (0x4)
   +0x000 PreferredNode    : 0y000000 (0)
   +0x000 PageSize         : 0y00
   +0x000 PrivateMemory    : 0y0

This corresponds to PAGE_READWRITE which means that the page can be written and read from. The most common protection flags are listed below:

  • PAGE_READONLY (0x1)
  • PAGE_EXECUTE_READ (0x3)
  • PAGE_READWRITE (0x4)
  • PAGE_EXECUTE_READWRITE (0x6)

The PrivateMemory field indicates the allocation type associated with the address range, this can be either private to the given process or an address range which is shared between multiple processes, including memory-mapped files. This brings to the other purpose of using VADs: to allow the management of memory-mapped files and shared memory.

Memory-mapped files enable a given process to access a file stored on the disk as if it were a region of memory by mapping the bytes of the file into the address space of the said process. This in turn reduces disk I/O and simplifies access to the file by allowing the process to use pointers to reference parts of the file (file-mapped I/O). As mentioned before, to manage the regions of memory owned by a process, when the MapFileOfView function is called, the Memory Manager will create a VAD for that given address range. As mentioned previously, physical memory and PTEs will only be allocated once a page fault has occurred on an address within the specified address range.

Moreover, it isn’t uncommon for mapped files to be shared between multiple processes, which enables those processes to access the same physical file. In Windows, shared mappings are represented by a structure which is known as the _SECTION (section object), each process will then create a view of that mapping which – depending on the permissions – will enable the process to read or write to that mapping.

When a process writes to this shared page of memory, the changes aren’t immediately flushed to the disk, instead they will be buffered and flushed when the view is closed or when a process calls FlushViewOfFile. It should also be noted that a mapped view does not always refer to a file but can be a portion of a page. This distinction leads to what is known as page-file backed views and file-backed views, although, there is an important caveat to mention here, any changes made to a page-file backed view will be automatically discarded once the underlying section object has been destroyed.

It’s important to mention that despite the same physical page being shared between multiple processes, each process’ view may create an entirely different virtual address range, therefore you shouldn’t expect the address range for each view of the same section object to start and end at the same point. Furthermore, each view must be independent of each other, in that they aren’t able to overlap, thus the offset and size passed to MapFileOfView function must be for a free memory region.

Let’s examine a section object:

0: kd> dt nt!_SECTION
   +0x000 SectionNode      : _RTL_BALANCED_NODE
   +0x018 StartingVpn      : Uint8B
   +0x020 EndingVpn        : Uint8B
   +0x028 u1               : <anonymous-tag>
   +0x030 SizeOfSection    : Uint8B
   +0x038 u                : <anonymous-tag>
   +0x03c InitialPageProtection : Pos 0, 12 Bits
   +0x03c SessionId        : Pos 12, 19 Bits
   +0x03c NoValidationNeeded : Pos 31, 1 Bit

The SectionNode, StartingVpn and EndingVpn describe the address range which this mapping corresponds to. The SizeOfSection is the size of the mapping in bytes and the InitialPageProtection simply describes what access is permitted with that mapping much like what was described with VADs. The union at offset 0x38 is a special structure called _MMSECTION_FLAGS which describes particular features about the mapping.

0: kd> dt _MMSECTION_FLAGS
nt!_MMSECTION_FLAGS
   +0x000 BeingDeleted     : Pos 0, 1 Bit
   +0x000 BeingCreated     : Pos 1, 1 Bit
   +0x000 BeingPurged      : Pos 2, 1 Bit
   +0x000 NoModifiedWriting : Pos 3, 1 Bit
   +0x000 FailAllIo        : Pos 4, 1 Bit
   +0x000 Image            : Pos 5, 1 Bit
   +0x000 Based            : Pos 6, 1 Bit
   +0x000 File             : Pos 7, 1 Bit
   +0x000 AttemptingDelete : Pos 8, 1 Bit
   +0x000 PrefetchCreated  : Pos 9, 1 Bit
   +0x000 PhysicalMemory   : Pos 10, 1 Bit
   +0x000 ImageControlAreaOnRemovableMedia : Pos 11, 1 Bit
   +0x000 Reserve          : Pos 12, 1 Bit
   +0x000 Commit           : Pos 13, 1 Bit
   +0x000 NoChange         : Pos 14, 1 Bit
   +0x000 WasPurged        : Pos 15, 1 Bit
   +0x000 UserReference    : Pos 16, 1 Bit
   +0x000 GlobalMemory     : Pos 17, 1 Bit
   +0x000 DeleteOnClose    : Pos 18, 1 Bit
   +0x000 FilePointerNull  : Pos 19, 1 Bit
   +0x000 PreferredNode    : Pos 20, 6 Bits
   +0x000 GlobalOnlyPerSession : Pos 26, 1 Bit
   +0x000 UserWritable     : Pos 27, 1 Bit
   +0x000 SystemVaAllocated : Pos 28, 1 Bit
   +0x000 PreferredFsCompressionBoundary : Pos 29, 1 Bit
   +0x000 UsingFileExtents : Pos 30, 1 Bit
   +0x000 PageSize64K      : Pos 31, 1 Bit

There’s a few flags of interest here: Image, File and Based. The Based field indicates wherever the section object has a direct correspondence with the offsets of the file stored on the disk. To clarify, if I accessed the file through its section object, then I should expect to find the same data at n offset in memory, as if it were the contents were on the physical disk. This rule is followed by file views which are backed by data files.

The first two flags indicate wherever a file-backed mapping is for an ordinary data file or an executable image (SEC_IMAGE is set when calling CreateFileMapping), this distinction is important because it leads to the introduction of another structure which hasn’t been discussed yet and that is _SECTION_OBJECT_POINTERS. This structure is part of the _FILE_OBJECT structure and is used to indicate which type of control area is being used for the mapping.

0: kd> dt _SECTION_OBJECT_POINTERS
nt!_SECTION_OBJECT_POINTERS
   +0x000 DataSectionObject : Ptr64 Void
   +0x008 SharedCacheMap   : Ptr64 Void
   +0x010 ImageSectionObject : Ptr64 Void

The simplest method to find a set of control areas is to use the !ca debugger command, like so:

0: kd> !ca

Scanning large pool allocation table for tag 0x61436d4d (MmCa) (ffff808f91010000 : ffff808f91190000)

ffff808f8dd43000 0000000000000000 1182     0 Pagefile-backed section
ffff808fa55bc000 0000000000000000 0        0 Pagefile-backed section

Searching nonpaged pool (ffff808000000000 : ffff908000000000) for tag 0x61436d4d (MmCa)

[...]

Scanning large pool allocation table for tag 0x69436d4d (MmCi) (ffff808f91010000 : ffff808f91190000)

ffff808fb16f0d40 00007ffff1e00000 0        0 Image: \Windows\SystemApps\MicrosoftWindows.Client.CBS_cw5n1h2txyewy\InputApp.dll
ffff808fb3cebaf0 00007fffe21b0000 0        0 Image: \Riot Games\VALORANT\live\Engine\Binaries\ThirdParty\CEF3\Win64\libGLESv2.dll
ffff808fb0065b50 0000000072ca0000 0        0 Image: \Windows\SysWOW64\sppc.dll
ffff808facbd1200 fffff80358a40000 0        0 Image: \Windows\System32\drivers\rassstp.sys
ffff808facbd1580 fffff8034b950000 0        0 Image: \Windows\System32\drivers\ndproxy.sys
ffff808facbd1900 fffff80348f80000 0        0 Image: \Windows\System32\drivers\mrxsmb20.sys
ffff808fad4d7c10 00007ff821b90000 0        0 Image: \Windows\System32\dwmredir.dll
ffff808fa59990b0 00007ff817940000 0        0 Image: \Windows\System32\ProximityServicePal.dll
0: kd> !ca ffff808fb3cebaf0

ControlArea  @ ffff808fb3cebaf0
  Segment      ffffbe84dc30add0  Flink      ffff808fb9d79900  Blink        ffff808fabbebcd0
  Section Ref                 0  Pfn Ref                 122  Mapped Views                2
  User Ref                    2  WaitForDel                0  Flush Count              bd68
  File Object  ffff808fb3b874e0  ModWriteCount             0  System Views             1da7
  WritableRefs           40003d  PartitionId                0  
  Flags (a0) Image File 

      \Riot Games\VALORANT\live\Engine\Binaries\ThirdParty\CEF3\Win64\libGLESv2.dll

Segment @ ffffbe84dc30add0
  ControlArea       ffff808fb3cebaf0  BasedAddress  00007fffe21b0000
  Total Ptes                     3cb
  Segment Size                3cb000  Committed                    0
  Image Commit                     f  Image Info    ffffbe84dc30ae18
  ProtoPtes         ffffbe84db2df000
  Flags (8e0000) ProtectionMask 

Subsection 1 @ ffff808fb3cebb70
  ControlArea  ffff808fb3cebaf0  Starting Sector        0  Number Of Sectors    2
  Base Pte     ffffbe84db2df000  Ptes In Subsect        1  Unused Ptes          0
  Flags                       2  Sector Offset          0  Protection           1

By examining the _CONTROL_AREA structure we can see a few other important structures which we haven’t covered thus far, one of these is the _MAPPED_FILE_SEGMENT (page-backed sections use _SEGMENT) structure which is allocated from paged pool and is used to keep track of the prototype PTEs which are used to keep track of this mapping. These can be found as part of an array in the ProtoPtes field. So, what are prototype PTEs and why are they necessary?

Since multiple processes can be sharing the same page, it’s possible for the state of that page in the working set of each process to differ – we’ll cover the working set in the next post of this series – and therefore cause a plethora of problems. For example, if a single PTE was to be used amongst multiple processes, if the page were to be removed from one working set, then it would removed from all the working sets of the processes which are using that page, thus invalidating the page in physical memory as well.

A prototype PTE is very much the same as usual hardware PTE, with one key difference: a prototype PTE doesn’t point to a page and is not used directly in address translation, but rather it points to the “real” PTE which has the actual page state of the page.

0: kd> dt _MMPTE_PROTOTYPE
nt!_MMPTE_PROTOTYPE
   +0x000 Valid            : Pos 0, 1 Bit
   +0x000 DemandFillProto  : Pos 1, 1 Bit
   +0x000 HiberVerifyConverted : Pos 2, 1 Bit
   +0x000 ReadOnly         : Pos 3, 1 Bit
   +0x000 SwizzleBit       : Pos 4, 1 Bit
   +0x000 Protection       : Pos 5, 5 Bits
   +0x000 Prototype        : Pos 10, 1 Bit
   +0x000 Combined         : Pos 11, 1 Bit
   +0x000 Unused1          : Pos 12, 4 Bits
   +0x000 ProtoAddress     : Pos 16, 48 Bits

You may noticed that each control area consists of a number of _SUBSECTION structures, there is typically only subsection for page-file backed and file-backed sections which are data files. If the file were to grow beyond the initial subsection size, then additional subsections will be automatically allocated to that section object in order to accommodate this. On the other hand, with executable images, a subsection is created for each section of the PE format.

0: kd> dt _SUBSECTION
nt!_SUBSECTION
   +0x000 ControlArea      : Ptr64 _CONTROL_AREA
   +0x008 SubsectionBase   : Ptr64 _MMPTE
   +0x010 NextSubsection   : Ptr64 _SUBSECTION
   +0x018 GlobalPerSessionHead : _RTL_AVL_TREE
   +0x018 CreationWaitList : Ptr64 _MI_CONTROL_AREA_WAIT_BLOCK
   +0x018 SessionDriverProtos : Ptr64 _MI_PER_SESSION_PROTOS
   +0x020 u                : <anonymous-tag>
   +0x024 StartingSector   : Uint4B
   +0x028 NumberOfFullSectors : Uint4B
   +0x02c PtesInSubsection : Uint4B
   +0x030 u1               : <anonymous-tag>
   +0x034 UnusedPtes       : Pos 0, 30 Bits
   +0x034 ExtentQueryNeeded : Pos 30, 1 Bit
   +0x034 DirtyPages       : Pos 31, 1 Bit

The ControlArea field merely points back to the control area which this subsection belongs to; the NextSubsection field contains the next sibling subsection and the SubsectionBase contains first prototype PTE for the subsection which is technically an _MMPTE_SUBSECTION, although, there is no real difference between that and _MMPTE_PROTOTYPE. The DirtyPages flag indicates wherever the subsection contains any data which needs to be discarded or flushed back to the disk.

Now, we’ve covered what subsections are, you may have noticed that the VAD structure contains a pointer to a subsection structure too, the Subsection property in _MMVAD is the first subsection which the VAD range corresponds to, if the VAD is being used against a memory-mapped file.

In the next post of this series, we’ll start looking at working sets and page states.

References:

https://flylib.com/books/en/3.169.1.64/1

https://www.codemachine.com/articles/prototype_ptes.html

https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/section-objects-and-views

What Makes It Page? The Windows 7 (x64) Virtual Memory Manager

Posted in WinDbg, Windows Internals | Leave a comment

Debugging Stop 0x157 – KERNEL_THREAD_PRIORITY_FLOOR_VIOLATION

KERNEL_THREAD_PRIORITY_FLOOR_VIOLATION (157)
An illegal operation was attempted on the priority floor of a particular
thread.
Arguments:
Arg1: ffffdd87dcb020c0, The address of the thread
Arg2: 000000000000000b, The target priority value (11)
Arg3: 0000000000000002, The priority counter for the target priority underflowed
Arg4: 0000000000000000, Reserved

There isn’t particularly much to be said about this bugcheck other than it is related to the scheduler not handling the priority level of a given thread properly. There seems to be three different conditions in which it can be triggered: the priority floor count underflows, the priority floor count overflows and the priority floor count is some invalid value.

I haven’t been able to find much information on what the fields PriorityFloorCounts and PriorityFloorSummary exactly correspond to, however, it is worth noting that the array appears to be exact same length as the number of priority levels available which is 0 to 31.

1: kd> dt _KTHREAD -y Priority ffffdd87dcb020c0
nt!_KTHREAD
   +0x0c3 Priority : 8 ''
   +0x234 PriorityDecrement : 0 ''
   +0x338 PriorityFloorCounts : [32]  ""
   +0x358 PriorityFloorSummary : 0

If we examine the call stack, we can see at one point the thread’s priority was boosted using AutoBoost, likely due to the acquisition of lock during a registry operation, this boost was then removed in order to allow the thread to return to a normal priority level: the base priority. It’s important to note here, that a thread’s priority isn’t allowed to drop the base priority level, which is a combination of the priority class and the thread priority.

1: kd> !dpx
Start memory scan  : 0xffff808b31f96ad8 ($csp)
End memory scan    : 0xffff808b31f98000 (Kernel Stack Base)

               rsp : 0xffff808b31f96ad8 : 0xfffff807198bc0e2 : nt!KiAbThreadUnboostCpuPriority+0x99a
0xffff808b31f96ad8 : 0xfffff807198bc0e2 : nt!KiAbThreadUnboostCpuPriority+0x99a
0xffff808b31f96b18 : 0xfffff8071982dc70 : nt!SepAccessCheck+0x330
0xffff808b31f96bf8 : 0xfffff807199251ea : nt!SepMandatoryIntegrityCheck+0x55a
0xffff808b31f96c98 : 0xfffff8071982c9e8 : nt!SeAccessCheckWithHint+0x688
0xffff808b31f96d38 : 0xfffff8071992f51f : nt!ExpAllocatePoolWithTagFromNode+0x5f
0xffff808b31f96e58 : 0xfffff807198bb701 : nt!KiAbThreadRemoveBoostsSlow+0x39
0xffff808b31f96e88 : 0xfffff8071982c0f3 : nt!KeAbPostRelease+0x1d3
0xffff808b31f96ec8 : 0xfffff8071a0f35ac : nt!CmpUnlockKcb+0x5c
0xffff808b31f972f8 : 0xfffff8071982aa00 : nt!CmpIsRegistryLockAcquired+0x40
0xffff808b31f97348 : 0xfffff80719cc4c25 : nt!CmpParseKey+0x2e5
0xffff808b31f97528 : 0xfffff80719cc4940 : nt!CmpParseKey
0xffff808b31f97540 : 0xfffff80719cc4901 : nt!ExpLookupHandleTableEntry+0x11
0xffff808b31f975e8 : 0xfffff8071992e889 : nt!RtlpHpFreeHeap+0x159
0xffff808b31f97618 : 0xfffff80719ccd0b3 : nt!ObpCaptureObjectCreateInformation+0x1e3
0xffff808b31f976c8 : 0xfffff80719ccb192 : nt!ObOpenObjectByNameEx+0x1f2
0xffff808b31f97758 : 0xfffff8071a2143e0 : nt!CallbackListHead
0xffff808b31f977f8 : 0xfffff80719cc0c94 : nt!CmOpenKey+0x2c4
0xffff808b31f97a48 : 0xfffff80719d18d18 : nt!NtOpenKeyEx+0x48
0xffff808b31f97a98 : 0xfffff80719a2d938 : nt!KiSystemServiceCopyEnd+0x28
0xffff808b31f97aa0 : 0xffffdd87dcb020c0 :  Trap @ ffff808b31f97aa0
1: kd> dt _EPROCESS -y Priority ffffdd87`d5fe20c0
nt!_EPROCESS
   +0x5b7 PriorityClass : 0x2 ''

As we can see, the process has a NORMAL_PRIORITY_CLASS set and the thread has a priority level of 8, therefore we can conclude that the base priority level is 8, as also shown here:

1: kd> dt _KTHREAD -y Base ffffdd87dcb020c0
nt!_KTHREAD
   +0x233 BasePriority : 8 ''

In addition to the aforementioned, there is two other priority floor fields in the _KTHREAD structure which appear to be undocumented:

1: kd> dt _KTHREAD -y SchedulerAssistPriorityFloor ffffdd87dcb020c0
nt!_KTHREAD
   +0x400 SchedulerAssistPriorityFloor : 0n0

1: kd> dt _KTHREAD -y RealtimePriorityFloor ffffdd87dcb020c0
nt!_KTHREAD
   +0x404 RealtimePriorityFloor : 0n32

In conclusion, from the limited findings, I can only suggest that somehow an internal counter was decremented too many times when the priority boost managed by AutoBoost was being removed. I doubt that this is a driver issue, it seems more likely either to some form of hardware malfunction or a bug in the AutoBoost library.

References:

https://learn.microsoft.com/en-us/windows/win32/procthread/priority-inversion

https://learn.microsoft.com/en-us/windows/win32/procthread/priority-boosts

https://learn.microsoft.com/en-gb/windows/win32/procthread/scheduling-priorities

Posted in Debugging, Stop 0x157, WinDbg | Leave a comment

Debugging Stop 0xC9 – DRIVER_VERIFIER_IOMANAGER_VIOLATION Part 2

DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)
The IO manager has caught a misbehaving driver.
Arguments:
Arg1: 0000000000000007, IRP passed to IoCompleteRequest still has cancel routine set
Arg2: fffff801007923a0, the cancel routine pointer
Arg3: ffffad07eff109a0, the IRP
Arg4: 0000000000000000

This bugcheck is a simple one to solve – all Driver Verifier bugchecks are – and simply means that a driver has attempted to complete an IRP which has had a cancellation routine set. The bugcheck will be thrown if I/O Verification is enabled in Driver Verifier and a driver is found to do the following:

Passes of an IRP to IoCompleteRequest that contains invalid status or that still has a cancel routine set

You can find the cancellation routine by either using the ln command with the second parameter or by by examining the CancelRoutine field of the _IRP structure as shown below:

0: kd> dt nt!_IRP -y CancelRoutine ffffad07eff109a0
   +0x068 CancelRoutine : 0xfffff801`007923a0     void  xusb22!WaitAndUnfilteredRequestCancelRoutine+0

This CancelRoutine field is set by using the IoSetCancelRoutine function, however, this field must be cleared and set to NULL to indicate that this IRP is no longer considered as cancellable when completing an IRP. Any failure to do so, can lead to race conditions and other fatal errors which can be difficult to debug. Moreover, as already shown, Driver Verifier will “fail” your driver if you were to do this while the driver is being verified.

0: kd> knL
 # Child-SP          RetAddr               Call Site
00 fffff802`6da0fc28 fffff802`6b8dd3d1     nt!KeBugCheckEx
01 fffff802`6da0fc30 fffff802`6b8d154c     nt!VerifierBugCheckIfAppropriate+0x14d
02 fffff802`6da0fcd0 fffff802`6b261eb5     nt!IovCompleteRequest+0xc0 (3)
03 fffff802`6da0fdc0 fffff801`00792697     nt!IofCompleteRequest+0x11cb65 (2)
04 fffff802`6da0fdf0 fffff801`0078fa0e     xusb22!XInputControllerDevice::cleanupWaitingRequests+0xcf (1)
05 fffff802`6da0fe30 fffff801`00786848     xusb22!XInputControllerDevice::~XInputControllerDevice+0x96
06 fffff802`6da0fe70 fffff801`007872ed     xusb22!GamepadInformation::GamepadDisconnected+0x64
07 fffff802`6da0fea0 fffff801`00787135     xusb22!GamepadInformation::ProcessStatusReport+0x85
08 fffff802`6da0ff30 fffff801`00786424     xusb22!GamepadInformation::ProcessReport+0x6d
09 fffff802`6da0ff70 fffff802`6c21b28c     xusb22!`anonymous namespace'::BusReadComplete+0x44
0a fffff802`6da0ffa0 fffff802`6c1c6e26     Wdf01000!FxUsbPipeContinuousReader::_FxUsbPipeRequestComplete+0x5c
0b fffff802`6da10000 fffff802`6c1c6b5a     Wdf01000!FxRequestBase::CompleteSubmitted+0xba
0c (Inline Function) --------`--------     Wdf01000!FxIoTarget::CompleteRequest+0x8
0d fffff802`6da10040 fffff802`6c1c7285     Wdf01000!FxIoTarget::RequestCompletionRoutine+0xba
0e fffff802`6da100a0 fffff802`6b0bbad6     Wdf01000!FxIoTarget::_RequestCompletionRoutine+0x35
[...]

If we examine the call stack, we can see that the xusb22.sys driver, also known as the Xbox 360 Controller – calls CleanupWaitingRequests (1) which subsequently leads to IofCompleteRequest being called (2), and since Driver Verifier is running with I/O Verification, IovCompleteRequest is called which completes the check and then throws the Stop 0xC9 bugcheck.

0: kd> ub fffff801`00792697
xusb22!XInputControllerDevice::cleanupWaitingRequests+0xac:
fffff801`00792674 eb26            jmp     xusb22!XInputControllerDevice::cleanupWaitingRequests+0xd4 (fffff801`0079269c)
fffff801`00792676 488b4718        mov     rax,qword ptr [rdi+18h]
fffff801`0079267a 33d2            xor     edx,edx
fffff801`0079267c c74030b60200c0  mov     dword ptr [rax+30h],0C00002B6h
fffff801`00792683 488b4718        mov     rax,qword ptr [rdi+18h]
fffff801`00792687 4883603800      and     qword ptr [rax+38h],0
fffff801`0079268c 488b4f18        mov     rcx,qword ptr [rdi+18h]
fffff801`00792690 48ff15d9c90000  call    qword ptr [xusb22!_imp_IofCompleteRequest (fffff801`0079f070)]

The IRP itself was related to an IOCTL request, however, if we inspect some of the disassembly for the responsible function, then we can see the IO_STATUS_BLOCK structure (which is part of the IRP using the IoStatus field) being set. The status code suggests that the controller was removed at the time of the bugcheck.

0: kd> dt nt!_IO_STATUS_BLOCK ffffad07eff109a0+0x30
   +0x000 Status           : 0n-1073741130
   +0x000 Pointer          : 0x00000000`c00002b6 Void
   +0x008 Information      : 0
0: kd> !error c00002b6
Error code: (NTSTATUS) 0xc00002b6 (3221226166) - The device has been removed.

References:

https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/introduction-to-cancel-routines

https://learn.microsoft.com/en-us/windows-hardware/drivers/devtest/i-o-verification

Posted in Debugging, Stop 0xC9, WinDbg | Leave a comment

Debugging Stop 0x76 – PROCESS_HAS_LOCKED_PAGES

PROCESS_HAS_LOCKED_PAGES (76)
Caused by a driver not cleaning up correctly after an I/O.
Arguments:
Arg1: 0000000000000000, Locked memory pages found in process being terminated.
Arg2: fffffa800b1a4060, Process address.
Arg3: 0000000000000004, Number of locked pages.
Arg4: 0000000000000000, Pointer to driver stacks (if enabled) or 0 if not.
	Issue a !search over all of physical memory for the current process pointer.
	This will yield at least one MDL which points to it.  Then do another !search
	for each MDL found, this will yield the IRP(s) that point to it, revealing
	which driver is leaking the pages.
	Otherwise, set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory
	Management\TrackLockedPages to a DWORD 1 value and reboot.  Then the system
	will save stack traces so the guilty driver can be easily identified.
	When you enable this flag, if the driver commits the error again you will
	see a different BugCheck - DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS (0xCB) -
	which can identify the offending driver(s).

The documentation does actually point out what you should do with this in bugcheck, which is to enable page tracking from the registry, this should hopefully lead to a Stop 0xCB bugcheck being produced or at least allow you to use the !lockedpages debugger extension with the _EPROCESS address. You enable page tracking using the following from command prompt:

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v TrackLockedPages /t REG_DWORD /d 0x1 /f

The system will need to be rebooted for changes to take effect. The call stack of all these bugchecks all seem to be near identical, with a process being terminated with locked pages still being held:

4: kd> knL
 # Child-SP          RetAddr               Call Site
00 fffff880`03354b78 fffff800`02b1ac25     nt!KeBugCheckEx
01 fffff880`03354b80 fffff800`02b35d27     nt! ?? ::NNGAKEGL::`string'+0x18126
02 fffff880`03354bc0 fffff800`02b616b8     nt!PspProcessDelete+0x177
03 fffff880`03354c20 fffff800`02b33ddf     nt!ObpRemoveObjectRoutine+0x8c
04 fffff880`03354c80 fffff800`02885961     nt!ObpProcessRemoveObjectQueue+0x37
05 fffff880`03354cb0 fffff800`02b1cc06     nt!ExpWorkerThread+0x111
06 fffff880`03354d40 fffff800`02856c26     nt!PspSystemThreadStartup+0x5a
07 fffff880`03354d80 00000000`00000000     nt!KiStartSystemThread+0x16

Each MDL is associated to a process through a pointer to the _EPROCESS structure, as shown below:

4: kd> dt _MDL
nt!_MDL
   +0x000 Next             : Ptr64 _MDL
   +0x008 Size             : Int2B
   +0x00a MdlFlags         : Int2B
   +0x010 Process          : Ptr64 _EPROCESS
   +0x018 MappedSystemVa   : Ptr64 Void
   +0x020 StartVa          : Ptr64 Void
   +0x028 ByteCount        : Uint4B
   +0x02c ByteOffset       : Uint4B

Unfortunately, you can’t do much else with this bugcheck other than enable locked page tracking and hope that the system crashes again. You can find the process which has been terminated but it is usually of little use, unless you know that the process has it’s own set of drivers.

4: kd> dt _EPROCESS fffffa800b1a4060 -y ImageFileName
nt!_EPROCESS
   +0x2e0 ImageFileName : [15]  "BFBC2Game.exe"

The process in the above example was for BattleField Bad Company 2.

References:

Using MDLs – Windows drivers | Microsoft Learn

Bug Check 0x76 PROCESS_HAS_LOCKED_PAGES – Windows drivers | Microsoft Learn

Posted in Debugging, Stop 0x76, WinDbg | Leave a comment

Windows Address Translation Deep Dive – Part 2

In the first part of this post series, we looked at how segmentation worked and how a virtual address (linear address) was constructed. This part we will exploring how our linear address is translated by the memory management unit (MMU) to a physical address and the structures which Windows uses to manage this process.

The linear address on x86 systems is divided into three distinct parts: the page directory index; the page table index and the byte offset. This gives a three-level paging structure. Each page on a x86 system is either 4KB or considered a large page which comprises of a 4MB page; these large pages are only enabled if the page size extension (PSE) bit is enabled in the CR4 register. The only distinguishable difference between the two, is that a large page address does not have a page table index and its offset is extended from 12-bit to 22-bits.

0: kd> .formats cr4
Evaluate expression:
Hex: 00000000`00370e78
Decimal: 3608184
Octal: 0000000000000015607170
Binary: 00000000 00000000 00000000 00000000 00000000 00110111 00001110 01111000

The same applies x86-x64 systems but since physical address extension (PAE) is enabled by default in order to increase the addressable amount of memory, the linear address changes slightly and is extended but usually limited to 48 bits on most operating systems to make paging simpler to manage. This provides a paging hierarchy of four levels.

On a x86 system, the page directory index is 10 bits; the page table index is 10 bits and the byte offset is 12 bits. The page directory index – as the name implies – contains an index into the page directory which is an array of page directory entries (PDEs). The page directory differs between different processes but the physical address can always be found in the CR3 register. It can also be found in the _KPROCESS structure.

0: kd> r @cr3
cr3=000000019e5db002
0: kd> dt _KPROCESS -y Directory ffff8381083dc0c0
nt!_KPROCESS
+0x028 DirectoryTableBase : 0x00000001`9e5db002

On x86 systems, the page directory consists of 1024 page directory entries, all of which are 32 bits (4 bytes) in length. Each page directory entry will point to a page table which in turn consists of 1024 4-byte entries known as page table entries (PTEs). This PTE is very similar to a PDE but instead of referring to a page table, will point to the base physical address of the associated page; the page offset is then applied to this base address to find the physical address which the PTE corresponds to.

However, you may be wondering how do you determine if paging has been enabled on a system and which paging mode has been enabled? The answer lies with the CR0 register; more specifically the 31st bit: the paging (PG) bit and the physical address extension (PAE) bit (5th) in the the CR4 register.

0: kd> .formats cr0
Evaluate expression:
Hex: 00000000`80050033
Decimal: 2147811379
Octal: 0000000000020001200063
Binary: 00000000 00000000 00000000 00000000 10000000 00000101 00000000 00110011

0: kd> .formats cr4
Evaluate expression:
Hex: 00000000`00370e78
Decimal: 3608184
Octal: 0000000000000015607170
Binary: 00000000 00000000 00000000 00000000 00000000 00110111 00001110 01111000

In addition to the aforementioned control bits, I’ve also highlighted the 1st bit (bit 0) of the CR0 register as this indicates wherever the system in operating in real mode or protected mode, and of course, the system must be operating in protected mode in order for paging to be enabled.

Given we had a quick overview of how the x86 paging hierarchy is structured, let’s turn our attention to x86-x64 paging which is largely the same, afterwards we’ll take a closer look at the PDE and PTE structure since there is very little difference between them on x86 and x86-x64 systems. The linear address – as you may expect – is now extended to 64-bits, however, there is an important caveat here which introduces an issue which you will very likely come across when troubleshooting; despite, the linear address being 64-bits in length, only 48-bits of that linear address is addressable by the operating system! This restriction means that all addresses on x86-x64 systems must be canonical: the upper 14-bits must be either all 0s or all 1s. The 48th bit (bit 47) determines what the other upper bits will be set to: if it is set to 1, then the rest must be 1s otherwise the rest must be 0s if it has been set to 0.

The linear address is then broken in the following parts: a 14-bit sign extension; a 9-bit page-map level-4 offset (PML4); a 9-bit page directory pointer index (PDPT); 9-bit page directory index (PDE); 9-bit page table index (PTE) and 12-bit byte offset. This gives a paging hierarchy like below:

You may have now noticed that the CR3 register no longer references the physical address of a page directory but rather the physical address of the PML4 table instead; specifically, bits 12 to 51 of the CR3 register are used. All the paging tables now store 512 entries each respectively, with each entry being extended to 64 bits (8 bytes), although, only 40 bits is actually used for indexing into the other tables. As an aside, the DirectoryTableBase field of the _EPROCESS structure now refers to the PML4 rather than the page directory table.

The PML4 table consists of 512 PML4 entries, the PML4 index from the linear address is multiplied by 8 bytes – remember each PML4E is 8 bytes in length – and then added to the base address found in the CR3 register. This means that the PML4 table can address up to 512GB of physical memory. Similarly, bits 12-51 of the PML4E contain the physical address of a PDPT, we use the same process as before to find the PDPTE: take the index into the PDPT from the linear address then multiply by 8 bytes, which when combined with the physical address allows us to locate the corresponding PDPTE. This maps up to 1GB of physical memory.

To find the address of the page directory table entry, we take the PDE index multiply that by 8 bytes as we did on the previous two occasions, then use that with bits 12-51 of the PDPTE from the previous step. This is assuming that the PS flag (bit 7) of the PDPTE has been set to 0. If the flag were to be set to 1, then we would use combine bits 30-51 and bits 0-29 from the linear address in order to find the physical address. This only applies to 1GB pages. On the other hand, and the more common case, the PS flag will be set to 0, and the PDE will be used to find the corresponding PTE using a similar process. If the PS flag were to be set to 1, then the PDE would point to a large page (2MB) directly.

Fortunately, we don’t have to calculate each paging structure entry by hand since the !pte command conveniently handles this for us.

0: kd> !pte ffffe68b04c1b6b0 
                                           VA ffffe68b04c1b6b0
PXE at FFFFFA7D3E9F4E68    PPE at FFFFFA7D3E9CD160    PDE at FFFFFA7D39A2C130    PTE at FFFFFA73458260D8
contains 0A000008BC060863  contains 0A000002A547D863  contains 0A000005A66D2863  contains 810000047EFB3863
pfn 8bc060    ---DA--KWEV  pfn 2a547d    ---DA--KWEV  pfn 5a66d2    ---DA--KWEV  pfn 47efb3    ---DA--KW-V

Each dashed line corresponds to a particular control bit; the contents field will map to these control bits. The PXE part actually refers to the PML4 entry of that linear address and PPE will mean page PDP entry. I’m not sure why the developers of WinDbg decided to change the naming convention here.

The control bits on a hardware level are the following:

  1. The Present (P) bit determines wherever the linear address is resident in physical memory. If not, then a page fault will be incurred and the MMU will attempt to “page” the address into RAM.
  2. The Read/Write (R/W) bit determines if the page can be written to or not. If this bit is cleared then the page is read-only. The 16th bit (Write Protection) of the CR0 register determines if this control flag should be applied to pages which belong to the kernel as well otherwise it will be applicable to user-mode pages.
  3. The User/Supervisor (U/S) bit is used to signify if the corresponding pages are kernel or user-mode only. If this bit is set, then the page is readily accessible by both, otherwise it will be kernel-mode only. An important caveat here is that if this bit is set for a PDE, then all the pages under that PDE will inherit this control flag unless it has been explicitly set for the PTE.
  4. The Page Write-Through (PWT) bit controls the caching mode of the page and wherever it uses write-through or write-back caching. If this bit is set then write-through caching is used.
  5. The Page Cache Disable (PCD) bit – as the name implies – is used to enable or disable caching of the page.
  6. The Accessed (A) bit does not necessarily have a purpose other than to show wherever a page has been accessed as part of a page table walk.
  7. The Dirty (D) bit indicates a page which has been written to; this is quite an important control bit for most operating systems since usually dirty pages must be zeroed before being usable by another process.
  8. The Page Size (PS) bit is used with PDPTEs and PDEs to determine the size of the page, which is either huge pages (1GB) or large parges (4MB). This bit is only applicable on x86 systems if the page size extension (PSE) bit has been set in the CR4 register. This bit is ignored on x86-x64 systems operating in long mode.
  9. The Global (G) bit is used to determine wherever the TLB cache will be flushed when a CR3 register is cleared or changed. This control bit is only applicable when the page global enabled (PGE) bit has been set in the CR4 register. This is used to allow multiple address spaces to share the same page tables.
  10. The Page Attribute Table (PAT) bit is used in conjunction with the PCD and PWT bits to determine the caching type. The PAT is used to establish per-page caching behaviour which can be used to make caching very efficient for particular scenarios.
  11. The Execute Disable (XD) bit is only applicable when the processor is operating in long mode but determines wherever instructions are able to execute within the specified page. This is only supported if the NXE bit is set in the EFER register.
  12. The Protection Key (PK) is 4 bits in length and used to set access rights for a set of pages. There is a separate set of access rights for kernel-mode and user-mode, and therefore the PKE bit and PKS bit in the CR4 register is evaluated to determine this distinction. If the PKE bit is set, then the PKRU register is used in conjunction with the PK to evaluate the accessibility of the page for user-mode. On the other hand, if the PKS bit is set, then PKRS register is similarly used with the PK to checking the access rights in the context of kernel-mode. This bit is only applicable when the processor is running in long mode.

An overview of the aforementioned structures can be found below:

References:

https://connormcgarr.github.io/paging

https://wiki.osdev.org/Paging

https://blog.xenoscr.net/2021/09/06/Exploring-Virtual-Memory-and-Page-Structures.html

Posted in Computer Science, Windows Internals | Leave a comment

Debugging Stop 0x7C – BUGCODE_NDIS_DRIVER

BUGCODE_NDIS_DRIVER (7c)
The operating system detected an error in a networking driver.
The BUGCODE_NDIS_DRIVER BugCheck identifies problems in network drivers.
Often, the defect is caused by a NDIS miniport driver. You can get a complete
list of NDIS miniport drivers using !ndiskd.netadapter.  You can get a
big-picture overview of the network stack with !ndiskd.netreport.
Arguments:
Arg1: 0000000000000014, NDIS_BUGCHECK_WAIT_EVENT_HIGH_IRQL
	A network driver called NdisWaitEvent at an illegal
	IRQL.
Arg2: 0000000000000002, The actual IRQL
Arg3: 0000000000000000, Zero.
Arg4: 0000000000000000, Zero.
3: kd> knL
 # Child-SP          RetAddr               Call Site
00 fffffe0b`21a263c8 fffff806`70f18e48     nt!KeBugCheckEx
01 fffffe0b`21a263d0 fffff806`92824a20     ndis!NdisWaitEvent+0x1fe88
02 fffffe0b`21a26410 fffffe0b`21a264d9     rtwlanu6+0x194a20
03 fffffe0b`21a26418 fffff806`927e83f5     0xfffffe0b`21a264d9
04 fffffe0b`21a26420 00000000`00000000     rtwlanu6+0x1583f5

There isn’t anything particularly interesting to mention about this bugcheck other than the IRQL level is higher than permitted for the NdisWaitEvent function which requires an IRQL level of 0 or PASSIVE_LEVEL. These issues are typically due to inappropriate use of spinlocks which raises the IRQL level to DISPATCH_LEVEL or not checking that the IRQL level has returned to 0 before attempting to wait.

The issue does appear to be due to a Realtek network adapter driver, however, there is evidence that the user has AVG installed. I’ve witnessed nothing but issues with most third-party AV programs and I would strongly suggest that the user remove the product using the official removal tool.

Reference:

https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ndis/nf-ndis-ndiswaitevent

Posted in Debugging, Stop 0x7C, WinDbg | Leave a comment

Debugging Stop 0x139 – KERNEL_SECURITY_CHECK_FAILURE Part 4

KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure. The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 000000000000001e, Type of memory safety violation
Arg2: fffff9848b6ade10, Address of the trap frame for the exception that caused the BugCheck
Arg3: fffff9848b6add68, Address of the exception record for the exception that caused the BugCheck
Arg4: 0000000000000000, Reserved
3: kd> .exr 0xfffff9848b6add68
ExceptionAddress: fffff80252e97571 (nt!KeQueryValuesThread+0x000000000013bfa1)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 000000000000001e
Subcode: 0x1e FAST_FAIL_INVALID_NEXT_THREAD

We look at the exception subcode we can see that is due to an invalid next thread, unfortunately I haven’t been able to find much information on the KeQueryValuesThread function but I should imagine that it checks which processor the thread is scheduled to run on and then performs a validation check to ensure that this is correct.

3: kd> !stack -p
Call Stack : 8 frames
## Stack-Pointer Return-Address Call-Site
00 fffff9848b6adfa0 fffff80253108701 nt!KeQueryValuesThread+13bfa1 (perf)
Parameter[0] = ffffc68a5acee300 << _KTHREAD?
Parameter[1] = fffff9848b6ae040
Parameter[2] = (unknown)
Parameter[3] = (unknown)
01 fffff9848b6ae020 fffff80253108392 nt!PsQueryStatisticsProcess+111
Parameter[0] = ffffc68a5cedd080 << _EPROCESS
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
02 fffff9848b6ae0a0 fffff80252ff4cc1 nt!ExpCopyProcessInfo+42
Parameter[0] = 00000000066ebc10
Parameter[1] = ffffc68a5cedd080 << _EPROCESS
Parameter[2] = ffffc68a5d81e300
Parameter[3] = fffff9848b6ae330
03 fffff9848b6ae120 fffff802530fcc87 nt!ExpGetProcessInformation+9f1
Parameter[0] = (unknown)
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
04 fffff9848b6ae780 fffff802530fbe37 nt!ExpQuerySystemInformation+d07
Parameter[0] = 0000000000000001
Parameter[1] = 0000000000000000
Parameter[2] = 0000000000000000
Parameter[3] = 00000000066a80b0
05 fffff9848b6aeac0 fffff80252e086b8 nt!NtQuerySystemInformation+37
Parameter[0] = (unknown)
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
06 fffff9848b6aeb00 00007ff9d19ad4e4 nt!KiSystemServiceCopyEnd+28
Parameter[0] = 0000000000000005
Parameter[1] = 00000000066a80b0
Parameter[2] = 0000000000010400
Parameter[3] = 0000000008d8e340

From what I gather from this OSR Online article, the argument passed to the PsQueryStatisticsProcess is indeed an _EPROCESS structure. This is likely being used to make a scheduling decision for the next thread to be scheduled which in fact belongs to this process.

3: kd> !thread ffffc68a5acee300
THREAD ffffc68a5acee300 Cid 2110.1c98 Teb: 000000000038e000 Win32Thread: 0000000000000000 STANDBY
Not impersonating
GetUlongFromAddress: unable to read from fffff8025361151c
Owning Process ffffc68a5cedd080 Image:
Attached Process N/A Image: N/A
fffff78000000000: Unable to get shared data
Wait Start TickCount 20457
Context Switch Count 15301 IdealProcessor: 6
ReadMemory error: Cannot get nt!KeMaximumIncrement value.
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x000000005984d8a8
Stack Init fffff98486f6bc90 Current fffff98486f6b6a0
Base fffff98486f6c000 Limit fffff98486f66000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff984`86f6b6e0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x76

The definition of a standby thread is the following:

Standby A thread in the standby state has been selected to run next on a particular processor. When the correct conditions exist, the dispatcher performs a context switch to this thread. Only one thread can be in the standby state for each processor on the system. Note that a thread can be preempted out of the standby state before it ever executes (if, for example, a higher priority thread becomes runnable before the standby thread begins execution).

Windows Internals 5th Edition

We can check which processor the standby thread has been scheduled to run on next by examining the _KTHREAD structure:

3: kd> dt _KTHREAD -y Next* ffffc68a5acee300
nt!_KTHREAD
+0x218 NextProcessor : 7
+0x218 NextProcessorNumber : 0y0000000000000000000000000000111 (0x7)

The processor number allocated to this is seven which is different to what the scheduler was attempting to run the thread on. If we check the processor control region (PCR) for the processor which we were running on, then we’ll also notice that no thread has been scheduled to run on it.

3: kd> !pcr
KPCR for Processor 3 at ffffaf8134c40000:
Major 1 Minor 1
NtTib.ExceptionList: ffffaf8134c4ffb0
NtTib.StackBase: ffffaf8134c4e000
NtTib.StackLimit: 0000000008d8e318
NtTib.SubSystemTib: ffffaf8134c40000
NtTib.Version: 0000000034c40180
NtTib.UserPointer: ffffaf8134c40870
NtTib.SelfTib: 00000000009f5000

SelfPcr: 0000000000000000
Prcb: ffffaf8134c40180
Irql: 0000000000000000
IRR: 0000000000000000
IDR: 0000000000000000
InterruptMode: 0000000000000000
IDT: 0000000000000000
GDT: 0000000000000000
TSS: 0000000000000000

CurrentThread: ffffc68a5f306080
NextThread: 0000000000000000
IdleThread: ffffaf8134c4b340

Reference:

Posted in Debugging, Stop 0x139, WinDbg | Leave a comment

Debugging Stop 0x96 – INVALID_WORK_QUEUE_ITEM

INVALID_WORK_QUEUE_ITEM (96)

This message occurs when KeRemoveQueue removes a queue entry whose flink
or blink field is null. This is almost always called by code misusing
worker thread work items, but any queue misuse can cause this. The rule
is that an entry on a queue may only be inserted on the list once. When an
item is removed from a queue, it's flink field is set to NULL. This bugcheck
occurs when remove queue attempts to remove an entry, but the flink or blink
field is NULL. In order to debug this problem, you need to know the queue being
referenced.
In an attempt to help identify the guilty driver, this bugcheck assumes the
queue is a worker queue (ExWorkerQueue) and prints the worker routine as
parameter 4 below.
Arguments:
Arg1: ffffbf824cfac910, The address of the queue entry whose flink/blink field is NULL
Arg2: ffffbf824d95ad80, The address of the queue being references. Usually this is one
of the ExWorkerQueues.
Arg3: ffffbf823a088cc0, The base address of the ExWorkerQueue array. This will help determine
if the queue in question is an ExWorkerQueue and if so, the offset from
this parameter will isolate the queue.
Arg4: 0000000000000502, If this is an ExWorkerQueue (which it usually is), this is the address
of the worker routine that would have been called if the work item was
valid. This can be used to isolate the driver that is misusing the work
queue.

The first thing you must establish when debugging this bugcheck is the queue type. You can typically work this out from examining the call stack. In this case, it appears that we’re dealing with an I/O Completion Port. These rely upon I/O completion packets (_IO_MINI_COMPLETION_PACKET_USER) being added a particular port, which is just a queue, and then waiting for an I/O worker thread to come along and process the I/O requests queued up.

2: kd> knL

# Child-SP RetAddr Call Site
00 ffff8e0d`f0a9fad8 fffff807`5141aec1 nt!KeBugCheckEx
01 ffff8e0d`f0a9fae0 fffff807`5120f208 nt!KeRemoveQueueEx+0x20b951 << This is where we crash!
02 ffff8e0d`f0a9fb80 fffff807`515f77ce nt!IoRemoveIoCompletion+0x98 << I/O Completion Port API
03 ffff8e0d`f0a9fcb0 fffff807`514086b8 nt!NtRemoveIoCompletionEx+0xfe
04 ffff8e0d`f0a9fdf0 fffff807`513faaf0 nt!KiSystemServiceCopyEnd+0x28
05 ffff8e0d`f0a9fff8 fffffb03`7f8c3aac nt!KiServiceLinkage
06 ffff8e0d`f0aa0000 fffffb03`7f9e586a win32kfull!xxxRemoveQueueCompletion+0x5c
07 ffff8e0d`f0aa0070 fffffb03`7f942bbe win32kfull!xxxMsgWaitForMultipleObjectsEx+0x126
08 ffff8e0d`f0aa0120 fffffb03`7f3b6fcf win32kfull!NtUserMsgWaitForMultipleObjectsEx+0x3fe
09 ffff8e0d`f0aa0a50 fffff807`514086b8 win32k!NtUserMsgWaitForMultipleObjectsEx+0x1f
0a ffff8e0d`f0aa0a90 00007ffc`0dbaa104 nt!KiSystemServiceCopyEnd+0x28
0b 0000009a`7f9fe848 00000000`00000000 0x00007ffc`0dbaa104

When a server thread invokes GetQueuedCompletionStatus, the system service NtRemoveIoCompletion is executed. After validating parameters and translating the completion port handle to a pointer to the port, NtRemoveIoCompletion calls IoRemoveIoCompletion, which eventually calls KeRemoveQueueEx. For high-performance scenarios, it’s possible that multiple I/Os may have been completed, and although the thread will not block, it will still call into the kernel each time to get one item. The GetQueuedCompletionStatus or GetQueuedCompletionStatusEx API allows applications to retrieve more than one I/O completion status at the same time, reducing the number of user-to-kernel roundtrips and maintaining peak efficiency. Internally, this is implemented through the NtRemoveIoCompletionEx function, which calls IoRemoveIoCompletion with a count of queued items, which is passed on to KeRemoveQueueEx.

Windows Internals 6th Edition

This information will help understand the parameters being passed to the API function which we crash at.

2: kd> !stack -p

Call Stack : 12 frames
## Stack-Pointer Return-Address Call-Site
00 ffff8e0df0a9fad8 fffff8075141aec1 nt!KeBugCheckEx+0
Parameter[0] = (unknown)
Parameter[1] = (unknown)
Parameter[2] = (unknown)
Parameter[3] = (unknown)
01 ffff8e0df0a9fae0 fffff8075120f208 nt!KeRemoveQueueEx+20b951 (perf)
Parameter[0] = ffffbf824d95ad80
Parameter[1] = 0000000000000000
Parameter[2] = 0000000000000000
Parameter[3] = ffff8e0df0aa0078
02 ffff8e0df0a9fb80 fffff807515f77ce nt!IoRemoveIoCompletion+98
Parameter[0] = ffffbf824d95ad80 << Our queue object
Parameter[1] = ffff8e0df0aa0040
Parameter[2] = ffff8e0df0a9fd20
Parameter[3] = 0000000000000001 << Number of queued items

We can dump a queue using the _KQUEUE structure like so.

2: kd> dt _KQUEUE ffffbf824d95ad80

nt!_KQUEUE
+0x000 Header : _DISPATCHER_HEADER
+0x018 EntryListHead : _LIST_ENTRY [ 0xffffbf82`4cfac910 - 0xffffbf82`4cfac910 ]
+0x028 CurrentCount : 1
+0x02c MaximumCount : 0xc << 12
+0x030 ThreadListHead : _LIST_ENTRY [ 0xffffbf82`4d99d288 - 0xffffbf82`4d99d288 ]

Notice how the Flink value is null as mentioned in the first parameter?

2: kd> dt _LIST_ENTRY ffffbf824cfac910

nt!_LIST_ENTRY
[ 0x00000000`00000000 - 0xffffbf82`4d95ad98 ]
+0x000 Flink : (null)
+0x008 Blink : 0xffffbf82`4d95ad98 _LIST_ENTRY [ 0xffffbf82`4cfac910 - 0xffffbf82`4cfac910 ]

So, it seems that someone has attempted to call GetQueuedCompletionStatus twice or provided an invalid completion port to the function. The best course of action here is to run Driver Verifier or see if the problem can be isolated to a particular user application.

References:

Posted in Debugging, Stop 0x96, WinDbg | Leave a comment

Debugging Stop 0x139 – KERNEL_SECURITY_CHECK_FAILURE Part 3

KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure. The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 000000000000001d, An RTL_BALANCED_NODE RBTree entry has been corrupted.
Arg2: ffff8d0be5d5d000, Address of the trap frame for the exception that caused the BugCheck
Arg3: ffff8d0be5d5cf58, Address of the exception record for the exception that caused the BugCheck
Arg4: 0000000000000000, Reserved

This is one of the least common variations of a Stop 0x139 bugcheck which you’ll see and it isn’t immediately obvious why the operating system would care about an unbalanced red-black tree other than for efficiency reasons, until you understand where this important data structure is being implemented.

To begin with, I’ll give a brief overview of a what a red-black tree is and how they’re considered to be balanced as that is what the “corruption” mentioned in the first parameter is referring to. Red-black trees are similar to other self-balancing binary search trees, in that there is a strict ordering imposed on the children of any given node, with the left child having a “value” less than its parent and the right child having a value greater than or equal to its parent.

The red-black tree introduces another concept which is colour: a node is either red or black, with NULL nodes being implicitly black in addition to the root node usually considered as black. A red node can not have a red child node, therefore a red node will always have at most two black children. This colour restriction introduces a new form of balancing which is based on the height of the tree: the path from the root to the furthest leaf (node without children) is no longer than twice the length from the root to the nearest leaf.

1: kd> dt _RTL_BALANCED_NODE
win32k!_RTL_BALANCED_NODE
+0x000 Children : [2] Ptr64 _RTL_BALANCED_NODE
+0x000 Left : Ptr64 _RTL_BALANCED_NODE
+0x008 Right : Ptr64 _RTL_BALANCED_NODE
+0x010 Red : Pos 0, 1 Bit
+0x010 Balance : Pos 0, 2 Bits
+0x010 ParentValue : Uint8B

The _RTL_BALANCED_NODE structure is used to represent both AVL and red-black trees, the Balance property is ignored with red-black trees and the Red property is used instead, which set to either 1 or 0 to denote if that node is red or black.

1: kd> .exr 0xffff8d0be5d5cf58
ExceptionAddress: fffff8007bc3397d (nt!RtlRbRemoveNode+0x00000000001f3d7d)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 000000000000001d
Subcode: 0x1d FAST_FAIL_INVALID_BALANCED_TREE

If we examine the exception record, then we can see that the fast fail exception was thrown while we were attempting to remove a red-black tree node. Why would such validation exist on this API call? The reason being is because the segmentation heap internally uses red-black trees in order to keep track of free heap blocks for both the low fragmentation heap (LFH) and the variable size (VS) backend allocators. There are many security exploits around corrupted node structures which allow an attacker to arbitrary write to memory, hence why this validation check was added. The tree in question is known as the FreeChunkTree.

The validation check involves checking that the children still match their parent. If we examine the trap frame and then check the rdx register:

1: kd> .trap ffff8d0b`e5d5d000
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ff3fa80cae465038 rbx=0000000000000000 rcx=000000000000001d
rdx=ffffa80cae886038 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8007bc3397d rsp=ffff8d0be5d5d198 rbp=0000000000000516
r8=0000000000000000 r9=ffffa80cae765038 r10=0000000000000000
r11=ffffa80cae123038 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na po cy
nt!RtlRbRemoveNode+0x1f3d7d:
fffff800`7bc3397d cd29 int 29h

If we then dump the red-black node:

1: kd> dt _RTL_BALANCED_NODE ffffa80cae886038
win32k!_RTL_BALANCED_NODE
+0x000 Children : [2] (null)
+0x000 Left : (null)
+0x008 Right : 0xffffa80c`ae765038 _RTL_BALANCED_NODE
+0x010 Red : 0y0
+0x010 Balance : 0y00
+0x010 ParentValue : 0xffffa80c`ae123038

We can see that it has one child, this child should point back to the above node, although, it seems to differ by one bit:

1: kd> dt _RTL_BALANCED_NODE 0xffffa80c`ae765038
win32k!_RTL_BALANCED_NODE
+0x000 Children : [2] (null)
+0x000 Left : (null)
+0x008 Right : (null)
+0x010 Red : 0y1
+0x010 Balance : 0y01
+0x010 ParentValue : 0xffffa80c`ae886039

Typically, these single bit flips are usually due to some form of hardware failure and in fact, the crashes were due to faulty RAM.

1: kd> knL
# Child-SP RetAddr Call Site
00 ffff8d0b`e5d5ccd8 fffff800`7bc169a9 nt!KeBugCheckEx
01 ffff8d0b`e5d5cce0 fffff800`7bc16f10 nt!KiBugCheckDispatch+0x69
02 ffff8d0b`e5d5ce20 fffff800`7bc14e9d nt!KiFastFailDispatch+0xd0
03 ffff8d0b`e5d5d000 fffff800`7bc3397d nt!KiRaiseSecurityCheckFailure+0x31d
04 ffff8d0b`e5d5d198 fffff800`7ba3f9c0 nt!RtlRbRemoveNode+0x1f3d7d
05 ffff8d0b`e5d5d1b0 fffff800`7ba3f408 nt!RtlpHpVsChunkCoalesce+0xb0
06 ffff8d0b`e5d5d210 fffff800`7ba41524 nt!RtlpHpVsContextFree+0x188
07 ffff8d0b`e5d5d2b0 fffff800`7c1bc0b9 nt!ExFreeHeapPool+0x4d4
08 ffff8d0b`e5d5d390 fffff800`7bbf6442 nt!ExFreePool+0x9
09 ffff8d0b`e5d5d3c0 fffff800`7ba42103 nt!IopProcessBufferedIoCompletion+0x7a
0a ffff8d0b`e5d5d400 fffff800`7ba62916 nt!IopCompleteRequest+0xd3
0b ffff8d0b`e5d5d4d0 fffff800`7c1cd17f nt!IopfCompleteRequest+0x816
0c ffff8d0b`e5d5d5b0 fffff800`7bc3f20d nt!IovCompleteRequest+0x1cf
0d ffff8d0b`e5d5d6a0 fffff800`7bea4539 nt!IofCompleteRequest+0x1dd13d
0e ffff8d0b`e5d5d6d0 fffff800`7bea4486 nt!PiUEventHandleIoctl+0x4d
0f ffff8d0b`e5d5d710 fffff800`7bf0850d nt!PiUEventDispatch+0x36
10 ffff8d0b`e5d5d740 fffff800`7bb75867 nt!PiDaDispatch+0x4d
11 ffff8d0b`e5d5d770 fffff800`7c1ccf2a nt!IopfCallDriver+0x53
12 ffff8d0b`e5d5d7b0 fffff800`7bc32089 nt!IovCallDriver+0x266
13 ffff8d0b`e5d5d7f0 fffff800`7be4a1dc nt!IofCallDriver+0x1f73e9
14 ffff8d0b`e5d5d830 fffff800`7be49e33 nt!IopSynchronousServiceTail+0x34c
15 ffff8d0b`e5d5d8d0 fffff800`7be49106 nt!IopXxxControlFile+0xd13
16 ffff8d0b`e5d5da20 fffff800`7bc16138 nt!NtDeviceIoControlFile+0x56
17 ffff8d0b`e5d5da90 00007fff`7ed8d0c4 nt!KiSystemServiceCopyEnd+0x28
18 00000089`51cff518 00000000`00000000 0x00007fff`7ed8d0c4

By looking at the call stack, we can see the crash really begins when we attempt to release a pool allocation using ExFreePool, which frees a chunk of memory, however, there is a slight optimisation which the operating system makes here, if a freed chunk is contiguous in memory to two other freed chunks, these chunks are merged together to form one chunk through the RtlpHpVsChunkCoalesce function. This chunk is then inserted into the FreeChunkTree.

References:

https://en.wikipedia.org/wiki/Red%E2%80%93black_tree

https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/shared/ntdef/rtl_balanced_node.htm

Posted in Debugging, Stop 0x139, WinDbg, Windows Internals | Leave a comment

Debugging Stop 0xCE – DRIVER_UNLOADED_WITHOUT_CANCELLING_PENDING_OPERATIONS

DRIVER_UNLOADED_WITHOUT_CANCELLING_PENDING_OPERATIONS (ce)
A driver unloaded without cancelling timers, DPCs, worker threads, etc.
The broken driver's name is displayed on the screen and saved in
KiBugCheckDriver.
Arguments:
Arg1: fffff805c5bc8597, memory referenced
Arg2: 0000000000000010, value 0 = read operation, 1 = write operation
Arg3: fffff805c5bc8597, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000000, Mm internal code.

The bugcheck parameters are always the same across all instances of a Stop 0xCE, the first and second parameter will provide the call site of the problematic driver’s unload function. You can use the ln command to directly go to it.

17: kd> knL
# Child-SP RetAddr Call Site
00 ffffa981`44cd6f38 fffff805`2f082123 nt!KeBugCheckEx
01 ffffa981`44cd6f40 fffff805`2ee5766c nt!MiSystemFault+0x22d9d3
02 ffffa981`44cd7040 fffff805`2f03a029 nt!MmAccessFault+0x29c
03 ffffa981`44cd7160 fffff805`c5bc8597 nt!KiPageFault+0x369
04 ffffa981`44cd72f0 ffffe102`098866a0 <Unloaded_BEDaisy.sys>+0x328597
05 ffffa981`44cd72f8 ffffa981`44cd7379 0xffffe102`098866a0
06 ffffa981`44cd7300 ffffe102`098866a0 0xffffa981`44cd7379
07 ffffa981`44cd7308 ffffa981`44cd1000 0xffffe102`098866a0
08 ffffa981`44cd7310 ffffe101`df2c3080 0xffffa981`44cd1000
09 ffffa981`44cd7318 fffff805`2ee1b236 0xffffe101`df2c3080
0a ffffa981`44cd7320 fffff805`2eecbaea nt!KiDeliverApc+0x186
0b ffffa981`44cd73e0 fffff805`2eeccae7 nt!KiSwapThread+0xf2a
0c ffffa981`44cd7530 fffff805`2eecf106 nt!KiCommitThreadWait+0x137
0d ffffa981`44cd75e0 fffff805`2f2d915c nt!KeWaitForSingleObject+0x256
0e ffffa981`44cd7980 fffff805`2f2d907b nt!ObWaitForSingleObject+0xcc
0f ffffa981`44cd79e0 fffff805`2f03e1e5 nt!NtWaitForSingleObject+0x6b
10 ffffa981`44cd7a20 00007ffa`b9faecd4 nt!KiSystemServiceCopyEnd+0x25
11 0000001f`ef9ff638 00000000`00000000 0x00007ffa`b9faecd4

As the driver is unloaded, the symbols are typically unloaded at the time of the crash as well, therefore you can use the .reload /unl <drivername.sys> command to reload the symbol information for the given driver module.

17: kd> .reload /unl BEDaisy.sys

This then should then resolve any addresses which lack function names and make your troubleshooting efforts much easier. However, this bugcheck is typically caused by driver’s simply not loading certain resources or mismanaged reference accounting which is causing the driver to prematurely unload itself. In the above example, we can see that the BattleEye Anti-Cheat driver is causing the crash, this isn’t a surprise to be honest, it frequently crashes for a multitude of different reasons.

Posted in Debugging, Stop 0xCE, WinDbg | Leave a comment