Debugging Stop 0x119 – VIDEO_SCHEDULER_INTERNAL_ERROR

In this post, we’re going to be looking at the bugcheck parameters of a Stop 0x119 which are documented, but not on the MSDN page for the bugcheck code itself for some reason?

Before we begin with examining the bugcheck, we’ll need to go some of the basic principles of the WDDM (Windows Display Driver Model). There is two drivers which an application will use: a user-mode display driver and a kernel-mode miniport display driver. The miniport driver is what does most of the “heavy-work”. Both drivers interact with the Direct X Kernel subsystem which is known as DXGKRNL.sys.

As we can see, the video scheduler is part of the Direct X kernel. This is where our crash occurs. The video scheduler has four different levels, each level differs on what functions are able to be run and the level of re-entrancy which is permitted. The level of re-entrancy is based upon distinct classes, in our example below, we were operating first level synchronization and within the GPU Scheduler class. At this level, no re-entrancy is permitted, and therefore only one thread will be running a function from the scheduler class.

The function which would have been called was DxgkDdiSubmitCommand. This builds a DMA buffer and then inserts it into the execution queue of a particular GPU engine. For example, the 3D rendering engine is a GPU engine. Each buffer which has been queued will have a unique identifier assigned to it called a fence identifier. Essentially, this ensures that the buffers are executed in the correct order. If a buffer is executed out of order or the fence identifier isn’t valid, then the system will bugcheck with another variant of a Stop 0x119.

There is actually two types of buffers available: command and DMA. The command buffers are used by the user-mode display driver and then sent to the graphics kernel, from there, the command buffer is validated and then converted into a DMA buffer. The DMA buffer contains the instructions for the GPU engine to execute. Both of the buffers are allocated from pageable memory. If a DMA buffer is currently paged out, then it must be paged into video memory.

Each process will have it’s own execution queue and the scheduler will provide a certain period of time for each task in the queue. If a task takes too long to process, then the system will bugcheck with either a Stop 0x116 or a Stop 0x117.

VIDEO_SCHEDULER_INTERNAL_ERROR (119)
The video scheduler has detected that fatal violation has occurred. This resulted
in a condition that video scheduler can no longer progress. Any other values after
parameter 1 must be individually examined according to the subtype.
Arguments:
Arg1: 0000000000000002, The driver failed upon the submission of a command.
Arg2: ffffffffc000000d << NTStatus Error Code
Arg3: ffff878c651fa860 << Pointer to DXGKARG_SUBMITCOMMAND structure
Arg4: ffffa1889c542570 << Parameter to dxgmms2!VidSchiSendToExecutionQueue+0x14d48

Anyhow, let’s begin with examining the bugcheck and checking why the system crashed. As you can see from the first parameter, the driver has failed to submit a paging command to the GPU’s execution queue. This is evident in the call stack.

6: kd> knL
# Child-SP RetAddr Call Site
00 ffff878c`651fa788 fffff806`98804210 nt!KeBugCheckEx
01 ffff878c`651fa790 fffff806`9a0faf38 watchdog!WdLogEvent5_WdCriticalError+0xe0
02 ffff878c`651fa7d0 fffff806`9a158d54 dxgmms2!VidSchiSendToExecutionQueue+0x14d48 << Crash here!
03 ffff878c`651fa900 fffff806`9a16a315 dxgmms2!VidSchiSubmitPagingCommand+0x2f4 << Command to be submitted
04 ffff878c`651faa80 fffff806`9a16a18a dxgmms2!VidSchiRun_PriorityTable+0x175
05 ffff878c`651faad0 fffff806`82d33585 dxgmms2!VidSchiWorkerThread+0xca
06 ffff878c`651fab10 fffff806`82dcb128 nt!PspSystemThreadStartup+0x55
07 ffff878c`651fab60 00000000`00000000 nt!KiStartSystemThread+0x28

Now, the second parameter fortunately provides an NTSTATUS code which indicates that an invalid parameter was passed to a function, presumably this is a parameter for the execution queue function.

6: kd> !error ffffffff`c000000d
Error code: (NTSTATUS) 0xc000000d (3221225485) - An invalid parameter was passed to a service or function.

If we examine the fourth parameter, then we can see that it’s the same parameter which is passed to the function in which we crashed at. The third parameter is a pointer to the DXGARG_SUBMITCOMMAND structure which is sent to the execution queue. So it seems that the driver has incorrectly set this and therefore has led to the crash you’ve experienced.

6: kd> !stack -p
Call Stack : 8 frames
## Stack-Pointer    Return-Address   Call-Site       
00 ffff878c651fa788 fffff80698804210 nt!KeBugCheckEx+0 
	Parameter[0] = 0000000000000119
	Parameter[1] = 0000000000000002
	Parameter[2] = ffffffffc000000d
	Parameter[3] = ffff878c651fa860
01 ffff878c651fa790 fffff8069a0faf38 watchdog!WdLogEvent5_WdCriticalError+e0 
	Parameter[0] = ffffa188982ea2b0
	Parameter[1] = (unknown)       
	Parameter[2] = (unknown)       
	Parameter[3] = (unknown)       
02 ffff878c651fa7d0 fffff8069a158d54 dxgmms2!VidSchiSendToExecutionQueue+14d48 (perf)
	Parameter[0] = ffffa1889c542570
	Parameter[1] = 0000000000000000
	Parameter[2] = (unknown)       
	Parameter[3] = (unknown)       
03 ffff878c651fa900 fffff8069a16a315 dxgmms2!VidSchiSubmitPagingCommand+2f4 (perf)
	Parameter[0] = ffffa1889f22d010
	Parameter[1] = (unknown)       
	Parameter[2] = (unknown)       
	Parameter[3] = (unknown)

If we check the raw stack for the thread, we can see what action might have been about to be undertaken. It appears that the graphics miniport driver was attempting to gain access to the hardware’s power settings. The Component Power Management model allows developers to have finer grained control of the individual components which make up a graphics card.

6: kd> !dpx
Start memory scan  : 0xffff878c651fa788 ($csp)
End memory scan    : 0xffff878c651fb000 (Kernel Stack Base)

               rsp : 0xffff878c651fa788 : 0xfffff80698804210 : watchdog!WdLogEvent5_WdCriticalError+0xe0
Unable to load image \SystemRoot\System32\DriverStore\FileRepository\nvlt.inf_amd64_e7444925b6f55a93\nvlddmkm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
0xffff878c651fa788 : 0xfffff80698804210 : watchdog!WdLogEvent5_WdCriticalError+0xe0
0xffff878c651fa858 : 0xfffff806990a47c8 : dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBInternal+0x70
0xffff878c651fa8f8 : 0xfffff8069a158d54 : dxgmms2!VidSchiSubmitPagingCommand+0x2f4
0xffff878c651fa948 : 0xfffff8069a15ab73 : dxgmms2!VidSchiCheckHwProgress+0x163
0xffff878c651fa9b8 : 0xfffff8069a0e8732 : dxgmms2!VidSchiScheduleCommandToRun+0x272
0xffff878c651faa28 : 0xfffff8069a0f1400 : dxgmms2!VidSchiClearFlipDevice+0x5c
0xffff878c651faa78 : 0xfffff8069a16a315 : dxgmms2!VidSchiRun_PriorityTable+0x175
0xffff878c651faa98 : 0xfffff8069a16a0c0 : dxgmms2!VidSchiWorkerThread
0xffff878c651faac8 : 0xfffff8069a16a18a : dxgmms2!VidSchiWorkerThread+0xca
0xffff878c651faad8 : 0xfffff8069a16a0c0 : dxgmms2!VidSchiWorkerThread
0xffff878c651fab08 : 0xfffff80682d33585 : nt!PspSystemThreadStartup+0x55
0xffff878c651fab58 : 0xfffff80682dcb128 : nt!KiStartSystemThread+0x28
0xffff878c651fab70 : 0xfffff80682d33530 : nt!PspSystemThreadStartup
6: kd> lmvm nvlddmkm
Browse full module list
start             end                 module name
fffff806`9ae40000 fffff806`9bc63000   nvlddmkm T (no symbols)           
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nvlt.inf_amd64_e7444925b6f55a93\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Mon Jun 19 02:28:31 2017 (594728BF)
    CheckSum:         00DD9945
    ImageSize:        00E23000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

So, as we can see, for some reason the display miniport driver has failed to submit the command to the execution queue. Unfortunately, like most graphics-related bugchecks, to really understand what is going on, we’re going to need to have some extensive knowledge of the DXGKRNL (most of this is quite secretive and undocumented to the public) along with the appropriate checked symbols. However, it does appear to be an issue with the graphics card driver in this case. In some circumstances, these bugchecks have been caused by either a faulty PSU or a faulty graphics card as well.

As a closing note, the general architecture for the GPU under WDDM can be found below:

References:

DXGKCB_SETPOWERCOMPONENTACTIVE callback function (d3dkmddi.h) DXGKDDI_SUBMITCOMMAND callback function (d3dkmddi.h) GPU power management of idle states and active power Enumerating GPU engine capabilities

About 0x14c

I'm currently a Software Developer. My primary interests are Mathematics, Programming and Windows Internals.
This entry was posted in Debugging, Stop 0x119, WinDbg. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.