The previous two parts have discussed address translation on an operating system agnostic hardware level. This part is going to focus on the translation mechanisms used almost exclusively with the Windows operating system.
Since processes will generally reserve large portions of memory which may never be touched, it doesn’t make sense to create page tables and consume physical memory which is not being used, therefore the Memory Manager will consult a AVL tree structure called the VAD (virtual address descriptor) tree. This is defined for each process, with each node of the tree being constructed when a process reserves or commits memory, generally this is achieved using VirtualAlloc.
How are VADs used in conjunction with address translation then? When thread attempts to access a given linear address, the Memory Manager will consult the VAD tree for the process which the thread belongs to, it will then find the corresponding VAD by checking the address range of each node and if the linear address falls within it for a given node, the protection bits for the node will be examined and the PTE control bits will then be constructed. This then enables the page fault to be successfully resolved.
The !vad debugger extension command can be used to dump the VAD tree for a given process:
0: kd> !vad
VAD Level Start End Commit
ffff838108005280 4 10 1f 0 Mapped READWRITE Pagefile section, shared commit 0x10
ffff83810800c9e0 5 20 20 0 Mapped READONLY Pagefile section, shared commit 0x1
ffff838108006d60 3 30 4c 0 Mapped READONLY Pagefile section, shared commit 0x1d
ffff83810c5a5050 4 50 14f 7 Private READWRITE
ffff838104845a60 2 150 153 0 Mapped READONLY Pagefile section, shared commit 0x4
ffff838104847400 5 160 160 0 Mapped READONLY Pagefile section, shared commit 0x1
ffff83810c5a2710 4 170 171 2 Private READWRITE
ffff83810800cee0 5 180 180 0 Mapped READONLY Pagefile section, shared commit 0x1
ffff83810800c620 3 190 190 0 Mapped READONLY Pagefile section, shared commit 0x1
ffff83810800e9c0 6 1a0 1a7 0 Mapped READONLY Pagefile section, shared commit 0x21
[...]
Alternatively, you can find the VAD root through the _EPROCESS structure, the root address is stored in the VadRoot field.
0: kd> dt _EPROCESS -y Vad ffff8381083dc0c0
nt!_EPROCESS
+0x7d8 VadRoot : _RTL_AVL_TREE
+0x7e0 VadHint : 0xffff8381`0c27d230 Void
+0x7e8 VadCount : 0xb2
+0x7f0 VadPhysicalPages : 0
+0x7f8 VadPhysicalPagesLimit : 0
+0x87c VadTrackingDisabled : 0y0
The VadCount field contains the number of nodes in the VAD tree. The VadRoot contains a pointer to the _RTL_AVL_TREE structure which acts as a wrapper structure for _RTL_BALANCED_NODE which in turn can then be used to transverse the rest of the tree.
Each VAD is represented by the _MMVAD structure as shown below:
0: kd> dt _MMVAD
nt!_MMVAD
+0x000 Core : _MMVAD_SHORT
+0x040 u2 : <anonymous-tag>
+0x048 Subsection : Ptr64 _SUBSECTION
+0x050 FirstPrototypePte : Ptr64 _MMPTE
+0x058 LastContiguousPte : Ptr64 _MMPTE
+0x060 ViewLinks : _LIST_ENTRY
+0x070 VadsProcess : Ptr64 _EPROCESS
+0x078 u4 : <anonymous-tag>
+0x080 FileObject : Ptr64 _FILE_OBJECT
The Core field contains a pointer to a another structure called _MMVAD_SHORT which contains the address range of the VAD which is shown by the StartingVpn and EndingVpn fields. The address is actually given as virtual page numbers which are then used to calculate the start and end virtual addresses. This is what the Start and End columns of the !vad output are referring to.
0: kd> dt _MMVAD_SHORT
nt!_MMVAD_SHORT
+0x000 NextVad : Ptr64 _MMVAD_SHORT
+0x008 ExtraCreateInfo : Ptr64 Void
+0x000 VadNode : _RTL_BALANCED_NODE
+0x018 StartingVpn : Uint4B
+0x01c EndingVpn : Uint4B
+0x020 StartingVpnHigh : UChar
+0x021 EndingVpnHigh : UChar
+0x022 CommitChargeHigh : UChar
+0x023 SpareNT64VadUChar : UChar
+0x024 ReferenceCount : Int4B
+0x028 PushLock : _EX_PUSH_LOCK
+0x030 u : <anonymous-tag>
+0x034 u1 : <anonymous-tag>
+0x038 EventList : Ptr64 _MI_VAD_EVENT_BLOCK
The VadNode field refers to the AVL tree node; the ReferenceCount is the number of references to this VAD and the PushLock field contains the push lock which controls write access to the VAD structure. Moreover, the union type specified at offset 0x30 in the above structure provides the protection flags of the VAD, using the example from earlier, we can see:
0: kd> dt _MMVAD_FLAGS ffff838108005280+0x30
nt!_MMVAD_FLAGS
+0x000 Lock : 0y0
+0x000 LockContended : 0y0
+0x000 DeleteInProgress : 0y0
+0x000 NoChange : 0y0
+0x000 VadType : 0y000
+0x000 Protection : 0y00100 (0x4)
+0x000 PreferredNode : 0y000000 (0)
+0x000 PageSize : 0y00
+0x000 PrivateMemory : 0y0
This corresponds to PAGE_READWRITE which means that the page can be written and read from. The most common protection flags are listed below:
- PAGE_READONLY (0x1)
- PAGE_EXECUTE_READ (0x3)
- PAGE_READWRITE (0x4)
- PAGE_EXECUTE_READWRITE (0x6)
The PrivateMemory field indicates the allocation type associated with the address range, this can be either private to the given process or an address range which is shared between multiple processes, including memory-mapped files. This brings to the other purpose of using VADs: to allow the management of memory-mapped files and shared memory.
Memory-mapped files enable a given process to access a file stored on the disk as if it were a region of memory by mapping the bytes of the file into the address space of the said process. This in turn reduces disk I/O and simplifies access to the file by allowing the process to use pointers to reference parts of the file (file-mapped I/O). As mentioned before, to manage the regions of memory owned by a process, when the MapFileOfView function is called, the Memory Manager will create a VAD for that given address range. As mentioned previously, physical memory and PTEs will only be allocated once a page fault has occurred on an address within the specified address range.
Moreover, it isn’t uncommon for mapped files to be shared between multiple processes, which enables those processes to access the same physical file. In Windows, shared mappings are represented by a structure which is known as the _SECTION (section object), each process will then create a view of that mapping which – depending on the permissions – will enable the process to read or write to that mapping.
When a process writes to this shared page of memory, the changes aren’t immediately flushed to the disk, instead they will be buffered and flushed when the view is closed or when a process calls FlushViewOfFile. It should also be noted that a mapped view does not always refer to a file but can be a portion of a page. This distinction leads to what is known as page-file backed views and file-backed views, although, there is an important caveat to mention here, any changes made to a page-file backed view will be automatically discarded once the underlying section object has been destroyed.
It’s important to mention that despite the same physical page being shared between multiple processes, each process’ view may create an entirely different virtual address range, therefore you shouldn’t expect the address range for each view of the same section object to start and end at the same point. Furthermore, each view must be independent of each other, in that they aren’t able to overlap, thus the offset and size passed to MapFileOfView function must be for a free memory region.
Let’s examine a section object:
0: kd> dt nt!_SECTION
+0x000 SectionNode : _RTL_BALANCED_NODE
+0x018 StartingVpn : Uint8B
+0x020 EndingVpn : Uint8B
+0x028 u1 : <anonymous-tag>
+0x030 SizeOfSection : Uint8B
+0x038 u : <anonymous-tag>
+0x03c InitialPageProtection : Pos 0, 12 Bits
+0x03c SessionId : Pos 12, 19 Bits
+0x03c NoValidationNeeded : Pos 31, 1 Bit
The SectionNode, StartingVpn and EndingVpn describe the address range which this mapping corresponds to. The SizeOfSection is the size of the mapping in bytes and the InitialPageProtection simply describes what access is permitted with that mapping much like what was described with VADs. The union at offset 0x38 is a special structure called _MMSECTION_FLAGS which describes particular features about the mapping.
0: kd> dt _MMSECTION_FLAGS
nt!_MMSECTION_FLAGS
+0x000 BeingDeleted : Pos 0, 1 Bit
+0x000 BeingCreated : Pos 1, 1 Bit
+0x000 BeingPurged : Pos 2, 1 Bit
+0x000 NoModifiedWriting : Pos 3, 1 Bit
+0x000 FailAllIo : Pos 4, 1 Bit
+0x000 Image : Pos 5, 1 Bit
+0x000 Based : Pos 6, 1 Bit
+0x000 File : Pos 7, 1 Bit
+0x000 AttemptingDelete : Pos 8, 1 Bit
+0x000 PrefetchCreated : Pos 9, 1 Bit
+0x000 PhysicalMemory : Pos 10, 1 Bit
+0x000 ImageControlAreaOnRemovableMedia : Pos 11, 1 Bit
+0x000 Reserve : Pos 12, 1 Bit
+0x000 Commit : Pos 13, 1 Bit
+0x000 NoChange : Pos 14, 1 Bit
+0x000 WasPurged : Pos 15, 1 Bit
+0x000 UserReference : Pos 16, 1 Bit
+0x000 GlobalMemory : Pos 17, 1 Bit
+0x000 DeleteOnClose : Pos 18, 1 Bit
+0x000 FilePointerNull : Pos 19, 1 Bit
+0x000 PreferredNode : Pos 20, 6 Bits
+0x000 GlobalOnlyPerSession : Pos 26, 1 Bit
+0x000 UserWritable : Pos 27, 1 Bit
+0x000 SystemVaAllocated : Pos 28, 1 Bit
+0x000 PreferredFsCompressionBoundary : Pos 29, 1 Bit
+0x000 UsingFileExtents : Pos 30, 1 Bit
+0x000 PageSize64K : Pos 31, 1 Bit
There’s a few flags of interest here: Image, File and Based. The Based field indicates wherever the section object has a direct correspondence with the offsets of the file stored on the disk. To clarify, if I accessed the file through its section object, then I should expect to find the same data at n offset in memory, as if it were the contents were on the physical disk. This rule is followed by file views which are backed by data files.
The first two flags indicate wherever a file-backed mapping is for an ordinary data file or an executable image (SEC_IMAGE is set when calling CreateFileMapping), this distinction is important because it leads to the introduction of another structure which hasn’t been discussed yet and that is _SECTION_OBJECT_POINTERS. This structure is part of the _FILE_OBJECT structure and is used to indicate which type of control area is being used for the mapping.
0: kd> dt _SECTION_OBJECT_POINTERS
nt!_SECTION_OBJECT_POINTERS
+0x000 DataSectionObject : Ptr64 Void
+0x008 SharedCacheMap : Ptr64 Void
+0x010 ImageSectionObject : Ptr64 Void
The simplest method to find a set of control areas is to use the !ca debugger command, like so:
0: kd> !ca
Scanning large pool allocation table for tag 0x61436d4d (MmCa) (ffff808f91010000 : ffff808f91190000)
ffff808f8dd43000 0000000000000000 1182 0 Pagefile-backed section
ffff808fa55bc000 0000000000000000 0 0 Pagefile-backed section
Searching nonpaged pool (ffff808000000000 : ffff908000000000) for tag 0x61436d4d (MmCa)
[...]
Scanning large pool allocation table for tag 0x69436d4d (MmCi) (ffff808f91010000 : ffff808f91190000)
ffff808fb16f0d40 00007ffff1e00000 0 0 Image: \Windows\SystemApps\MicrosoftWindows.Client.CBS_cw5n1h2txyewy\InputApp.dll
ffff808fb3cebaf0 00007fffe21b0000 0 0 Image: \Riot Games\VALORANT\live\Engine\Binaries\ThirdParty\CEF3\Win64\libGLESv2.dll
ffff808fb0065b50 0000000072ca0000 0 0 Image: \Windows\SysWOW64\sppc.dll
ffff808facbd1200 fffff80358a40000 0 0 Image: \Windows\System32\drivers\rassstp.sys
ffff808facbd1580 fffff8034b950000 0 0 Image: \Windows\System32\drivers\ndproxy.sys
ffff808facbd1900 fffff80348f80000 0 0 Image: \Windows\System32\drivers\mrxsmb20.sys
ffff808fad4d7c10 00007ff821b90000 0 0 Image: \Windows\System32\dwmredir.dll
ffff808fa59990b0 00007ff817940000 0 0 Image: \Windows\System32\ProximityServicePal.dll
0: kd> !ca ffff808fb3cebaf0
ControlArea @ ffff808fb3cebaf0
Segment ffffbe84dc30add0 Flink ffff808fb9d79900 Blink ffff808fabbebcd0
Section Ref 0 Pfn Ref 122 Mapped Views 2
User Ref 2 WaitForDel 0 Flush Count bd68
File Object ffff808fb3b874e0 ModWriteCount 0 System Views 1da7
WritableRefs 40003d PartitionId 0
Flags (a0) Image File
\Riot Games\VALORANT\live\Engine\Binaries\ThirdParty\CEF3\Win64\libGLESv2.dll
Segment @ ffffbe84dc30add0
ControlArea ffff808fb3cebaf0 BasedAddress 00007fffe21b0000
Total Ptes 3cb
Segment Size 3cb000 Committed 0
Image Commit f Image Info ffffbe84dc30ae18
ProtoPtes ffffbe84db2df000
Flags (8e0000) ProtectionMask
Subsection 1 @ ffff808fb3cebb70
ControlArea ffff808fb3cebaf0 Starting Sector 0 Number Of Sectors 2
Base Pte ffffbe84db2df000 Ptes In Subsect 1 Unused Ptes 0
Flags 2 Sector Offset 0 Protection 1
By examining the _CONTROL_AREA structure we can see a few other important structures which we haven’t covered thus far, one of these is the _MAPPED_FILE_SEGMENT (page-backed sections use _SEGMENT) structure which is allocated from paged pool and is used to keep track of the prototype PTEs which are used to keep track of this mapping. These can be found as part of an array in the ProtoPtes field. So, what are prototype PTEs and why are they necessary?
Since multiple processes can be sharing the same page, it’s possible for the state of that page in the working set of each process to differ – we’ll cover the working set in the next post of this series – and therefore cause a plethora of problems. For example, if a single PTE was to be used amongst multiple processes, if the page were to be removed from one working set, then it would removed from all the working sets of the processes which are using that page, thus invalidating the page in physical memory as well.
A prototype PTE is very much the same as usual hardware PTE, with one key difference: a prototype PTE doesn’t point to a page and is not used directly in address translation, but rather it points to the “real” PTE which has the actual page state of the page.
0: kd> dt _MMPTE_PROTOTYPE
nt!_MMPTE_PROTOTYPE
+0x000 Valid : Pos 0, 1 Bit
+0x000 DemandFillProto : Pos 1, 1 Bit
+0x000 HiberVerifyConverted : Pos 2, 1 Bit
+0x000 ReadOnly : Pos 3, 1 Bit
+0x000 SwizzleBit : Pos 4, 1 Bit
+0x000 Protection : Pos 5, 5 Bits
+0x000 Prototype : Pos 10, 1 Bit
+0x000 Combined : Pos 11, 1 Bit
+0x000 Unused1 : Pos 12, 4 Bits
+0x000 ProtoAddress : Pos 16, 48 Bits
You may noticed that each control area consists of a number of _SUBSECTION structures, there is typically only subsection for page-file backed and file-backed sections which are data files. If the file were to grow beyond the initial subsection size, then additional subsections will be automatically allocated to that section object in order to accommodate this. On the other hand, with executable images, a subsection is created for each section of the PE format.
0: kd> dt _SUBSECTION
nt!_SUBSECTION
+0x000 ControlArea : Ptr64 _CONTROL_AREA
+0x008 SubsectionBase : Ptr64 _MMPTE
+0x010 NextSubsection : Ptr64 _SUBSECTION
+0x018 GlobalPerSessionHead : _RTL_AVL_TREE
+0x018 CreationWaitList : Ptr64 _MI_CONTROL_AREA_WAIT_BLOCK
+0x018 SessionDriverProtos : Ptr64 _MI_PER_SESSION_PROTOS
+0x020 u : <anonymous-tag>
+0x024 StartingSector : Uint4B
+0x028 NumberOfFullSectors : Uint4B
+0x02c PtesInSubsection : Uint4B
+0x030 u1 : <anonymous-tag>
+0x034 UnusedPtes : Pos 0, 30 Bits
+0x034 ExtentQueryNeeded : Pos 30, 1 Bit
+0x034 DirtyPages : Pos 31, 1 Bit
The ControlArea field merely points back to the control area which this subsection belongs to; the NextSubsection field contains the next sibling subsection and the SubsectionBase contains first prototype PTE for the subsection which is technically an _MMPTE_SUBSECTION, although, there is no real difference between that and _MMPTE_PROTOTYPE. The DirtyPages flag indicates wherever the subsection contains any data which needs to be discarded or flushed back to the disk.
Now, we’ve covered what subsections are, you may have noticed that the VAD structure contains a pointer to a subsection structure too, the Subsection property in _MMVAD is the first subsection which the VAD range corresponds to, if the VAD is being used against a memory-mapped file.
In the next post of this series, we’ll start looking at working sets and page states.
References:
https://flylib.com/books/en/3.169.1.64/1
https://www.codemachine.com/articles/prototype_ptes.html
https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/section-objects-and-views
What Makes It Page? The Windows 7 (x64) Virtual Memory Manager