The TLB Cache is very much a key part for the necessary performance of Virtual to Physical Address Translation. It’s main purpose is to improve the performance of Virtual Address Translation. All modern CPUs and their MMUs (Memory Management Units) support the use of the TLB.
A important aspect to understand, is the difference between TLB Hit and TLB Miss. When a Virtual Address is accessed, and then looked up, the TLB Cache is checked first to see if the Virtual-Physical Address mapping is present and if so then a TLB Hit occurs. On the other hand, if the address isn’t present, then a TLB Miss occurs and the MMU is forced to execute a Page Walk which is the process of looking through the Page Tables like discussed in my previous blog posts. Once, the Page Walk has completed and the physical address is found, then this information is loaded into the TLB Cache.
If Page Walk is unsuccessful, then a Page Fault is raised, and the Virtual to Physical Address Mapping is created in the Page Table. Generally, any changes to the Page Table Structure and the Paging Structure will resulting the flushing of the TLB with a Inter Processor Interrupt (IPI). This one of the reasons why you will tend to notice TLB flushing and IPIs with Stop 0x101 bugchecks.
The flushing of the TLB Cache can be achieved by reloading the CR3 (Page Directory Base Register), there is a easier method which I will explain too.
Here is a small segment of Assembly code from OSDev Wiki:
However, if the G flag has been set for a PTE or PDE, then that entry will not be flushed from the TLB Cache.
The other method would be to use the
invlpg (Invalidate TLB Entry) instruction. This instruction is a privileged instruction, and therefore the CPL (Current Privilege Level) must be Level 0. This instruction also flushes or invalidates an entry for a specific page, and therefore is much more suited if you wish to only flush a certain entry. Although, in some circumstances, it may flush the entire TLB or multiple entries, there is the guarantee though that it will flush the entries associated with the current PCID (Process Context Identifier). See Volume 3: Section 4.10 in Intel Developer's Manual.
You can check if PCIDs are enabled by checking the 17th bit of the CR4 register.
The above example is from a AMD CPU, and I don’t think AMD yet supports PCIDs. We could also use the j command with the addition of the .echo command, as seen below:
Getting back on topic back the TLB Cache, each entry is associated with a tag. The tag contains important information such as the part of the virtual address, physical page number, protection bits, valid bit and dirty bit. A Virtual address being checked, and is then matched against a tag within the TLB Cache. The 8 bit ASID (Address Space Identifier) is used to from part of the tag. The ASID part is the is matched between the TLB Entry and the Virtual Address (PTE).
Context Switches and Task Switches can invalidate TLB Entries, since the mappings will be different.
Look Aside Lists
The second part of my blog post will concern Look Aside Lists. Look Aside Lists are a type of pool allocation algorithm, although, the difference is that Look Aside Lists have fixed sizes and do not use spinlocks. They are also based around Singly Linked Lists, using a LIFO order.
Device Drivers and parts of the operating system (I/O Manager, Cache Manager and Object Manager) can build their own Look Aside Lists. The Executive versions of the Look Aside List are managed per a processor (see _KPRCB). These look-asides lists are managed by the operating system. Each Look-Aside List can be allocated with Paged Pool or Non-Paged Pool respectively. The operating system will increase the number of allocations to a Look Aside List if the demand is great, and thus the number of entries. The opposite is true if demand is low.
Furthermore, Look Aside Lists are managed automatically by the Balance Set Manager every second with the call from ExAdjustLookAsideDepth (reserved for System Use). This only happens if no allocations have happened recently. The Look Aside List depth will never drop below 4.
The main purpose of Look Aside Lists is when a device driver is going to be using specific sized pool blocks frequently. Each allocation is known as a entry, depending if the pool allocation type, the data structure used will either be _PAGED_LOOKASIDE_LIST or _NPAGED_LOOKASIDE_LIST.
Firstly, a Look Aside List is created by calling ExInitializeLookasideListEx, just a side note, before I continue writing, all the said functions should be fully documented in the WDK. The function mentioned, creates data structure called _LOOKASIDE_LIST_EX.
The data structure contains a pointer to a larger data structure called _GENERAL_LOOKASIDE_LIST which retains the information about the current Look Aside List.
The type of pool being used for the Look Aside List will depend upon if the driver is going to need to access the entries at IRQL Level 2 or IRQL Level 1. Obviously, Non-Paged Pool will need to be used if the driver is going to need to use the entries at IRQL Level 2, and Paged Pool will need to be used if the driver is not going to access these entries at any level above IRQL Level 1.
The Allocate and Free fields are used as function pointers to the functions in which you wish to use to allocate and free the entries in the list. The TotalAllocates and TotalFrees fields shows the total number of allocations and frees.
The Depth field contains the number of entries in the list.
The SingleListHead.Next field is used to point to the next free pool chunk.
We can use the !lookaside extension to see the efficiency and information about system lookaside lists.