Some Notes on the Goldbach Conjecture

The Goldbach Conjecture is a very simple conjecture to understand providing you have some basic knowledge about primes. It is one of those conjectures which is very easy to comprehend, but fiendishly difficult to solve. It was first conceived during a letter written by Goldbach to Euler in 1742. There is currently a strong and weak version of the Goldbach Conjecture, with the weak conjecture actually being the original conjecture.

goldbach_conjectures

xkcd: Goldbach Conjectures

The original conjecture asks to prove that all positive integers greater than 2 are equivalent to the sum of three primes, the only problem with the original conjecture, is that Goldbach considered the unit “1” to be prime, which has now been dismissed in Mathematics. However, the conjecture has been restated by Euler to the following modern equivalent, which is known as the strong Goldbach Conjecture (sometimes referred to as the binary Goldbach Conjecture):

Strong Goldbach Conjecture: Every positive even integer greater than or equal to 4, can be expressed as the sum of two primes.

The conjecture can also be expressed using the totient function:

\phi (p) + \phi (q) = 2m,

where m is a positive integer and p and q are both prime numbers. The totient function can be defined as the following (p is prime):

\phi(p) = p - 1

The weaker version of the Goldbach conjecture asks a slightly different question, and is as follows:

Weak Goldbach Conjecture: Every odd integer greater than or equal to 9, can be expressed as the sum of three odd primes.

Similarly, Levy’s Conjecture claims that all odd integers greater than or equal to 7, can be expressed the sum of prime and twice a prime.

p + 2q = n, where n is a positive integer greater than or equal to 7, and p and q are both primes.

Goldbach Number:

Goldbach numbers are any positive even integers which are the sum of two odd primes.

Goldbach Partition:

The Goldbach Partition is the number of ways in which a positive even integer n, can be written as the sum of two primes. The function which produces these partitions is called the Goldbach Partition function G(n). It is defined in the following manner:

G(n) = #{(p, q) | n = p + q, p \le q}

 Methods for proving the Goldbach Conjecture:

The two most common methods which have been used for solving the Goldbach Conjecture involve the Hardy-Littlewood Method along with a number of different sieving methods which have been used to produce greater asymptotic bounds on the distribution of which natural numbers can be written as Goldbach Partitions.

References:

Goldbach Conjecture – Wolfram MathWorld

Goldbach Partition – Wolfram MathWorld

Goldbach Number – Wolfram MathWorld

On Partitions of Goldbach’s Conjecture (Woon)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Posted in Uncategorized | 1 Comment

Project Euler: Problem 29

Problem 29 is actually a very easy problem to solve, but I will not provide the direct code, however, instead will provide the answer to the problem along with some insight into how to solve the problem using Java or a similar programming language.

The solution for the problem is 9183.

There are three key constructs which you will need to include in your code, and these are:

  • Nested For-loop
  • Power function
  • A collection to hold the sequence of numbers generated from the power function.

If you use the above programming constructs within your problem, then you’ll the solution to the problem almost instantly. A important point to remember is to consider the size of the numbers which will be generated, and which data types will be able to hold such large numbers (integers won’t work!).

 

 

 

Posted in Uncategorized | Leave a comment

Understanding Atom Tables

Atom Tables have been a structure on Windows which I wanted to investigate for a while, but only have managed to find the time to write about now.

Atom Tables enable strings to be encoded by with a 16-bit integer (an atom) which can then be shared among different processes. The encoded strings stored within the table are then referred to as atom names. There is two forms of Atom Table: Global Atom Table defined for all processes on the system and Local Atom Table which is defined privately for process which created the table.

Following from this, there are two types of Atom; String Atoms and Integer Atoms. The type which I had mentioned in the first paragraph refers to String Atoms.

For a String Atom, a string is passed to GlobalAddAtom or AddAtom, either function then returns a 16-bit integer which can be used to access the string stored within the atom table. We can use the GlobalFindAtom or FindAtom functions to do so.

Properties of String Atoms:

  • The values returned are within the range of 0xC000 to 0xFFFF.
  • The string can be no longer than 255 bytes in size.
  • The reference count associated with each atom name, is incremented when the string is added to the table, and decremented when a string is removed from the table. The atom is only removed from the atom table when the atom name reference count has decremented to 0.

Properties of Integer Atoms:

  • The values given are within the range of 0x0001 to 0xBFFF.
  • There is no reference count or storage limitations associated.

The Atom Table has the following structure:

0: kd> dt _RTL_ATOM_TABLE
nt!_RTL_ATOM_TABLE
+0x000 Signature : Uint4B
+0x004 ReferenceCount : Int4B
+0x008 PushLock : _EX_PUSH_LOCK
+0x010 ExHandleTable : Ptr64 _HANDLE_TABLE
+0x018 Flags : Uint4B
+0x01c NumberOfBuckets : Uint4B
+0x020 Buckets : [1] Ptr64 _RTL_ATOM_TABLE_ENTRY

Each entry within the atom table is then represented with the following structure:

0: kd> dt _RTL_ATOM_TABLE_ENTRY
nt!_RTL_ATOM_TABLE_ENTRY
   +0x000 HashLink         : Ptr64 _RTL_ATOM_TABLE_ENTRY
   +0x008 HandleIndex      : Uint2B
   +0x00a Atom             : Uint2B
   +0x010 Reference        : _RTL_ATOM_TABLE_REFERENCE
   +0x028 NameLength       : UChar
   +0x02a Name             : [1] Wchar

You can dump a Local Atom Table with the !atom extension, and dump a Global Atom Table with the !gatom extension. Here is an example of !gatom:

6: kd> !gatom

Global atom table c00c( 1) = Protocols (18) pinned
c031( 1) = Button (12)
c042( 1) = msctls_updown32 (30)
c00d( 1) = Topics (12) pinned
c00e( 1) = Formats (14) pinned
c024( 1) = TaskbandHWND (24)
c04b( 1) = SysIPAddress32 (28)
c023(47) = OleEndPointID (26)
c04a( 1) = RichEdit20W (22)
c02b( 1) = _IDH_ALTTAB_STICKY_PREV (46)
c032( 1) = Static (12)
c02d( 1) = _IDH_WINALTTAB_PREV (38)
c045( 1) = SysTreeView32 (26)
c018( 1) = UxSubclassInfo (28) pinned
c03c( 1) = msctls_hotkey32 (30)
c044( 1) = tooltips_class32 (32)
c046( 1) = SysMonthCal32 (26)
c047( 1) = SysDateTimePick32 (34)
c007( 1) = StdShowItem (22) pinned
c053( 1) = ClipboardDataObjectInterface (56)
c011( 1) = True (8) pinned
c03f( 1) = SysListView32 (26)
c04c( 1) = UIAE{46a56456-04b6-4a08-a2bd-9a764fc58246} (84)
c010( 1) = EditEnvItems (24) pinned
c01c( 1) = PROGMAN (14)
c02a( 1) = _IDH_ALTTAB_STICKY_NEXT (46)
c04e( 1) = MozillaPluginWindowPropertyAssociation (76)
c012( 1) = False (10) pinned
c02c( 1) = _IDH_WINALTTAB_NEXT (38)
c03e( 1) = msctls_trackbar32 (34)
c015( 1) = Close (10) pinned
c021( 4) = AllowConsentToStealFocus (48)
c038( 1) = ToolbarWindow32 (30)
c04d( 2) = MSAA_*FCFFFFFF00000000 (44)
[...]

The first column is the entry within the atom table, the number represents the unique 16-bit decimal representation of the string assigned to that atom; I believe the second column is the the number entries within that bucket and the last column is the number of references. Atom Tables are based upon the Hash Table data structure.

References:

Posted in Computer Science, WinDbg, Windows Internals | Leave a comment

Learning the Importance of Mathematics

Tonight, I thought I would write about how I came to become so interested in Mathematics; and the importance of Mathematics in scienfic disciplines; and more importantly the importance of Mathematics in our everyday life.

Mathematics, in my opinion, is a universal language which can be understood by everyone providing they know how to write and read Mathematics; it is very much like how English has become the universal language within Business and Latin has become the universal language when classifying new species; Mathematics is the fundamental language of scienfic disciplines and the language which we speak everyday.

I would not consider myself to be a Mathematician of any sorts, since quite honestly, I do not like every area of Maths. I loathe Differential Equations and Calculus. However, the main areas of interest to me are the following: Graph Theory, Number Theory, Set Theory/Logic and Abstract Algebra. There is something about these areas in which I have a particular interest with.

Alan Turing, one of my heroes and favorite Mathematicians.

Alan Turing, one of my heroes and favorite Mathematicians.

I never really appreciated or learned Maths until after leaving secondary school at the age of 15. The culture of the school at the time, was that Maths is one of those subjects which is useless and abstract and a pass in the subject would be suffice. Ironically, I had a deep interest in the Natural Sciences (Biology, Chemistry and Physics), achieving two A*s and an A. I only bothered to pass my Maths class since I adopted the culture which was a popular at my school, a serious mistake which I regret and wish I could reverse.

I believe Mathematics is poorly taught at schools – it certainly was at my school – and the importance of Mathematics isn’t fully expressed by teachers, ask any person who doesn’t have an interest in Maths, and I guarantee that if you asked them, what do you think Maths is about? They will immediately answer with either Numbers or Equations. This is not what Mathematics is about, it is about rigorous and rational thinking; an application of intuition and abstract though to find solutions to complex puzzles which provide important applications to our understanding of our beautiful and weird Universe.

Once we overcome the stereotypical perception that Maths can only be exploited by intellectuals and academics, we will finally begin to overcome the bottleneck created by ourselves, to enable everyone learn, appreciate and apply Mathematics to their everyday life. Mathematics can be learned and appreciated by anyone, it is finding that dedication, enthusiasm and motivation to begin exploring a very interesting subject!

 

 

Posted in Mathematics | Leave a comment

The Fundamental Laws of Logic

The Fundamental Laws of Logic or the Classic Laws of Thought may be redundant in some respects; the introduction of propositional logic and fuzzy logic has meant that some of the axioms (some disrupted to be axioms) these laws are not always applicable when discussing situations regarding logic like the construction of mathematical proofs. The three laws are still used in most systems of logic, and consist of the following:

Law of Contradiction:

Let P be any proposition, then the law of contradiction states that:

\forall P, \sim (P \wedge \sim P)

The above statement states that it is impossible for both P and the negation of P to both be true at the same time. For example, an even number can’t also be odd (or non-even), it is either even or odd.

Law of Excluded Middle:

Let P be any proposition, then the law of excluded middle states that:

P \vee \sim P

The mentioned statement asserts that the proposition P is either true or it is false, it is not possible for both P and the negation of P to be both true.

Principle of Identity:

Let X be some proposition or variable, then the principle of identity asserts that:

\forall X, X = X

The principle of identity implies that for all instances of the object named X, each instance is exactly the same as each other.

Extensions to the Fundamental Laws of Logic

There have been some additions by Logicians to the traditional laws of thought, Leibniz added two of these laws, which are as follows:

Principle of Sufficient Reason:

The principle of sufficient reason is simple and self explanatory within the name; it states that nothing is without reason, and thus every event has a cause. This forms the basis of our rational thinking and of scientific method.

In some aspects, the principle of sufficient reason can be used to contest the notion of free will; the paradox of free will indicates that while we are given free will, we are not truly free since God created us and the rules which are used to control the game known as life.

The principle additionally shows us that for something to not exist, then there must be a reason for that object’s non existence; the principle also has given rise to the philosophical concept of Necessitarianism.

Identity of Indiscernibles:

This is slightly more interesting than Leibniz’s other principle, it implies that no two distinct objects are exactly identical, even though that it may appear to be that way. This statement can be summarised in the following formula:

\forall P (Px \leftrightarrow Py) \rightarrow x = y, where P denotes the property.

The above principle can be counter intuitive in some circumstances, especially when considering hypothetical situations. For instance, what if the property which distinguishes the two objects is shared by both of the objects?

————————————————

In the future, I’m intending to write some more philosophical posts regarding Logic and Mathematics, however, due to other commitments it may be a while before anything else is written on this blog.

 

 

 

 

 

 

 

 

 

 

 

Posted in Uncategorized | Leave a comment

The Rise of Fake BSODs

There has recently been an increasingly number of occurrences of fake BSODs, all of which appear to be malware related. The BSODs are usually poorly designed and easy to spot if you have any experience with Windows or debugging BSODs.

blue screenHowever, the issue becomes rather alarming since many users will not understand the difference between a true BSOD and a fake BSOD, this can lead to several problems for the user.

Furthermore, it has also been found that these fake BSODs do not create any form of dump file in the following two directories:

C:\Windows\MEMORY.DMP

For Minidumps, the directory path is:

C:\Windows\Minidump

It is important for users to understand that a true BSOD will never ask you to contact a Microsoft technician or provide any guidance on how to contact that said technician. The latest BSOD screen designed by Microsoft can be found below:

bsodFortunately, most of these BSODs do not actually commit any malicious actions with regards to the system, but are mostly intended as poor phishing attempts to scare users into contacting a certain telephone number.

The main purpose of this post was simply to increase the awareness of these fake BSODs, and thus to conclude, please find a list of some example of these scams:

Malwarebytes have provided a blog post which explains the internals of one of the fake BSODs – TechSupportScams And The Blue Screen of Death

 

 

Posted in Uncategorized | Leave a comment

The Complete Debugging Guide to Stop 0x124 – Part 3

In the previous two parts, we examined error packets and error records, now we will begin to discuss the debugging methodology involved with a Stop 0x124 bugcheck, and how to gather useful debugging information from the dump file using WinDbg. I’ve split this final part into two sections: processor type errors and PCIe type errors, since these are both the most common errors you’ll experience when debugging a Stop 0x124 bugcheck.

Processor Type Errors:

Upon loading the dump file into WinDbg, you will be greeted with the following parameters:

BugCheck 124, {0, fffffa80080d2028, f6000d80, 40150}

Probably caused by : GenuineIntel

The first parameter is the type of error source as discussed in Part 1. As mentioned previously, it is stored within the enumeration called _WHEA_ERROR_SOURCE_TYPE, and from looking at the value of the parameter we know that the error source type was a Machine Check Exception (MCE). A MCE is a troubleshooting mechanism used by the processor to report hardware errors to the operating system. It can be used to report a wide variety of errors, including cache errors, bus errors and memory errors. The most common from my experience is the that MCE reports cache errors.

The second parameter is the address of the error record, which as explained in Part 2, is represented by _WHEA_ERROR_RECORD. This is the most important parameter of both of the bugcheck types. We will be using the !errrec extension to dump this structure and examine the sections which were also discussed in Part 2.

The third and fourth parameters are the higher and lower bits of the MCi_STATUS registers, which do not have any significant additional debugging value, apart from self interest of the CPU architecture. If you wish, you can dump the contents in WinDbg using the following:

0: kd> dt hal!_MCi_STATUS
   +0x000 McaErrorCode     : Uint2B
   +0x002 ModelErrorCode   : Uint2B
   +0x004 OtherInformation : Pos 0, 23 Bits
   +0x004 ActionRequired   : Pos 23, 1 Bit
   +0x004 Signalling       : Pos 24, 1 Bit
   +0x004 ContextCorrupt   : Pos 25, 1 Bit
   +0x004 AddressValid     : Pos 26, 1 Bit
   +0x004 MiscValid        : Pos 27, 1 Bit
   +0x004 ErrorEnabled     : Pos 28, 1 Bit
   +0x004 UncorrectedError : Pos 29, 1 Bit
   +0x004 StatusOverFlow   : Pos 30, 1 Bit
   +0x004 Valid            : Pos 31, 1 Bit
   +0x000 QuadPart         : Uint8B

You’ll have to dump the other values using the .formats command and then comparing the MCi_STATUS structure to the bit values.

I’ve highlighted the GenuineIntel string since some users make the mistake of assuming automatically that the processor is at fault, and this is simply not true. The string is used to identify if the system is using a real Intel processor. For informational purposes, the string can be found in the following structure:

0: kd> dt nt!_KPRCB -y VendorString
   +0x4bb8 VendorString : [13] UChar

From this particular example, we can find the address of the PRCB by using the !prcb extension and then using the given address on the above mentioned structure.

0: kd> !prcb
PRCB for Processor 0 at fffff780ffff0000:
Current IRQL -- 15
Threads--  Current fffffa8007618b50 Next 0000000000000000 Idle fffff8000345fcc0
Processor Index 0 Number (0, 0) GroupSetMember 1
Interrupt Count -- 000b8afa
Times -- Dpc    00000100 Interrupt 00000023 
         Kernel 0002724c User      00000698
0: kd> dt nt!_KPRCB -y VendorString fffff780ffff0000
   +0x4bb8 VendorString : [13]  "GenuineIntel"

Okay, we have now established the meaning of the parameters, and have discovered that error was reported the processor through MCE. We will now need to dump the error record.

0: kd> !errrec fffffa80080d2028
===============================================================================
Common Platform Error Record @ fffffa80080d2028
-------------------------------------------------------------------------------
Record Id     : 01d090ac66cc4cb7
Severity      : Fatal (1)
Length        : 928
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 5/17/2015 15:10:51 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ fffffa80080d20a8
Section       @ fffffa80080d2180
Offset        : 344
Length        : 192
Flags         : 0x00000001 Primary
Severity      : Fatal

Proc. Type    : x86/x64
Instr. Set    : x64
Error Type    : Cache error
Operation     : Instruction Execute
Flags         : 0x00
Level         : 0
CPU Version   : 0x00000000000106e5
Processor ID  : 0x0000000000000000

===============================================================================
Section 1     : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor    @ fffffa80080d20f0
Section       @ fffffa80080d2240
Offset        : 536
Length        : 128
Flags         : 0x00000000
Severity      : Fatal

Local APIC Id : 0x0000000000000000
CPU Id        : e5 06 01 00 00 08 10 00 - fd e3 98 00 ff fb eb bf
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0  @ fffffa80080d2240

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ fffffa80080d2138
Section       @ fffffa80080d22c0
Offset        : 664
Length        : 264
Flags         : 0x00000000
Severity      : Fatal

Error         : ICACHEL0_IRD_ERR (Proc 0 Bank 2)
  Status      : 0xf6000d8000040150
  Address     : 0x00000000004b6990
  Misc.       : 0x0000000000000000

The most important section is Section 2: x86/x64 MCA, since this section contains data specific to the data which the MCE would have reported to WHEA. We that error severity was fatal, thus leading to the creation of the bugcheck to begin with. I’ll go back to Section 0 in a moment. In Section 2, the Error field contains a mnemonic to type of error which is shown in Section 0. The mnemonic can be deciphered using the Intel processor documentation from following page 2352.

There is 4 different error classifications if you don’t consider the generic processor error type. These error classifications come to form what is known as a compound error code. In our example I’ve highlighted the sections which can vary and take different values depending upon the situation.

The current compound error classifications are:

Type Interpretation
Generic Cache Hierarchy Generic Cache Hierarchy Error
TLB Errors {TT}TLB{LL}_ERR
Memory Controller Errors {MMM}_CHANNEL_{CCCC}_ERR
Cache Hierarchy Errors {TT}CACHE{LL}_{RRRR}_ERR
Bus and Interconnect Errors BUS{LL}_{PP}_{RRRR}_{II}_{T}_ERR

I’ve taken Table 15-9 (Section 15.9.2) from the Intel documentation and cut it down into two sections, for our example we will be looking at the Cache Hierarchy Errors. Just to clarify, when dumping the error record and looking at the CPU mnemonic, attempt to decipher which error interpretation should be applied so your able to investigate the cause of the error in greater depth. The following mnemonics have been copied from the Intel documentation, I’ve added the section numbers if you wish to check yourself.

The {TT} variable is known as the Transaction Type (Section 15.9.2.2) and can take the following values:

  • I = Instruction
  • D = Data
  • G = Generic

In our particular type, the transaction type was an instruction, so we know that the error was based around the execution of some instruction.

The {LL} variable is known as the Level (Level of the Memory Hierarchy (Section 15.9.2.3)) and points to the type of cache which has experienced the error condition. It can take the following values:

  • L0 = Level 0
  • L1 = Level 1
  • L2 = Level 2
  • LG = Level Generic

The {RRRR} is known as the Request Type (Section 15.9.2.4) field and indicates the type of instruction or action which was being carried out at the time of the error. The variable can take the following values:

  • ERR = Generic Error
  • RD = Generic Read
  • WR = Generic Write
  • DRD = Data Read
  • DWR = Data Write
  • IRD = Instruction Fetch
  • PREFETCH = Prefetch
  • EVICT = Eviction
  • SNOOP = Snoop

I will quickly provide the meanings of the other sub-fields to save having to trawl through the Intel documentation. The {MMM} and {CCCC} fields primarily apply to Memory Controller errors. {MMM} is a 3-bit field called the Memory Transaction Type, whereas, {CCCC} is a 4-bit field for Channels. The memory controller error mnemonics can be found in Section 15.9.2.5.

The {MMM} field has the meanings of:

  • GEN = Generic undefined request
  • RD = Memory Read Error
  • WR = Memory Write Error
  • AC = Address/Command Error
  • MS = Memory Scrubbing Error

The {CCCC} field has one meaning which is CHN corresponds to the Channel Number.

Bus and Interconnect errors have three additional fields called {PP} for Participation; {T} for Timeout and {II} for I/O or Memory. The bus and interconnect error mnemonics can be found in Section 15.9.2.6.

{PP} defines how the processor participated within the request, and thus:

  • SRC = Local processor originated request
  • RES = Local processor responded to the request
  • OBS = Local processor observed the error as a third party

{T} defines if the processor requested for a timeout of the error or not:

  • TIMEOUT = Request timed out
  • NOTIMEOUT = Request didn’t time out

{II} defines the processor bus asked for memory access or I/O access.

  • M = Memory Access
  • I/O = IO

We have examined the error condition and have a good understanding of what the error is pertains to, however, we will need to gather some general hardware information to either check for patches or to provide greater troubleshooting information to the hardware manufacturer. There are several WinDbg extensions which enable us to achieve this.

2: kd> !sysinfo machineid
Machine ID Information [From Smbios 2.7, DMIVersion 39, Size=3456]
BiosMajorRelease = 4
BiosMinorRelease = 6
BiosVendor = American Megatrends Inc.
BiosVersion = 1005
BiosReleaseDate = 10/11/2012
SystemManufacturer = System manufacturer
SystemProductName = System Product Name
SystemFamily = To be filled by O.E.M.
SystemVersion = System Version
SystemSKU = SKU
BaseBoardManufacturer = ASUSTeK COMPUTER INC.
BaseBoardProduct = P8H77-M LE
BaseBoardVersion = Rev X.0x

The !sysinfo machineid extension can give motherboard and BIOS information, from here I would be able to check the motherboard documentation to ensure that the hardware is compatible and if any patches have been released to resolve any issues the user may be experiencing.

2: kd> !sysinfo cpuspeed
CPUID:        "Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz"
MaxSpeed:     3200
CurrentSpeed: 3192

The !sysinfo cpuspeed extension enables us to iinvestigate the clockspeed of the processor and if the user has been overclocking their processor. As commonly stated, overclocking can use system instablity and produce excessive heat production which could be affecting the normal operation of the system.

2: kd> !sysinfo cpuinfo
[CPU Information]
~MHz = REG_DWORD 3192
Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0
Identifier = REG_SZ Intel64 Family 6 Model 58 Stepping 9
ProcessorNameString = REG_SZ Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
Update Signature = REG_BINARY 0,0,0,0,12,0,0,0
Update Status = REG_DWORD 2
VendorIdentifier = REG_SZ GenuineIntel
MSR8B = REG_QWORD 1200000000

The !sysinfo cpuinfo extension provides some greater depth into the processor family and model, which is great when reporting a bug to Intel or AMD, since they will be to investigate the issue further and provide any insight into if the error is specific to a certain processor type. You may also noticed that I’m using a different dump file from previous part of this tutorial too!

The !sysinfo microcode provides similar information to the previous extension:

2: kd> !sysinfo cpumicrocode
Initial Microcode Version: 00000012:00000000
 Cached Microcode Version: 00000012:00000000
         Processor Family: 06
          Processor Model: 3a
       Processor Stepping: 09

Again, simply as personal preference, you will wish to dump the model information about all the processors on the system with !cpuinfo:

2: kd> !cpuinfo
CP  F/M/S Manufacturer  MHz PRCB Signature    MSR 8B Signature Features
 0  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
 1  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
 2  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
 3  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
                      Cached Update Signature 0000001200000000
                     Initial Update Signature 0000001200000000

Alternatively, you could use !cpuid which provides the exact same information:

2: kd> !cpuid
CP  F/M/S  Manufacturer     MHz
 0  6,58,9  GenuineIntel    3192
 1  6,58,9  GenuineIntel    3192
 2  6,58,9  GenuineIntel    3192
 3  6,58,9  GenuineIntel    3192

You can gather temperature information about the system through the use !tz and !tzinfo extensions, I won’t directly discuss the purpose of thermal zones and how they work in this tutorial since it would needlessly go out of scope and produce another page of writing. You can find more information about thermal zones in the ACPI documentation or through this discussion thread created by myself and Patrick when we first discovered the extensions.

2: kd> !tz
0 - ThermalZone @ 0xfffffa8004073310
  State:         Read                Flags:              0x00000002 Initialized
  Mode:          Active              PendingMode:        Active  
  ActivePoint:   0x00000002          PendingActivePoint: 0x00000002
  Throttle:      0x00000064
  SampleRate:    0x00000000          ThrottleReasons:    0
  LastTime:      0x0000000000000000  LastTemp:           0x00000000 (0.0K)
  PassiveTimer:  0xfffffa8004073340
  PassiveDpc:    0xfffffa8004073380
  OverThrottled: 0xfffffa80040733c0
  Irp:           0xfffffa8004680c80
  Device:        0x00000000
  Thermal Info:  0xfffffa80040733e0
1 - ThermalZone @ 0xfffffa8003679310
  State:         Read                Flags:              0x00000002 Initialized
  Mode:          Active              PendingMode:        Active  
  ActivePoint:   0x00000000          PendingActivePoint: 0x00000000
  Throttle:      0x00000064
  SampleRate:    0x00000000          ThrottleReasons:    0
  LastTime:      0x0000000000000000  LastTemp:           0x00000000 (0.0K)
  PassiveTimer:  0xfffffa8003679340
  PassiveDpc:    0xfffffa8003679380
  OverThrottled: 0xfffffa80036793c0
  Irp:           0xfffffa8004074310
  Device:        0x00000000
  Thermal Info:  0xfffffa80036793e0

The !tzinfo extension provides information about a specific thermal zone:

2: kd> !tzinfo 0xfffffa80036793e0
ThermalInfo @ 0xfffffa80036793e0
  Stamp:         0x00000007  Constant1:  0x00000001  Constant2:   0x00000005
  Period:        0x0000000a  ActiveCnt:  0x00000000  AffinityEx:  0xfffffa80036793f0
  Current Temperature:                   0x00000bd6 (303.0K)
  Passive TripPoint Temperature:         0x00000ed0 (379.2K)
  Hibernate TripPoint Temperature:       0x00000000 (0.0K)
  Critical TripPoint Temperature:        0x00000ed0 (379.2K)

PCIe Type Errors:

As shown before, we are going to dump the error record and then examine the relevant sections. The sections displayed with PCIe crashes are generally more lengthy and complex to understand. I would advise the full use of the PCIe documentation if available.

 

3: kd> !errrec 869348d4
===============================================================================
Common Platform Error Record @ 869348d4
-------------------------------------------------------------------------------
Record Id     : 01cd07d8bce4740f
Severity      : Fatal (1)
Length        : 672
Creator       : Microsoft
Notify Type   : PCI Express Error
Timestamp     : 3/22/2012 3:06:44 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : PCI Express
-------------------------------------------------------------------------------
Descriptor    @ 86934954
Section       @ 869349e4
Offset        : 272
Length        : 208
Flags         : 0x00000001 Primary
Severity      : Recoverable

Port Type     : Root Port
Version       : 1.1
Command/Status: 0x4010/0x0507
Device Id     :
  VenId:DevId : 8086:340a
  Class code  : 030400
  Function No : 0x00
  Device No   : 0x03
  Segment     : 0x0000
  Primary Bus : 0x00
  Second. Bus : 0x00
  Slot        : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ 86934a18
  Device Caps : 00008021 Role-Based Error Reporting: 1
  Device Ctl  : 0107 ur FE NF CE
  Dev Status  : 0003 ur fe NF CE
   Root Ctl   : 0008 fs nfs cs

AER Information @ ffffffff86934a54
  Uncorrectable Error Status    : 00000020 ur ecrc mtlp rof uc ca cto fcp ptlp SD dlp und
  Uncorrectable Error Mask      : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
  Uncorrectable Error Severity  : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
  Correctable Error Status      : 00000000 adv rtto rnro dllp tlp re
  Correctable Error Mask        : 00000000 adv rtto rnro dllp tlp re
  Caps & Control                : 00000005 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
  Header Log                    : 00000000 00000000 00000000 00000000
  Root Error Command            : 00000000 fen nfen cen
  Root Error Status             : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
  Correctable Error Source ID   : 00,00,00
  Correctable Error Source ID   : 00,00,00

===============================================================================
Section 1     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ 8693499c
Section       @ 86934ab4
Offset        : 480
Length        : 192
Flags         : 0x00000000
Severity      : Informational

Proc. Type    : x86/x64
Instr. Set    : x86
CPU Version   : 0x00000000000106a5
Processor ID  : 0x0000000000000006

Section 0 of the error record is the only important section of the error record with this form of bugcheck. I’ll start with identifying the device of the bugcheck using the Vendor ID and Device ID fields. The bugcheck indicates the error occurred at the Root Port, and from the PCI Database, we know that the device was the Intel I/O Hub PCIe Root Port. Unfortunately, this information is far too generic and doesn’t point to the exact cause of the bugcheck. From here, we would need to begin checking the PCIe devices which are connected to the system and how they interact with the other components.

To gather more information regarding the error, we need to investigate the meanings of the error codes shown in the AER; it is important to remember that we will be using the _PCI_EXPRESS_ROOTPORT_AER_CAPABILITY structure. The Uncorrectable Error Status is presented by the _PCI_EXPRESS_UNCORRECTABLE_ERROR_STATUS structure, which contains the bitfields for the error codes shown in the register. If we dump the this structure, then we can see that the captialised letters correspond to the bitfields which have been set to true.

3: kd> dt pci!_PCI_EXPRESS_UNCORRECTABLE_ERROR_STATUS
   +0x000 Undefined        : Pos 0, 1 Bit
   +0x000 Reserved1        : Pos 1, 3 Bits
   +0x000 DataLinkProtocolError : Pos 4, 1 Bit
   +0x000 SurpriseDownError : Pos 5, 1 Bit
   +0x000 Reserved2        : Pos 6, 6 Bits
   +0x000 PoisonedTLP      : Pos 12, 1 Bit
   +0x000 FlowControlProtocolError : Pos 13, 1 Bit
   +0x000 CompletionTimeout : Pos 14, 1 Bit
   +0x000 CompleterAbort   : Pos 15, 1 Bit
   +0x000 UnexpectedCompletion : Pos 16, 1 Bit
   +0x000 ReceiverOverflow : Pos 17, 1 Bit
   +0x000 MalformedTLP     : Pos 18, 1 Bit
   +0x000 ECRCError        : Pos 19, 1 Bit
   +0x000 UnsupportedRequestError : Pos 20, 1 Bit
   +0x000 Reserved3        : Pos 21, 11 Bits
   +0x000 AsULONG          : Uint4B

Since the Suprise Down (SD) error bitfield is the also one which has been set, then we can investigate further into what a exactly a Surprise Down error is. In short, it indicates a loss of connection between two devices, although, I will give a slightly more detailed defintion with the use of the PCIe documentation. I’ve added the section numbers for reference.

A Surprise Down error occurs when a TLP (Transaction Layer Protocol) request packet is sent numerous times to a device across a link, and then device doesn’t respond positively. TLP’s are similar to IRPs and are present within the Transaction Layer (Section 2) of the PCIe topology, which is responsible for issuing and responding to TLPs.

For those experienced with debugging, you can imagine this situation as a Stop 0x9F, a IRP is sent but becomes stuck for some unknown reason. Once a threshold can be met, then the link is considered to be inactive or malfunctioning and thus a bugcheck is raised to alert the operating system of this error. The best methodology for this type of error would be to investigate the connections of the devices on the motherboard; check for any loosely seated cards and dust which may have built up inside the slots.

Moreover, simply as a matter of interest, the Bus Number, Device Number and Function Number are used to map a device into the PCI Configuration Space. We use can the !pci extension to view such information, but please note that you will require a live debugging session with a x86 computer.

lkd> !pci
PCI Segment 0 Bus 0
00:0  1022:1510.00  Cmd[0006:.mb...]  Sts[0220:.6...]  AMD Host Bridge  SubID:1022:1510
01:0  1002:9806.00  Cmd[0407:imb...]  Sts[0010:c....]  ATI VGA Compatible Controller  SubID:103c:3387
01:1  1002:1314.00  Cmd[0006:.mb...]  Sts[0010:c....]  ATI Class:4:3:0  SubID:103c:3387
04:0  1022:1512.00  Cmd[0004:..b...]  Sts[0010:c....]  AMD PCI-PCI Bridge 0->0x1-0x1
11:0  1002:4394.00  Cmd[0007:imb...]  Sts[0230:c6...]  ATI Class:1:6:1  SubID:103c:3387
12:0  1002:4397.00  Cmd[0016:.mb...]  Sts[02a0:.6...]  ATI USB Controller  SubID:103c:3387
12:2  1002:4396.00  Cmd[0016:.mb...]  Sts[02b0:c6...]  ATI USB2 Controller  SubID:103c:3387
13:0  1002:4397.00  Cmd[0016:.mb...]  Sts[02a0:.6...]  ATI USB Controller  SubID:103c:3387
13:2  1002:4396.00  Cmd[0016:.mb...]  Sts[02b0:c6...]  ATI USB2 Controller  SubID:103c:3387
14:0  1002:4385.42  Cmd[0403:im....]  Sts[0220:.6...]  ATI SMBus Controller  SubID:103c:3387
14:2  1002:4383.40  Cmd[0006:.mb...]  Sts[0410:c....]  ATI Class:4:3:0  SubID:103c:3387
14:3  1002:439d.40  Cmd[000f:imb...]  Sts[0220:.6...]  ATI ISA Bridge  SubID:103c:3387
14:4  1002:4384.40  Cmd[0407:imb...]  Sts[02a0:.6...]  ATI PCI-PCI Bridge 0->0x2-0x2
15:0  1002:43a0.00  Cmd[0007:imb...]  Sts[0810:c..A.]  ATI PCI-PCI Bridge 0->0x3-0x6
15:1  1002:43a1.00  Cmd[0007:imb...]  Sts[0010:c....]  ATI PCI-PCI Bridge 0->0x7-0x7
16:0  1002:4397.00  Cmd[0016:.mb...]  Sts[02a0:.6...]  ATI USB Controller  SubID:103c:3387
16:2  1002:4396.00  Cmd[0016:.mb...]  Sts[02b0:c6...]  ATI USB2 Controller  SubID:103c:3387
18:0  1022:1700.43  Cmd[0000:......]  Sts[0010:c....]  AMD Host Bridge
18:1  1022:1701.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
18:2  1022:1702.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
18:3  1022:1703.00  Cmd[0000:......]  Sts[0010:c....]  AMD Host Bridge
18:4  1022:1704.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
18:5  1022:1718.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
18:6  1022:1716.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
18:7  1022:1719.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge

The first column indicates the device number and the second column shows the function number of that particular device. We can gather even further information using the !pcitree extension which shows all the devices which have enumerated on the bus:

 

lkd> !pcitree
Bus 0x0 (FDO Ext 86688ae0)
  (d=0,  f=0) 10221510 devext 0x8665cc10 devstack 0x8665cb58 0600 Bridge/HOST to PCI
  (d=1,  f=0) 10029806 devext 0x8665c738 devstack 0x8665c680 0300 Display Controller/VGA
  (d=1,  f=1) 10021314 devext 0x866720e8 devstack 0x86672030 0403 Multimedia Device/Unknown Sub Class
  (d=4,  f=0) 10221512 devext 0x86672c10 devstack 0x86672b58 0604 Bridge/PCI to PCI
  Bus 0x1 (FDO Ext 86680d18)
    No devices have been enumerated on this bus.
  (d=11, f=0) 10024394 devext 0x86672738 devstack 0x86672680 0106 Mass Storage Controller/Unknown Sub Class
  (d=12, f=0) 10024397 devext 0x866730e8 devstack 0x86673030 0c03 Serial Bus Controller/USB
  (d=12, f=2) 10024396 devext 0x86673c10 devstack 0x86673b58 0c03 Serial Bus Controller/USB
  (d=13, f=0) 10024397 devext 0x86673738 devstack 0x86673680 0c03 Serial Bus Controller/USB
  (d=13, f=2) 10024396 devext 0x866740e8 devstack 0x86674030 0c03 Serial Bus Controller/USB
  (d=14, f=0) 10024385 devext 0x86674c10 devstack 0x86674b58 0c05 Serial Bus Controller/Unknown Sub Class
  (d=14, f=2) 10024383 devext 0x86674738 devstack 0x86674680 0403 Multimedia Device/Unknown Sub Class
  (d=14, f=3) 1002439d devext 0x866750e8 devstack 0x86675030 0601 Bridge/PCI to ISA
  (d=14, f=4) 10024384 devext 0x86675c10 devstack 0x86675b58 0604 Bridge/PCI to PCI
  Bus 0x2 (FDO Ext 86685888)
    No devices have been enumerated on this bus.
  (d=15, f=0) 100243a0 devext 0x86675738 devstack 0x86675680 0604 Bridge/PCI to PCI
  Bus 0x3 (FDO Ext 866853e0)
    (d=0,  f=0) 14e44727 devext 0x86a787c8 devstack 0x86a78710 0280 Network Controller/'Other'
  (d=15, f=1) 100243a1 devext 0x8667c0e8 devstack 0x8667c030 0604 Bridge/PCI to PCI
  Bus 0x7 (FDO Ext 8668fea8)
    (d=0,  f=0) 10ec8168 devext 0x86a7dc10 devstack 0x86a7db58 0200 Network Controller/Ethernet
  (d=16, f=0) 10024397 devext 0x8667cc10 devstack 0x8667cb58 0c03 Serial Bus Controller/USB
  (d=16, f=2) 10024396 devext 0x8667c738 devstack 0x8667c680 0c03 Serial Bus Controller/USB
  (d=18, f=0) 10221700 devext 0x8667d0e8 devstack 0x8667d030 0600 Bridge/HOST to PCI
  (d=18, f=1) 10221701 devext 0x8667dc10 devstack 0x8667db58 0600 Bridge/HOST to PCI
  (d=18, f=2) 10221702 devext 0x8667d738 devstack 0x8667d680 0600 Bridge/HOST to PCI
  (d=18, f=3) 10221703 devext 0x8667e0e8 devstack 0x8667e030 0600 Bridge/HOST to PCI
  (d=18, f=4) 10221704 devext 0x8667ec10 devstack 0x8667eb58 0600 Bridge/HOST to PCI
  (d=18, f=5) 10221718 devext 0x8667e738 devstack 0x8667e680 0600 Bridge/HOST to PCI
  (d=18, f=6) 10221716 devext 0x8667f0e8 devstack 0x8667f030 0600 Bridge/HOST to PCI
  (d=18, f=7) 10221719 devext 0x8667fc10 devstack 0x8667fb58 0600 Bridge/HOST to PCI
Total PCI Root busses processed = 1
Total PCI Segments processed = 1

The D represents the device number, the F represents the function number, and the first block of highlighted charachters indicates the Device ID with the subsequent block of characters being used to show the Vendor ID of the device.

lkd> !devext 0x8665cc10
PDO Extension, Bus 0x0, Device 0, Function 0.
  DevObj 0x8665cb58  Parent FDO DevExt 0x86688ae0
  Device State = PciStarted
  Vendor ID 1022 (ADVANCED MICRO DEVICES)  Device ID 1510
  Subsystem Vendor ID 1022 (ADVANCED MICRO DEVICES)  Subsystem ID 1510
  Header Type 0, Class Base/Sub 06/00  (Bridge/HOST to PCI)
  Programming Interface: 00, Revision: 00, IntPin: 00, RawLine 00
  Possible Decodes ((cmd & 7) = 7): BMI   Capabilities: Ptr = <none>
  Logical Device Power State: D0
  Device Wake Level:          Unspecified
  WaitWakeIrp:                <none>
  Device Requirements structure has changed size.  Update extension.
  Device Resources structure has changed size.  Update extension.
  Interrupt Requirement: <none>
  Interrupt Resource: <none>

The !devext extension can provide some additional information about the device, which is useful for debugging purposes. There another field within the error record which I had forgotten to mention, and that is the Class Code register.

The register is very useful for identifying the type of device which could be causing the problem. The register is divided into three different parts: Class, Sub-Class and Prog. I/F. From our example, we can see that the class is 0x3, the sub-class is 0x4 and the Prog. I/F is 0x0. If we were to check the meanings of these values, then we would reach the conclusion a display controller had reported the error to the operating system.

0x3 is the class number for display controllers, and the sub-class points to general category of display controllers. This makes sense in the context of this dump since the issue lied with a TV tuner card (the dump was previously debugged by Vir Gnarus). A complete list of PCI Class codes can be found here – http://wiki.xomb.org/index.php?title=PCI_Class_Codes

I hope this tutorial series given a in-depth insight into the internals of a Stop 0x124 and some of the debugging methodologies we could use to debug such bugchecks. I didn’t wish to delve too deeply into the technical details of PCIe and x86/x64 architectures since it would leave the scope of this tutorial and generate too much ‘fluff’ and ‘filler’.

I hope you enjoyed this tutorial, and if you wish to suggest any amendments or corrections then please comment/post below. Moreover, please note that I’ve created a list of reference material regarding PCI-e and CPU architecture in this thread – Hardware Architecture Documentation Links

Posted in Computer Science, Debugging, Stop 0x124, WinDbg, Windows Internals | 1 Comment