The following topic, is much more related to CPUs rather than the operating system. CPUs are able to work much faster when data has been aligned to a size of 2 bytes, 4 bytes, 8 bytes, 16 bytes and 32 bytes. This values are called memory access granularity. Granularity refers to how subdivided a memory address is, and if the data can be evenly divided by it’s associated address.
If a CPU accesses a unaligned piece of data, and the processor is x86, then the processor will simply (although negative for performance) access the data into aligned chunks for the unaligned data. This will cause more read memory accesses.
On the other hand, with x64 processors, two options can happen and this depending upon the setting with the EFLAGS register. If the AC (Alignment Check) flag is set to 0, then the above is true like the x86, otherwise if set to 1, an exception is thrown that the CPU is accessing misaligned data. This exception interrupts with INT 17H. Remember the H, otherwise you’ll be confused with the interrupt used with the parallel printer port.
Looking at the above image, we can see that Address 0 has been correctly aligned to a 4 byte boundary, however, Address 1 is clearly not aligned to this boundary leading to unaligned memory access, and thus creating an extra read.
We have looked at data alignment, but stacks and data structures are also aligned, as well as, instructions. Firstly, let’s start with stack alignment, on x86 processors, the stack frame is always aligned to a four byte boundary, whereas, on a x64 processor, the stack frame is always aligned to 16 byte boundary. Some x86 processors, may need to use 8-byte or 16-byte alignment boundaries.
For data structures, the structure is aligned according to the largest member data type within the structure. For example, if a data structure contained two int values and then one double value, the data structure will be aligned according to the size of the double value. This is inter-structure alignment. For intra-structure alignment, the members are aligned to their own boundaries, so 4-bytes and then one 8 byte (using the example stated), but a padding of bytes is included to align the members.
If you do encounter misaligned addresses in your program, then Windows will throw the EXCEPTION_DATATYPE_MISALIGNMENT. For BSOD debugging, you will most likely encounter an access violation exception code, or the the failure bucket ID will say something along the lines of x64 unaligned IP.