Contents
v
Loop Unrolling ............................................................................................................... 2-24
Compiler Support for Branch Prediction......................................................................... 2-26
Memory Accesses................................................................................................................. 2-27
Alignment ....................................................................................................................... 2-27
Store Forwarding ............................................................................................................ 2-30
Store-to-Load-Forwarding Restriction on Size and Alignment.................................. 2-31
Store-forwarding Restriction on Data Availability...................................................... 2-36
Data Layout Optimizations ............................................................................................. 2-37
Stack Alignment.............................................................................................................. 2-40
Capacity Limits and Aliasing in Caches.......................................................................... 2-41
Capacity Limits in Set-Associative Caches............................................................... 2-42
Aliasing Cases in the Pentium 4 and Intel
®
Xeon™ Processors .............................. 2-43
Aliasing Cases in the Pentium M Processor............................................................. 2-43
Mixing Code and Data.................................................................................................... 2-44
Self-modifying Code ................................................................................................. 2-45
Write Combining ............................................................................................................. 2-46
Locality Enhancement .................................................................................................... 2-48
Minimizing Bus Latency.................................................................................................. 2-48
Non-Temporal Store Bus Traffic ..................................................................................... 2-49
Prefetching ..................................................................................................................... 2-50
Hardware Instruction Fetching.................................................................................. 2-51
Software and Hardware Cache Line Fetching .......................................................... 2-51
Cacheability Instructions ................................................................................................ 2-52
Code Alignment.............................................................................................................. 2-52
Improving the Performance of Floating-point Applications.................................................... 2-53
Guidelines for Optimizing Floating-point Code............................................................... 2-53
Floating-point Modes and Exceptions ............................................................................ 2-55
Floating-point Exceptions ......................................................................................... 2-55
Floating-point Modes ................................................................................................ 2-58
Improving Parallelism and the Use of FXCH.................................................................. 2-63
x87 vs. Scalar SIMD Floating-point Trade-offs............................................................... 2-64
Memory Operands.......................................................................................................... 2-65
Floating-Point Stalls........................................................................................................ 2-65
x87 Floating-point Operations with Integer Operands .............................................. 2-66
x87 Floating-point Comparison Instructions ............................................................. 2-66
Transcendental Functions ........................................................................................ 2-66
Instruction Selection.............................................................................................................. 2-67
Complex Instructions...................................................................................................... 2-67
Use of the lea Instruction................................................................................................ 2-68
Use of the inc and dec Instructions ................................................................................ 2-68