Contents 1 Introduction.......................................................................................................................3 1.1 About this manual.......................................................................................................3 1.2 Microprocessor versions covered by this manual........................................................4 2 Out-of-order execution (All processors except P1, PMMX)................................................5 2.1 Instructions are split into uops.....................................................................................5 2.2 Register renaming......................................................................................................6 3 Branch prediction (all processors).....................................................................................7 3.1 Prediction methods for conditional jumps....................................................................7 3.2 Branch prediction in P1.............................................................................................13 3.3 Branch prediction in PMMX, PPro, P2, and P3.........................................................17 3.4 Branch prediction in P4 and P4E..............................................................................18 3.5 Branch prediction in PM and Core2..........................................................................21 3.6 Branch prediction in AMD64.....................................................................................22 3.7 Indirect jumps (all processors except PM and Core2)...............................................25 3.8 Returns (all processors except P1)...........................................................................25 3.9 Static prediction........................................................................................................26 3.10 Close jumps............................................................................................................27 4 Pentium 1 and Pentium MMX pipeline.............................................................................29 4.1 Pairing integer instructions........................................................................................29 4.2 Address generation interlock.....................................................................................33 4.3 Splitting complex instructions into simpler ones........................................................33 4.4 Prefixes.....................................................................................................................34 4.5 Scheduling floating point code..................................................................................35 5 Pentium Pro, II and III pipeline.........................................................................................38 5.1 The pipeline in PPro, P2 and P3...............................................................................38 5.2 Instruction fetch........................................................................................................38 5.3 Instruction decoding..................................................................................................39 5.4 Register renaming....................................................................................................43 5.5 ROB read..................................................................................................................43 5.6 Out of order execution..............................................................................................47 5.7 Retirement................................................................................................................48 5.8 Partial register stalls..................................................................................................49 5.9 Partial memory stalls.................................................................................................52 5.10 Bottlenecks in PPro, P2, P3....................................................................................53 6 Pentium M pipeline..........................................................................................................55 6.1 The pipeline in PM....................................................................................................55 6.2 The pipeline in Core Solo and Duo...........................................................................56 6.3 Instruction fetch........................................................................................................56 6.4 Instruction decoding..................................................................................................56 6.5 Loop buffer...............................................................................................................58 6.6 Micro-op fusion.........................................................................................................58 6.7 Stack engine.............................................................................................................60 6.8 Register renaming....................................................................................................62 6.9 Register read stalls...................................................................................................62 2 6.10 Execution units.......................................................................................................64 6.11 Execution units that are connected to both port 0 and 1..........................................64 6.12 Retirement..............................................................................................................66 6.13 Partial register access.............................................................................................66 6.14 Partial memory stalls...............................................................................................68 6.15 Bottlenecks in PM...................................................................................................68 7 Core 2 pipeline................................................................................................................71 7.1 Pipeline.....................................................................................................................71 7.2 Instruction fetch and predecoding.............................................................................71 7.3 Instruction decoding..................................................................................................73 7.4 Micro-op fusion.........................................................................................................74 7.5 Macro-op fusion........................................................................................................74 7.6 Stack engine.............................................................................................................76 7.7 Register renaming....................................................................................................76 7.8 Register read stalls...................................................................................................76 7.9 Execution units.........................................................................................................78 7.10 Retirement..............................................................................................................80 7.11 Partial register access.............................................................................................80 7.12 Partial memory stalls...............................................................................................81 7.13 Cache and memory access.....................................................................................81 7.14 Breaking dependence chains..................................................................................82 7.15 Bottlenecks in Core2...............................................................................................83 8 Pentium 4 (NetBurst) pipeline..........................................................................................85 8.1 Data cache...............................................................................................................85 8.2 Trace cache..............................................................................................................85 8.3 Instruction decoding..................................................................................................90 8.4 Execution units.........................................................................................................91 8.5 Do the floating point and MMX units run at half speed?............................................93 8.6 Transfer of data between execution units..................................................................96 8.7 Retirement................................................................................................................98 8.8 Partial registers and partial flags...............................................................................99 8.9 Partial memory access............................................................................................100 8.10 Memory intermediates in dependence chains.......................................................100 8.11 Breaking dependence chains................................................................................102 8.12 Choosing the optimal instructions.........................................................................102 8.13 Bottlenecks in P4 and P4E....................................................................................105 9 AMD64 pipeline.............................................................................................................108 9.1 The pipeline in AMD64............................................................................................108 9.2 Instruction fetch......................................................................................................110 9.3 Predecoding and instruction length decoding..........................................................110 9.4 Single, double and vector path instructions.............................................................111 9.5 Integer execution pipes...........................................................................................112 9.6 Floating point execution pipes.................................................................................112 9.7 Mixing instructions with different latency.................................................................114 9.8 64 bit versus 128 bit instructions.............................................................................115 9.9 Data delay between differently typed instructions...................................................116 9.10 Partial register access...........................................................................................117 9.11 Partial flag access.................................................................................................117 9.12 Partial memory stalls.............................................................................................118 9.13 Loops....................................................................................................................118 9.14 Cache...................................................................................................................119 9.15 Bottlenecks in AMD64...........................................................................................120 10 Comparison of microarchitectures...............................................................................122 10.1 The AMD kernel....................................................................................................122 10.2 The Pentium 4 kernel............................................................................................123 10.3 The Pentium M kernel...........................................................................................124 10.4 Intel Core 2 microarchitecture...............................................................................125 10.5 Conclusion............................................................................................................126 3 10.6 Future trends........................................................................................................128 11 Literature.....................................................................................................................129
剩余128页未读,继续阅读
- 粉丝: 172
- 资源: 2138
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助