没有合适的资源?快使用搜索试试~ 我知道了~
Intel® Pentium® 4 Processor Optimization Reference Manual
5星 · 超过95%的资源 需积分: 10 28 下载量 102 浏览量
2008-11-19
17:48:00
上传
评论
收藏 2.21MB PDF 举报
温馨提示
试读
331页
The Intel® Pentium® 4 Processor Optimization Reference Manual describes how to optimize software to take advantage of the performance characteristics of the newest Intel Pentium 4 processor. The optimizations described for the Pentium 4 processor will also apply to the future IA-32 processors based on the Intel® NetBurst™ micro-architecture.
资源推荐
资源详情
资源评论
ii
Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any
intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel
assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including
liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellec-
tual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications.
This Intel Pentium 4 Processor Optimization Reference Manual as well as the software described in it is furnished under license and may
only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only,
is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no
responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in associa-
tion with this document.
Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means without the express written consent of Intel Corporation.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves
these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
The Pentium 4 processor may contain design defects or errors known as errata which may cause the product to deviate from published spec-
ifications. Current characterized errata are available on request.
* Third-party brands and names are the property of their respective owners.
Copyright © Intel Corporation 1999-2001.
iii
Contents
Introduction
About This Manual ............................................................................... xxviii
Related Documentation......................................................................... xxix
Notational Conventions .......................................................................... xxx
Chapter 1 Intel® Pentium® 4 Processor Overview
SIMD Technology and Streaming SIMD Extensions 2 ........................... 1-2
Summary of SIMD Technologies ....................................................... 1-4
MMX Technology .......................................................................... 1-4
Streaming SIMD Extensions ......................................................... 1-5
Streaming SIMD Extensions 2 ...................................................... 1-5
Intel® NetBurst™ Micro-architecture...................................................... 1-6
The Design Considerations of the Intel NetBurst
Micro-architecture............................................................................ 1-7
Overview of the Intel NetBurst Micro-architecture Pipeline ............... 1-8
The Front End ............................................................................... 1-9
The Out-of-order Core ................................................................ 1-10
Retirement ................................................................................... 1-11
Front End Pipeline Detail................................................................ 1-12
Prefetching.................................................................................. 1-12
Decoder ...................................................................................... 1-12
Execution Trace Cache............................................................... 1-13
Branch Prediction........................................................................ 1-13
Branch Hints ............................................................................... 1-15
Execution Core Detail...................................................................... 1-15
Intel Pentium 4 Processor Optimization Contents
iv
Instruction Latency and Throughput............................................ 1-16
Execution Units and Issue Ports ................................................. 1-17
Caches........................................................................................ 1-18
Data Prefetch .............................................................................. 1-19
Loads and Stores ........................................................................ 1-21
Store Forwarding......................................................................... 1-22
Chapter 2 General Optimization Guidelines
Tuning to Achieve Optimum Performance.............................................. 2-1
Tuning to Prevent Known Coding Pitfalls ............................................... 2-2
General Practices and Coding Guidelines.............................................. 2-3
Use Available Performance Tools ...................................................... 2-3
Optimize Performance Across Processor Generations ..................... 2-4
Optimize Branch Predictability........................................................... 2-4
Optimize Memory Access .................................................................. 2-4
Optimize Floating-point Performance ................................................ 2-5
Optimize Instruction Selection ........................................................... 2-5
Optimize Instruction Scheduling ........................................................ 2-6
Enable Vectorization .......................................................................... 2-6
Coding Rules, Suggestions and Tuning Hints ........................................ 2-6
Performance Tools.................................................................................. 2-7
Intel® C++ Compiler .......................................................................... 2-7
General Compiler Recommendations................................................ 2-8
VTune™ Performance Analyzer ........................................................ 2-9
Processor Generations Perspective....................................................... 2-9
The CPUID Dispatch Strategy and Compatible Code Strategy....... 2-11
Branch Prediction ................................................................................. 2-12
Eliminating Branches ....................................................................... 2-12
Spin-Wait and Idle Loops................................................................. 2-15
Static Prediction............................................................................... 2-15
Branch Hints .................................................................................... 2-17
Inlining, Calls and Returns............................................................... 2-18
Branch Type Selection..................................................................... 2-19
Intel Pentium 4 Processor Optimization Contents
v
Loop Unrolling.................................................................................. 2-20
Compiler Support for Branch Prediction .......................................... 2-21
Memory Accesses ................................................................................ 2-22
Alignment......................................................................................... 2-22
Store Forwarding ............................................................................. 2-25
Store-forwarding Restriction on Size and Alignment................... 2-26
Store-forwarding Restriction on Data Availability ........................ 2-30
Data Layout Optimizations............................................................... 2-31
Stack Alignment............................................................................... 2-34
Aliasing Cases ................................................................................. 2-35
Mixing Code and Data ..................................................................... 2-36
Write Combining .............................................................................. 2-37
Locality Enhancement ..................................................................... 2-38
Prefetching....................................................................................... 2-39
Hardware Instruction Fetching .................................................... 2-39
Software and Hardware Cache Line Fetching ............................ 2-39
Cacheability instructions .................................................................. 2-40
Code ................................................................................................ 2-40
Improving the Performance of Floating-point Applications................... 2-41
Guidelines for Optimizing Floating-point Code ................................ 2-41
Floating-point Modes and Exceptions.............................................. 2-43
Floating-point Exceptions............................................................ 2-43
Floating-point Modes.................................................................. 2-45
Improving Parallelism and the Use of FXCH ................................... 2-49
x87 vs. SIMD Floating-point Trade-offs ........................................... 2-50
Memory Operands ........................................................................... 2-51
Floating-Point Stalls......................................................................... 2-51
x87 Floating-point Operations with Integer Operands................ 2-52
x87 Floating-point Comparison Instructions................................ 2-52
Transcendental Functions........................................................... 2-52
Instruction Selection ............................................................................. 2-52
Complex Instructions ....................................................................... 2-53
剩余330页未读,继续阅读
资源评论
- 小小鸟12212013-01-05很好的书,虽然是英文版,虽然是电子书,依然是NO.1,喜欢英文版的书,因为质量高,至少比中文版高多了。
daizhiqiang0101
- 粉丝: 13
- 资源: 31
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功