没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
1. Optimizing software in C++An optimization guide for Windows, Linux and Mac platformsBy Agner Fog. Technical University of Denmark.Copyright :copyright: 2004 - 2017. Last updated 2017-05-02.Contents 1 Introduction ....................................................................................................................... 31.1 The costs of optimizing ............................................................................................... 4 2 Choosing the optimal platform .
资源推荐
资源详情
资源评论
1.
Optimizing software in C++
An optimization guide for Windows, Linux and Mac
platforms
By Agner Fog. Technical University of Denmark.
Copyright © 2004 - 2017. Last updated 2017-05-02.
Contents
1 Introduction ....................................................................................................................... 3
1.1 The costs of optimizing ............................................................................................... 4
2 Choosing the optimal platform ........................................................................................... 5
2.1 Choice of hardware platform ....................................................................................... 5
2.2 Choice of microprocessor ........................................................................................... 6
2.3 Choice of operating system ......................................................................................... 6
2.4 Choice of programming language ............................................................................... 8
2.5 Choice of compiler .................................................................................................... 10
2.6 Choice of function libraries ........................................................................................ 12
2.7 Choice of user interface framework ........................................................................... 14
2.8 Overcoming the drawbacks of the C++ language ...................................................... 14
3 Finding the biggest time consumers ................................................................................ 16
3.1 How much is a clock cycle? ...................................................................................... 16
3.2 Use a profiler to find hot spots .................................................................................. 16
3.3 Program installation .................................................................................................. 18
3.4 Automatic updates .................................................................................................... 19
3.5 Program loading ....................................................................................................... 19
3.6 Dynamic linking and position-independent code ....................................................... 20
3.7 File access ................................................................................................................ 20
3.8 System database ...................................................................................................... 20
3.9 Other databases ....................................................................................................... 21
3.10 Graphics ................................................................................................................. 21
3.11 Other system resources .......................................................................................... 21
3.12 Network access ...................................................................................................... 21
3.13 Memory access ....................................................................................................... 22
3.14 Context switches ..................................................................................................... 22
3.15 Dependency chains ................................................................................................ 22
3.16 Execution unit throughput ....................................................................................... 22
4 Performance and usability ............................................................................................... 23
5 Choosing the optimal algorithm ....................................................................................... 24
6 Development process ...................................................................................................... 25
7 The efficiency of different C++ constructs ........................................................................ 26
7.1 Different kinds of variable storage ............................................................................. 26
7.2 Integers variables and operators ............................................................................... 29
7.3 Floating point variables and operators ...................................................................... 32
7.4 Enums ...................................................................................................................... 33
7.5 Booleans ................................................................................................................... 34
7.6 Pointers and references ............................................................................................ 36
7.7 Function pointers ...................................................................................................... 37
7.8 Member pointers ....................................................................................................... 38
7.9 Smart pointers .......................................................................................................... 38
7.10 Arrays ..................................................................................................................... 39
7.11 Type conversions .................................................................................................... 40
7.12 Branches and switch statements ............................................................................. 44
7.13 Loops ...................................................................................................................... 45
2
7.14 Functions ................................................................................................................ 48
7.15 Function parameters ............................................................................................... 50
7.16 Function return types .............................................................................................. 50
7.17 Function tail calls .................................................................................................... 51
7.18 Recursive functions ................................................................................................. 52
7.19 Structures and classes ............................................................................................ 52
7.20 Class data members (instance variables) ............................................................... 53
7.21 Class member functions (methods) ......................................................................... 54
7.22 Virtual member functions ........................................................................................ 55
7.23 Runtime type identification (RTTI) ........................................................................... 55
7.24 Inheritance .............................................................................................................. 55
7.25 Constructors and destructors .................................................................................. 56
7.26 Unions .................................................................................................................... 57
7.27 Bitfields ................................................................................................................... 57
7.28 Overloaded functions .............................................................................................. 58
7.29 Overloaded operators ............................................................................................. 58
7.30 Templates ............................................................................................................... 58
7.31 Threads .................................................................................................................. 61
7.32 Exceptions and error handling ................................................................................ 62
7.33 Other cases of stack unwinding .............................................................................. 66
7.34 Preprocessing directives ......................................................................................... 66
7.35 Namespaces ........................................................................................................... 67
8 Optimizations in the compiler .......................................................................................... 67
8.1 How compilers optimize ............................................................................................ 67
8.2 Comparison of different compilers ............................................................................. 75
8.3 Obstacles to optimization by compiler ....................................................................... 78
8.4 Obstacles to optimization by CPU ............................................................................. 82
8.5 Compiler optimization options ................................................................................... 82
8.6 Optimization directives .............................................................................................. 84
8.7 Checking what the compiler does ............................................................................. 85
9 Optimizing memory access ............................................................................................. 88
9.1 Caching of code and data ......................................................................................... 88
9.2 Cache organization ................................................................................................... 88
9.3 Functions that are used together should be stored together ...................................... 89
9.4 Variables that are used together should be stored together ...................................... 90
9.5 Alignment of data ...................................................................................................... 91
9.6 Dynamic memory allocation ...................................................................................... 91
9.7 Container classes ..................................................................................................... 94
9.8 Strings ...................................................................................................................... 97
9.9 Access data sequentially .......................................................................................... 97
9.10 Cache contentions in large data structures ............................................................. 98
9.11 Explicit cache control ............................................................................................ 100
10 Multithreading .............................................................................................................. 102
10.1 Simultaneous multithreading ................................................................................. 104
11 Out of order execution ................................................................................................. 105
12 Using vector operations ............................................................................................... 107
12.1 AVX instruction set and YMM registers ................................................................. 109
12.2 AVX512 instruction set and ZMM registers ........................................................... 109
12.3 Automatic vectorization ......................................................................................... 110
12.4 Using intrinsic functions ........................................................................................ 112
12.5 Using vector classes ............................................................................................. 116
12.6 Transforming serial code for vectorization ............................................................. 120
12.7 Mathematical functions for vectors ........................................................................ 122
12.8 Aligning dynamically allocated memory ................................................................. 123
12.9 Aligning RGB video or 3-dimensional vectors ....................................................... 123
12.10 Conclusion .......................................................................................................... 123
13 Making critical code in multiple versions for different instruction sets ........................... 125
13.1 CPU dispatch strategies........................................................................................ 125
3
13.2 Model-specific dispatching .................................................................................... 127
13.3 Difficult cases ........................................................................................................ 128
13.4 Test and maintenance .......................................................................................... 129
13.5 Implementation ..................................................................................................... 129
13.6 CPU dispatching in Gnu compiler ......................................................................... 131
13.7 CPU dispatching in Intel compiler ......................................................................... 133
14 Specific optimization topics ......................................................................................... 135
14.1 Use lookup tables ................................................................................................. 135
14.2 Bounds checking .................................................................................................. 137
14.3 Use bitwise operators for checking multiple values at once ................................... 138
14.4 Integer multiplication ............................................................................................. 139
14.5 Integer division ...................................................................................................... 140
14.6 Floating point division ........................................................................................... 142
14.7 Don't mix float and double ..................................................................................... 143
14.8 Conversions between floating point numbers and integers ................................... 144
14.9 Using integer operations for manipulating floating point variables ......................... 145
14.10 Mathematical functions ....................................................................................... 149
14.11 Static versus dynamic libraries ............................................................................ 149
14.12 Position-independent code .................................................................................. 151
14.13 System programming .......................................................................................... 153
15 Metaprogramming ....................................................................................................... 154
16 Testing speed .............................................................................................................. 157
16.1 Using performance monitor counters .................................................................... 159
16.2 The pitfalls of unit-testing ...................................................................................... 159
16.3 Worst-case testing ................................................................................................ 160
17 Optimization in embedded systems ............................................................................. 162
18 Overview of compiler options....................................................................................... 164
19 Literature ..................................................................................................................... 167
20 Copyright notice .......................................................................................................... 168
1 Introduction
This manual is for advanced programmers and software developers who want to make their
software faster. It is assumed that the reader has a good knowledge of the C++
programming language and a basic understanding of how compilers work. The C++
language is chosen as the basis for this manual for reasons explained on page 8 below.
This manual is based mainly on my study of how compilers and microprocessors work. The
recommendations are based on the x86 family of microprocessors from Intel, AMD and VIA
including the 64-bit versions. The x86 processors are used in the most common platforms
with Windows, Linux, BSD and Mac OS X operating systems, though these operating
systems can also be used with other microprocessors. Many of the advices may apply to
other platforms and other compiled programming languages as well.
This is the first in a series of five manuals:
1. Optimizing software in C++: An optimization guide for Windows, Linux and Mac
platforms.
2. Optimizing subroutines in assembly language: An optimization guide for x86
platforms.
3. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for
assembly programmers and compiler makers.
4
4. Instruction tables: Lists of instruction latencies, throughputs and micro-operation
breakdowns for Intel, AMD and VIA CPUs.
5. Calling conventions for different C++ compilers and operating systems.
The latest versions of these manuals are always available from www.agner.org/optimize.
Copyright conditions are listed on page 168 below.
Those who are satisfied with making software in a high-level language need only read this
first manual. The subsequent manuals are for those who want to go deeper into the
technical details of instruction timing, assembly language programming, compiler
technology, and microprocessor microarchitecture. A higher level of optimization can
sometimes be obtained by the use of assembly language for CPU-intensive code, as
described in the subsequent manuals.
Please note that my optimization manuals are used by thousands of people. I simply don't
have the time to answer questions from everybody. So please don't send your programming
questions to me. You will not get any answer. Beginners are advised to seek information
elsewhere and get a good deal of programming experience before trying the techniques in
the present manual. There are various discussion forums on the Internet where you can get
answers to your programming questions if you cannot find the answers in the relevant
books and manuals.
I want to thank the many people who have sent me corrections and suggestions for my
optimization manuals. I am always happy to receive new relevant information.
1.1 The costs of optimizing
University courses in programming nowadays stress the importance of structured and
object-oriented programming, modularity, reusability and systematization of the software
development process. These requirements are often conflicting with the requirements of
optimizing the software for speed or size.
Today, it is not uncommon for software teachers to recommend that no function or method
should be longer than a few lines. A few decades ago, the recommendation was the
opposite: Don't put something in a separate subroutine if it is only called once. The reasons
for this shift in software writing style are that software projects have become bigger and
more complex, that there is more focus on the costs of software development, and that
computers have become more powerful.
The high priority of structured software development and the low priority of program
efficiency is reflected, first and foremost, in the choice of programming language and
interface frameworks. This is often a disadvantage for the end user who has to invest in
ever more powerful computers to keep up with the ever bigger software packages and who
is still frustrated by unacceptably long response times, even for simple tasks.
Sometimes it is necessary to compromise on the advanced principles of software develop-
ment in order to make software packages faster and smaller. This manual discusses how to
make a sensible balance between these considerations. It is discussed how to identify and
isolate the most critical part of a program and concentrate the optimization effort on that
particular part. It is discussed how to overcome the dangers of a relatively primitive
programming style that doesn't automatically check for array bounds violations, invalid
pointers, etc. And it is discussed which of the advanced programming constructs are costly
and which are cheap, in relation to execution time.
5
2 Choosing the optimal platform
2.1 Choice of hardware platform
The choice of hardware platform has become less important than it used to be. The
distinctions between RISC and CISC processors, between PC's and mainframes, and
between simple processors and vector processors are becoming increasingly blurred as the
standard PC processors with CISC instruction sets have got RISC cores, vector processing
instructions, multiple cores, and a processing speed exceeding that of yesterday's big
mainframe computers.
Today, the choice of hardware platform for a given task is often determined by
considerations such as price, compatibility, second source, and the availability of good
development tools, rather than by the processing power. Connecting several standard PC's
in a network may be both cheaper and more efficient than investing in a big mainframe
computer. Big supercomputers with massively parallel vector processing capabilities still
have a niche in scientific computing, but for most purposes the standard PC processors are
preferred because of their superior performance/price ratio.
The CISC instruction set (called x86) of the standard PC processors is not optimal from a
technological point of view. This instruction set is maintained for the sake of backwards
compatibility with a lineage of software that dates back to around 1980 where RAM memory
and disk space were scarce resources. However, the CISC instruction set is better than its
reputation. The compactness of the code makes caching more efficient today where cache
size is a limited resource. The CISC instruction set may actually be better than RISC in
situations where code caching is critical. The worst problem of the x86 instruction set is the
scarcity of registers. This problem has been alleviated in the 64-bit extension to the x86
instruction set where the number of registers has been doubled.
Thin clients that depend on network resources are not recommended for critical applications
because the response times for network resources cannot be controlled.
Small hand-held devices are becoming more popular and used for an increasing number of
purposes such as email and web browsing that previously required a PC. Similarly, we are
seeing an increasing number of devices and machines with embedded microcontrollers. I
am not making any specific recommendation about which platforms and operating systems
are most efficient for such applications, but it is important to realize that such devices
typically have much less memory and computing power than PCs. Therefore, it is even
more important to economize the resource use on such systems than it is on a PC platform.
However, with a well optimized software design, it is possible to get a good performance for
many applications even on such small devices, as discussed on page 162.
This manual is based on the standard PC platform with an Intel, AMD or VIA processor and
a Windows, Linux, BSD or Mac operating system running in 32-bit or 64-bit mode. Much of
the advice given here may apply to other platforms as well, but the examples have been
tested only on PC platforms.
Graphics accelerators
The choice of platform is obviously influenced by the requirements of the task in question.
For example, a heavy graphics application is preferably implemented on a platform with a
graphics coprocessor or graphics accelerator card. Some systems also have a dedicated
physics processor for calculating the physical movements of objects in a computer game or
animation.
It is possible in some cases to use the high processing power of the processors on a
graphics accelerator card for other purposes than rendering graphics on the screen.
However, such applications are highly system dependent and therefore not recommended if
portability is important. This manual does not cover graphics processors.
剩余167页未读,继续阅读
资源评论
weixin_38628953
- 粉丝: 6
- 资源: 926
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功