2
6.20 Constructors and destructors .................................................................................. 42
6.21 Unions .................................................................................................................... 42
6.22 Bitfields ................................................................................................................... 43
6.23 Overloaded functions .............................................................................................. 43
6.24 Overloaded operators ............................................................................................. 43
6.25 Templates............................................................................................................... 44
6.26 Threads .................................................................................................................. 47
6.27 Exception handling.................................................................................................. 48
6.28 Other cases of stack unwinding .............................................................................. 51
6.29 Preprocessing directives ......................................................................................... 51
7 Optimizations in the compiler .......................................................................................... 52
7.1 How compilers optimize ............................................................................................ 52
7.2 Comparison of different compilers............................................................................. 59
7.3 Obstacles to optimization by compiler....................................................................... 62
7.4 Obstacles to optimization by CPU............................................................................. 66
7.5 Compiler optimization options ................................................................................... 66
7.6 Optimization directives.............................................................................................. 67
7.7 Checking what the compiler does ............................................................................. 69
8 Optimizing memory access ............................................................................................. 72
8.1 Caching of code and data ......................................................................................... 72
8.2 Cache organization................................................................................................... 72
8.3 Functions that are used together should be stored together...................................... 73
8.4 Variables that are used together should be stored together ...................................... 73
8.5 Alignment of data...................................................................................................... 75
8.6 Dynamic memory allocation ...................................................................................... 75
8.7 Access data sequentially .......................................................................................... 78
8.8 Cache contentions in large data structures ............................................................... 78
8.9 Explicit cache control ................................................................................................ 81
9 Using multiple CPU kernels............................................................................................. 83
10 Out of order execution................................................................................................... 84
11 Using vector operations................................................................................................. 86
11.1 Automatic vectorization........................................................................................... 87
11.2 Explicit vectorization ............................................................................................... 89
11.3 Mathematical functions ......................................................................................... 102
11.4 Conclusion............................................................................................................ 104
12 Make critical code in multiple versions for different CPU's ........................................... 105
13 Specific optimization advices....................................................................................... 107
13.1 Bounds checking .................................................................................................. 107
13.2 Use lookup tables ................................................................................................. 108
13.3 Integer multiplication............................................................................................. 110
13.4 Integer division...................................................................................................... 112
13.5 Floating point division ........................................................................................... 113
13.6 Don't mix float and double..................................................................................... 114
13.7 Conversions between floating point numbers and integers ................................... 115
13.8 Using integer operations for manipulating floating point variables ......................... 116
13.9 Mathematical functions ......................................................................................... 119
14 Testing speed.............................................................................................................. 119
15 Some useful templates................................................................................................ 121
15.1 Array with bounds checking .................................................................................. 121
15.2 FIFO list ................................................................................................................ 122
15.3 LIFO list ................................................................................................................ 123
15.4 Searchable list ...................................................................................................... 124
16 Overview of compiler options....................................................................................... 129
17 Literature..................................................................................................................... 132