没有合适的资源?快使用搜索试试~ 我知道了~
Performance Analysis and Tuning on Modern CPUs
需积分: 5 1 下载量 123 浏览量
2024-01-12
00:56:22
上传
评论
收藏 5.5MB PDF 举报
温馨提示
试读
175页
Performance Analysis and Tuning on Modern CPUs
资源推荐
资源详情
资源评论
Notices
Responsibility
. Knowledge and best practice in the field of engineering and software
development are constantly changing. Practitioners and researchers must always rely on their
own experience and knowledge in evaluating and using any information, methods, compounds,
or experiments described herein. In using such information or methods, they should be mindful
of their own safety and the safety of others, including parties for whom they have a professional
responsibility.
To the fullest extent of the law, neither the author nor contributors, or editors, assume any
liability for any injury and/or damage to persons or property as a matter of products liability,
negligence or otherwise, or from any use or operations of any methods, products, instructions,
or ideas contained in the material herein.
Trademarks
. Designations used by companies to distinguish their products are often claimed
as trademarks or registered trademarks. Intel, Intel Core, Intel Xeon, Intel Pentium, Intel
Vtune, and Intel Advisor are trademarks of Intel Corporation in the U.S. and/or other
countries. AMD is a trademark of Advanced Micro Devices Corporation in the U.S. and/or
other countries. ARM is a trademark of Arm Limited (or its subsidiaries) in the U.S.
and/or elsewhere. Readers, however, should contact the appropriate companies for complete
information regarding trademarks and registration.
Affiliation
. At the time of writing, the book’s primary author (Denis Bakhvalov) is an
employee of Intel Corporation. All information presented in the book is not an official position
of the aforementioned company, but rather is an individual knowledge and opinions of the
author. The primary author did not receive any financial sponsorship from Intel Corporation
for writing this book.
Advertisement
. This book does not advertise any software, hardware, or any other product.
Copyright
Copyright © 2020 by Denis Bakhvalov under Creative Commons license (CC BY 4.0).
2
Preface
About The Author
Denis Bakhvalov is a senior developer at Intel, where he works on C++ compiler projects that
aim at generating optimal code for a variety of different architectures. Performance engineering
and compilers were always among the primary interests for him. Denis has started his career
as a software developer in 2008 and has since worked in multiple areas, including developing
desktop applications, embedded, performance analysis, and compiler development. In 2016
Denis started his easyperf.net blog, where he writes about performance analysis and tuning,
C/C++ compilers, and CPU microarchitecture. Denis is a big proponent of an active lifestyle,
which he practices in his free time. You can find him playing soccer, tennis, running, and
playing chess. Besides that, Denis is a father of 2 beautiful daughters.
Contacts:
• Email: dendibakh@gmail.com
• Twitter: @dendibakh
• LinkedIn: @dendibakh
From The Author
I started this book with a simple goal: educate software developers to better understand their
applications’ performance on modern hardware. I know how confusing this topic might be for
a beginner or even for an experienced developer. This confusion mostly happens to developers
that don’t have prior occasions of working on performance-related tasks. And that’s fine since
every expert was once a beginner.
I remember the days when I was starting with performance analysis. I was staring at unfamiliar
metrics trying to match the data that didn’t match. And I was baffled. It took me years
until it finally “clicked”, and all pieces of the puzzle came together. At the time, the only
good sources of information were software developer manuals, which are not what mainstream
developers like to read. So I decided to write this book, which will hopefully make it easier for
developers to learn performance analysis concepts.
Developers who consider themselves beginners in performance analysis can start from the
beginning of the book and read sequentially, chapter by chapter. Chapters 2-4 give developers
a minimal set of knowledge required by later chapters. Readers already familiar with these
concepts may choose to skip those. Additionally, this book can be used as a reference or a
checklist for optimizing SW applications. Developers can use chapters 7-11 as a source of ideas
for tuning their code.
Target Audience
This book will be primarily useful for software developers who work with performance-critical
applications and do low-level optimizations. To name just a few areas: High-Performance
Computing (HPC), Game Development, data-center applications (like Facebook, Google, etc.),
High-Frequency Trading. But the scope of the book is not limited to the mentioned industries.
This book will be useful for any developer who wants to understand the performance of their
application better and know how it can be diagnosed and improved. The author hopes that
3
the material presented in this book will help readers develop new skills that can be applied in
their daily work.
Readers are expected to have a minimal background in C/C++ programming languages to
understand the book’s examples. The ability to read basic x86 assembly is desired but is not
a strict requirement. The author also expects familiarity with basic concepts of computer
architecture and operating systems like central processor, memory, process, thread, virtual
and physical memory, context switch, etc. If any of the mentioned terms are new to you, I
suggest studying this material first.
Acknowledgments
Huge thanks to Mark E. Dawson, Jr. for his help writing several sections of this book:
“Optimizing For DTLB” (section 8.1.3), “Optimizing for ITLB” (section 7.8), “Cache Warming”
(section 10.3), System Tuning (section 10.5), section 11.1 about performance scaling and
overhead of multithreaded applications, section 11.5 about using COZ profiler, section 11.6
about eBPF, “Detecting Coherence Issues” (section 11.7). Mark is a recognized expert in the
High-Frequency Trading industry. Mark was kind enough to share his expertise and feedback
at different stages of this book’s writing.
Next, I want to thank Sridhar Lakshmanamurthy, who authored the major part of section 3
about CPU microarchitecture. Sridhar has spent decades working at Intel, and he is a veteran
of the semiconductor industry.
Big thanks to Nadav Rotem, the original author of the vectorization framework in the LLVM
compiler, who helped me write the section 8.2.3 about vectorization.
Clément Grégoire authored a section 8.2.3.7 about ISPC compiler. Clément has an extensive
background in the game development industry. His comments and feedback helped address in
the book some of the challenges in the game development industry.
This book wouldn’t have come out of the draft without its reviewers: Dick Sites, Wojciech
Muła, Thomas Dullien, Matt Fleming, Daniel Lemire, Ahmad Yasin, Michele Adduci, Clément
Grégoire, Arun S. Kumar, Surya Narayanan, Alex Blewitt, Nadav Rotem, Alexander Yer-
molovich, Suchakrapani Datt Sharma, Renat Idrisov, Sean Heelan, Jumana Mundichipparakkal,
Todd Lipcon, Rajiv Chauhan, Shay Morag, and others.
Also, I would like to thank the whole performance community for countless blog articles and
papers. I was able to learn a lot from reading blogs by Travis Downs, Daniel Lemire, Andi
Kleen, Agner Fog, Bruce Dawson, Brendan Gregg, and many others. I stand on the shoulders
of giants, and the success of this book should not be attributed only to myself. This book is
my way to thank and give back to the whole community.
Last but not least, thanks to my family, who were patient enough to tolerate me missing
weekend trips and evening walks. Without their support, I wouldn’t have finished this book.
4
Table Of Contents
Table Of Contents 5
1 Introduction 9
1.1 Why Do We Still Need Performance Tuning? . . . . . . . . . . . . . . . . . . . 10
1.2 Who Needs Performance Tuning? . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 What Is Performance Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 What is discussed in this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 What is not in this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Part1. Performance analysis on a modern CPU 17
2 Measuring Performance 17
2.1 Noise In Modern Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Measuring Performance In Production . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Automated Detection of Performance Regressions . . . . . . . . . . . . . . . . . 20
2.4 Manual Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Software and Hardware Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Microbenchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 CPU Microarchitecture 30
3.1 Instruction Set Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Exploiting Instruction Level Parallelism (ILP) . . . . . . . . . . . . . . . . . . . 32
3.3.1 OOO Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Superscalar Engines and VLIW . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.3 Speculative Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Exploiting Thread Level Parallelism . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.1 Simultaneous Multithreading . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.1 Cache Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.1.1 Placement of data within the cache. . . . . . . . . . . . . . . . 36
3.5.1.2 Finding data in the cache. . . . . . . . . . . . . . . . . . . . . 37
3.5.1.3 Managing misses. . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.1.4 Managing writes. . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.1.5 Other cache optimization techniques. . . . . . . . . . . . . . . 38
3.5.2 Main Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 SIMD Multiprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.8 Modern CPU design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.8.1 CPU Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.8.2 CPU Back-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5
剩余174页未读,继续阅读
资源评论
vimer-hz
- 粉丝: 6220
- 资源: 32
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功