没有合适的资源?快使用搜索试试~ 我知道了~
深入理解并行编程 v2023.06.11a 英文文字版 非扫描版 带标签 超清晰
需积分: 3 1 下载量 56 浏览量
2024-10-14
11:17:06
上传
评论
收藏 8.57MB PDF 举报
温馨提示
深入理解并行编程 v2023.06.11a 英文文字版 非扫描版 带标签 超清晰
资源推荐
资源详情
资源评论
v2023.06.11a
ii
Legal Statement
This work represents the views of the editor and the authors and does not necessarily
represent the view of their respective employers.
Trademarks:
•
IBM, z Systems, and PowerPC are trademarks or registered trademarks of Inter-
national Business Machines Corporation in the United States, other countries, or
both.
• Linux is a registered trademark of Linus Torvalds.
•
Intel, Itanium, Intel Core, and Intel Xeon are trademarks of Intel Corporation or
its subsidiaries in the United States, other countries, or both.
•
Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or
elsewhere.
•
SPARC is a registered trademark of SPARC International, Inc. Products bearing
SPARC trademarks are based on an architecture developed by Sun Microsystems,
Inc.
•
Other company, product, and service names may be trademarks or service marks
of such companies.
The non-source-code text and images in this document are provided under the terms
of the Creative Commons Attribution-Share Alike 3.0 United States license.
1
In brief,
you may use the contents of this document for any purpose, personal, commercial, or
otherwise, so long as attribution to the authors is maintained. Likewise, the document
may be modified, and derivative works and translations made available, so long as
such modifications and derivations are offered to the public on equal terms as the
non-source-code text and images in the original document.
Source code is covered by various versions of the GPL.
2
Some of this code is
GPLv2-only, as it derives from the Linux kernel, while other code is GPLv2-or-later. See
the comment headers of the individual source files within the CodeSamples directory in
the git archive
3
for the exact licenses. If you are unsure of the license for a given code
fragment, you should assume GPLv2-only.
Combined work © 2005–2023 by Paul E. McKenney. Each individual contribution
is copyright by its contributor at the time of contribution, as recorded in the git archive.
1
https://creativecommons.org/licenses/by-sa/3.0/us/
2
https://www.gnu.org/licenses/gpl-2.0.html
3
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git
v2023.06.11a
Contents
1 How To Use This Book 1
1.1 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Quick Quizzes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Alternatives to This Book . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Sample Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Whose Book Is This? . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Introduction 7
2.1 Historic Parallel Programming Difficulties . . . . . . . . . . . . . . . 7
2.2 Parallel Programming Goals . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Generality . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Alternatives to Parallel Programming . . . . . . . . . . . . . . . . . . 12
2.3.1 Multiple Instances of a Sequential Application . . . . . . . . 12
2.3.2 Use Existing Parallel Software . . . . . . . . . . . . . . . . . 12
2.3.3 Performance Optimization . . . . . . . . . . . . . . . . . . . 13
2.4 What Makes Parallel Programming Hard? . . . . . . . . . . . . . . . 13
2.4.1 Work Partitioning . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Parallel Access Control . . . . . . . . . . . . . . . . . . . . 14
2.4.3 Resource Partitioning and Replication . . . . . . . . . . . . . 15
2.4.4 Interacting With Hardware . . . . . . . . . . . . . . . . . . . 15
2.4.5 Composite Capabilities . . . . . . . . . . . . . . . . . . . . 15
2.4.6
How Do Languages and Environments Assist With These Tasks?
16
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Hardware and its Habits 17
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Pipelined CPUs . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Memory References . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.4 Memory Barriers . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.5 Thermal Throttling . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.6 Cache Misses . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.7 I/O Operations . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Overheads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Hardware System Architecture . . . . . . . . . . . . . . . . 22
3.2.2 Costs of Operations . . . . . . . . . . . . . . . . . . . . . . 23
iii
v2023.06.11a
iv CONTENTS
3.2.3 Hardware Optimizations . . . . . . . . . . . . . . . . . . . . 24
3.3 Hardware Free Lunch? . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 3D Integration . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Novel Materials and Processes . . . . . . . . . . . . . . . . . 26
3.3.3 Light, Not Electrons . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4 Special-Purpose Accelerators . . . . . . . . . . . . . . . . . 27
3.3.5 Existing Parallel Software . . . . . . . . . . . . . . . . . . . 27
3.4 Software Design Implications . . . . . . . . . . . . . . . . . . . . . . 28
4 Tools of the Trade 29
4.1 Scripting Languages . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 POSIX Multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 POSIX Process Creation and Destruction . . . . . . . . . . . 30
4.2.2 POSIX Thread Creation and Destruction . . . . . . . . . . . 31
4.2.3 POSIX Locking . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.4 POSIX Reader-Writer Locking . . . . . . . . . . . . . . . . 34
4.2.5 Atomic Operations (GCC Classic) . . . . . . . . . . . . . . . 36
4.2.6 Atomic Operations (C11) . . . . . . . . . . . . . . . . . . . 36
4.2.7 Atomic Operations (Modern GCC) . . . . . . . . . . . . . . 37
4.2.8 Per-Thread Variables . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Alternatives to POSIX Operations . . . . . . . . . . . . . . . . . . . 37
4.3.1 Organization and Initialization . . . . . . . . . . . . . . . . . 37
4.3.2 Thread Creation, Destruction, and Control . . . . . . . . . . 38
4.3.3 Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.4 Accessing Shared Variables . . . . . . . . . . . . . . . . . . 40
4.3.5 Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.6 Per-CPU Variables . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 The Right Tool for the Job: How to Choose? . . . . . . . . . . . . . . 47
5 Counting 49
5.1 Why Isn’t Concurrent Counting Trivial? . . . . . . . . . . . . . . . . 49
5.2 Statistical Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.2 Array-Based Implementation . . . . . . . . . . . . . . . . . 51
5.2.3 Per-Thread-Variable-Based Implementation . . . . . . . . . . 52
5.2.4 Eventually Consistent Implementation . . . . . . . . . . . . 54
5.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Approximate Limit Counters . . . . . . . . . . . . . . . . . . . . . . 55
5.3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.2 Simple Limit Counter Implementation . . . . . . . . . . . . 56
5.3.3 Simple Limit Counter Discussion . . . . . . . . . . . . . . . 59
5.3.4 Approximate Limit Counter Implementation . . . . . . . . . 60
5.3.5 Approximate Limit Counter Discussion . . . . . . . . . . . . 61
5.4 Exact Limit Counters . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.1 Atomic Limit Counter Implementation . . . . . . . . . . . . 61
5.4.2 Atomic Limit Counter Discussion . . . . . . . . . . . . . . . 64
5.4.3 Signal-Theft Limit Counter Design . . . . . . . . . . . . . . 64
5.4.4 Signal-Theft Limit Counter Implementation . . . . . . . . . . 65
5.4.5 Signal-Theft Limit Counter Discussion . . . . . . . . . . . . 67
5.4.6 Applying Exact Limit Counters . . . . . . . . . . . . . . . . 68
v2023.06.11a
CONTENTS v
5.5 Parallel Counting Discussion . . . . . . . . . . . . . . . . . . . . . . 68
5.5.1 Parallel Counting Validation . . . . . . . . . . . . . . . . . . 69
5.5.2 Parallel Counting Performance . . . . . . . . . . . . . . . . 69
5.5.3 Parallel Counting Specializations . . . . . . . . . . . . . . . 70
5.5.4 Parallel Counting Lessons . . . . . . . . . . . . . . . . . . . 70
6 Partitioning and Synchronization Design 73
6.1 Partitioning Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1.1 Dining Philosophers Problem . . . . . . . . . . . . . . . . . 73
6.1.2 Double-Ended Queue . . . . . . . . . . . . . . . . . . . . . 74
6.1.3 Partitioning Example Discussion . . . . . . . . . . . . . . . 80
6.2 Design Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3 Synchronization Granularity . . . . . . . . . . . . . . . . . . . . . . 83
6.3.1 Sequential Program . . . . . . . . . . . . . . . . . . . . . . 83
6.3.2 Code Locking . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3.3 Data Locking . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3.4 Data Ownership . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.5 Locking Granularity and Performance . . . . . . . . . . . . . 87
6.4 Parallel Fastpath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4.1 Reader/Writer Locking . . . . . . . . . . . . . . . . . . . . . 89
6.4.2 Hierarchical Locking . . . . . . . . . . . . . . . . . . . . . . 89
6.4.3 Resource Allocator Caches . . . . . . . . . . . . . . . . . . 90
6.5 Beyond Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.5.1 Work-Queue Parallel Maze Solver . . . . . . . . . . . . . . . 94
6.5.2 Alternative Parallel Maze Solver . . . . . . . . . . . . . . . . 95
6.5.3 Maze Validation . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5.4 Performance Comparison I . . . . . . . . . . . . . . . . . . 97
6.5.5 Alternative Sequential Maze Solver . . . . . . . . . . . . . . 98
6.5.6 Performance Comparison II . . . . . . . . . . . . . . . . . . 99
6.5.7 Future Directions and Conclusions . . . . . . . . . . . . . . 99
6.6 Partitioning, Parallelism, and Optimization . . . . . . . . . . . . . . . 100
7 Locking 101
7.1 Staying Alive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.1.1 Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.1.2 Livelock and Starvation . . . . . . . . . . . . . . . . . . . . 109
7.1.3 Unfairness . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.1.4 Inefficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2 Types of Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.1 Exclusive Locks . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.2 Reader-Writer Locks . . . . . . . . . . . . . . . . . . . . . . 111
7.2.3 Beyond Reader-Writer Locks . . . . . . . . . . . . . . . . . 112
7.2.4 Scoped Locking . . . . . . . . . . . . . . . . . . . . . . . . 113
7.3 Locking Implementation Issues . . . . . . . . . . . . . . . . . . . . . 115
7.3.1
Sample Exclusive-Locking Implementation Based on Atomic
Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.3.2 Other Exclusive-Locking Implementations . . . . . . . . . . 115
7.4 Lock-Based Existence Guarantees . . . . . . . . . . . . . . . . . . . 117
7.5 Locking: Hero or Villain? . . . . . . . . . . . . . . . . . . . . . . . 118
7.5.1 Locking For Applications: Hero! . . . . . . . . . . . . . . . 119
剩余661页未读,继续阅读
资源评论
dongsong1117
- 粉丝: 28
- 资源: 49
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- (源码)基于Spring Boot和Vue的后台管理系统.zip
- 用于将 Power BI 嵌入到您的应用中的 JavaScript 库 查看文档网站和 Wiki 了解更多信息 .zip
- (源码)基于Arduino、Python和Web技术的太阳能监控数据管理系统.zip
- (源码)基于Arduino的CAN总线传感器与执行器通信系统.zip
- (源码)基于C++的智能电力系统通信协议实现.zip
- 用于 Java 的 JSON-RPC.zip
- 用 JavaScript 重新实现计算机科学.zip
- (源码)基于PythonOpenCVYOLOv5DeepSort的猕猴桃自动计数系统.zip
- 用 JavaScript 编写的贪吃蛇游戏 .zip
- (源码)基于ASP.NET Core的美术课程管理系统.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功