没有合适的资源?快使用搜索试试~ 我知道了~
CLK_2021_在虚拟化场景中构建基于硬件的性能监控服务1
需积分: 0 0 下载量 102 浏览量
2022-08-04
13:40:54
上传
评论
收藏 2.63MB PDF 举报
温馨提示
试读
17页
1. A basic mindset of CPU performance analysis 2. Perf profiling modes and its s
资源推荐
资源详情
资源评论
在虚拟化场景中构建
基于硬件的性能监控服务
Like Xu
腾讯云虚拟化开源团队
CLK 2021
Agenda
1. A basic mindset of CPU performance analysis
2. Perf profiling modes and its subcommands roadmap
3. Current publicly available hardware capabilities on x86
1. The PMC workflow and some devilish details
2. Virtualizing PMC and the basic KVM framework
3. Virtualizing Branches Sampling Facilities
4. Virtualizing Instructions Trace Facilities
4. Challenges in the ongoing hybrid scenarios
5. (Virtualized) PMU Use Cases at Tencent Cloud
1. A basic mindset of CPU performance analysis
• Optimization is driven by careful performance analysis, not intuition.
• Full stack, short board, joggle, spikes, long tail
• Telemetry/APM agent à Please fix other issues first
• How can we further accelerate our code ?
• HW utilization and saturation, and also errors.
• even if the CPU occupancy (not real utilization) reaches 99%
• Increase as many Instructions per cycle (IPC) ↑ as possible à Optimizable
• elapsed time ↓, retired instructions per functional block ↓
• real walked cycles (not crystal clock/tsc) before the next branch↓
• cache-references ↑ + cache-misses ↓
• cache (I, D, TLB) ↑ and memory latency ↓ hierarchy
• branch-instructions ↓ + branch-misses ↓
• frontend stalled cycles ↓
• utilization of execution units ↑
• backend stalled cycles ↓
• execution units and types (Int, FP, Scalar, Vector) ↑
• More CPU HW details
• Out of order Window, Scheduler Entries …
• Register Files, Allocation Queue …
• Smarter speculation/prefetch … algorithms
• Cstate change, uncore iio/imc …
• check https://perfmon-events.intel.com/
• Workload Classification :: Top-down Microarchitecture Analysis (not for CLK)
• http://www.cs.technion.ac.il/~erangi/TMA_using_Linux_perf__Ahmad_Yasin.pdf
• keep performance in mind as you make early design and architectural decisions
Performance bottlenecks
Distributed Services
Components and language
Runtime System (Lib, OS, VM, MM)
Data structures, algorithms
ISA (Architecture)
CPU Microarchitecture
Vendor Logic/Devices/Electrons
Instructions Branch Blocks
Observable
HW Platform Configurations
(dies, mem chips, firmware, power)
Scalable Application Code
Architecture
HW events
Model Specified
HW events
Next Page : Perf - Linux Kernel official performance analyzing tool
FE à I-Cache, Decode
Branch, μop cache
BE à Mem load/store
Execution, D-Cache
Is μOP
Queued
And
Retired
?
Monitoring hardware with hardware (PMU)
Monitoring interesting event with counter (PMC)
How the instructions flow reaches this event
剩余16页未读,继续阅读
资源评论
马虫医生
- 粉丝: 24
- 资源: 324
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功