没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
68 Copublished by the IEEE CS and the AIP 1521-9615/10/$26.00 :copyright: 2010 IEEE Computing in SCienCe & engineeringS C I E n t I f I C P r o g r A m m I n gEditors: Konstantin Läufer, laufer@cs.luc.eduKonrad Hinsen, hinsen@cnrs-orleans.frWhy Modern CPUs Are stArving And WhAt CAn Be done ABoUt it By Francesc AltedA well-documented trend shows that CPU speeds are in-creasing at a faster rate than memory speeds.1,2 Indeed, CPU per- formance has now outstripped mem- ory performance to the point
资源推荐
资源详情
资源评论
68 Copublished by the IEEE CS and the AIP 1521-9615/10/$26.00 © 2010 IEEE Computing in SCienCe & engineering
S CIENTI F IC P ROGRA MMI NG
Editors: Konstantin Läufer, laufer@cs.luc.edu
Konrad Hinsen, hinsen@cnrs-orleans.fr
Why Modern CPUs Are stArving
And WhAt CAn Be done ABoUt it
By Francesc Alted
A
well-documented trend shows
that CPU speeds are in-
creasing at a faster rate than
memory speeds.
1,2
Indeed, CPU per-
formance has now outstripped mem-
ory performance to the point that
current CPUs are starved for data,
as memory I/O becomes the perfor-
mance bottleneck.
This hasn’t always been the case.
Once upon a time, processor and
memory speeds evolved in parallel.
For example, memory clock access in
the early 1980s was at approximately
1 MHz, and memory and CPU speeds
increased in tandem to reach speeds
of 16 MHz by decade’s end. By the
early 1990s, however, CPU and mem-
ory speeds began to drift apart: mem-
ory speed increases began to level off,
while CPU clock rates continued to
skyrocket to 100 MHz and beyond.
It wasn’t too long before CPU capa-
bilities began to substantially outstrip
memory performance. Consider this: a
100 MHz processor consumes a word
from memory every 10 nanoseconds
in a single clock tick. This rate is im-
possible to sustain even with present-
day RAM, let alone with the RAM
available when 100 MHz processors
were state of the art. To address this
mismatch, commodity chipmakers in-
troduced the rst on-chip cache.
But CPUs didn’t stop at 100 MHz; by
the start of the new millennium, pro-
cessor speeds reached unparalleled ex-
tremes, hitting the magic 1 GHz gure.
As a consequence, a huge abyss opened
between the processors and the memory
subsystem: CPUs had to wait up to 50
clock ticks for each memory read or
write operation.
During the early and middle 2000s,
the strong competition between Intel
and AMD continued to drive CPU
clock cycles faster and faster (up to 4
GHz). Again, the increased impedance
mismatch with memory speeds forced
vendors to introduce a second-level
cache in CPUs. In the past ve years,
the size of this second-level cache
grew rapidly, reaching 12 Mbytes in
some instances.
Vendors started to realize that they
couldn’t keep raising the frequency
forever, however, and thus dawned
the multicore age. Programmers be-
gan scratching their heads, wondering
how to take advantage of those shiny
new and apparently innovative multi-
core machines. Today, the arrival of
Intel i7 and AMD Phenom makes
four-core on-chip CPUs the most
common conguration. Of course,
more processors means more demand
for data, and vendors thus introduced
a third-level cache.
So, here we are today: memory la-
tency is still much greater than pro-
cessor clock step (around 150 times
greater or more) and has become an
essential bottleneck over the past 20
years. Memory throughput is improv-
ing at a better rate than its latency,
but it’s also lagging behind processors
(about 25 times slower). The result is
that current CPUs are suffering from
serious starvation: they’re capable of
consuming (much!) more data than
the system can possibly deliver.
The Hierarchical
Memory Model
Why, exactly, can’t we improve mem-
ory latency and bandwidth to keep
up with CPUs? The main reason is
cost: it’s prohibitively expensive to
manufacture commodity SDRAM
that can keep up with a modern pro-
cessor. To make memory faster, we
need motherboards with more wire
layers, more complex ancillary logic,
and (most importantly) the ability to
run at higher frequencies. This addi-
tional complexity represents a much
higher cost, which few are willing to
pay. Moreover, raising the frequency
implies pushing more voltage through
the circuits. This causes the energy
consumption to quickly skyrocket and
more heat to be generated, which re-
quires huge coolers in user machines.
That’s not practical.
To cope with memory bus limita-
tions, computer architects introduced
a hierarchy of CPU memory caches.
3
Such caches are useful because they’re
closer to the processor (normally in
the same die), which improves both la-
tency and bandwidth. The faster they
run, however, the smaller they must
be due mainly to energy dissipation
problems. In response, the industry
CPUs spend most of their time waiting for data to arrive. Identifying low-level bottlenecks—and how to
ameliorate them—can save hours of frustration over poor performance in apparently well-written programs.
CISE-12-2-ScientificPro.indd 68 2/8/10 2:23:25 PM
资源评论
weixin_38545961
- 粉丝: 4
- 资源: 963
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- CAP 定理(Consistency、Availability、Partition Tolerance Theorem),也称为 Brewer 定理,起源于在 2000 年 7 月,是加州大学伯克利分
- Fluent电弧,激光,熔滴一体模拟 UDF包括高斯旋转体热源、双椭球热源(未使用)、VOF梯度计算、反冲压力、磁场力、表面张力,以及熔滴过渡所需的熔滴速度场、熔滴温度场和熔滴VOF
- 哈工大数据结构课程写的一些代码.zip
- 图书借阅系统,大二数据库课程大作业.LibaraySystem, Data.zip
- 双馈风机 DFIG 低电压穿越 MATLAB仿真模型simulink, LVRT 双馈异步风力,Crowbar电路,波形如图 (1)转子侧变器采用基于定子电压定向的矢量控制策略,有功无功解耦,具备MP
- 图书馆系统,大一java课程设计,swing界面,基本数据库操作.zip
- - 使用Seata的AT事务保障数据一致性 - 使用Kafka来保障异步记账效率
- (2025)Unity Barracuda-3.0.1发布版
- C#课程大作业基于C#实现的个人博客Blog源代码+数据库,带GUI界面
- EcgLab_v1_0_4c_cn_111117.EXE
- 城南大数据平台项目.zip
- 电动汽车充电负荷概率预测的条件扩散模型 利用去噪扩散模型,该模型可以通过学习扩散过程的反转,逐步将高斯先验转为实时时间序列数据 此外,我们将这种扩散模型与基于交叉注意的条件调节机制相结合,对可能的充
- 这是一个功能齐全的 Scala http 客户端,它包装了 java.net.HttpURLConnection
- (2025)Unity导入GLB的插件 GLTFUtility-0.7.2
- 基于java开发,功能强大、配置灵活的数据库之间同步工具,可以执行多个数据同步任务,并且可以根据cron表达式配置同步的周期和时间.zip
- 2025跨年倒计时html代码
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功