WhyModernCPUsAreStarvingAndWhatCanBeDoneAboutIt-FrancescAlted(StarvingCPUs-CISE-2010)-计算机科学资源-CSDN文库

2 浏览量 2021-04-22 18:19:32 上传评论收藏 258KB PDF 举报

资源推荐

资源详情

资源评论

S CIENTI F IC P ROGRA MMI NG

Editors: Konstantin Läufer, laufer@cs.luc.edu

Konrad Hinsen, hinsen@cnrs-orleans.fr

Why Modern CPUs Are stArving

And WhAt CAn Be done ABoUt it

By Francesc Alted

well-documented trend shows

that CPU speeds are in-

creasing at a faster rate than

memory speeds.

1,2

Indeed, CPU per-

formance has now outstripped mem-

ory performance to the point that

current CPUs are starved for data,

as memory I/O becomes the perfor-

mance bottleneck.

This hasn’t always been the case.

Once upon a time, processor and

memory speeds evolved in parallel.

For example, memory clock access in

the early 1980s was at approximately

1 MHz, and memory and CPU speeds

increased in tandem to reach speeds

of 16 MHz by decade’s end. By the

early 1990s, however, CPU and mem-

ory speeds began to drift apart: mem-

ory speed increases began to level off,

while CPU clock rates continued to

skyrocket to 100 MHz and beyond.

It wasn’t too long before CPU capa-

bilities began to substantially outstrip

memory performance. Consider this: a

100 MHz processor consumes a word

from memory every 10 nanoseconds

in a single clock tick. This rate is im-

possible to sustain even with present-

day RAM, let alone with the RAM

available when 100 MHz processors

were state of the art. To address this

mismatch, commodity chipmakers in-

troduced the rst on-chip cache.

But CPUs didn’t stop at 100 MHz; by

the start of the new millennium, pro-

cessor speeds reached unparalleled ex-

tremes, hitting the magic 1 GHz gure.

As a consequence, a huge abyss opened

between the processors and the memory

subsystem: CPUs had to wait up to 50

clock ticks for each memory read or

write operation.

During the early and middle 2000s,

the strong competition between Intel

and AMD continued to drive CPU

clock cycles faster and faster (up to 4

GHz). Again, the increased impedance

mismatch with memory speeds forced

vendors to introduce a second-level

cache in CPUs. In the past ve years,

the size of this second-level cache

grew rapidly, reaching 12 Mbytes in

some instances.

Vendors started to realize that they

couldn’t keep raising the frequency

forever, however, and thus dawned

the multicore age. Programmers be-

gan scratching their heads, wondering

how to take advantage of those shiny

new and apparently innovative multi-

core machines. Today, the arrival of

Intel i7 and AMD Phenom makes

four-core on-chip CPUs the most

common conguration. Of course,

more processors means more demand

for data, and vendors thus introduced

a third-level cache.

So, here we are today: memory la-

tency is still much greater than pro-

cessor clock step (around 150 times

greater or more) and has become an

essential bottleneck over the past 20

years. Memory throughput is improv-

ing at a better rate than its latency,

but it’s also lagging behind processors

(about 25 times slower). The result is

that current CPUs are suffering from

serious starvation: they’re capable of

consuming (much!) more data than

the system can possibly deliver.

The Hierarchical

Memory Model

Why, exactly, can’t we improve mem-

ory latency and bandwidth to keep

up with CPUs? The main reason is

cost: it’s prohibitively expensive to

manufacture commodity SDRAM

that can keep up with a modern pro-

cessor. To make memory faster, we

need motherboards with more wire

layers, more complex ancillary logic,

and (most importantly) the ability to

run at higher frequencies. This addi-

tional complexity represents a much

higher cost, which few are willing to

pay. Moreover, raising the frequency

implies pushing more voltage through

the circuits. This causes the energy

consumption to quickly skyrocket and

more heat to be generated, which re-

quires huge coolers in user machines.

That’s not practical.

To cope with memory bus limita-

tions, computer architects introduced

a hierarchy of CPU memory caches.

Such caches are useful because they’re

closer to the processor (normally in

the same die), which improves both la-

tency and bandwidth. The faster they

run, however, the smaller they must

be due mainly to energy dissipation

problems. In response, the industry

CPUs spend most of their time waiting for data to arrive. Identifying low-level bottlenecks—and how to

ameliorate them—can save hours of frustration over poor performance in apparently well-written programs.

CISE-12-2-ScientificPro.indd 68 2/8/10 2:23:25 PM

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余4页未读，立即下载

评论收藏

内容反馈

weixin_38545961

粉丝: 4
资源: 963

Why Modern CPUs Are Starving And What Can Be Done About It - Fra...

最新资源

Why Modern CPUs Are Starving And What Can Be Done About It - Fra...

FAST - Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs - Slides-计算机科学

Performance Analysis and Tuning on Modern CPUs

SLIDE - Accelerating SLIDE Deep Learning on Modern CPUs - Vectorization, Quantizations, Memory Optimizations, and More - 2021 (2103.10891)-计算机科学

Diskeeper 2008 v12.0.759.0

CPU-Z 1.47,CPU 查看工具

CPUS打印管理.pdf

Index Search Algorithms for Databases and Modern CPUs - Florian Gloss (Nov 2010)-计算机科学

uC-FS-4.08.00.zip

Fast Sort on CPUs, GPUs and Intel MIC Architectures - Technical Report - Intel Labs (intel-labs-radix-sort-mic-report)-计算机科学

cpuz 1.0 完美版

一个win32下的ARM开源编译器

Understanding the Linux Kernel pdf

Modern Operating Systems 3rd

Agner Fog - Microarchitecture of Intel, AMD and VIA CPUs - An optimization guide for assembly programmers and compiler makers (2013-09-04)-计算机科学

Modern Operating Systems,Tanenbaum,3rd

Frank Kane's Taming Big Data with Apache Spark and Python 【含代码】

os x mountlion bt

arteris-ncore-white-paper.pdf

S7-400 CPU Specifications Manual .pdf

BURNINTEST--硬件检测工具

The Indispensable PC Hardware Book - rar - part1. (1/7)

Docker - Clustering Payara Server in Docker

Upgrading and Repairing PCs, 21st Edition

Vm Protect v2.12.3

Qt 5实现串口调试助手 （源工程文件、0积分下载）

【SystemVerilog】路科验证V2学习笔记（全600页）.pdf

最新资源

Qt 5实现串口调试助手（源工程文件、0积分下载）