ComputerArchitecture:AQuantitativeapproach5thedition答案资源-CSDN文库

共5个文件

pdf：5个

Computer

Architecture

edition

5星 · 超过95%的资源需积分: 49 68 浏览量 2013-10-13 20:40:57 上传评论 20 收藏 1.88MB ZIP 举报

资源详情

资源评论

收起资源包目录

computer architecture a quantitative approach solutons第五版.zip （5个子文件）

chapter_04.pdf 320KB

chapter_05.pdf 923KB

chapter_03.pdf 932KB

chapter_01.pdf 316KB

chapter_02.pdf 324KB

Chapter 3 Solutions ■ 13

Case Study 1: Exploring the Impact of Microarchitectural

Techniques

3.1 The baseline performance (in cycles, per loop iteration) of the code sequence in

Figure 3.48, if no new instruction’s execution could be initiated until the previ-

ous instruction’s execution had completed, is 40. See Figure S.2. Each instruc-

tion requires one clock cycle of execution (a clock cycle in which that

instruction, and only that instruction, is occupying the execution units; since

every instruction must execute, the loop will take at least that many clock

cycles). To that base number, we add the extra latency cycles. Don’t forget the

branch shadow cycle.

3.2 How many cycles would the loop body in the code sequence in Figure 3.48

require if the pipeline detected true data dependencies and only stalled on those,

rather than blindly stalling everything just because one functional unit is busy?

The answer is 25, as shown in Figure S.3. Remember, the point of the extra

latency cycles is to allow an instruction to complete whatever actions it needs, in

order to produce its correct output. Until that output is ready, no dependent

instructions can be executed. So the first LD must stall the next instruction for

three clock cycles. The MULTD produces a result for its successor, and therefore

must stall 4 more clocks, and so on.

Figure S.2 Baseline performance (in cycles, per loop iteration) of the code sequence

in Figure 3.48.

Chapter 3 Solutions

Loop: LD F2,0(Rx) 1 + 4

DIVD F8,F2,F0 1 + 12

MULTD F2,F6,F2 1 + 5

LD F4,0(Ry) 1 + 4

ADDD F4,F0,F4 1 + 1

ADDD F10,F8,F2 1 + 1

ADDI Rx,Rx,#8 1

ADDI Ry,Ry,#8 1

SD F4,0(Ry) 1 + 1

SUB R20,R4,Rx 1

BNZ R20,Loop 1 + 1

____

cycles per loop iter 40

14 ■ Solutions to Case Studies and Exercises

3.3 Consider a multiple-issue design. Suppose you have two execution pipelines, each

capable of beginning execution of one instruction per cycle, and enough fetch/

decode bandwidth in the front end so that it will not stall your execution. Assume

results can be immediately forwarded from one execution unit to another, or to itself.

Further assume that the only reason an execution pipeline would stall is to observe a

true data dependency. Now how many cycles does the loop require? The answer

is 22, as shown in Figure S.4. The LD goes first, as before, and the DIVD must wait

for it through 4 extra latency cycles. After the DIVD comes the MULTD, which can run

in the second pipe along with the DIVD, since there’s no dependency between them.

(Note that they both need the same input, F2, and they must both wait on F2’s readi-

ness, but there is no constraint between them.) The LD following the MULTD does not

depend on the DIVD nor the MULTD, so had this been a superscalar-order-3 machine,

Figure S.3 Number of cycles required by the loop body in the code sequence in

Figure 3.48.

Loop: LD

<stall>

DIVD

MULTD

ADDD

ADDD

ADDI

SUB

BNZ

cycles per loop iter

F2,0(Rx)

F8,F2,F0

F2,F6,F2

F4,0(Ry)

F4,F0,F4

F10,F8,F2

Rx,Rx,#8

Ry,Ry,#8

F4,0(Ry)

R20,R4,Rx

R20,Loop

1 + 4

1 + 12

1 + 5

1 + 4

1 + 1

------

Chapter 3 Solutions ■ 15

that LD could conceivably have been executed concurrently with the DIVD and the

MULTD. Since this problem posited a two-execution-pipe machine, the LD executes in

the cycle following the DIVD/MULTD. The loop overhead instructions at the loop’s

bottom also exhibit some potential for concurrency because they do not depend on

any long-latency instructions.

3.4 Possible answers:

1. If an interrupt occurs between N and N + 1, then N + 1 must not have been

allowed to write its results to any permanent architectural state. Alternatively,

it might be permissible to delay the interrupt until N + 1 completes.

2. If N and N + 1 happen to target the same register or architectural state (say,

memory), then allowing N to overwrite what N + 1 wrote would be wrong.

3. N might be a long floating-point op that eventually traps. N + 1 cannot be

allowed to change arch state in case N is to be retried.

Execution pipe 0 Execution pipe 1

Loop: LD F2,0(Rx) ; <nop>

DIVD F8,F2,F0 ; MULTD F2,F6,F2

LD F4,0(Ry) ; <nop>

ADD F4,F0,F4 ; <nop>

ADDD F10,F8,F2 ; ADDI Rx,Rx,#8

ADDI Ry,Ry,#8 ; SD F4,0(Ry)

SUB R20,R4,Rx ; BNZ R20,Loop

cycles per loop iter 22

Figure S.4 Number of cycles required per loop.

16 ■ Solutions to Case Studies and Exercises

Long-latency ops are at highest risk of being passed by a subsequent op. The

DIVD instr will complete long after the LD F4,0(Ry), for example.

3.5 Figure S.5 demonstrates one possible way to reorder the instructions to improve the

performance of the code in Figure 3.48. The number of cycles that this reordered

code takes is 20.

3.6 a. Fraction of all cycles, counting both pipes, wasted in the reordered code

shown in Figure S.5:

11 ops out of 2x20 opportunities.

1 – 11/40 = 1 – 0.275

= 0.725

b. Results of hand-unrolling two iterations of the loop from code shown in Figure S.6:

c. Speedup =

Speedup = 20 / (22/2)

Speedup = 1.82

Execution pipe 0 Execution pipe 1

Loop: LD F2,0(Rx) ; LD F4,0(Ry)

DIVD F8,F2,F0 ; ADDD F4,F0,F4

MULTD F2,F6,F2 ; <stall due to ADDD latency>

<stall due to DIVD latency> ; SD F4,0(Ry)

<stall due to DIVD latency> ; <nop> #ops: 11

<stall due to DIVD latency> ; <nop> #nops: (20 ×

2) – 11 = 29

<stall due to DIVD latency> ; ADDI Rx,Rx,#8

<stall due to DIVD latency> ; ADDI Ry,Ry,#8

<stall due to DIVD latency> ; SUB R20,R4,Rx

ADDD F10,F8,F2 ; BNZ R20,Loop

cycles per loop iter 20

Figure S.5 Number of cycles taken by reordered code.

exec time w/o enhancement

exec time with enhancement

--------------------------------------------------------------------

Chapter 3 Solutions ■ 17

3.7 Consider the code sequence in Figure 3.49. Every time you see a destination regis-

ter in the code, substitute the next available T, beginning with T9. Then update all

the src (source) registers accordingly, so that true data dependencies are main-

tained. Show the resulting code. (Hint: See Figure 3.50.)

Execution pipe 0 Execution pipe 1

Loop: LD F2,0(Rx) ; LD F4,0(Ry)

LD F2,0(Rx) ; LD F4,0(Ry)

DIVD F8,F2,F0 ; ADDD F4,F0,F4

MULTD F2,F0,F2 ; SD F4,0(Ry)

MULTD F2,F6,F2 ; SD F4,0(Ry)

<stall due to DIVD latency> ; ADDI Rx,Rx,#16

<stall due to DIVD latency> ; ADDI Ry,Ry,#16

ADDD F10,F8,F2 ; SUB R20,R4,Rx

ADDD F10,F8,F2 ; BNZ R20,Loop

cycles per loop iter 22

Figure S.6 Hand-unrolling two iterations of the loop from code shown in Figure S.5.

Loop: LD T9,0(Rx)

IO: MULTD T10,F0,T2

I1: DIVD T11,T9,T10

I2: LD T12,0(Ry)

I3: ADDD T13,F0,T12

I4: SUBD T14,T11,T13

I5: SD T14,0(Ry)

Figure S.7 Register renaming.

评论收藏

内容反馈

zhangkaia

2014-09-12

我们上课就用的这本书，谢谢

Computer Architecture: A Quantitative approach 5th edition 答案

评论30

最新资源

Computer Architecture: A Quantitative approach 5th edition 答案

评论30

最新资源

相关推荐

Computer Architecture: A Quantitative Approach(5th edition)

solution for computer architecture 5th edition (appendix)

Computer Architecture, A Quantitative Approach (5th)

Computer architecture, A Quantitative Approach (solution for 5th edition).7z

Computer Architecture: A Quantitative approach 习题答案

Computer.Architecture.A.Quantitative.Approach英文版及习题答案

Computer Architecture A Quantitative Approach (5th edition)-401-420.pdf

computer architecture a quantitative approach 5th edition

Computer Architecture A Quantitative Approach 6th,Hennessy,David

Computer Architecture, A Quantitative Approach, 5th.pdf

Computer Architecture 5th edition

Computer Architecture, A Quantitative Approach, 5th edition

Computer Architecture A Quantitative Approach 5th Edition

Computer Architecture, A Quantitative Approach, 5th

Computer architecture, A Quantitative Approach (solution for 5th edition).pdf

Computer Architecture A Quantitative Approach 6th Edition

computer architecture: a quantitative approach, 6th edition

Computer Architecture A Quantitative Approach(4th) 无水印pdf

Computer Architecture: A Quantitative Approach 6th

计算机体系结构Computer Architecture 习题解答第四版

Computer Architecture A Quantitative Approach 5th

Computer Architecture:A Quantitative Approach(5th)

Computer Architecture A Quantitative Approach (5th edition)-1-200.pdf

Computer Architecture 5th, A Quantitative Approach for interconnect

Computer Architecture, Fifth Edition: A Quantitative Approach

《computer architecture a quantitative approach》 6th edition

computer architecture: a quantitative approach

《计算机体系结构-量化研究方法》-第五版 以及课后习题答案

《计算机体系结构-量化研究方法》-第五版以及课后习题答案