Solutions
5
Chapter 5 Solutions S-3
5.1
5.1.1 2.
5.1.2 I, J, and B[I][0].
5.1.3 A[I][J].
5.1.4 I, J, and B[I][0].
5.1.5 A(J, I) and B[I][0].
5.1.6 32,004 with Matlab. 32,008 with C.
e code references 8*8000 = 64,000 integers from matrix A. At two integers
per 16-byte block, we need 32,000 blocks.
e code also references the rst element in each of eight rows of Matrix B.
Matlab stores matrix data in column-major order; therefore, all eight
integers are contiguous and t in four blocks. C stores matrix data in row-
major order; therefore, the rst element of each row is in a di erent block.
5.2
5.2.1
5.2.2
S-4 Chapter 5 Solutions
5.2.3
Cache 1 miss rate = 100%
Cache 1 total cycles = 12 × 25 + 12 × 2 = 324
Cache 2 miss rate = 10/12 = 83%
Cache 2 total cycles = 10 × 25 + 12 × 3 = 286
Cache 3 miss rate = 11/12 = 92%
Cache 3 total cycles = 11 × 25 + 12 × 5 = 335
Cache 2 provides the best performance.
5.3
5.3.1 Total size is 364,544 bits = 45,568 bytes
Each word is 8 bytes; each block contains two words; thus, each block contains
16 = 2^4 bytes.
e cache contains 32KiB = 2^15 bytes of data. us, it has 2^15/2^4 = 2^11
lines of data.
Each 64-bit address is divided into: (1) a 3-bit word o set, (2) a 1-bit block o
set,
(3) an 11-bit index (because there are 2^11 lines), and (3) a 49-bit tag (64 − 3 − 1
− 11 = 49).
e cache is composed of: 2^15 * 8 bits of data + 2^11*49 bits of tag + 2^11*1
valid bits = 364,544 bits.
Chapter 5 Solutions S-5
5.3.2 549,376 bits = 68,672 bytes. is is a 51% increase.
Each word is 8 bytes; each block contains 16 words; thus, each block contains
128 = 2^7 bytes.
e cache contains 64KiB = 2^16 bytes of data. us, it has 2^16/2^7 = 2^9 lines
of data.
Each 64-bit address is divided into: (1) a 3-bit word o set, (2) a 4-bit block o set,
(3) a 9-bit index (because there are 2^9 lines), and (3) a 48-bit tag (64 − 3 − 4 − 9
= 48).
e cache is composed of: 2^16 * 8 bits of data + 2^9*48 bits of tag + 2^9*1 valid
bits = 549,376 bits
5.3.3 e larger block size may require an increased hit time and an increased
miss penalty than the original cache. e fewer number of blocks may cause a higher
con ict miss rate than the original cache.
5.3.4 Associative caches are designed to reduce the rate of con ict misses. As such,
a sequence of read requests with the same 12-bit index eld but a di erent tag eld
will generate many misses. For the cache described above, the sequence 0, 32768, 0,
32768, 0, 32768, …, would miss on every access, while a two-way set associate cache
with LRU replacement, even one with a signi cantly smaller overall capacity, would
hit on every access a er the rst two.
5.4 Yes it is possible. To implement a direct-mapped cache, we need only a
function that will take an address as input and produce a 10-bit output. Although
it is possible to implement a cache in this manner, it is not clear that such an
implementation will be bene cial. (1) e cache would require a larger tag; and (2)
there would likely be more con ict misses.
5.5
5.5.1 Each cache block consists of four 8-byte words. e total o set is 5 bits.
ree of those 5 bits is the word o set (the o set into an 8-byte word). e
remaining two bits are the block o set. Two bits allows us to enumerate 2^2 = 4
words.
5.5.2 ere are ve index bits. is tells us there are 2^5 = 32 lines in the cache.
5.5.3 e ratio is 1.21. e cache stores a total of 32 lines * 4 words/block * 8 bytes/
word= 1024 bytes = 8192 bits.
In addition to the data, each line contains 54 tag bits and 1 valid bit. us,
the total bits required = 8192 + 54*32 + 1*32 = 9952 bits.
S-6 Chapter 5 Solutions
5.5.4
5.5.5 4/12 = 33%.
5.5.6 <index, tag, data>
<0, 3, Mem[0xC00]-Mem[0xC1F]>
<4, 2, Mem[0x880]-Mem[0x89f]>
<5, 0, Mem[0x0A0]-Mem[0x0Bf]>
<7, 0, Mem[0x0e0]-Mem[0x0ff]>
5.6
5.6.1 e L1 cache has a low write miss penalty while the L2 cache has a high write
miss penalty. A write bu er between the L1 and L2 cache would hide the write miss
latency of the L2 cache. e L2 cache would bene t from write bu ers when
replacing a dirty block, since the new block would be read in before the dirty block
is physically written to memory.
5.6.2 On an L1 write miss, the word is written directly to L2 without bringing its
block into the L1 cache. If this results in an L2 miss, its block must be brought into
the L2 cache, possibly replacing a dirty block, which must rst be written to memory.
5.6.3 A er an L1 write miss, the block will reside in L2 but not in L1. A subsequent
read miss on the same block will require that the block in L2 be written back to
memory, transferred to L1, and invalidated in L2.