0% found this document useful (0 votes)

79 views27 pages

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

This document summarizes a lecture on memory hierarchies and caches. It discusses set-associative caches, block placement and identification policies, block replacement policies, and write policies like write-back and write-through. It also describes techniques for reducing cache miss penalties like victim caches, critical word first loading, and multilevel caches. The goal is to improve overall cache performance by reducing miss rates, miss penalties, and hit times.

Uploaded by

rafi sk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views27 pages

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

Uploaded by

rafi sk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Computer Science 246

Computer Architecture
Spring
S i 2009
Harvard University

Instructor: Prof. David Brooks

dbrooks@eecs.harvard.edu

Memory Hierarchy and Caches (Part 2)

Computer Science 246

David Brooks
Caches
• Monday lecture
– Review of cache basics, direct-mapped, set-associative
caches
• Today
– More on cache performance, write strategies

Computer Science 246

David Brooks
Summary of Set Associativity
• Direct Mapped
pp
– One place in cache, One Comparator, No Muxes
• Set Associative Caches
– Restricted set of places
– N-way set associativity
– Number of comparators = number of blocks per set
– N:1 mux
• Fully Associative
– Anywhere in cache
– Number of comparators = number of blocks in cache
– N:1 mux needed
Computer Science 246
David Brooks
More Detailed Questions
• Block placement policy?
– Where does a block go when it is fetched?
• Block identification policy?
– How do we find a block in the cache?
• Block replacement policy?
– When fetching a block into a full cache, how do we
decide what other block gets kicked out?
• Write strategy?
– Does any of this differ for reads vs. writes?

Computer Science 246

David Brooks
Block Placement + ID
• Placement
– Invariant: block always goes in exactly one set
– Fully
Fully-Associative:
Associative: Cache is one set, block goes
anywhere
– Direct-Mapped: Block goes in exactly one frame
– Set-Associative: Block goes in one of a few frames
• Identification
– Find Set
– Search ways in parallel (compare tags, check valid bits)

Computer Science 246

David Brooks
Block Replacement
• Cache miss requires a replacement
• No decision needed in direct mapped cache
• More than one place for memory blocks in set
set-
associative
• Replacement Strategies
– Optimal
• Replace Block used furthest ahead in time (oracle)
– Least Recently Used (LRU)
• Optimized for temporal locality
– (Pseudo) Random
• Nearly as good as LRU, simpler
Computer Science 246
David Brooks
Write Policies
• Writes are only about 21% of data cache traffic
• Optimize cache for reads, do writes “on the side”
– Reads can do tag check/data read in parallel
– Writes must be sure we are updating the correct data
and the correct amount of data (1-8 byte writes)
– Serial process => slow
• What to do on a write hit?
• What to do on a write miss?

Computer Science 246

David Brooks
Write Hit Policies
• Q1: When to propagate new values to memory?
• Write back – Information is only written to the cache.
– Next lower level only updated when it is evicted (dirty bits
say when
h data
d t has
h been
b modified)
difi d)
– Can write at speed of cache
– Caches become temporarily
p y inconsistent with lower-levels of
hierarchy.
– Uses less memory bandwidth/power (multiple consecutive
writes may require only 1 final write)
– Multiple writes within a block can be merged into one write
– Evictions are longer latency now (must write back)

Computer Science 246

David Brooks
Write Hit Policies
• Q1: When to propagate new values to memory?
• Write through – Information is written to cache
and to the lower-level memory
– Main memory is always “consistent/coherent”
– Easier to implement
p – no dirtyy bits
– Reads never result in writes to lower levels (cheaper)
– Higher bandwidth needed
– Write buffers used to avoid write stalls

Computer Science 246

David Brooks
Write buffers
CPU • Small chunks of memory to
buffer outgoing writes
• Processor can continue
Cache when data written to buffer
Write Buffer
• Allows overlap of
processor execution with
Lower Levels of Memory memory update

• Write buffers are essential for write-through caches

Computer Science 246
David Brooks
Write buffers
• Writes can now be pipelined (rather than serial)
• Check tag + Write store data into Write Buffer
• Write data from Write buffer to L2 cache (tags ok)
• Loads must check write buffer for Store Op
pending stores to same address Address| Data
• Loads Check: Write Buffer Address| Data
• Write Buffer Entry
• Cache
Tagg Data
• Subsequent Levels of Memory
Computer Science 246 Data Cache
David Brooks
Write buffer policies:
Performance/Complexity Tradeoffs
Stores L2 Cache

Loads

• Allow merging of multiple stores? (“coalescing”)

• “Flush
Flush Policy”
Policy – How to do flushing of entries?
• “Load Servicing Policy” – What happens when a
load occurs to data currently in write buffer?
Computer Science 246
David Brooks
Write misses?
• Write Allocate
– Block is allocated on a write miss
– Standard write hit actions follow the block allocation
– Write misses = Read Misses
– Goes well with write-back
• No-write Allocate
– Write misses do not allocate a block
– O l update
Only d t lower-level
l l l memory
– Blocks only allocate on Read misses!
– Goes well with write-through
Computer Science 246
David Brooks
Summary of Write Policies

Write Policy Hit/Miss Writes to

WriteBack/Allocate Both L1 Cache
WriteBack/NoAllocate Hit L1 Cache
WriteBack/NoAllocate Miss L2 Cache
WriteThrough/Allocate Both Both
WriteThrough/NoAllocate Hit Both
WriteThrough/NoAllocate Miss L2 Cache

Computer Science 246

David Brooks
Cache Performance
CPU time = (CPU execution cycles + Memory Stall
Cycles)*Clock Cycle Time

AMAT = Hit Time + Miss Rate * Miss Penalty

• Reducing these three parameters can have a big

impact on performance
• Out-of-order processors can hide some of the miss
penalty
Computer Science 246
David Brooks
Reducing Miss Penalty
• Have already seen two examples of techniques to
reduce miss penalty
– Write buffers give priority to read misses over writes
– Merging write buffers
• Multiword writes are faster than many single word writes
• Now we consider several more
– Victim Caches
– Critical Word First/Early Restart
– Multilevel caches

Computer Science 246

David Brooks
Reducing Miss Penalty:
Victim Caches
• Direct mapped caches => many conflict misses
• Solution 1: More associativity (expensive)
• Solution 2: Victim Cache
• Victim Cache
– Small (4 to 8-entry),
8 entry) fully
fully-associative
associative cache between L1
cache and refill path
– Holds blocks discarded from cache because of evictions
– Checked on a miss before going to L2 cache
– Hit in victim cache => swap victim block with cache block

Computer Science 246

David Brooks
Reducing Miss Penalty:
Victim Caches

• Even one entry helps

some benchmarks!
• Helps more for smaller
caches, larger block
sizes
Computer Science 246
David Brooks
Reducing Miss Penalty:
Critical Word First/Early Restart
• CPU normally just needs one word at a time
• Large cache blocks have long transfer times
• Don
Don’tt wait for the full block to be loaded before
sending requested data word to the CPU
• Critical Word First
– Request the missed word first from memory and send it
to the CPU and continue execution
• Early Restart
– Fetch in order, but as soon as the requested block
arrives send it to the CPU and continue execution

Computer Science 246

David Brooks
Review: Improving Cache
Performance
• How to improve cache performance?
– Reducing Cache Miss Penalty
– Reducing Miss Rate
– Reducing Miss Penalty/Rate via parallelism
– Reducing Hit Time

Computer Science 246

David Brooks
Non-blocking Caches to reduce stalls
on misses
• Non-blocking
Non blocking cache or lockup
lockup-free
free cache allow data cache to
continue to supply cache hits during a miss
– requires out-of-order execution
– requires multi-bank memories
• “hit under miss” reduces the effective miss penalty by
working during miss vs. ignoring CPU requests
• “hit
“h under
d multiple
l l miss”” or “miss
“ under
d miss”” may further
f h
lower the effective miss penalty by overlapping multiple
misses
– Significantly increases the complexity of the cache controller as there
can be multiple outstanding memory accesses
– Requires multiple memory banks (otherwise cannot support)
– Penium Pro allows 4 outstanding memory misses
Computer Science 246
David Brooks
Value of Hit Under Miss for SPEC
P
Percentage
t M
Memory St
Stall
ll Time
Ti off a Blocking
Bl ki Cache
C h

• FP pprograms
g on average:
g AMAT= 0.68 -> 0.52 -> 0.34 -> 0.26
• Int programs on average: AMAT= 0.24 -> 0.20 -> 0.19 -> 0.19
• 8 KB Data Cache, Direct Mapped, 32B block, 16 cycle miss
Reducing Misses by Hardware
Prefetching of Instructions & Data
• Instruction Prefetching
– Alpha 21064 fetches 2 blocks on a miss
– Extra block placed in “stream buffer” not the cache
– On Access: check both cache and stream buffer
– On SB Hit: move line into cache
– On SB Miss: Clear and refill SB with successive lines
• Works with data blocks too:
– Jouppi [1990] 1 data stream buffer got 25% misses from 4KB cache; 4
streams got 43%
– Palacharla & Kessler [[1994]] for scientific pprograms
g for 8 streams ggot
50% to 70% of misses from
2 64KB, 4-way set associative caches
• Prefetching relies on having extra memory bandwidth that can
be used without penalty
Computer Science 246
David Brooks
Hardware Prefetching
• What to prefetch?
p
– One block ahead (spatially)
• What will this work well for?
– Address prediction for non-sequential data
• Correlated predictors (store miss, next_miss pairs in table)
• Jump-pointers
pp (augment
( g data structures with prefetch
p pointers)
p )
• When to prefetch?
– On everyy reference
– On a miss (basically doubles block size!)
– When resident data becomes “dead” -- how do we know?
• No one will use it anymore, so it can be kicked out
Computer Science 246
David Brooks
Reducing Misses by
S f
Software P f hi D
Prefetching Data
• Data Prefetch
– Load data into register (HP PA-RISC loads)
– Cache Prefetch: load into cache (MIPS IV, PowerPC, SPARC v. 9)
– Special prefetching instructions cannot cause faults; a form of speculative
execution
• Prefetching comes in two flavors:
– Binding prefetch: Requests load directly into register.
register
• Must be correct address and register!
– Non-Binding prefetch: Load into cache.
• Can be incorrect.
incorrect Faults?
• Issuing Prefetch Instructions takes time
– Is cost of prefetch issues < savings in reduced misses?
– Higher
Hi h superscalar l reduces
d difficulty
diffi lt off issue
i bandwidth
b d idth
Computer Science 246
David Brooks
Reducing Hit Times
• Some common techniques/trends
q
– Small and simple caches
• Pentium III – 16KB L1
• Pentium
i 4 – 8KB
8 L11
– Pipelined Caches (actually bandwidth increase)
• Pentium – 1 clock cycle
y I-Cache
• Pentium III – 2 clock cycle I-Cache
• Pentium 4 – 4 clock cycle I-Cache
– Trace
T Caches
C h
• Beyond spatial locality
• Dynamic sequences of instruction (including taken branches)

Computer Science 246

David Brooks
Cache Bandwidth
• Superscalars need multiple memory access per cycle
• Parallel cache access: more difficult than parallel ALUs
– Caches have state so multiple
p accesses will affect each other
• “True Multiporting”
– Multiple decoders, read/write wordlines per SRAM cell
– Pipeline a single port by “double pumping” Alpha 21264
– Multiple cache copies (like clustered register file) POWER4
• Interleaved Multiporting
– Cache divides into banks – two accesses to same bank =>
conflict

Computer Science 246

David Brooks

Class-5 Maths Workbook
67% (6)
Class-5 Maths Workbook
13 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
10 Caches
No ratings yet
10 Caches
34 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Lecture16 PDF
No ratings yet
Lecture16 PDF
4 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
Unit II
No ratings yet
Unit II
9 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Memory 2
No ratings yet
Memory 2
31 pages
Cache
No ratings yet
Cache
34 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
CS530 Fall2015 Lecture6
No ratings yet
CS530 Fall2015 Lecture6
3 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Lecture 7
No ratings yet
Lecture 7
21 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
27 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Ca Q,,a 4TH Sem
No ratings yet
Ca Q,,a 4TH Sem
18 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Week 11
No ratings yet
Week 11
45 pages
Cache Optimizations
No ratings yet
Cache Optimizations
29 pages
Stanford Advanced Caches
No ratings yet
Stanford Advanced Caches
46 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Lec 33
No ratings yet
Lec 33
26 pages
Cache
No ratings yet
Cache
35 pages
Unit 4
No ratings yet
Unit 4
72 pages
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Nvidia Round 1
No ratings yet
Nvidia Round 1
3 pages
Severity System Tasks Description: Section 19.9
No ratings yet
Severity System Tasks Description: Section 19.9
3 pages
Computer Organization: Performance, Risc and Cisc
No ratings yet
Computer Organization: Performance, Risc and Cisc
41 pages
Visualising Indian Heritage Digital Library Metaphor: Ramdasi@cdac - Ernet.in
No ratings yet
Visualising Indian Heritage Digital Library Metaphor: Ramdasi@cdac - Ernet.in
10 pages
Lec 4
No ratings yet
Lec 4
28 pages
VEDA IIT, A Unit of The VEDA Educational Society: Learn. Explore. Excel
No ratings yet
VEDA IIT, A Unit of The VEDA Educational Society: Learn. Explore. Excel
1 page
Short Surahs For Namaz Telugu Islam
No ratings yet
Short Surahs For Namaz Telugu Islam
2 pages
Automobile Alarm Circuit
100% (1)
Automobile Alarm Circuit
1 page
Automobile Alarm Circuit
100% (1)
Automobile Alarm Circuit
1 page
Ece PDF
No ratings yet
Ece PDF
157 pages
Aspen Mtell Exam Qus
100% (1)
Aspen Mtell Exam Qus
9 pages
Descriptive Text
0% (1)
Descriptive Text
35 pages
From Autonomy To Contact
No ratings yet
From Autonomy To Contact
7 pages
Handout On Gupta and Post Gupta Age
No ratings yet
Handout On Gupta and Post Gupta Age
6 pages
Nursing Care Plan
No ratings yet
Nursing Care Plan
13 pages
AF Catalogo de Balatas Malacates PDF
100% (1)
AF Catalogo de Balatas Malacates PDF
28 pages
6092 01
No ratings yet
6092 01
8 pages
Nu U227as en
No ratings yet
Nu U227as en
24 pages
Final Test - Sample With Keys
No ratings yet
Final Test - Sample With Keys
5 pages
Baseline Risk Assessment Report & Ohs Specification - Construc - 1 PDF
No ratings yet
Baseline Risk Assessment Report & Ohs Specification - Construc - 1 PDF
33 pages
What Is An Acid and Metal Reaction - BBC Bitesize
No ratings yet
What Is An Acid and Metal Reaction - BBC Bitesize
10 pages
Nag Hammadi Gospels
0% (1)
Nag Hammadi Gospels
3 pages
Geometry CHP 4.1 and 4.2
No ratings yet
Geometry CHP 4.1 and 4.2
3 pages
Moral Demands in Nonideal Theory 1st Issued As An Oxford Unversity Press PBK Edition Murphy
No ratings yet
Moral Demands in Nonideal Theory 1st Issued As An Oxford Unversity Press PBK Edition Murphy
60 pages
God, The Devil and You: A Systemic Functional Linguistic Analysis of The Language of Hillsong
No ratings yet
God, The Devil and You: A Systemic Functional Linguistic Analysis of The Language of Hillsong
32 pages
Honors Experience Proposal
No ratings yet
Honors Experience Proposal
8 pages
Häggloader 10HR-B: Atlas Copco
No ratings yet
Häggloader 10HR-B: Atlas Copco
2 pages
Termite MGT
No ratings yet
Termite MGT
33 pages
Rail Corrosion
No ratings yet
Rail Corrosion
71 pages
Antenna Chapter-1
No ratings yet
Antenna Chapter-1
19 pages
01 Wisniewski
No ratings yet
01 Wisniewski
39 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
34 pages
0620 w17 Ms 21 PDF
No ratings yet
0620 w17 Ms 21 PDF
3 pages
TUGAS TUTOR KE 3 .MAKUL B.inggris - 1
No ratings yet
TUGAS TUTOR KE 3 .MAKUL B.inggris - 1
7 pages
Digital Marketing MARVEL VS DC
No ratings yet
Digital Marketing MARVEL VS DC
22 pages
GR40CWC IG v1.0.0 Installation Guide 190311
No ratings yet
GR40CWC IG v1.0.0 Installation Guide 190311
39 pages
EAPP Reviewer 1
No ratings yet
EAPP Reviewer 1
2 pages
Does God Care
No ratings yet
Does God Care
4 pages
(MS-SHLLINK) - Shortcut To A File
No ratings yet
(MS-SHLLINK) - Shortcut To A File
4 pages

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

Uploaded by

Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University

Uploaded by

Computer Science 246

Instructor: Prof. David Brooks

Memory Hierarchy and Caches (Part 2)

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

• Write buffers are essential for write-through caches

• Allow merging of multiple stores? (“coalescing”)

Write Policy Hit/Miss Writes to

Computer Science 246

AMAT = Hit Time + Miss Rate * Miss Penalty

• Reducing these three parameters can have a big

Computer Science 246

Computer Science 246

• Even one entry helps

Computer Science 246

Computer Science 246

Computer Science 246

Computer Science 246

You might also like