0% found this document useful (0 votes)

10 views8 pages

ASA Chapter4

The document discusses the growing performance gap between CPUs and memory (DRAM) over time. It describes how caches were introduced to bridge this gap by exploiting the principles of temporal and spatial locality in memory access patterns. Caches provide smaller, faster memory between the CPU and main memory in a memory hierarchy. The goal is to create the illusion of large, fast, and cheap memory for programs.

Uploaded by

Little Bro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

ASA Chapter4

Uploaded by

Little Bro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

dce

2019
dce
2019

ADVANCED SYSTEM
ARCHITECTURES
Khoa Khoa học và Kỹ thuật Máy tính Memory Hierarchy Design
BM Kỹ thuật Máy tính
BK
TP.HCM

Trần Ngọc Thịnh

http://www.cse.hcmut.edu.vn/~tnthinh

dce dce
2019
Since 1980, CPU has outpaced DRAM ... 2019
Processor-DRAM Performance Gap Impact
Four-issue 2GHz superscalar accessing 100ns DRAM could execute • To illustrate the performance impact, assume a single-issue pipelined
800 instructions during time for one memory access! CPU with CPI = 1 using non-ideal memory.
Performance • Ignoring other factors, the minimum cost of a full memory access in terms
(1/latency) CPU of number of wasted CPU cycles:
CPU 60% per yr
CPU CPU Memory Minimum CPU memory stall cycles
2X in 1.5 yrs Year speed cycle Access or instructions wasted
MHZ ns ns

1986: 8 125 190 190/125 - 1 = 0.5

Gap grew 50% per 1989: 33 30 165 165/30 -1 = 4.5
year 1992: 60 16.6 120 120/16.6 -1 = 6.2
DRAM 1996: 200 5 110 110/5 -1 = 21
1998: 300 3.33 100 100/3.33 -1 = 29
DRAM
9% per yr 2000: 1000 1 90 90/1 - 1 = 89
2X in 10 yrs 2002: 2000 .5 80 80/.5 - 1 = 159
Year 2004: 3000 .333 60 60.333 - 1 = 179

3 4

1
dce Addressing the Processor-Memory Performance GAP dce
2019 2019
Levels of the Memory Hierarchy
Upper Level
Capacity
Access Time Staging
Cost
Today’s Xfer Unit faster
• Goal: Illusion of large, fast, cheap memory. CPU Registers Focus Registers
100s Bytes
Let programs address a memory space that <10s ns
Instr. Operands prog./compiler
1-8 bytes
scales to the disk size, at a speed that is Cache
K Bytes
Cache
10-100 ns
usually as fast as register access 1-0.1 cents/bit cache cntl
Blocks 8-128 bytes
Main Memory
M Bytes Memory
200ns- 500ns
• Solution: Put smaller, faster “cache” $.0001-.00001 cents /bit
Pages
OS
512-4K bytes
Disk
memories between CPU and DRAM. Create G Bytes, 10 ms
(10,000,000 ns) Disk
a “memory hierarchy”. -5 -6
10 - 10 cents/bit user/operator
Files Mbytes
Tape Larger
infinite
sec-min Tape Lower Level
10 -8

5 6

dce dce
2019
Common Predictable Patterns 2019

Caches

Two predictable properties of memory references:

Caches exploit both types of predictability:
• Temporal Locality: If a location is referenced, it is
likely to be referenced again in the near future (e.g., – Exploit temporal locality by remembering the contents of
loops, reuse). recently accessed locations.

– Exploit spatial locality by fetching blocks of data around

• Spatial Locality: If a location is referenced it is likely recently accessed locations.
that locations near it will be referenced in the near
future (e.g., straightline code, array access).

7 8

2
dce dce
Simple view of cache
2019 2019
Simple view of cache
Address Address • Hit rate: fraction of cache hit
Processor Main
CACHE Memory • Miss rate: 1 – Hit rate
Data Data
- Miss penalty: Time to replace a block + time to
• The processor accesses the cache first deliver the data to the processor
• Cache hit: Just use the data
• Cache miss: replace a block in cache by a
block from main memory, use the data
• The data transferred between cache and main
memory is in blocks, and controlled by
independent hardware
9 10

dce dce
2019
Simple view of cache 2019
Replacement
• Example: For(i = 0; i < 10; i++) S = S + A[i]; Block Number 0 1 2 3 4 5 6 7 8 9
1111111111 2222222222 33
0123456789 0123456789 01

• No cache: At least 12 accesses to main Memory

memory (10 A[i] and Read S, write S)
• With Cache: if A[i] and S is in a single block 0123

CPU need this

(ex 32-bytes), 1 access to load block to cache, Cache
and 1 access to write block to main memory
• Access to S: Temporal Locality • Cache cannot hold all blocks
• Access to A[i]: Spatial Locality (A[i]) • Replace a block by another that is currently
needed by CPU
11 12

3
dce dce
2019
Basic Cache Design & Operation Issues 2019

Q1: Where can a block be placed?

• Q1: Where can a block be placed cache? Block Number 0 1 2 3 4 5 6 7 8 9
1111111111 2222222222 33
0123456789 0123456789 01
(Block placement strategy & Cache organization)
– Fully Associative, Set Associative, Direct Mapped. Memory
• Q2: How is a block found if it is in cache?
(Block identification)
– Tag/Block. Set Number 0 0 1 2 3 01234567

• Q3: Which block should be replaced on a miss? Cache

(Block replacement)
– Random, LRU, FIFO.
Fully (2-way) Set Direct
• Q4: What happens on a write? Associative Associative Mapped
anywhere anywhere in only into
(Cache write policy) Block 12
set 0 block 4
can be placed
– Write through, write back. (12 mod 4) (12 mod 8)
13 14

dce dce
2019

Direct-Mapped Cache
2019
Direct-mapped Cache
Tag Index Block Address • Address: N bits (2N bytes)
Offset

t
• Cache has 2k lines (blocks)
k b
V Tag Data Block • Each line has 2b bytes
• Block M is mapped to the line M % 2k
2k
lines • Need t = N-k-bTag bits to identify mem. block
t • Advantage: Simple
= • Disadvantage: High miss rate
• What if CPU accesses block N0, N1 and N0 %
HIT Data Word or Byte 2k = N1 % 2k ?
15 16

4
dce dce
2019
Direct-mapped Cache 2019
4KB Direct Mapped Cache Example
A d d r e s s ( s h o w i n g b i t p o s it io n s )
Index field
31 30 13 12 11 2 1 0

1111111111 2222222222 33
Tag field B y te

Block Number 0 1 2 3 4 5 6 7 8 9 0123456789 0123456789 01

o f fs e t

20 10
H it D a ta
1K = 1024 Blocks Tag

Memory Each block = one word

In d e x

Can cache up to
In d e x V a l id T ag D a ta

01234567
232 bytes = 4 GB 1

of memory 2

Mapping function:
Cache Block frame number = 1021

1022
(Block address) MOD (1024)
• Access N0, N1 where N0 % 2k = N1 % 2k i.e. index field or
1023

20 32

10 low bits of block address

• Replace a block while there are many rooms
available! Block Address = 30 bits Block offset
Hit or miss?

Tag = 20 bits Index = 10 bits = 2 bits

17 18

dce dce
2019
64KB Direct Mapped Cache Example 2019

Fully Associative Cache

Tag field A d d r e s s (s h o w in g b it p o s i ti o n s )

4K= 4096 blocks 31 16 1 5 4 32 1 0 Index field

Each block = four words = 16 bytes V Tag Data Block
16 12 2 B yte
H it
T ag o ffs e t Word select D a ta

Can cache up to In d e x B lo c k o f f s e t
232 bytes = 4 GB
of memory
1 6 b i ts 1 2 8 b i ts
t
V T ag D a ta

Tag
e n t rie s

t
16 32 32 32 32
=
HIT
Mux
Offset
Block

Hit or miss? 32

Data
Larger cache blocks take better advantage of spatial locality
and thus may result in a lower miss rate
Block Address = 28 bits
Block offset = Word
Tag = 16 bits Index = 12 bits = 4 bits
b or Byte
Mapping Function: Cache Block frame number = (Block address) MOD (4096)
i.e. index field or 12 low bit of block address
19 20

5
dce dce
2019
Fully associative cache 2019
Set-Associative Cache
• CAM: Content Addressable Memory Tag Index Block
Offset b
• Each block can be mapped to any lines in t
k
cache V Tag Data Block V Tag Data Block
• Tag bit: t = N-b. Compared to Tag of all lines
• Advantage: replacement occurs only when no
rooms available t

• Disadvantage: resource consumption, delay Data

= =
by comparing many elements Word
or Byte

dce dce 4K Four-Way Set Associative Cache:

2019
W-way Set-associative Cache 2019

MIPS Implementation Example

Block
• Balancing: Direct mapped cache vs Fully
A d d re s s
Tag 31 30 1 2 11 10 9 8 3 2 1 0 Offset
Field
associative cache 22 8 Index
Field
1024 block frames
• Cache has 2k sets Each block = one word In d ex V Tag D a ta V Tag D a ta V T ag D ata V T ag D a ta

4-way set associative 0

• Each set has w lines

1
1024 / 4= 256 sets 2

25 3
Can cache up to
• Block M is mapped to one of w lines in set M 232 bytes = 4 GB
25 4
25 5
22 32
of memory
% 2k
• Tag bit: t = N-k-b Set associative cache requires parallel tag
matching and more complex hit logic which
may increase hit time

• Currently: widely used (Intel, AMD, …) Block Address = 30 bits

Block offset 4 - to - 1 m u l tip le xo r
Tag = 22 bits Index = 8 bits = 2 bits

H it D a ta

Mapping Function: Cache Set Number = index= (Block address) MOD (256)
23 24

6
dce dce
2019
Q2: How is a block found? 2019
What causes a MISS?
• Index selects which set to look in
• Three Major Categories of Cache Misses:
• Compare Tag to find block – Compulsory Misses: first access to a block
• Increasing associativity shrinks index, – Capacity Misses: cache cannot contain all blocks needed
expands tag. Fully Associative caches have to execute the program
– Conflict Misses: block replaced by another block and then
no index field. later retrieved - (affects set assoc. or direct mapped
• Direct-mapped: 1-way set associative? caches)
• Nightmare Scenario: ping pong effect!
• Fully associative: 1 set?
Memory Address
Block Address Block
Tag Index Offset

25 26

dce dce
Block Size and Spatial Locality
2019 2019

Q3: Which block should be replaced on a miss?

Block is unit of transfer between the cache and memory • Easy for Direct Mapped
4 word block, • Set Associative or Fully Associative:
Tag Word0 Word1 Word2 Word3
b=2 – Random
Split CPU block address offsetb – Least Recently Used (LRU)
address • LRU cache state must be updated on every access
32-b bits b bits • true implementation only feasible for small sets (2-
2b = block size a.k.a line size (in bytes) way, 4-way)
Larger block size has distinct hardware advantages
• pseudo-LRU binary tree often used for 4-8 way
• less tag overhead – First In, First Out (FIFO) a.k.a. Round-Robin
• exploit fast burst transfers from DRAM
• exploit fast burst transfers over wide busses • used in highly associative caches

What are the disadvantages of increasing block size?

• Replacement policy has a second order effect
Fewer blocks => more conflicts. Can waste bandwidth. since replacement only happens on misses
27 28

7
dce dce
2019
Q4: What happens on a write? 2019
Reading assignment 1
• Cache hit:
– write through: write both cache & memory • Cache performance
• generally higher traffic but simplifies cache coherence – Replacement policy (algorithms)
– write back: write cache only – Optimization (Miss rate, penalty, …)
(memory is written only when the entry is evicted)
• a dirty bit per block can further reduce the traffic • Reference
• Cache miss: – Hennessy - Patterson - Computer Architecture. A
– no write allocate: only write to main memory Quantitative
– write allocate (aka fetch on write): fetch into cache – www2.lns.mit.edu/~avinatan/research/cache.pdf
– More on internet
• Common combinations:
– write through and no write allocate
– write back with write allocate
29 30

dce
2019
Reading Log1
• Listing some algorithms of replacement
policy? Can it affect cache performance?

• What is cache performance?

• What criteria can effect cache performance?

• Explain optimization technique on each

criterion

• Multi-level Caches
31

GR-TIEMS Operation Manual (6F2M1082) 0.02
No ratings yet
GR-TIEMS Operation Manual (6F2M1082) 0.02
289 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Ch2-MemoryHierarchyDesign Appb
No ratings yet
Ch2-MemoryHierarchyDesign Appb
101 pages
APC220
No ratings yet
APC220
9 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Dell Latitude XT Tablet PC Wistron Parker Rev - 1 SCH
No ratings yet
Dell Latitude XT Tablet PC Wistron Parker Rev - 1 SCH
53 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
Model PM15E Electro-Pneumatic Valve Positioner: Installation, Operation and Maintenance Instructions
No ratings yet
Model PM15E Electro-Pneumatic Valve Positioner: Installation, Operation and Maintenance Instructions
12 pages
Cache Memory
No ratings yet
Cache Memory
57 pages
CH05-COA11e
100% (1)
CH05-COA11e
43 pages
Compal LA-G021P r1.B
No ratings yet
Compal LA-G021P r1.B
49 pages
Cache Memory
No ratings yet
Cache Memory
61 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
No ratings yet
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
47 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
Watlow Series C & TM Manual Rev E
No ratings yet
Watlow Series C & TM Manual Rev E
25 pages
DX Diag
No ratings yet
DX Diag
33 pages
L-4 (Cache Memory)
No ratings yet
L-4 (Cache Memory)
61 pages
CH04 COA9e Cache Memory Repaired
No ratings yet
CH04 COA9e Cache Memory Repaired
42 pages
Philips 49pus6551 12 Chassis qm16.3 SM
No ratings yet
Philips 49pus6551 12 Chassis qm16.3 SM
224 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
1783-In016 - En-P.pdf Stratix 5700 Ethernet Managed Switches Installation Instructions 1903
No ratings yet
1783-In016 - En-P.pdf Stratix 5700 Ethernet Managed Switches Installation Instructions 1903
20 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Coa Notes Part 4 Memory System
No ratings yet
Coa Notes Part 4 Memory System
37 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
54 pages
CH05 COA11e
No ratings yet
CH05 COA11e
43 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
CS2115 chapter-6
No ratings yet
CS2115 chapter-6
45 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
67 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
2786237_sizing sap hana with presistent memory
No ratings yet
2786237_sizing sap hana with presistent memory
3 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
04_Cache Memory
No ratings yet
04_Cache Memory
61 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
WD1772
No ratings yet
WD1772
28 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Chap 6
No ratings yet
Chap 6
48 pages
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
16 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
72 pages
cache_ppt
No ratings yet
cache_ppt
38 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
28 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Acer Extensa 5220_5620_7220_7620 schematic
No ratings yet
Acer Extensa 5220_5620_7220_7620 schematic
46 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
Installation Guide (Z9) V1.0
No ratings yet
Installation Guide (Z9) V1.0
80 pages
AUTOMATOR Cat Micropunti ENG
No ratings yet
AUTOMATOR Cat Micropunti ENG
19 pages
Conspect of Lecture 7
No ratings yet
Conspect of Lecture 7
13 pages
AR-2000 Preamp Removal & Re-installation
No ratings yet
AR-2000 Preamp Removal & Re-installation
11 pages
Laptop Workstation Guide Indian Rupee35 Lakhs
No ratings yet
Laptop Workstation Guide Indian Rupee35 Lakhs
14 pages
Prebuilt Gaming Computers
No ratings yet
Prebuilt Gaming Computers
2 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
cache_memory
No ratings yet
cache_memory
51 pages
03-Chap4-Cache Memory Mapping
No ratings yet
03-Chap4-Cache Memory Mapping
24 pages
CH05-COA11e - modified
No ratings yet
CH05-COA11e - modified
46 pages
ELSPF01
No ratings yet
ELSPF01
21 pages
Pricelist
No ratings yet
Pricelist
9 pages
CMP3010L09-MemoryII (1)
No ratings yet
CMP3010L09-MemoryII (1)
39 pages
Ch4_CacheMemory (1)
No ratings yet
Ch4_CacheMemory (1)
29 pages
6.Module 2_Part 2
No ratings yet
6.Module 2_Part 2
39 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
mmo_121591395_1727445307_1573_1920555
No ratings yet
mmo_121591395_1727445307_1573_1920555
2 pages
Penawaran lb1 BNPT
No ratings yet
Penawaran lb1 BNPT
2 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Agisoft Metashape: Processing Report 19 May 2022
No ratings yet
Agisoft Metashape: Processing Report 19 May 2022
7 pages
Pro MP22
No ratings yet
Pro MP22
2 pages
AHK3000D QSG v9
No ratings yet
AHK3000D QSG v9
2 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
ABX00087 Schematics
No ratings yet
ABX00087 Schematics
2 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Cache Design
No ratings yet
Cache Design
59 pages
Ce 419
No ratings yet
Ce 419
4 pages
Guillotine in The Network
No ratings yet
Guillotine in The Network
13 pages
Quarter 2 - Module 1: Performing Mensuration and Calculation
No ratings yet
Quarter 2 - Module 1: Performing Mensuration and Calculation
21 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet