Comparch Fall2020 Lecture11a Memory Controllers
Comparch Fall2020 Lecture11a Memory Controllers
Cai+, “Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory 4
Lifetime”, ICCD 2012.
Another View of the SSD
Controller
Cai+, “Error Characterization, Mitigation, and Recovery in Flash Memory Based Solid State Drives,” Proc. IEEE 2017
https://arxiv.org/pdf/1711.11427.pdf 5
On Modern SSD Controllers (I)
Proceedings of the IEEE, Sept. 2017
https://arxiv.org/pdf/1706.08642 6
Many Errors and Their
Mitigation [PIEEE’17]
“Error Characterization, Mitigation, and Recovery in Flash Memory Based Solid State Drives,” Proc. IEE
7
More Up-to-date Version
Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixin Luo, and Onur
Mutlu,
"Errors in Flash-Memory-Based Solid-State Drives: Anal
ysis, Mitigation, and Recovery"
8
On Modern SSD Controllers (II)
Arash Tavakkol, Juan Gomez-Luna, Mohammad Sadrosadati,
Saugata Ghose, and Onur Mutlu,
"MQSim: A Framework for Enabling Realistic Studies of
Modern Multi-Queue SSD Devices"
Proceedings of the
16th USENIX Conference on File and Storage Technologies
(FAST), Oakland, CA, USA, February 2018.
[Slides (pptx) (pdf)]
[Source Code]
9
On Modern SSD Controllers (III)
Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose,
Jeremie Kim, Yixin Luo, Yaohua Wang, Nika Mansouri Ghiasi,
Lois Orosa, Juan G. Luna and Onur Mutlu,
"FLIN: Enabling Fairness and Enhancing Performance in
Modern NVMe Solid State Drives"
Proceedings of the
45th International Symposium on Computer Architecture
(ISCA), Los Angeles, CA, USA, June 2018.
[Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)]
[Lightning Talk Video]
10
DRAM Types
DRAM has different types with different interfaces
optimized for different purposes
Commodity: DDR, DDR2, DDR3, DDR4, …
Low power (for mobile): LPDDR1, …, LPDDR5, …
High bandwidth (for graphics): GDDR2, …, GDDR5, …
Low latency: eDRAM, RLDRAM, …
3D stacked: WIO, HBM, HMC, …
…
Underlying microarchitecture is fundamentally the
same
A flexible memory controller can support various
DRAM types
This complicates the memory controller
Difficult to support all types (and upgrades) 11
DRAM Types (circa 2015)
14
DRAM Types vs. Workloads
Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali,
and Onur Mutlu,
"Demystifying Workload–DRAM Interactions: An Experimental St
udy"
Proceedings of the
ACM International Conference on Measurement and Modeling of Comput
er Systems
(SIGMETRICS), Phoenix, AZ, USA, June 2019.
[Preliminary arXiv Version]
[Abstract]
[Slides (pptx) (pdf)]
[MemBen Benchmark Suite]
[Source Code for GPGPUSim-Ramulator]
15
DRAM Controller: Functions
Ensure correct operation of DRAM (refresh and
timing)
17
A Modern DRAM Controller
Row decoder
(Row 1, Column 0)
Rows
Row address 0
1
Row 01
Row
Empty Row Buffer CONFLICT
HIT !
Column address 0
1
85 Column mux
Data
20
DRAM Scheduling Policies (II)
A scheduling policy is a request prioritization order
21
Row Buffer Management
Policies
Open row
Keep the row open after an access
+ Next access might need the same row row hit
-- Next access might need a different row row conflict, wasted
energy
Closed row
Close the row after an access (if no other requests already in the
request buffer need the same row)
+ Next access might need a different row avoid a row conflict
-- Next access might need the same row extra activate latency
Adaptive policies
Predict whether or not the next access to the bank will
be to the same row and act accordingly
22
Open vs. Closed Row Policies
Policy First access Next access Commands
needed for
next access
Open row Row 0 Row 0 (row hit) Read
Open row Row 0 Row 1 (row Precharge +
conflict) Activate Row 1 +
Read
Closed row Row 0 Row 0 – access Read
in request buffer
(row hit)
Closed row Row 0 Row 0 – access Activate Row 0 +
not in request Read +
buffer (row Precharge
closed)
Closed row Row 0 Row 1 (row Activate Row 1 +
closed) Read +
Precharge
23
DRAM Power Management
DRAM chips have power modes
Idea: When not accessing a chip power it down
Power states
Active (highest power)
All banks idle
Power-down
Self-refresh (lowest power)
24
Difficulty of DRAM
Control
Why Are DRAM Controllers Difficult
to Design?
Need to obey DRAM timing constraints for correctness
There are many (50+) timing constraints in DRAM
tWTR: Minimum number of cycles to wait before issuing a read
command after a write command is issued
tRC: Minimum number of cycles between the issuing of two
consecutive activate commands to the same bank
…
Need to keep track of many resources to prevent
conflicts
Channels, banks, ranks, data bus, address bus, row buffers
Need to handle DRAM refresh
Need to manage power consumption
Need to optimize performance & QoS (in the presence of
constraints)
Reordering is not simple
26
Fairness and QoS needs complicates the scheduling problem
Many DRAM Timing Constraints
27
More on DRAM Operation
Kim et al., “A Case for Exploiting Subarray-Level
Parallelism (SALP) in DRAM,” ISCA 2012.
Lee et al., “Tiered-Latency DRAM: A Low Latency
and Low Cost DRAM Architecture,” HPCA 2013.
28
Why So Many Timing
Constraints? (I)
30
DRAM Controller Design Is Becoming
More Difficult
CPU CPU CPU CPU
GPU
32
Memory Controller: Performance
Function
Resolves memory
contention by scheduling
Core Core Memory requests
Memory
Controller
Core Core
33
Self-Optimizing DRAM
Controllers
Problem: DRAM controllers are difficult to design
It is difficult for human designers to design a policy that can
adapt itself very well to different workloads and different
system conditions
Proceedings of the
35th
Goal: International
Learn Symposium
to choose actions to maximize r0on
+ Computer
r1 + 2r2 + … Architec
(0 <
1) ture
(ISCA), pages 39-50, Beijing, China, June 2008.
35
Self-Optimizing DRAM
Controllers
Dynamically adapt the memory scheduling policy via
interaction with the system at runtime
Associate system states and actions (commands) with long
term reward values: each action at a given state leads to a
learned reward
Schedule command with highest estimated long-term reward
value in each state
Continuously update reward values for <state, action> pairs
based on feedback from system
36
Self-Optimizing DRAM
Controllers
Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana,
"Self Optimizing Memory Controllers: A Reinforcement Learni
ng Approach"
Proceedings of the
35th International Symposium on Computer Architecture (ISCA),
pages 39-50, Beijing, China, June 2008.
37
States, Actions, Rewards
❖ State attributes
❖ Reward function
❖
• Number of reads, Actions
• +1 for writes, and load
• Activate
scheduling misses in
Read and Write transaction • Write
commands queue
• Read - load miss
• 0 at all other • Number of
times pending writes • Read - store miss
and ROB heads • Precharge - pending
Goal is to
waiting for
maximize long-
referenced row • Precharge - preemptiv
term data
bus utilization • Request’s • NOP
relative ROB
order
38
Performance Results
39
Self Optimizing DRAM
Controllers
+ Continuous learning in the presence of changing
environment
-- Hardware complexity?
40
More on Self-Optimizing DRAM
Controllers
Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana,
"Self Optimizing Memory Controllers: A Reinforcement Learni
ng Approach"
Proceedings of the
35th International Symposium on Computer Architecture (ISCA),
pages 39-50, Beijing, China, June 2008.
41
Challenge and Opportunity for
Future
Self-Optimizing
(Data-Driven)
Computing
Architectures
42
System Architecture Design
Today
Human-driven
Humans design the policies (how to do things)
Data-driven
Data-aware
45
Source: http://spectrum.ieee.org/image/MjYzMzAyMg.jpeg 46
We Need to Think Across the
Entire Stack
Problem
Algorithm
Program/Language
System Software
SW/HW Interface
Micro-architecture
Logic
Devices
Electrons
49
Inter-Thread/Application
Interference
Problem: Threads share the memory system, but
memory system does not distinguish between
threads’ requests
50
Uncontrolled Interference: An
Example
CORE
stream1 random2
CORE Multi-Core
Chip
L2 L2
CACHE CACHE
unfairness
INTERCONNECT
Shared DRAM
DRAM MEMORY CONTROLLER Memory System
51
A Memory Performance Hog
// initialize large arrays A, B // initialize large arrays A, B
STREAM RANDOM
- Sequential memory access - Random memory access
- Very low row buffer locality (3% hit rate
- Very high row buffer locality (96% hit rate)
- Memory intensive - Similarly memory intensive
52
What Does the Memory Hog
Do?
Row decoder
T0: Row 0
T0:
T1: Row 05
T1:
T0:Row
Row111
0
T1:
T0:Row
Row16
0
Memory Request Buffer Row
Row 00 Row Buffer
53
Unfair Slowdowns due to
Interference
matlab gcc
(Core
(Core 0)1) (Core 2)
(Core 1)
Slowdown 2
0.5
0
STREAM RANDOM
Virtual
gcc PC
56
Greater Problem with More
Cores
57
Greater Problem with More
Cores
58
More on Memory Performance
Attacks
Thomas Moscibroda and Onur Mutlu,
"Memory Performance Attacks: Denial of Memory
Service in Multi-Core Systems"
59
How Do We Solve The Problem?
Inter-thread interference is uncontrolled in all memory
resources
Memory controller
Interconnect
Caches
We need to control it
i.e., design an interference-aware (QoS-aware) memory
system
60
QoS-Aware Memory Scheduling
Resolves memory
contention by scheduling
Core Core Memory requests
Memory
Controller
Core Core
61
QoS-Aware Memory: Readings
(I)Onur Mutlu and Thomas Moscibroda,
Proceedings of the
40th International Symposium on Microarchitecture
(MICRO), pages 146-158, Chicago, IL, December 2007. [
Summary] [Slides (ppt)]
62
QoS-Aware Memory: Readings
(II)
Onur Mutlu and Thomas Moscibroda,
"Parallelism-Aware Batch Scheduling: Enhancing b
oth Performance and Fairness of Shared DRAM Sys
tems"
Proceedings of the
35th International Symposium on Computer Architecture
(ISCA), pages 63-74, Beijing, China, June 2008. [
Summary] [Slides (ppt)]
63
QoS-Aware Memory: Readings
(III)
Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-
Balter,
"ATLAS: A Scalable and High-Performance Scheduli
ng Algorithm for Multiple Memory Controllers"
Proceedings of the
16th International Symposium on High-Performance Com
puter Architecture
(HPCA), Bangalore, India, January 2010. Slides (pptx)
64
QoS-Aware Memory: Readings
(IV)
Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor
Harchol-Balter,
"Thread Cluster Memory Scheduling: Exploiting Dif
ferences in Memory Access Behavior"
Proceedings of the
43rd International Symposium on Microarchitecture
(MICRO), pages 65-76, Atlanta, GA, December 2010.
Slides (pptx) (pdf)
65
QoS-Aware Memory: Readings
(V)
Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri,
Harsha Rastogi, and Onur Mutlu,
"The Blacklisting Memory Scheduler: Achieving Hig
h Performance and Fairness at Low Cost"
Proceedings of the
32nd IEEE International Conference on Computer Design
(ICCD), Seoul, South Korea, October 2014. [Slides (pptx)
(pdf)]
66
QoS-Aware Memory: Readings
(VI)
Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha
Rastogi, and Onur Mutlu,
"BLISS: Balancing Performance, Fairness and Complexit
y in Memory Access Scheduling"
67
QoS-Aware Memory: Readings
(VII)
Rachata Ausavarungnirun, Kevin Chang, Lavanya
Subramanian, Gabriel Loh, and Onur Mutlu,
"Staged Memory Scheduling: Achieving High Perfo
rmance and Scalability in Heterogeneous Systems"
Proceedings of the
39th International Symposium on Computer Architecture
(ISCA), Portland, OR, June 2012. Slides (pptx)
68
QoS-Aware Memory: Readings
(VIII)
Hiroyuki Usui, Lavanya Subramanian, Kevin Kai-Wei Chang,
and Onur Mutlu,
"DASH: Deadline-Aware High-Performance Memory S
cheduler for Heterogeneous Systems with Hardware
Accelerators"
69
QoS-Aware Memory: Readings
(IX)
Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben
Jaiyen, and Onur Mutlu,
"MISE: Providing Performance Predictability and I
mproving Fairness in Shared Main Memory System
s"
Proceedings of the
19th International Symposium on High-Performance Com
puter Architecture
(HPCA), Shenzhen, China, February 2013. Slides (pptx)
70
QoS-Aware Memory: Readings
(X)
Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira
Khan, and Onur Mutlu,
"The Application Slowdown Model: Quantifying and Con
trolling the Impact of Inter-Application Interference at
Shared Caches and Main Memory"
Proceedings of the
48th International Symposium on Microarchitecture (MICRO),
Waikiki, Hawaii, USA, December 2015.
[Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [
Poster (pptx) (pdf)]
[Source Code]
71