0% found this document useful (0 votes)
22 views

Comparch Fall2020 Lecture11a Memory Controllers

lecture note for the memory controller

Uploaded by

lijianing1024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Comparch Fall2020 Lecture11a Memory Controllers

lecture note for the memory controller

Uploaded by

lijianing1024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Computer Architecture

Lecture 11a: Memory


Controllers

Prof. Onur Mutlu


ETH Zürich
Fall 2020
29 October 2020
Memory Controllers
DRAM versus Other Types of
Memories
 Long latency memories have similar characteristics
that need to be controlled.

 The following discussion will use DRAM as an


example, but many scheduling and control issues
are similar in the design of controllers for other
types of memories
 Flash memory
 Other emerging memory technologies
 Phase Change Memory
 Spin-Transfer Torque Magnetic Memory
 These other technologies can also place other
demands on the controller
3
Flash Memory (SSD) Controllers
 Similar to DRAM memory controllers, except:
 They are flash memory specific
 They do much more: complex error correction, wear
leveling, voltage optimization, garbage collection,
page remapping, …

Cai+, “Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory 4
Lifetime”, ICCD 2012.
Another View of the SSD
Controller

Cai+, “Error Characterization, Mitigation, and Recovery in Flash Memory Based Solid State Drives,” Proc. IEEE 2017

https://arxiv.org/pdf/1711.11427.pdf 5
On Modern SSD Controllers (I)
Proceedings of the IEEE, Sept. 2017

https://arxiv.org/pdf/1706.08642 6
Many Errors and Their
Mitigation [PIEEE’17]

“Error Characterization, Mitigation, and Recovery in Flash Memory Based Solid State Drives,” Proc. IEE
7
More Up-to-date Version
 Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixin Luo, and Onur
Mutlu,
"Errors in Flash-Memory-Based Solid-State Drives: Anal
ysis, Mitigation, and Recovery"

Invited Book Chapter in Inside Solid State Drives, 2018.


[Preliminary arxiv.org version]

8
On Modern SSD Controllers (II)
 Arash Tavakkol, Juan Gomez-Luna, Mohammad Sadrosadati,
Saugata Ghose, and Onur Mutlu,
"MQSim: A Framework for Enabling Realistic Studies of
Modern Multi-Queue SSD Devices"

Proceedings of the
16th USENIX Conference on File and Storage Technologies
(FAST), Oakland, CA, USA, February 2018.
[Slides (pptx) (pdf)]
[Source Code]

9
On Modern SSD Controllers (III)
 Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose,
Jeremie Kim, Yixin Luo, Yaohua Wang, Nika Mansouri Ghiasi,
Lois Orosa, Juan G. Luna and Onur Mutlu,
"FLIN: Enabling Fairness and Enhancing Performance in
Modern NVMe Solid State Drives"

Proceedings of the
45th International Symposium on Computer Architecture
(ISCA), Los Angeles, CA, USA, June 2018.
[Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)]
[Lightning Talk Video]

10
DRAM Types
 DRAM has different types with different interfaces
optimized for different purposes
 Commodity: DDR, DDR2, DDR3, DDR4, …
 Low power (for mobile): LPDDR1, …, LPDDR5, …
 High bandwidth (for graphics): GDDR2, …, GDDR5, …
 Low latency: eDRAM, RLDRAM, …
 3D stacked: WIO, HBM, HMC, …
 …
 Underlying microarchitecture is fundamentally the
same
 A flexible memory controller can support various
DRAM types
 This complicates the memory controller
 Difficult to support all types (and upgrades) 11

DRAM Types (circa 2015)

Kim+, “Ramulator: A Flexible and Extensible DRAM Simulator”, IEEE CAL


2015. 12
Modern DRAM Types: Comparison to DDR3

Low-  Bank groups


Banks Bank 3D-
DRAM
per Group Stack
Type Power Bank Group Bank Group
Rank s ed
Bank Bank Bank Bank
DDR3 8
DDR4 16  increased latency
GDDR5
GDDR5 16
16 
 increased area/power memory channel
HBM
HBM
High- 16 
High-
Bandwidth
Bandwidth
16   3D-stacked
Memory high bandwidth with
Memory DRAM Through-Silicon
HMC narrower rows,
HMC
Hybrid higher latency Vias (TSVs)
Hybrid
Memory
256
256 

Memory
Cube
Cube Memory
Wide I/O 4   Layers
Wide I/O 4  
Wide I/O
Wide I/O 8  
2 8   dedicated Logic Layer
2
LPDDR3 8  Page 13 of 25
Ramulator Paper and Source
Code
Yoongu Kim, Weikun Yang, and Onur Mutlu,
"Ramulator: A Fast and Extensible DRAM Simu
lator"

IEEE Computer Architecture Letters (CAL), March


2015.
[Source Code]

 Source code is released under the liberal MIT


License
 https://github.com/CMU-SAFARI/ramulator

14
DRAM Types vs. Workloads
 Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali,
and Onur Mutlu,
"Demystifying Workload–DRAM Interactions: An Experimental St
udy"

Proceedings of the
ACM International Conference on Measurement and Modeling of Comput
er Systems
(SIGMETRICS), Phoenix, AZ, USA, June 2019.
[Preliminary arXiv Version]
[Abstract]
[Slides (pptx) (pdf)]
[MemBen Benchmark Suite]
[Source Code for GPGPUSim-Ramulator]

15
DRAM Controller: Functions
 Ensure correct operation of DRAM (refresh and
timing)

 Service DRAM requests while obeying timing


constraints of DRAM chips
 Constraints: resource conflicts (bank, bus, channel),
minimum write-to-read delays
 Translate requests to DRAM command sequences

 Buffer and schedule requests for high performance +


QoS
 Reordering, row-buffer, bank, rank, bus management

 Manage power consumption and thermals in DRAM


16

A Modern DRAM Controller (I)

17
A Modern DRAM Controller

Mutlu+, “Stall-Time Fair Memory Scheduling,” MICRO 2007. 18


DRAM Scheduling Policies (I)
 FCFS (first come first served)
 Oldest request first

 FR-FCFS (first ready, first come first served)


1. Row-hit first
2. Oldest first
Goal: Maximize row buffer hit rate  maximize DRAM
throughput

 Actually, scheduling is done at the command level


 Column commands (read/write) prioritized over row
commands (activate/precharge)
 Within each group, older commands prioritized over
younger ones
19
Review: DRAM Bank Operation
Access Address:
(Row 0, Column 0) Columns
(Row 0, Column 1)
(Row 0, Column 85)

Row decoder
(Row 1, Column 0)

Rows
Row address 0
1

Row 01
Row
Empty Row Buffer CONFLICT
HIT !

Column address 0
1
85 Column mux

Data

20
DRAM Scheduling Policies (II)
 A scheduling policy is a request prioritization order

 Prioritization can be based on


 Request age
 Row buffer hit/miss status
 Request type (prefetch, read, write)
 Requestor type (load miss or store miss)
 Request criticality
 Oldest miss in the core?
 How many instructions in core are dependent on it?
 Will it stall the processor?
 Interference caused to other cores
 …

21
Row Buffer Management
Policies
Open row
 Keep the row open after an access
+ Next access might need the same row  row hit
-- Next access might need a different row  row conflict, wasted
energy

 Closed row
 Close the row after an access (if no other requests already in the
request buffer need the same row)
+ Next access might need a different row  avoid a row conflict
-- Next access might need the same row  extra activate latency

 Adaptive policies
 Predict whether or not the next access to the bank will
be to the same row and act accordingly
22
Open vs. Closed Row Policies
Policy First access Next access Commands
needed for
next access
Open row Row 0 Row 0 (row hit) Read
Open row Row 0 Row 1 (row Precharge +
conflict) Activate Row 1 +
Read
Closed row Row 0 Row 0 – access Read
in request buffer
(row hit)
Closed row Row 0 Row 0 – access Activate Row 0 +
not in request Read +
buffer (row Precharge
closed)
Closed row Row 0 Row 1 (row Activate Row 1 +
closed) Read +
Precharge
23
DRAM Power Management
 DRAM chips have power modes
 Idea: When not accessing a chip power it down

 Power states
 Active (highest power)
 All banks idle
 Power-down
 Self-refresh (lowest power)

 Tradeoff: State transitions incur latency during


which the chip cannot be accessed

24
Difficulty of DRAM
Control
Why Are DRAM Controllers Difficult
to Design?
 Need to obey DRAM timing constraints for correctness
 There are many (50+) timing constraints in DRAM
 tWTR: Minimum number of cycles to wait before issuing a read
command after a write command is issued
 tRC: Minimum number of cycles between the issuing of two
consecutive activate commands to the same bank
 …
 Need to keep track of many resources to prevent
conflicts
 Channels, banks, ranks, data bus, address bus, row buffers
 Need to handle DRAM refresh
 Need to manage power consumption
 Need to optimize performance & QoS (in the presence of
constraints)
 Reordering is not simple
26
 Fairness and QoS needs complicates the scheduling problem
Many DRAM Timing Constraints

 From Lee et al., “DRAM-Aware Last-Level Cache Writeback:


Reducing Write-Caused Interference in Memory Systems,” HPS
Technical Report, April 2010.

27
More on DRAM Operation
 Kim et al., “A Case for Exploiting Subarray-Level
Parallelism (SALP) in DRAM,” ISCA 2012.
 Lee et al., “Tiered-Latency DRAM: A Low Latency
and Low Cost DRAM Architecture,” HPCA 2013.

28
Why So Many Timing
Constraints? (I)

Kim et al., “A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM,”


ISCA 2012.
29
Why So Many Timing
Constraints? (II)

Lee et al., “Tiered-Latency DRAM: A Low


Latency and Low Cost DRAM Architecture,”
HPCA 2013.

30
DRAM Controller Design Is Becoming
More Difficult
CPU CPU CPU CPU
GPU

Shared Cache HWA HWA

DRAM and Hybrid Memory Controllers

DRAM and Hybrid Memories

 Heterogeneous agents: CPUs, GPUs, and HWAs


 Main memory interference between CPUs, GPUs,
HWAs
 Many timing constraints for various memory types
 Many goals at the same time: performance,
fairness, QoS, energy efficiency, … 31
Reality and Dream
 Reality: It is difficult to design a policy that
maximizes performance, QoS, energy-efficiency, …
 Too many things to think about
 Continuously changing workload and system behavior

 Dream: Wouldn’t it be nice if the DRAM controller


automatically found a good scheduling policy on its
own?

32
Memory Controller: Performance
Function

Resolves memory
contention by scheduling
Core Core Memory requests
Memory
Controller
Core Core

How to schedule requests to maximize system


performance?

33
Self-Optimizing DRAM
Controllers
Problem: DRAM controllers are difficult to design
 It is difficult for human designers to design a policy that can
adapt itself very well to different workloads and different
system conditions

 Idea: A memory controller that adapts its scheduling policy


to workload behavior and system conditions using machine
learning.

 Observation: Reinforcement learning maps nicely to


memory control.

 Design: Memory controller is a reinforcement learning


agent
 It dynamically and continuously learns and employs the best
scheduling
Ipek+, “Self policy
Optimizing to Controllers:
Memory maximize Along-term performance.
Reinforcement Learning Approach,”
Self-Optimizing DRAM
Controllers
Engin Ipek, Onur Mutlu, José F. Martínez, and Rich
Caruana,
"Self Optimizing Memory Controllers: A Reinfo
rcement Learning Approach"

Proceedings of the
35th
Goal: International
Learn Symposium
to choose actions to maximize r0on
+ Computer
r1 +  2r2 + … Architec
(0  <
1) ture
(ISCA), pages 39-50, Beijing, China, June 2008.

35
Self-Optimizing DRAM
Controllers
Dynamically adapt the memory scheduling policy via
interaction with the system at runtime
 Associate system states and actions (commands) with long
term reward values: each action at a given state leads to a
learned reward
 Schedule command with highest estimated long-term reward
value in each state
 Continuously update reward values for <state, action> pairs
based on feedback from system

36
Self-Optimizing DRAM
Controllers
Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana,
"Self Optimizing Memory Controllers: A Reinforcement Learni
ng Approach"

Proceedings of the
35th International Symposium on Computer Architecture (ISCA),
pages 39-50, Beijing, China, June 2008.

37
States, Actions, Rewards
❖ State attributes
❖ Reward function

• Number of reads, Actions
• +1 for writes, and load
• Activate
scheduling misses in
Read and Write transaction • Write
commands queue
• Read - load miss
• 0 at all other • Number of
times pending writes • Read - store miss
and ROB heads • Precharge - pending
Goal is to
waiting for
maximize long-
referenced row • Precharge - preemptiv
term data
bus utilization • Request’s • NOP
relative ROB
order

38
Performance Results

Large, robust performance improvements


over many human-designed policies

39
Self Optimizing DRAM
Controllers
+ Continuous learning in the presence of changing
environment

+ Reduced designer burden in finding a good scheduling


policy. Designer specifies:
1) What system variables might be useful
2) What target to optimize, but not how to
optimize it

-- How to specify different objectives? (e.g., fairness,


QoS, …)

-- Hardware complexity?
40
More on Self-Optimizing DRAM
Controllers
Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana,
"Self Optimizing Memory Controllers: A Reinforcement Learni
ng Approach"

Proceedings of the
35th International Symposium on Computer Architecture (ISCA),
pages 39-50, Beijing, China, June 2008.

41
Challenge and Opportunity for
Future

Self-Optimizing
(Data-Driven)
Computing
Architectures
42
System Architecture Design
Today
Human-driven
 Humans design the policies (how to do things)

 Many (too) simple, short-sighted policies all over the


system

 No automatic data-driven policy learning

 (Almost) no learning: cannot take lessons from past


actions
Can we design
fundamentally intelligent architectures?
43
An Intelligent Architecture
 Data-driven
 Machine learns the “best” policies (how to do things)

 Sophisticated, workload-driven, changing, far-sighted


policies

 Automatic data-driven policy learning

 All controllers are intelligent data-driven agents

We need to rethink design


(of all controllers)
44
Architectures for Intelligent
Machines
Data-centric

Data-driven

Data-aware
45
Source: http://spectrum.ieee.org/image/MjYzMzAyMg.jpeg 46
We Need to Think Across the
Entire Stack

Problem
Algorithm
Program/Language
System Software
SW/HW Interface
Micro-architecture
Logic
Devices
Electrons

We can get there step by step


47
Computer Architecture
Lecture 11a: Memory
Controllers

Prof. Onur Mutlu


ETH Zürich
Fall 2020
29 October 2020
Memory Interference

49
Inter-Thread/Application
Interference
Problem: Threads share the memory system, but
memory system does not distinguish between
threads’ requests

 Existing memory systems


 Free-for-all, shared based on demand
 Control algorithms thread-unaware and thread-unfair
 Aggressive threads can deny service to others
 Do not try to reduce or control inter-thread
interference

50
Uncontrolled Interference: An
Example
CORE
stream1 random2
CORE Multi-Core
Chip

L2 L2
CACHE CACHE
unfairness
INTERCONNECT
Shared DRAM
DRAM MEMORY CONTROLLER Memory System

DRAM DRAM DRAM DRAM


Bank 0 Bank 1 Bank 2 Bank 3

51
A Memory Performance Hog
// initialize large arrays A, B // initialize large arrays A, B

for (j=0; j<N; j++) { for (j=0; j<N; j++) {


index = j*linesize;
streaming index = rand();random
A[index] = B[index]; A[index] = B[index];
… …
} }

STREAM RANDOM
- Sequential memory access - Random memory access
- Very low row buffer locality (3% hit rate
- Very high row buffer locality (96% hit rate)
- Memory intensive - Similarly memory intensive

Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007

52
What Does the Memory Hog
Do?

Row decoder
T0: Row 0
T0:
T1: Row 05
T1:
T0:Row
Row111
0
T1:
T0:Row
Row16
0
Memory Request Buffer Row
Row 00 Row Buffer

Row size: 8KB, cache blockColumn mux


size: 64B
T0: STREAM
128
T1: (8KB/64B)
RANDOM requests of T0 serviced
Data before T1

Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007

53
Unfair Slowdowns due to
Interference

matlab gcc
(Core
(Core 0)1) (Core 2)
(Core 1)

Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service


54
in multi-core systems,” USENIX Security 2007.
DRAM Controllers
 A row-conflict memory access takes significantly
longer than a row-hit access

 Current controllers take advantage of the row buffer

 Commonly used scheduling policy (FR-FCFS) [Rixner


2000]*
(1) Row-hit first: Service row-hit memory accesses first
(2) Oldest-first: Then service older accesses first

 This scheduling policy aims to maximize DRAM


throughput
But, it is unfair when multiple threads share the DRAM
 et al., “Memory Access Scheduling,” ISCA 2000.
*Rixner
*Zuravleff and Robinson, “Controller for a synchronous DRAM …,” US Patent 5,630,096, May 1997.
system
55
Effect of the Memory Performance
Hog 3
2.82X slowdown
2.5

Slowdown 2

1.5 1.18X slowdown

0.5

0
STREAM RANDOM
Virtual
gcc PC

Results on Intel Pentium D running Windows XP


(Similar results for Intel Core Duo and AMD Turion, and on Fedora Linux)

Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007

56
Greater Problem with More
Cores

 Vulnerable to denial of service (DoS)


 Unable to enforce priorities or SLAs
 Low system performance
Uncontrollable, unpredictable system

57
Greater Problem with More
Cores

 Vulnerable to denial of service (DoS)


 Unable to enforce priorities or SLAs
 Low system performance
Uncontrollable, unpredictable system

58
More on Memory Performance
Attacks
Thomas Moscibroda and Onur Mutlu,
"Memory Performance Attacks: Denial of Memory
Service in Multi-Core Systems"

Proceedings of the 16th USENIX Security Symposium


(USENIX SECURITY), pages 257-274, Boston, MA,
August 2007. Slides (ppt)

59
How Do We Solve The Problem?
 Inter-thread interference is uncontrolled in all memory
resources
 Memory controller
 Interconnect
 Caches

 We need to control it
 i.e., design an interference-aware (QoS-aware) memory
system

60
QoS-Aware Memory Scheduling
Resolves memory
contention by scheduling
Core Core Memory requests
Memory
Controller
Core Core

 How to schedule requests to provide


 High system performance
 High fairness to applications
 Configurability to system software

 Memory controller needs to be aware of threads

61
QoS-Aware Memory: Readings
(I)Onur Mutlu and Thomas Moscibroda,

"Stall-Time Fair Memory Access Scheduling for Chi


p Multiprocessors"

Proceedings of the
40th International Symposium on Microarchitecture
(MICRO), pages 146-158, Chicago, IL, December 2007. [
Summary] [Slides (ppt)]

62
QoS-Aware Memory: Readings
(II)
 Onur Mutlu and Thomas Moscibroda,
"Parallelism-Aware Batch Scheduling: Enhancing b
oth Performance and Fairness of Shared DRAM Sys
tems"

Proceedings of the
35th International Symposium on Computer Architecture
(ISCA), pages 63-74, Beijing, China, June 2008. [
Summary] [Slides (ppt)]

63
QoS-Aware Memory: Readings
(III)
 Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-
Balter,
"ATLAS: A Scalable and High-Performance Scheduli
ng Algorithm for Multiple Memory Controllers"

Proceedings of the
16th International Symposium on High-Performance Com
puter Architecture
(HPCA), Bangalore, India, January 2010. Slides (pptx)

64
QoS-Aware Memory: Readings
(IV)
 Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor
Harchol-Balter,
"Thread Cluster Memory Scheduling: Exploiting Dif
ferences in Memory Access Behavior"

Proceedings of the
43rd International Symposium on Microarchitecture
(MICRO), pages 65-76, Atlanta, GA, December 2010.
Slides (pptx) (pdf)

65
QoS-Aware Memory: Readings
(V)
Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri,
Harsha Rastogi, and Onur Mutlu,
"The Blacklisting Memory Scheduler: Achieving Hig
h Performance and Fairness at Low Cost"

Proceedings of the
32nd IEEE International Conference on Computer Design
(ICCD), Seoul, South Korea, October 2014. [Slides (pptx)
(pdf)]

66
QoS-Aware Memory: Readings
(VI)
Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha
Rastogi, and Onur Mutlu,
"BLISS: Balancing Performance, Fairness and Complexit
y in Memory Access Scheduling"

IEEE Transactions on Parallel and Distributed Systems (TPDS),


to appear in 2016. arXiv.org version, April 2015.
An earlier version as SAFARI Technical Report, TR-SAFARI-2015-
004, Carnegie Mellon University, March 2015.
[Source Code]

67
QoS-Aware Memory: Readings
(VII)
Rachata Ausavarungnirun, Kevin Chang, Lavanya
Subramanian, Gabriel Loh, and Onur Mutlu,
"Staged Memory Scheduling: Achieving High Perfo
rmance and Scalability in Heterogeneous Systems"

Proceedings of the
39th International Symposium on Computer Architecture
(ISCA), Portland, OR, June 2012. Slides (pptx)

68
QoS-Aware Memory: Readings
(VIII)
Hiroyuki Usui, Lavanya Subramanian, Kevin Kai-Wei Chang,
and Onur Mutlu,
"DASH: Deadline-Aware High-Performance Memory S
cheduler for Heterogeneous Systems with Hardware
Accelerators"

ACM Transactions on Architecture and Code Optimization


(TACO), Vol. 12, January 2016.
Presented at the 11th HiPEAC Conference, Prague, Czech
Republic, January 2016.
[Slides (pptx) (pdf)]
[Source Code]

69
QoS-Aware Memory: Readings
(IX)
 Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben
Jaiyen, and Onur Mutlu,
"MISE: Providing Performance Predictability and I
mproving Fairness in Shared Main Memory System
s"

Proceedings of the
19th International Symposium on High-Performance Com
puter Architecture
(HPCA), Shenzhen, China, February 2013. Slides (pptx)

70
QoS-Aware Memory: Readings
(X)
Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira
Khan, and Onur Mutlu,
"The Application Slowdown Model: Quantifying and Con
trolling the Impact of Inter-Application Interference at
Shared Caches and Main Memory"

Proceedings of the
48th International Symposium on Microarchitecture (MICRO),
Waikiki, Hawaii, USA, December 2015.
[Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [
Poster (pptx) (pdf)]
[Source Code]

71

You might also like