0% found this document useful (0 votes)

61 views33 pages

CS683 Project Presentation (Helloki)

The document describes Orinoco, an out-of-order processor that uses ordered issue and unordered commit with non-collapsible queues. It utilizes various matrix structures like age matrices, commit dependency matrices, and wakeup matrices to track dependencies between instructions and enable efficient scheduling. These matrices are implemented using processing-in-memory techniques for reduced complexity and timing constraints compared to traditional implementations.

Uploaded by

Avaneesh S.Subramanyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views33 pages

CS683 Project Presentation (Helloki)

Uploaded by

Avaneesh S.Subramanyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CS-683 Project Checkpoint 1 Presentation

Orinoco: Ordered Issue and Unordered

Commit with Non-Collapsible Queues
Team Helloki

Atharva Kulkarni Avaneesh Sai Puja Kudupudi

210070047@iitb.ac.in 210070015@iitb.ac.in 210070046@iitb.ac.in

1
Challenges for Modern O3 Processors
● Moore's law has slowed down but still functional
● Microarchitectures becoming wider and deeper
● Full potential is not harnessed due to inefficient use of
resources
● Survey of Google and Facebook services:
○ Only 20-40% instructions retired without stalling
○ Actual bandwidth is 1/3rd of theoretical bandwidth

Challenges for Modern O3 Processors 2

Solutions?
● Need aggressive instruction scheduling
● O3 Commit comes in
● Scheduling: NP hard problem
● Being on the critical path of pipeline, it should be done
with minimum latency without complex hardware

Solutions? 3
Qs in O3
● Issue Q (IQ)
● Re-Order Buffer (ROB)
● Load/ Store Q (LSQ)
● They determine temporal order of instructions

Qs in O3 4
Existing O3 Commit Techniques
● Existing methods lack in capacity efficiency of
scheduling structures while preserving ideal instruction
ordering

Existing O3 Commit Techniques 5

Existing O3 Commit Techniques
● Prioritize certain instructions at commit
● But they struggle with reclaiming the gaps left in Qs
● Main Challenge: Physical ordering of instructions
within Qs

Existing O3 Commit Techniques 6

Orinoco
● Ordered issue and unordered commit
● Non collapsible IQ, ROB, LSQ
● Decouples temporal order from positions in Qs

Ordered
issue Unordered
commit

Orinoco 7
Age Matrices
● Track relative age in ROB and IQ
● # rows = # cols = # entries in respective Qs
● After decoding, renaming and dispatching to the Q,
valid (VLD) vector tracks entries
● Operation: When dispatched, set row vector and clear
column vector

Age Matrices 8
Age Matrix for IQ
● After waking up* in IQ, and ready to schedule, set BID
vector bit *Details in further sections

Age Matrix for IQ 9

Age Matrix for IQ
count = zeros(#instructions)
for instruction_in_ready_instructions:
temp = bitwise_AND(row_vec, BID)
count[instruction] = #ones in temp
● If ∃ nth oldest instruction, ∃ at most (n-1) older
instructions
● count <= n-1
● Thus, to select IW (Instruction Width) oldest
instructions, select the ones with count = IW - 1

Age Matrix for IQ 10

Age Matrix for ROB
● When speculation resolves, clear corresponding bit in
SPEC vector
for every_completed_instruction:
if bitwise_AND(row_vec,SPEC)==all_zeros:
commit

Age Matrix for ROB 11

Age Matrix for ROB
● Need to take care of exceptions/ misspeculation
● More importantly, oldest exception/ misspeculation

Age Matrix for ROB 12

Age Matrix for ROB
result = zeros(#entries, #entries)
for every in_flight instruction:
result[instr] = bitwise_AND(row_vec, VLD)
one_hot = reduction_NOR_along_rows(temp)
● Only for the oldest instruction, result is all zeros
● After reduction NOR, we have one-hot vector pointing to
instruction which raised exception/ misspeculation

Age Matrix for ROB 13

Commit Dependency Matrix
● Local state of an instruction:
○ Completed
○ Raised exception
○ Flushed by some other event
● Global state of an instruction:
○ Completed
○ Misspeculation
○ Exception
● Commit depends on local state of the instruction and
global state of older instructions
Commit Dependency Matrix 14
Commit Dependency Matrix
● Structure to identify the global state earlier in the
pipeline
● # rows = # cols = # in-flight instructions
● Intersections indicate commit dependencies

Commit Dependency Matrix 15

Commit Dependency Matrix
● On every new dispatch to ROB i.e. new row entry, set
bits in every column corresponding to older instructions
which may raise exceptions
● E.g.: Memory operations, branches and synchronization
barriers
● When an older instruction completes, clear its column
vector

Commit Dependency Matrix 16

Commit Dependency Matrix
● Before commit, every instruction waits for the commit
dependency to resolve i.e. th older ones to become
non-speculative

// check global state of completed instruction

if reduction_NOR(row_vec)==1:
commit if resources available

Commit Dependency Matrix 17

Precise Exception Handling
● Delinquent instruction = Instruction causing exception
● Delinquent and all subsequent instructions (w.r.t.
program order) are squashed
● The subsequent ones are located using column vector
of the delinquent instruction
● When none are commiting, determine delinquent
(unresolved or exception causing) using age matrix

Precise Exception Handling 18

Matrix Merging
● Global state is same for all instructions in ROB
● Thus, merge commit dependency matrix and age matrix
for ROB (saves space of order O(n2))
● A SPEC vector now tracks speculative instructions
// at commit time
for a completed_instruction:
result = bitwise_AND(row_vec, SPEC)
if reduction_NOR(result)==1:
grant to commit
Matrix Merging 19
Memory Disambiguation matrix
● The order and dependencies between loads and stores
need to be explicitly tracked
● This is done to decide the absence of conflicts as early
as possible

Memory Disambiguation Matrix 20

Memory Disambiguation Matrix
// when load enters LQ
row_vec = set_bit_at_store_addr_in_SQ
if store_addr != load_addr:
clear_col
if reduction_NOR(row_vec)==1:
Load_entry_becomes_non-speculative
● # rows = # loads in LQ, # columns = # stores in SQ
● A load in LQ sets its row according to older unresolved stores in SQ at
issue, and a store clears bits in its column when its address does not
conflict
● A load becomes non-speculative when all the bits in its row vector is
zero
Memory Disambiguation Matrix 21
Lockdown Matrix
● We use a lockdown matrix to track older non-performed
loads for each speculative load

Lockdown Matrix 22
Lockdown Matrix
//at commit
row_vec = set_bit_at_all_unperf_loads
if load_complete:
clear_col
if reduction_NOR(row_vec) == 1:
load_is_ordered
● # rows = # committed loads, # columns = # loads in LQ
● A committed load sets its row according to older unperformed
loads in LQ, and a performed load clears its column
● A lockdown to the address of a committed load is lifted when all
theMatrix
Lockdown bits in its row vector is zero
23
Wakeup Matrix
● Producers - Source operands
● An instruction sets its row according to its producers in
IQ at dispatch, and clears its column at issue

Wakeup Matrix 24
Wakeup Matrix
// at dispatch
row_vec = set_bit_at_all_instr_with_producers
if instr_issue:
clear_col
if reduction_NOR(row_vec) == 1:
instr_wake_up
● # rows = # columns = # Entries in IQ
● An instruction sets its row according to its producers in IQ at
dispatch, and clears its column at issue
● An instruction is woken up when all the bits in its row vector is zero

Lockdown Matrix 25
Matrix Scheduler Challenges
● Increased logic complexity
○ Fulfills O(1) time scheduling

● Timing and area constraints

● 12T SRAM arrays : 8 for dependency storage

● Eg: Age matrix performs n2 AND and n n-bit OR

operations per cycle

Matrix Scheduler Challenges 26

Processing-In-Memory Implementation
● Implements efficient matrix schedulers

● O(matrix scheduling) = O(element-wise logic operations)

● 8T SRAM arrays

● PIM supports vertical read and write mechanisms

Processing-In-Memory 27
Bit Line Computing
● Matrix scheduler stores dependency of instructions

● Precharge RBL
beforehand
● Activate RWL
● Bit count is encoded by
voltage drop on bit lines

Bit Line Computing 28

Vertical Access
● Standard SRAM arrays only support horizontal access

● Update Matrix
Schedulers
○ Column-wise write
● Memory Disambiguation
Matrix
○ Column-wise read

Vertical Access 29
Multibanking
● Multibanking for parallel processing

● Horizontally partitioned SRAM arrays

○ Matrix Schedulers into small arrays (banks)

Divided into n
single-ported banks

*n = dispatch width

Multibanking 30
The Complete Picture

The Complete Picture 31

References
● Orinoco: Ordered Issue and Unordered Commit with
Non-Collapsible Queues Dibei Chen, Tairan Zhang, et.
al
● Prof. Biswa's slides

References 32
Questions?

Questions? 33

Statistical Mechanics: Advanced Physical Chemistry
No ratings yet
Statistical Mechanics: Advanced Physical Chemistry
91 pages
Catalogo Reductor
No ratings yet
Catalogo Reductor
106 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
22 pages
Computer Architecture Midterm1 Cmu
No ratings yet
Computer Architecture Midterm1 Cmu
30 pages
Parallelism I: Inside The Core
No ratings yet
Parallelism I: Inside The Core
61 pages
Cwa14050 42 2005 Jan
No ratings yet
Cwa14050 42 2005 Jan
49 pages
Computer Architecture Revision For Final Exam
No ratings yet
Computer Architecture Revision For Final Exam
60 pages
Lec16 OoOa
No ratings yet
Lec16 OoOa
57 pages
Lecture-10-pre
No ratings yet
Lecture-10-pre
152 pages
ACA Unit 3
No ratings yet
ACA Unit 3
50 pages
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
No ratings yet
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
49 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Mitigating Problems With OoO Commit (1)
No ratings yet
Mitigating Problems With OoO Commit (1)
44 pages
Onur Ddca 2025 Lecture15a Dataflow Superscalar Beforelecture
No ratings yet
Onur Ddca 2025 Lecture15a Dataflow Superscalar Beforelecture
50 pages
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
No ratings yet
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
114 pages
Pipelining: Basic Concepts
No ratings yet
Pipelining: Basic Concepts
20 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
The First Encounter
50% (2)
The First Encounter
44 pages
Arch5 Precise Exceptions Afterlecture
No ratings yet
Arch5 Precise Exceptions Afterlecture
72 pages
MATH-224 - LectureNotes 1
No ratings yet
MATH-224 - LectureNotes 1
24 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
5.Advanced-1
No ratings yet
5.Advanced-1
60 pages
Schedule of Taiwan Baseball League: 1 Days. in Each Day of The Competition
No ratings yet
Schedule of Taiwan Baseball League: 1 Days. in Each Day of The Competition
2 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
M14
No ratings yet
M14
44 pages
Modern CPU
No ratings yet
Modern CPU
14 pages
Nakanishi 2020
No ratings yet
Nakanishi 2020
13 pages
RN ACA-5 Unit-II
No ratings yet
RN ACA-5 Unit-II
42 pages
10.Week
No ratings yet
10.Week
35 pages
Year 1 Maths Rubric 1
100% (2)
Year 1 Maths Rubric 1
3 pages
CPC Solution W4 PDF
No ratings yet
CPC Solution W4 PDF
4 pages
Actividad 2 Análisis de Procesos
No ratings yet
Actividad 2 Análisis de Procesos
9 pages
03ILP Speculation and Advanced Topics
No ratings yet
03ILP Speculation and Advanced Topics
48 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Lecture: Out-Of-Order Processors
No ratings yet
Lecture: Out-Of-Order Processors
9 pages
CompArch_17e_ILP-1
No ratings yet
CompArch_17e_ILP-1
15 pages
Presentation Cea Chapter16 2 Demo
No ratings yet
Presentation Cea Chapter16 2 Demo
30 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Section-I (Single Answer Correct Type)
No ratings yet
Section-I (Single Answer Correct Type)
6 pages
08 Speculation
No ratings yet
08 Speculation
21 pages
EE457Unit9a_OoO
No ratings yet
EE457Unit9a_OoO
77 pages
Lecture-14-03.02.2025
No ratings yet
Lecture-14-03.02.2025
53 pages
Datum Transformation and Coordinate Conversion
100% (1)
Datum Transformation and Coordinate Conversion
18 pages
ILP-Architectures Part III
No ratings yet
ILP-Architectures Part III
49 pages
Superscalar
No ratings yet
Superscalar
38 pages
Torque - Second Condition of Equilibrium
No ratings yet
Torque - Second Condition of Equilibrium
2 pages
Traffic Number
No ratings yet
Traffic Number
2 pages
An Econometric Analysis of The Determina
No ratings yet
An Econometric Analysis of The Determina
11 pages
Taller 1 Matemáticas Fundamentales
No ratings yet
Taller 1 Matemáticas Fundamentales
2 pages
A Monte Carlo Simulation Approach To Evaluate Service Capacities of EV Charging and Battery Swapping Stations
No ratings yet
A Monte Carlo Simulation Approach To Evaluate Service Capacities of EV Charging and Battery Swapping Stations
10 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Vaje CNC Rezkanje PDF
No ratings yet
Vaje CNC Rezkanje PDF
102 pages
CPU Structure & Functions
No ratings yet
CPU Structure & Functions
44 pages
Issyll
No ratings yet
Issyll
165 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
Algebraic Geometry - A Concise Dictionary (gnv64) PDF
100% (5)
Algebraic Geometry - A Concise Dictionary (gnv64) PDF
240 pages
Interrupt and Precise Exception: Computer System Architecture
No ratings yet
Interrupt and Precise Exception: Computer System Architecture
21 pages
Group 17_2151177
No ratings yet
Group 17_2151177
15 pages
L05-PipeliningII
No ratings yet
L05-PipeliningII
36 pages
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
Cee 100 Module
No ratings yet
Cee 100 Module
144 pages
Sp11-Quiz1 Soln
No ratings yet
Sp11-Quiz1 Soln
20 pages
Unit V
No ratings yet
Unit V
23 pages
Linear Code - Wikipedia
No ratings yet
Linear Code - Wikipedia
27 pages
Kinematics Bansal
70% (23)
Kinematics Bansal
19 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Xavier College F7 Mathematics Complex Numbers
No ratings yet
Xavier College F7 Mathematics Complex Numbers
3 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
Midterm Recap: Performance Evaluation
No ratings yet
Midterm Recap: Performance Evaluation
5 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Novel Puzzle Rings
No ratings yet
Novel Puzzle Rings
4 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
LPP Pyq's
No ratings yet
LPP Pyq's
17 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Bcan 201 (New) Dca 201
No ratings yet
Bcan 201 (New) Dca 201
2 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
CSC415
No ratings yet
CSC415
10 pages
Group 6 Comprehension
No ratings yet
Group 6 Comprehension
10 pages
Ee660 2017 Spring Materials Week 04 Slides
No ratings yet
Ee660 2017 Spring Materials Week 04 Slides
40 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
Bits HD Ques Set 2019 - CS+SS
No ratings yet
Bits HD Ques Set 2019 - CS+SS
2 pages
EMM 2 & 16 Marks
No ratings yet
EMM 2 & 16 Marks
23 pages
Estimating Piping Costs From Process Flow Sheets
100% (6)
Estimating Piping Costs From Process Flow Sheets
3 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Introduction to Coding in Hours With Python Level 1: A Guide to Programming for Students With No Prior Experience (Learn Coding Basics With Python)
From Everand
Introduction to Coding in Hours With Python Level 1: A Guide to Programming for Students With No Prior Experience (Learn Coding Basics With Python)
Jack C. Stanely
No ratings yet