0% found this document useful (0 votes)

123 views6 pages

Vector Code Example

This document contains C code for a vector multiplication operation and examples of how to implement it in scalar MIPS and vector VMIPS assembly code. It asks the student to determine the number of cycles each implementation would take and to calculate the speedup of the vectorized version. For a vector length of 16, the speedup of VMIPS over MIPS is estimated to be 15.34x. For a vector length of 32, the speedup would be higher since more elements could be processed per loop.

Uploaded by

mdkamal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views6 pages

Vector Code Example

Uploaded by

mdkamal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

EENG/CSCI 641 Computer Architecture 1

Vector Code Example

Name:

Grade:

Example:
Consider this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = a*x[i] + y[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D
L.D.
L.D.
L.D.
L.D.

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ

F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8
R2,R2,8
R3,R3,8
R10,R10,-1
R10,Loop

;
;
;
;
;
;
;
;
;
;

load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]
increment array pointer for y[]
increment array pointer for z[]
decrement loop counter
branch R10 != zero

process
array x
array y
array z

b) Develop the VMIPS assembly code for this C code.

L.D
L.D.
L.D.
L.D.
L.D.

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

LOOP:
LV
MULVS.D
LV
ADDVV.D
SV
DADDUI

V1,R1
V2,V1,F0
V3,R2
V4,V2,V3
R3,V4
R1,R1,16*8

;
;
;
;
;
;

load vector X
vector-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]

process
array x
array y
array z

Page 1 of 6

DADDUI
DADDUI
DADDUI
BNEZ
c)

R2,R2,16*8 ; increment array pointer for y[]

R3,R3,16*8 ; increment array pointer for z[]
R10,R10,-16 ; decrement loop counter
R10,Loop
; branch R10 != zero

How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction
L.D
L.D.
L.D.
L.D.
L.D.
Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
BNEZ

Number of times
executed
1
1
1
1
1

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8

;
;
;
;
;
;

load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer

128
128
128
128
128
128

R2,R2,8

; increment array pointer

128

R3,R3,8

; increment array pointer

128

process
array x
array y
array z

R10,R10,-1 ; decrement loop counter

R10,Loop
; branch R10 != zero

128
127*2 + 1*1

Total number of instruction cycles = 1(1+1+1+1+1) + 128 (9) + 127*2 + 1 = 1412

d) How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction
L.D
L.D.
L.D.
L.D.
L.D.

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

LOOP:
L.D
MULVS.D
LV
ADDVV.D
SV
DADDUI

V1,R1
; load vector X
V2,V1,F0
; vector-scalar multiply
V3,R2
; load vector Y
V4,V2,V3
; add
R3,V4
; store the result
R1,R1,16*8 ; increment array pointer

process
array x
array y
array z

Number of times
executed
1
1
1
1
1

8
8
8
8
8
8
Page 2 of 6

for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
counter
BNEZ

R2,R2,16*8 ; increment array pointer

R3,R3,16*8 ; increment array pointer

R10,R10,-16

R10,Loop

; decrement loop

; branch R10 != zero

7*2 + 1*1

Total number of instruction cycles = 1(1+1+1+1+1) + 8 (98) + 72 +1 = 92

What is the speed up?

1412/92 = 15.34
f)

What would be the speed up if the vector length is 32?

I leave it for you to figure this out. Remember now per loop iteration, you calculate 32 elements, as opposed to 16.

Page 3 of 6

Exercise:
Repeat the previous example for this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = (x[i] + y[i]) * w[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D.
L.D.
L.D.
L.D.
L.D.

R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000

;
;
;
;
;

128 elements
load address
load address
load address
load address

to
of
of
of
of

process
array x
array y
array z
array w

Loop:
L.D
L.D
L.D
ADD.D
MUL.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ

F1,[R1]
F2,[R2]
F4,[R4]
F3,F1,F2
F5, F3, F4
[R3],F5
R1,R1,8
R2,R2,8
R3,R3,8
R4,R4,8
R10,R10,-1
R10,Loop

;
;
;
;
;
;
;
;
;
;
;
;

load vector X
load vector Y
load vector W
add X & Y
Z = (X+Y) * W
store the result
increment array pointer
increment array pointer
increment array pointer
increment array pointer
decrement loop counter
branch R10 != zero

for
for
for
for

x[]
y[]
z[]
w[]

b) Develop the VMIPS assembly code for this C code.

L.D.
L.D.
L.D.
L.D.
L.D.

R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000

;
;
;
;
;

128 elements
load address
load address
load address
load address

to
of
of
of
of

process
array x
array y
array z
array w

LOOP:
LV
LV
LV
ADDVV.D
MULVV.D
SV
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ

V1,R1
; load vector X
V2,R2
; load vector Y
V4,R4
; load vector W
V3,V1,V2
; vector-vector add
V5,V3,V4
; add
R3,V5
; store the result
R1,R1,16*8 ; increment array pointer for x[]
R2,R2,16*8 ; increment array pointer for y[]
R3,R3,16*8 ; increment array pointer for z[]
R4,R3,16*8 ; increment array pointer for w[]
R10,R10,-16 ; decrement loop counter
R10,Loop
; branch R10 != zero
Page 4 of 6

How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction

Number of times
executed

Total number of instruction cycles =

d) How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction

Number of times
executed

Total number of instruction cycles =

Page 5 of 6

What is the speed up?

What would be the speed up if the vector length is 32?

Page 6 of 6

Bitcoin Money Adder V5.0 Full Keygen
60% (10)
Bitcoin Money Adder V5.0 Full Keygen
5 pages
SAP PO Development Guidelines and Naming Conventions
No ratings yet
SAP PO Development Guidelines and Naming Conventions
34 pages
Assignment 2
100% (3)
Assignment 2
10 pages
Instruction Set Architecture-Nguyễn Hoàng Long - BI11-157
No ratings yet
Instruction Set Architecture-Nguyễn Hoàng Long - BI11-157
11 pages
Coal 5,6,7
No ratings yet
Coal 5,6,7
13 pages
CA Lab2 2021
100% (1)
CA Lab2 2021
11 pages
Organisasi & Arsitektur Komputer
No ratings yet
Organisasi & Arsitektur Komputer
3 pages
Module 1.6
No ratings yet
Module 1.6
53 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
Vector
No ratings yet
Vector
38 pages
Chapter 04
No ratings yet
Chapter 04
47 pages
Data-Level Parallelism Vector and GPU
No ratings yet
Data-Level Parallelism Vector and GPU
6 pages
Simple Vector Processor Modeled With VHDL
No ratings yet
Simple Vector Processor Modeled With VHDL
6 pages
ps1 Sol
No ratings yet
ps1 Sol
11 pages
2011 Quiz 4 Sol
No ratings yet
2011 Quiz 4 Sol
17 pages
1 Vector Processing: Solutions
No ratings yet
1 Vector Processing: Solutions
16 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
16 pages
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
50 pages
2001 Spring Exam1 Sol
No ratings yet
2001 Spring Exam1 Sol
6 pages
CS7103 - MultiCore Architecture Ppts Unit-II
No ratings yet
CS7103 - MultiCore Architecture Ppts Unit-II
43 pages
CS6461 - Computer Architecture Fall 2016 - Vector Operations
No ratings yet
CS6461 - Computer Architecture Fall 2016 - Vector Operations
47 pages
Unit 2
No ratings yet
Unit 2
43 pages
7TH - Unit 4-21ec74h6 - Ca
No ratings yet
7TH - Unit 4-21ec74h6 - Ca
67 pages
Assembly #4
No ratings yet
Assembly #4
3 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
NB:-Write The Answers in Your Own Way and Do Not Copy From Other
No ratings yet
NB:-Write The Answers in Your Own Way and Do Not Copy From Other
2 pages
Lec. 12: Vector Computers: EECS 252 Graduate Computer Architecture
No ratings yet
Lec. 12: Vector Computers: EECS 252 Graduate Computer Architecture
31 pages
Unit Iii - Aca
No ratings yet
Unit Iii - Aca
13 pages
Mitsunari Shigeo (光成滋生)
No ratings yet
Mitsunari Shigeo (光成滋生)
32 pages
Compilers: Tools For Scientists and Engineers
No ratings yet
Compilers: Tools For Scientists and Engineers
42 pages
SIMD
No ratings yet
SIMD
44 pages
19 Computer Architecture Vector Processor
No ratings yet
19 Computer Architecture Vector Processor
20 pages
Computer Architecture Simd Vector Gpu
No ratings yet
Computer Architecture Simd Vector Gpu
16 pages
ECE331 HW 1 HC12 Assembly Language
No ratings yet
ECE331 HW 1 HC12 Assembly Language
2 pages
FFT Full
No ratings yet
FFT Full
6 pages
Pic Codes
No ratings yet
Pic Codes
5 pages
Module-5: Syntax Directed Translation, Intermediate Code Generation, Code Generation 5.1,5.2,5.3, 6.1,6.2,8.1,8.2
No ratings yet
Module-5: Syntax Directed Translation, Intermediate Code Generation, Code Generation 5.1,5.2,5.3, 6.1,6.2,8.1,8.2
37 pages
Proj Overview
No ratings yet
Proj Overview
11 pages
Basic Instructions
No ratings yet
Basic Instructions
24 pages
Unit 3-4
No ratings yet
Unit 3-4
76 pages
Superscalar Architecture
No ratings yet
Superscalar Architecture
156 pages
Utilizzando Solo e Unicamente Istruzioni Dalla Tabella Sottostante
No ratings yet
Utilizzando Solo e Unicamente Istruzioni Dalla Tabella Sottostante
2 pages
Signal and Image Processing On The TMS320C54x DSP: Prof. Brian L. Evans
No ratings yet
Signal and Image Processing On The TMS320C54x DSP: Prof. Brian L. Evans
38 pages
Signal and Image Processing On The TMS320C54x DSP: Prof. Brian L. Evans
No ratings yet
Signal and Image Processing On The TMS320C54x DSP: Prof. Brian L. Evans
38 pages
Announced Quiz 3: ECE511/CSE511 Computer Architecture
No ratings yet
Announced Quiz 3: ECE511/CSE511 Computer Architecture
1 page
Lab 13 Sol
No ratings yet
Lab 13 Sol
5 pages
EE6612-Miroprocessor and Microcontroller Laboratory
No ratings yet
EE6612-Miroprocessor and Microcontroller Laboratory
128 pages
Vector Processor
No ratings yet
Vector Processor
13 pages
Experiment 6: Adder/subtractor Block Diagram:: VHDL Code: Xor Gate
No ratings yet
Experiment 6: Adder/subtractor Block Diagram:: VHDL Code: Xor Gate
7 pages
Đỗ Ngọc Đức - Ititiu22034 - Ca - lab7
No ratings yet
Đỗ Ngọc Đức - Ititiu22034 - Ca - lab7
3 pages
Guc 315 61 38694 2023-11-23T11 50 52
No ratings yet
Guc 315 61 38694 2023-11-23T11 50 52
33 pages
8085 Programs
No ratings yet
8085 Programs
11 pages
17.40 Vector - RISCV 20190611 Vectors
No ratings yet
17.40 Vector - RISCV 20190611 Vectors
26 pages
ES LAB Programs
No ratings yet
ES LAB Programs
29 pages
Lec03 1 Program Optimizations
No ratings yet
Lec03 1 Program Optimizations
43 pages
M4 1.RISCV Datapath
No ratings yet
M4 1.RISCV Datapath
93 pages
Tut10 Selected Ans
No ratings yet
Tut10 Selected Ans
7 pages
CTCD Unit 4
No ratings yet
CTCD Unit 4
25 pages
363-Micro Project
No ratings yet
363-Micro Project
10 pages
TASK # 1: Write A Program To Add 5 Digits of 8 Bits
No ratings yet
TASK # 1: Write A Program To Add 5 Digits of 8 Bits
2 pages
Modern Computer Architecture and Programming in Assembly Language - TCM - 183 - 1309076
No ratings yet
Modern Computer Architecture and Programming in Assembly Language - TCM - 183 - 1309076
131 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Week1-Linked List
No ratings yet
Week1-Linked List
12 pages
Assignment 1A
No ratings yet
Assignment 1A
10 pages
Seminar 1
No ratings yet
Seminar 1
3 pages
CMPD114: C Programming: Chapter 2: Problem Solving
No ratings yet
CMPD114: C Programming: Chapter 2: Problem Solving
40 pages
Neutrino Series - Open Architecture / Drag & Drop DSP
No ratings yet
Neutrino Series - Open Architecture / Drag & Drop DSP
2 pages
Ear Speaker Cold Test
No ratings yet
Ear Speaker Cold Test
40 pages
Loan PDF
No ratings yet
Loan PDF
19 pages
Test
No ratings yet
Test
496 pages
MK - ND10 NC10 NP-N108 N110 Winchester-R - 0113 - Final BA41-01050A BA41-01051A BA41-01051A 010
No ratings yet
MK - ND10 NC10 NP-N108 N110 Winchester-R - 0113 - Final BA41-01050A BA41-01051A BA41-01051A 010
56 pages
DAG in Compiler Design Examples Gate Vidyalay
No ratings yet
DAG in Compiler Design Examples Gate Vidyalay
23 pages
Cocept of Su24
100% (1)
Cocept of Su24
6 pages
3.memory Management 20240103
No ratings yet
3.memory Management 20240103
41 pages
UNIT 3-2mark
No ratings yet
UNIT 3-2mark
3 pages
Open Book
No ratings yet
Open Book
20 pages
Progdvb and Progtv - Progdvb - Products
No ratings yet
Progdvb and Progtv - Progdvb - Products
2 pages
Arabic Pad User Guide
No ratings yet
Arabic Pad User Guide
6 pages
Salesforce Fundamentals Interview Questions
No ratings yet
Salesforce Fundamentals Interview Questions
43 pages
Esquemático B4322FS5A
No ratings yet
Esquemático B4322FS5A
16 pages
Unit 1 - Operating System - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Operating System - WWW - Rgpvnotes.in
15 pages
Intel Horse Ridge' Addresses Key Barriers To Quantum Scalability
No ratings yet
Intel Horse Ridge' Addresses Key Barriers To Quantum Scalability
3 pages
LiME - Linux Memory Extractor
No ratings yet
LiME - Linux Memory Extractor
8 pages
BTS3900 Monitoring System
No ratings yet
BTS3900 Monitoring System
16 pages
EasyCare TX Software Installation Guide
No ratings yet
EasyCare TX Software Installation Guide
7 pages
Database Management System
No ratings yet
Database Management System
5 pages
Nucamp Syllabus Python 2021
No ratings yet
Nucamp Syllabus Python 2021
18 pages
Section A: Matière: Systèmes-LINUX Enseignant: Rachid Mbarek Classe: 2DNI
No ratings yet
Section A: Matière: Systèmes-LINUX Enseignant: Rachid Mbarek Classe: 2DNI
6 pages
Cost Based Optimizer
No ratings yet
Cost Based Optimizer
20 pages
Pranav Resume
No ratings yet
Pranav Resume
1 page

Vector Code Example

Uploaded by

Vector Code Example

Uploaded by

EENG/CSCI 641 Computer Architecture 1

Vector Code Example

b) Develop the VMIPS assembly code for this C code.

R2,R2,16*8 ; increment array pointer for y[]

; increment array pointer

; increment array pointer

R10,R10,-1 ; decrement loop counter

Total number of instruction cycles = 1(1+1+1+1+1) + 128 (9) + 127*2 + 1 = 1412

R2,R2,16*8 ; increment array pointer

R3,R3,16*8 ; increment array pointer

; branch R10 != zero

Total number of instruction cycles = 1(1+1+1+1+1) + 8 (9*8) + 7*2 +1 = 92

What is the speed up?

What would be the speed up if the vector length is 32?

b) Develop the VMIPS assembly code for this C code.

Total number of instruction cycles =

Total number of instruction cycles =

What is the speed up?

What would be the speed up if the vector length is 32?

You might also like

Total number of instruction cycles = 1(1+1+1+1+1) + 8 (98) + 72 +1 = 92