Lecture 06 - Slides - Computer Technology and Instructions

Computer Technology
and Instructions
Sam Amiri
The Computer Revolution
 Progress in computer technology

 Underpinned by Moore’s Law
 Makes novel applications feasible

 Computers in automobiles
 Cell phones
 Human genome project
 World Wide Web
 Search Engines
 Computers are pervasive
2
Classes of Computers
 Desktop computers
 General purpose, variety of software
 Subject to cost/performance tradeoff
 Server computers
 Network based
 High capacity, performance, reliability
 Range from small servers to building sized
 Embedded computers
 Hidden as components of systems
 Stringent power/performance/cost constraints
3
The Processor Market
4
Understanding Performance
 Algorithm
 Determines number of operations executed
 Programming language, compiler, architecture

 Determine number of machine instructions executed per operation
 Processor and memory system

 Determine how fast instructions are executed
 I/O system (including OS)

 Determines how fast I/O operations are executed
5
Below Your Program
 Application software
 Written in high-level language
 System software
 Compiler: translates HLL code to machine code
 Operating System: service code
 Handling input/output
 Managing memory and storage
 Scheduling tasks & sharing resources
 Hardware
 Processor, memory, I/O controllers
6
Levels of Program Code
 High-level language
 Level of abstraction closer to problem domain
 Provides for productivity and portability
 Assembly language
 Textual representation of instructions
 Hardware representation
 Binary digits (bits)
 Encoded instructions and data
7
Components of a Computer
 Same components for all kinds

of computer
 Desktop, server, embedded
 Input/output includes
 User-interface devices
 Display, keyboard, mouse
 Storage devices
 Hard disk, CD/DVD, flash
 Network adapters
 For communicating with other computers
8
Inside the Processor (CPU)
 Datapath: performs operations on data
 Control: sequences datapath, memory, ...
 Cache memory
 Small fast SRAM memory for immediate access to data
9
Abstractions
 Abstraction helps us deal with complexity

 Hide lower-level detail
 Instruction set architecture (ISA)

 The hardware/software interface
 Application binary interface (ABI)

 The ISA plus system software interface
 Implementation
 The details underlying and interface
10
Defining Performance
 Response time
 How long it takes to do a task
 Throughput
 Total work done per unit time
 e.g., tasks/transactions/… per hour
 How are response time and throughput affected by

 Replacing the processor with a faster version?
 Adding more processors?
 We’ll focus on response time for now…

11
Measuring Execution Time
 Elapsed time
 Total response time, including all aspects
 Processing, I/O, OS overhead, idle time
 Determines system performance
 CPU time
 Time spent processing a given job
 Discounts I/O time, other jobs’ shares
 Comprises user CPU time and system CPU time
 Different programs are affected differently by CPU and system performance
12
CPU Clocking
 Operation of digital hardware governed by a constant-rate clock
 Clock period: duration of a clock cycle

 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second

 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
13
CPU Time
CPU Time = CPU Clock Cycles × Clock Cycle Time

CPU Clock Cycles
=
Clock Rate
 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate
 Hardware designer must often trade off clock rate against cycle count
14
Instruction Count and CPI
Clock Cycles = Instruction Count × Cycles per Instruction

CPU Time = Instruction Count × CPI × Clock Cycle Time
Instruction Count × CPI
=
Clock Rate
 Instruction Count for a program
 Determined by program, ISA and compiler
 Average cycles per instruction

 Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix
15
CPI in More Detail
 If different instruction classes take different numbers of cycles
n
Clock Cycles = ∑ (CPIi × Instruction Count i )
i=1
 Weighted average CPI
Clock Cycles n
 Instruction Count i 
CPI = = ∑  CPIi × 
Instruction Count i=1  Instruction Count 
Relative frequency
16
CPI Example
 Alternative compiled code sequences using instructions in classes A,

B, C
 Sequence 1: IC = 5  Sequence 2: IC = 6
 Clock Cycles  Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5
17
Performance Summary
Instructions Clock cycles Seconds

CPU Time = × ×
Program Instruction Clock cycle
 Performance depends on
 Algorithm: affects IC, possibly CPI
 Programming language: affects IC, CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, Tc
18
Power Trends
 The power wall

 We can’t reduce voltage
further
 We can’t remove more
heat
 How else can we

improve performance?
 In CMOS IC technology
Power = Capacitive load × Voltage 2 × Frequency

×30 5V → 1V ×1000 19
Uniprocessor Performance
Constrained by power, instruction-level parallelism, memory latency

20
Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming

 Compare with instruction level parallelism
 Hardware executes multiple instructions at once
 Hidden from the programmer
 Hard to do
 Programming for performance
 Load balancing
 Optimizing communication and synchronization
21
Instruction Set
 The repertoire of instructions of a computer

 Different computers have different instruction sets
 But with many aspects in common
 Early computers had very simple instruction sets
 Simplified implementation
 Many modern computers also have simple instruction sets
22
The MIPS Instruction Set
 Used as the example throughout this module
 Stanford MIPS commercialized by MIPS Technologies

(www.mips.com)
 Large share of embedded core market

 Applications in consumer electronics, network/storage equipment, cameras,
printers, …
 Typical of many modern ISAs
23
Arithmetic Operations
 Add and subtract, three operands

 Two sources and one destination
add a, b, c # a gets b + c
 All arithmetic operations have this form
 Design Principle 1: Simplicity favours regularity

 Regularity makes implementation simpler
 Simplicity enables higher performance at lower cost
24
Arithmetic Example
 C code:
f = (g + h) - (i + j);
 Compiled MIPS code:
add t0, g, h # temp t0 = g + h

add t1, i, j # temp t1 = i + j
sub f, t0, t1 # f = t0 - t1
25
Register Operands
 Arithmetic instructions use register operands

 MIPS has a 32 × 32-bit register file
 Use for frequently accessed data
 Numbered 0 to 31
 32-bit data called a “word”
 Assembler names
 $t0, $t1, …, $t9 for temporary values
 $s0, $s1, …, $s7 for saved variables
 Design Principle 2: Smaller is faster
 c.f. main memory: millions of locations
26
Register Operand Example
 C code:
f = (g + h) - (i + j);
 f, …, j in $s0, …, $s4
add $t0, $s1, $s2

add $t1, $s3, $s4
sub $s0, $t0, $t1
27
Memory Operands
 Main memory used for composite data

 Arrays, structures, dynamic data
 To apply arithmetic operations
 Load values from memory into registers
 Store result from register to memory
 Memory is byte addressed
 Each address identifies an 8-bit byte
 Words are aligned in memory
 Address must be a multiple of 4
 MIPS is Big Endian
 Most-significant byte at least address of a word
 c.f. Little Endian: least-significant byte at least address
28
Memory Operand Example 1
 C code:
g = h + A[8];
 g in $s1, h in $s2, base address of A in $s3

 Index 8 requires offset of 32
 4 bytes per word
lw $t0, 32($s3) # load word

add $s1, $s2, $t0
offset base register 29
Memory Operand Example 2
 C code:
A[12] = h + A[8];
 h in $s2, base address of A in $s3

 Index 8 requires offset of 32
lw $t0, 32($s3) # load word

add $t0, $s2, $t0
sw $t0, 48($s3) # store word
30
Registers vs. Memory
 Registers are faster to access than memory

 Operating on memory data requires loads and stores
 More instructions to be executed
 Compiler must use registers for variables as much as possible
 Only spill to memory for less frequently used variables
 Register optimization is important!
31
Immediate Operands
 Constant data specified in an instruction
addi $s3, $s3, 4
 No subtract immediate instruction

 Just use a negative constant
addi $s2, $s1, -1
 Design Principle 3: Make the common case fast

 Small constants are common
 Immediate operand avoids a load instruction
32
The Constant Zero
 MIPS register 0 ($zero) is the constant 0

 Cannot be overwritten
 Useful for common operations

 E.g., move between registers
add $t2, $s1, $zero
33
MIPS R-format Instructions
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
 Instruction fields
 op: operation code (opcode)
 rs: first source register number
 rt: second source register number
 rd: destination register number
 shamt: shift amount (00000 for now)
 funct: function code (extends opcode)
34
R-format Example
add $t0, $s1, $s2
special $s1 $s2 $t0 0 add
0 17 18 8 0 32
000000 10001 10010 01000 00000 100000
000000100011001001000000001000002 = 0232402016 35
MIPS I-format Instructions
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
 Immediate arithmetic and load/store instructions

 rt: destination or source register number
 Constant: –215 to +215 – 1
 Address: offset added to base address in rs
 Design Principle 4: Good design demands good compromises
 Different formats complicate decoding, but allow 32-bit instructions uniformly
 Keep formats as similar as possible
36
Logical Operations
 Instructions for bitwise manipulation
 Useful for extracting and inserting groups of bits in a word
37
Shift Operations
 shamt: how many positions to shift

 Shift left logical
 Shift left and fill with 0 bits
 sll by i bits multiplies by 2i
 Shift right logical
 Shift right and fill with 0 bits
 srl by i bits divides by 2i (unsigned only)
38
NOT Operations
 Useful to invert bits in a word

 Change 0 to 1, and 1 to 0
 MIPS has NOR 3-operand instruction

 a NOR b == NOT ( a OR b )
Register 0: always read as
zero
nor $t0, $t1, $zero
$t1 0000 0000 0000 0000 0011 1100 0000 0000
$t0 1111 1111 1111 1111 1100 0011 1111 1111
39
Conditional Operations
 Branch to a labeled instruction if a condition is true

 Otherwise, continue sequentially
 beq rs, rt, L1
 if (rs == rt) branch to instruction labeled L1;
 bne rs, rt, L1
 if (rs != rt) branch to instruction labeled L1;
 j L1
 unconditional jump to instruction labeled L1
40
Compiling If Statements
 C code:
if (i==j) f = g+h;
else f = g-h;
 f, g, … in $s0, $s1, …
bne $s3, $s4, Else
add $s0, $s1, $s2
j Exit
Else: sub $s0, $s1, $s2
Exit: …
Assembler calculates addresses
41
More Conditional Operations
 Set result to 1 if a condition is true

 Otherwise, set to 0
 slt rd, rs, rt

 if (rs < rt) rd = 1; else rd = 0;
 slti rt, rs, constant

 if (rs < constant) rt = 1; else rt = 0;
 Use in combination with beq, bne

slt $t0, $s1, $s2 # if ($s1 < $s2)
bne $t0, $zero, L # branch to L
42
Branch Addressing
 Branch instructions specify

 Opcode, two registers, target address
 Most branch targets are near branch

 Forward or backward
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
 PC-relative addressing
 Target address = PC + offset × 4
 PC already incremented by 4 by this time
43
Jump Addressing
 Jump (j and jal) targets could be anywhere in text segment

 Encode full address in instruction
op address
6 bits 26 bits
 (Pseudo)Direct jump addressing

 Target address = PC31…28 : (address × 4)
44
Addressing Mode Summary
45
Thank You!
46

Lecture 06 - Slides - Computer Technology and Instructions

Uploaded by

Copyright:

Available Formats

Lecture 06 - Slides - Computer Technology and Instructions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 06 - Slides - Computer Technology and Instructions

Uploaded by

Copyright:

Available Formats

Computer Technology

 Progress in computer technology

 Makes novel applications feasible

 Computers are pervasive

 Programming language, compiler, architecture

 Processor and memory system

 I/O system (including OS)

 Same components for all kinds

 Datapath: performs operations on data

 Control: sequences datapath, memory, ...

 Abstraction helps us deal with complexity

 Instruction set architecture (ISA)

 Application binary interface (ABI)

 How are response time and throughput affected by

 We’ll focus on response time for now…

 Operation of digital hardware governed by a constant-rate clock

 Clock period: duration of a clock cycle

 Clock frequency (rate): cycles per second

CPU Time = CPU Clock Cycles × Clock Cycle Time

Clock Cycles = Instruction Count × Cycles per Instruction

 Average cycles per instruction

 If different instruction classes take different numbers of cycles

 Weighted average CPI

 Alternative compiled code sequences using instructions in classes A,

Instructions Clock cycles Seconds

 The power wall

 How else can we

Power = Capacitive load × Voltage 2 × Frequency

Constrained by power, instruction-level parallelism, memory latency

 Requires explicitly parallel programming

 The repertoire of instructions of a computer

 Used as the example throughout this module

 Stanford MIPS commercialized by MIPS Technologies

 Large share of embedded core market

 Typical of many modern ISAs

 Add and subtract, three operands

 All arithmetic operations have this form

 Design Principle 1: Simplicity favours regularity

 Compiled MIPS code:

add t0, g, h # temp t0 = g + h

 Arithmetic instructions use register operands

 Compiled MIPS code:

add $t0, $s1, $s2

 Main memory used for composite data

 Compiled MIPS code:

lw $t0, 32($s3) # load word

 Compiled MIPS code:

lw $t0, 32($s3) # load word

 Registers are faster to access than memory

 Constant data specified in an instruction

addi $s3, $s3, 4

 No subtract immediate instruction

 Design Principle 3: Make the common case fast

 MIPS register 0 ($zero) is the constant 0

 Useful for common operations

add $t0, $s1, $s2

special $s1 $s2 $t0 0 add

000000 10001 10010 01000 00000 100000

 Immediate arithmetic and load/store instructions