Embedded Systems Design: A Unified Hardware/Software Introduction
Embedded Systems Design: A Unified Hardware/Software Introduction
Embedded Systems Design: A Unified Hardware/Software Introduction
Introduction
General-Purpose Processor
Processor designed for a variety of computation tasks Low unit cost, in part because manufacturer spreads NRE over large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
a.k.a. microprocessor micro used when they were implemented on one or a few chips rather than entire rooms
1
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Basic Architecture
Control unit and datapath
Note similarity to single-purpose processor
Processor Control unit Datapath ALU Controller Control /Status Registers
Datapath Operations
Load
Read memory location into register
Control unit Processor Datapath ALU Controller Control /Status
ALU operation
Input certain registers through ALU, store back in register
+1
Key differences
Datapath is general Control unit doesnt store the algorithm the algorithm is programmed into the memory
PC IR
Registers
Store
Write register to memory location
I/O Memory Memory PC IR
10
11
I/O
...
10 11 ...
Control Unit
Control unit: configures the datapath operations
Sequence of desired operations (instructions) stored in memory program
Control unit Processor Datapath ALU Controller Control /Status Registers
Instruction cycle broken into several sub-operations, each one clock cycle, e.g.:
Fetch: Get next instruction into IR Decode: Determine what the instruction means Fetch operands: Move data from memory to datapath register Execute: Move data through the ALU Store results: Write data from register to memory
PC
IR
R0
R1
PC
100
R0
R1
I/O 100 load R0, M[500] 101 inc R1, R0 102 store M[501], R1 Memory 500 501
...
10
I/O 100 101 load R0, M[500] inc R1, R0 Memory 500 501
...
10
...
5
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
...
6
7/7/2011
Registers
Registers
10
PC 100 IR load R0, M[500] R0 R1 PC 100 IR load R0, M[500] R0 R1
I/O 100 101 load R0, M[500] inc R1, R0 Memory 500 501
...
10
I/O 100 101 load R0, M[500] inc R1, R0 Memory 500 501
...
10
102 store M[501], R1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
...
7
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
...
8
10
PC 100 IR load R0, M[500] R0 R1
10
PC 100 IR load R0, M[500] R0 R1
I/O 100 101 load R0, M[500] inc R1, R0 Memory 500 501
...
10
I/O 100 101 load R0, M[500] inc R1, R0 Memory 500 501
...
10
102 store M[501], R1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
...
9
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
...
10
Instruction Cycles
PC=100
clk Processor Control unit Datapath ALU Controller Control /Status Registers
Instruction Cycles
PC=100
clk Processor Control unit Datapath ALU Controller Control /Status
+1
PC=101
clk
Registers
10
PC 100 IR load R0, M[500] R0 R1
10
11
R1
I/O 100 load R0, M[500] 101 inc R1, R0 102 store M[501], R1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Memory 500 501
...
10
I/O 100 load R0, M[500] 101 inc R1, R0 102 store M[501], R1 Memory 500 501
...
10
...
11
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
...
12
7/7/2011
Instruction Cycles
PC=100
clk Processor Control unit Datapath ALU Controller Control /Status
Architectural Considerations
N-bit processor
N-bit ALU, registers, buses, memory data interface Embedded: 8-bit, 16bit, 32-bit common Desktop/servers: 32bit, even 64
Control unit Controller Processor Datapath ALU Control /Status
PC=101
clk
Registers
Registers
10
11
R1
PC
IR
PC=102
clk
10 501 11
... ...
13
I/O Memory
14
Architectural Considerations
Clock frequency
Inverse of clock period Must be longer than longest register to register delay in entire processor Memory access is often the longest
Control unit Controller Processor Datapath ALU Control /Status Registers Dry Wash
Time
Time
2 1
3 2 1
4 3 2 1
5 4 3 2 1
6 5 4 3 2
7 6 5 4 3
8 7 6 5 4 8 7 6 5 8 7 6 8 7 8 Pipelined
Instruction 1
Time
15
16
Princeton
Fewer memory wires
Harvard
Simultaneous program and data memory access
Program memory Data memory Memory (program and data)
Harvard
Princeton
17
18
7/7/2011
Cache Memory
Memory access may be slow Cache is small but fast memory close to processor
Holds copy of part of memory Hits and misses
Fast/expensive technology, usually on the same chip
Programmers View
Programmer doesnt need detailed understanding of architecture
Instead, needs to know what instructions can be executed
Processor
Cache
Memory
19
20
Assembly-Level Instructions
Instruction 1 Instruction 2 Instruction 3 Instruction 4 opcode opcode opcode opcode operand1 operand1 operand1 operand1 ... operand2 operand2
operand2 MOV direct, Rn operand2 MOV @Rn, Rm MOV Rn, #immed. ADD Rn, Rm
Instruction Set
Defines the legal set of instructions for that processor
Data transfer: memory/register, register/register, I/O, etc. Arithmetic/logical: move register through ALU and back Branches: determine next PC value when not just PC+1
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
21
22
Addressing Modes
Addressing mode Operand field Register-file contents Memory contents C program
Sample Programs
Equivalent assembly program 0 1 2 3 MOV R0, #0; MOV R1, #10; MOV R2, #1; MOV R3, #0; JZ R1, Next; ADD R0, R1; SUB R1, R2; JZ R3, Loop; // next instructions... // total = 0 // i = 10 // constant 1 // constant 0 // Done if i=0 // total += i // i-// Jump always
Immediate Register-direct
Data
Register address
Data int total = 0; for (int i=10; i!=0; i--) total += i; // next instructions...
Register indirect
Loop: 5 6 7 Next:
Register address
Memory address
Data
Direct
Memory address
Data
Indirect
Memory address
Memory address
Data
24
7/7/2011
Programmer Considerations
Program and data memory space
Embedded processors often very limited
e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
I/O
How communicate with external signals?
Interrupts
25
26
PC Parallel port
Pin 2 LED
Sw itch
; save the content ; save the content dx, 3BCh + 1 ; base + 1 for register #1 al, dx ; read register #1 al, 10h ; mask out all but bit # 4 al, 0 ; is it 0? SwitchOn ; if not, we need to turn the LED on
Pin 13
Using assembly language programming we can configure a PC parallel port to perform digital I/O
write and read to three special registers to accomplish this table provides list of parallel port connector pins and corresponding register location Example : parallel port monitors the input switch and turns the LED on/off accordingly
PC Parallel port
Pin 2 LED
Sw itch
dx, 3BCh + 0 ; base + 0 for register #0 al, dx ; read the current state of the port al, f7h ; clear first bit (masking) dx, al ; write it out to the port Done ; we are done
Register Address 0th bit of register #2 0th bit of register #2 6,7,5,4,3th bit of register #1 1,2,3th bit of register #2
27
28
Operating System
Optional software layer providing low-level services to a program (application).
File management, disk access Keyboard/display interfacing Scheduling multiple programs for execution
Or even just multiple threads from one program
Development Environment
Development processor
The processor on which we write and debug our programs
Usually a PC
Target processor
DB file_name out.txt -- store file name MOV R0, 1324 MOV R1, file_name INT 34 JZ R0, L1 -- system call open id -- address of file-name -- cause a system call -- if zero -> error
The processor that the program will run on in our embedded system
Often different from the development processor
. . . read the file JMP L2 -- bypass error cond. L1: . . . handle the error L2:
Development processor 29
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Target processor 30
7/7/2011
Running a Program
If development processor is different than target, how can we run our compiled code? Two options:
Download to target processor Simulate
Cross compiler
Runs on one processor, but generates code for another
Debugger Profiler
Simulation
One method: Hardware description language
But slow, not always available
Verification Phase
32
ISS
Implementation Phase
Implementation Phase
Verification Phase
Development processor
Debugger / ISS
Emulator
Gives us control over time set breakpoints, look at register values, set values, step-by-step execution, ... But, doesnt interact with real environment
Download to board
Use device programmer Runs in real environment, but not controllable
External tools
Compromise: emulator
Programmer
Verification Phase
Runs in real environment, at speed or near Supports some controllability from the PC
34
33
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc. Tightly integrated for programmer, typically part of register space
Still programmable
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
On-chip program and data memory Direct programmer access to many of the chips pins Specialized instructions for bit-manipulation and other low-level operations
35
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
36
7/7/2011
DSP features
Several instruction execution units Multiple-accumulate single-cycle instruction, other instrs. Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.
Opportunity to add a custom datapath hardware and a few custom instructions, or delete a few instructions
Can have significant performance, power and size impacts Problem: need compiler/debugger for customized ASIP
Remember, most development uses structured languages One solution: automatic compiler/debugger generation
e.g., www.tensillica.com
37
38
Selecting a Microprocessor
Issues
Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc.
Processor Intel PIII 1GHz
IBM PowerPC 750X MIPS R5000 StrongARM SA-110 Intel 8051 Motorola 68HC811
550 MHz
32/64
~1300
5W
~7M
$900
32/64 32
NA 268
NA 1W
3.6M 2.1M
NA NA
12 MHz 3 MHz
4K ROM, 128 RAM, 32 I/O, Timer, UART 4K ROM, 192 RAM, 32 I/O, Timer, WDT, SPI 128K, SRAM, 3 T1 Ports, DMA, 13 ADC, 9 DAC 16K Inst., 2K Data, Serial Ports, DMA
8 8
Microcontroller ~1 ~.5
~0.2W ~0.1W
~10K ~10K
$7 $5
SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC EDN Embedded Benchmark Consortium, www.eembc.org
Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
TI C5416
160 MHz
NA
NA
$34
Lucent DSP32C
80 MHz
32
40
NA
NA
$75
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
39
40
Reset Fetch
PC=0;
Datapath
RFs
2x1 mux
RFw
Decode
Mov1
op = 0000
RF[rn] = M [dir] to Fetch M [dir] = RF[rn] to Fetch M [rn] = RF[rm] to Fetch RF[rn]= imm to Fetch RF[rn] =RF[rn]+RF[rm] to Fetch RF[rn] = RF[rn]-RF[rm] to Fetch PC=(RF[rn]=0) ?rel :PC to Fetch
RFwa RFwe
RF (16)
RFr1a RFr1e RFr2a RFr1 RFr2e ALUs ALU ALUz RFr2
Mov2
0001
16 PC IR
Mov3
0010
0011
Mov4
0100
Add
Sub
0101 dir IR[7..0] imm IR[7..0] rel IR[7..0]
Connections added among the components ports corresponding to the operations required by the FSM Unique identifiers created for every control signal
3x1 mux
M re M we
Memory
Jz
0110
41
42
7/7/2011
A Simple Microprocessor
Reset Fetch Decode
PC=0; IR=M [PC]; PC=PC+1 from states below PCclr=1; M S=10; Irld=1; M re=1; PCinc=1; RFwa=rn; RFwe=1; RFs=01; M s=01; Mre=1; RFr1a=rn; RFr1e=1; M s=01; Mwe=1; RFr1a=rn; RFr1e=1; M s=10; Mwe=1; PCld 0011 0100 0101 0110 RFwa=rn; RFwe=1; RFs=10; RFwa=rn; RFwe=1; RFs=00; RFr1a=rn; RFr1e=1; RFr2a=rm; RFr2e=1; ALUs=00 RFwa=rn; RFwe=1; RFs=00; RFr1a=rn; RFr1e=1; RFr2a=rm; RFr2e=1; ALUs=01 PCld= ALUz; RFrla=rn; RFrle=1; PCinc PCclr 2 Ms 1 0 PC
Chapter Summary
General-purpose processors
Datapath
RFs
Control unit
Mov1
op = 0000 0001 0010
RF[rn] = M [dir] to Fetch M [dir] = RF[rn] to Fetch M [rn] = RF[rm] to Fetch RF[rn]= imm to Fetch RF[rn] =RF[rn]+RF[rm] to Fetch RF[rn] = RF[rn]-RF[rm] to Fetch PC=(RF[rn]=0) ?rel :PC to Fetch
To all input contro l signals From all output control signals Irld
2x1 mux
RFw
RFwa RFwe
RF (16)
RFr1a RFr1e RFr2a RFr2e ALUs ALU ALUz RFr1 RFr2
IR
ASIPs
Microcontrollers, DSPs, network processors, more customized ASIPs
3x1 mux
M re M we
FSM operations that replace the FSMD operations after a datapath is created
Memory
Choosing among processors is an important step Designing a general-purpose processor is conceptually the same as designing a single-purpose processor
43
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
44