Lec24 Exploration
Lec24 Exploration
Lec24 Exploration
IRU:LUHOHVV$SSOLFDWLRQV
Jan M. Rabaey
BWRC
University of California @ Berkeley
http://www.eecs.berkeley.edu/~jan
1
The Smart Home
Security
Environment monitoring and control
Dense network of Object tagging
sensor and monitor nodes Identification
2
The Changing Metrics
Power
Cost
Flexibility
3
Some interesting numbers
• Energy cost of digital computation
– 1999 (0.25µm): 1pJ/op (custom) … 1nJ/op (µproc)
– 2004 (0.1µm): 0.1pJ/op (custom) … 100pJ/op (µproc)
• Factor 1.6 per year; Factor 10 over 5 years
• Assuming reconfigurable implementation: 1 pJ/op
• Energy cost of communication
– 1999 Bluetooth (2.4 GHz band, 10m distance)
• 1 nJ/bit transmission energy (thermal limit 30 pJ/bit)
• Overall energy: 170 nJ/bit reception / 150 nJ/bit transmission (!)
• Standby power: 300 µW
– 2004 Radio (10 m)
• Only minor reduction in transmission energy
• Reduce transceiver energy with at least a factor 10-50
• Trade-off
– @10m: 5000 operations / transmitted bit
– @ 1m: 0.5 operations / transmitted bit
Multi-
Spectral performance, and energy are the real
RAM + 1 Gbit DRAM
Imager issues!
Preprocessing DSP and control intensive
Mixed-mode
64 SIMD Processor µC
Array + SRAM Combines programmable and
system application-specific modules
+2 Gbit
Image Conditioning DRAM
100 GOPS Recog-
nition
4
The System-on-a-Chip Nightmare
“Femme se coiffant”
coiffant”
Pablo Ruiz Picasso
1940
Mem
Ctrl.
Bridge
MPEG
The “Board-on-a-Chip”
Approach
I O O
C
Custom
Courtesy of Sonics, Inc
Interfaces
5
The Wireless Challenge
&RQWURO
Call Slot Synchron-
UI Setup Allocation ization
'DWD
Application Network Mac/ Physical
Data Link + RF
6
The Mostly Digital Radio
Analog Digital
cos[2π(2GHz)t]
RF input
(fc = 2GHz) I (50MS/s)
A/D
Digital
Baseband
Receiver
RF filter LNA A/D
Q (50MS/s)
chip boundary
sin[2π(2GHz)t]
Architectural Choices
Prog Mem
Flexibility
Prog M em
µP µP
Prog Mem
Sate llite MAC Addr General
Unit Gen Purpose
µP Processor
µP
Direct
Mapped
Hardware 1/Efficiency
7
The Energy-Flexibility Gap
1000
MOPS/mW (or MIPS/mW)
Dedicated
HW
100
Reconfigurable Pleiades
Energy Efficiency
level
Performance analysis
Constraints
level
Performance analysis
8
The fully programmable approach
• Flexible platform for
experimentation on
networking and
protocol strategies
• Size: 3”x4”x2”
• Power dissipation < 2 W
(peak)
• Multiple radio modules:
Bluetooth, Proxim, …
• Collection of sensor
and monitor cards
• Fully operational by late
spring (including
software support
system)!
9
Two-Chip Intercom
Custom Mixed Program- Software
Fixed
analog analog/ mable logic running on
logic
circuitry digital processor
Protocol
Σ-∆ ADC
Chip 1 Chip 2
phone
Keypad,
Physical Accelerators Appl.
book
Display
analog digital
DSP core
Programmable Hardware
10
Digital Baseband
Simulink example:
Matched filter correlator
Stateflow example:
Receiver controller
Rates Duration
Tool:
Hz MHz s us Microsoft
Chip 2.50E+07 25.00 4.00E-08 0.04 Chips per Symbol 31
Symbol 8.06E+05 0.81 1.24E-06 1.24 Bits per Symbol 2 Excel
Bit 1.61E+06 1.61 6.20E-07 0.62
0.00
Pilot symbol 1.24E-06 1.24 Pilot sequence length 15
Pilot sequence 1.86E-05 18.60
PD (# of symbols) 10
PD 0.0000124 12.40 DAT (# of symbols) 3800 OK
DAT 0.004712 4712.00
meters feet
time from RX to TX transition until : Min distance 5 16.40
first DAT clock on transmitter 4.96E-05 49.60 (1) Max distance 10 32.81
11
Physical Layer Design
Physical Layer Protocol
12
The Intercom Protocol Stack
Service Requests Voice samples
Transport Layer
Transport
Mac Layer
Filter MAC
Transmit Receive
Synchronization
Simulation
CFSM model Refinement
Formal Verification
Formal Formal
13
Co-design Finite State machines
• Three-level hierarchy
– top level: asynchronous, partially ordered
(bounded buffer non- blocking single- read communication)
– middle level: synchronous FSM
(atomic event- and condition- based transition)
– bottom level: Synchronous DataFlow- like
(FSM provides tokens and selects active sub- network)
14
POLIS/VCC Design Flow
Mulaw 100
Transport 300
MAC 270 42
Transmit 120 16
Receive 140 2
Synchronization 17
• CFSM
• VCC, Polis
15
Formal Verification
• System satisfies certain properties?
– System described in some formal mathematical languages (e.g.
Esterel)
– Properties written in some formal logic (e.g. LTL) or formal model (e.g.
Esterel)
• Property Verification
– Invariant (only one remote can send voice data in any time slot)
– Response (if a remote sends a request to the base station, then
eventually there is an acknowledgement)
– deadlock freedom
• Refinement Checking
– Does the (low-level) implementation conform with the (high-level)
specification?
(Do the mapped CFSMs function the same as the specification?)
• Mocha (Henzinger): Modularity in Model Checking
✖ 1272.
%DVH VWDWLRQ
16
Why it Fails?
• Remote accepts Disc from the user even if
it is not connected
• After the remote has sent DiscReq and
waits for acknowledgement
• However, base station ignores DiscReq if
remote is not registered
Interconnect Network
Configurable
Baseband Programmable
Logic
Processing Protocol Stack
(Physical Layer)
17
Describing the Architecture
• Xtensa embedded CPU
(Tensilica, Inc)
– Configurability allows designer to keep
“minimal” hardware overhead
– ISA (compatible with 32 bit RISC) can
be extended for software optimizations ◆ Tensilica model in VCC
– Fully synthesizable inst,LD,2 inst,MUL.c,9 inst,DIV.i,118
– Complete HW/SW suite inst,LI,1 inst,MUL.s,10 inst,DIV.l,122
inst,ST,2 inst,MUL.i,18 inst,DIV.f,145
• VCC modeling for exploration inst,OP.c,2
inst,OP.s,3
inst,MUL.l,22
inst,MUL.f,45
inst,DIV.d,155
inst,IF,5
– Requires mapping of “fuzzy” inst,OP.i,1 inst,MUL.d,55 inst,GOTO,2
inst,OP.l,1 inst,DIV.c,19 inst,SUB,19
instructions of VCC processor model inst,OP.f,1 inst,DIV.s,110 inst,RET,21
to real ISA inst,OP.d,6
– Requires multiple models depending
on memory configuration
– ISS simulation to validate accuracy of
model
I O SiliconBackplane
C MEM
AgentTM
Guaranteed Bandwidth
Example: “The Silicon Backplane” (Sonics, Inc) Arbitration
18
Describing the Architecture
◆ SONICS model in VCC
Arbiter
OCP
Target Target
Agent Core
TCI Architecture
19
Exploring Architectural Mappings
Software
Processor
Application
Transport
Mu-law
MAC
ASIC
Accelerators
Rest
20
Implementation Fabrics for Protocols
RACH
req
RACH
akn
A protocol =
Extended FSM
idle
RACH
Memory
slotset
read write
update R_ENA
idle
W_ENA
BUF
BUF
Slot_Set_Tbl
2x16
addr
21
HW Mapping Experiment: STD to Std. Cell
Area Comparison – Manual versus Automated
3500
3000
2500
Manual Design Compiler
2000 SF2VHD Design Compiler
# Gates
500
0
PhySend
7000
6000
5000
Manual FPGA Express
4000 SF2V HD FPGA Express
# G ates
1000
0
PhySend
22
HW Mapping Experiment: STD to Flexible Imp.
Area Comparison - FPGA x PLD (Manual)
1400
1200
1000
Xilinx FPGA
800
Altera PLD
# Gates
600 CoolRunner
400
200
0
TCI CRC TCI CRC+FSM PhysSend
1200
1000
0
TCI CRC TCI CRC+FSM PhysSend
23
HW Mapping Experiment: Power
Consumption
FPGA versus PLD
70
60
50
40
LCA
M A X7 0 0 0
30
20
10
0
TCI CRC T C I C R C +F SM Ph ys Se n d F S M
level
Performance analysis
Constraints
level
Performance analysis
24
The Applications and Specs
The Obvious Choice -The
The Smart Home and Network Appliances
Security
Environment monitoring and control
Dense network of Object tagging
sensor and monitor nodes Identification
25
The Software-Defined Radio
FPGA Embedded uP
Dedicated FSM
Dedicated Reconfigurable
DSP DataPath
Communication Request
Media Network layer
Access Layer
(Data type, BW, latency, BER)
Source (Point-to-Point, multi-hop, star)
(T-C-F-DMA) Dest
(xs,ys) (xd,yd)
Physical Layer
(Band,Modulation)
• Based on well-defined abstraction layers
• Step-wise refinement (partitioning, resource
mapping and sharing) enables correctness
verification
• Automatic synthesis of adaptive protocols in
hard- and software
26
PicoRadio Energy Optimization
The Cost of Communication
Assumes R-4 loss due to ground wave
90dBm (@ 1 GHz) 90dBm
s
bp
0K
Transceiver Power
10 50dBm
Transmit Power
50dBm
10dBm 10dBm
-30dBm -30dBm
-70dBm -70dBm
1m 10m 100m 1Km 10Km
Distance
Example:
• 1 hop over 50 m
1.25 nJ/bit
• 5 hops of 10 m each log(β/α)
5 × 2 pJ/bit = 10 pJ/bit
• Multi-hop reduces transmission energy by 125! Optimal number of hops needed for
(assuming path loss exponent of 4) free space path loss.
27
Network Model Node Model Process Model
Analysis Viewer
OPNET
Network Simulator
4000 16000
3500 14000
3000 12000
2500 10000
2000 8000
1500 DSDV 6000
1000 AODV 4000
500 2000
0 0
20 33 56 20 33 56
Number of Nodes Number of Nodes
28
Summary
• Low-energy design ascends to prime time
forced mainly by the last-meter problem
• System-on-a-Chip approach enables and demands
heterogeneous implementation strategies, sometimes involving
non-intuitive and innovative design platforms
• Design exploration over various fabrics and partitions has
dramatic impact on dominant metrics, such as energy and cost
• It requires orthogonalization of function and architecture,
supplemented with performance models (cost, time, energy)
• This methodology holds at all levels of the system hierarchy
29