Tutorial 09 DSP IO Transceivers
Tutorial 09 DSP IO Transceivers
Tutorial 9
Michal Kubíček
Department of Radio Electronics, FEEC BUT Brno
Vytvořeno za podpory projektu OP VVV Moderní a otevřené studium techniky CZ.02.2.69/0.0/0.0/16_015/0002430.
Tutorial 9
FPGAs in detail
❑ DSP in FPGA
❑ IO cells
❑ Signal integrity
❑ Synchronous data interfaces
❑ High speed transceivers
page 2 kubicek@vutbr.cz
DSP in FPGA
page 3 kubicek@vutbr.cz
Basic DSP blocks
page 4 kubicek@vutbr.cz
Basic DSP blocks
page 5 kubicek@vutbr.cz
Basic DSP blocks
FFT
page 6 kubicek@vutbr.cz
Basic DSP blocks
page 7 kubicek@vutbr.cz
Basic DSP blocks
page 8 kubicek@vutbr.cz
DSP basics
page 9 kubicek@vutbr.cz
Binary number representation
page 10 kubicek@vutbr.cz
DSP basics
page 11 kubicek@vutbr.cz
DSP basics
page 12 kubicek@vutbr.cz
DSP basics
page 13 kubicek@vutbr.cz
DSP basics
alternative calculation:
signed integer 1/ 1/ 1/ signed INTEGER part + fractions
2 8 32
(1 bit sign + -4 + 1/2 + 1/16 + 1/32 = -3,40625
1/ 1/
2 bit value) 4 16
page 14 kubicek@vutbr.cz
DSP basics
page 15 kubicek@vutbr.cz
DSP basics
❑ The problem
• Very limited support in CAD tools (both synthesis and simulation tools)
• Xilinx ISE: no support (neither synthesis, nor simulation)
• Xilinx Vivado: only a limited subset of VHDL-2008 is supported (see ug900 and ug901 for
appropriate version of your tool - ex. 2019.2)
page 16 kubicek@vutbr.cz
DSP basics
use ieee.fixed_pkg.all;
....
VHDL-2008 DSP function support using
signal a, b : sfixed (7 downto -6); VHDL-93 language syntax (via a free
signal c: sfixed (8 downto -6); VHDL package).
begin
....
a <= to_sfixed (-3.125, 7, -6); Warning!!! Only limited support in XST
b <= to_sfixed (inp1, b’high, b’low); (Xilinx ISE synthesis)!!!
c <= a + b;
-- The decimal point is assumed to be between the "0" and "-1" index.
-- signal y : ufixed (4 downto -5)" as the data type (unsigned fixed point,
-- 10 bits wide, 5 bits of decimal), then y = 6.5 = "00110.10000", or
-- simply: y <= "01011010000";
page 17 kubicek@vutbr.cz
Bit width reduction, rounding
page 18 kubicek@vutbr.cz
DSP basics
Add 8b + 8b (unsigned)
254 + 254 = 508 = 1_1111_1100'b
Mathematics: The results is one-bit larger than the larger of the two operands.
VHDL: The result has only size of the larger of the two operands.
Solution: Fix the size of at least one operand to prevent overflow (can be described using
a dummy addition of a zero constant with no HW cost).
page 19 kubicek@vutbr.cz
DSP basics
Mathematics: The results is one-bit larger than the larger of the two operands.
VHDL: The result has only size of the larger of the two operands.
Solution: Fix the size of at least one operand to prevent overflow (can be described using
a dummy addition of a zero constant with no HW cost).
page 20 kubicek@vutbr.cz
DSP basics
Multiply 8b x 8b (unsigned)
254 * 254 = 64516 = 1111_1100_0000_0100'b
Mathematics: The results size is equal to the sum of lengths of both operands.
VHDL: The results size is equal to the sum of lengths of both operands.
Solution: No action needed.
page 21 kubicek@vutbr.cz
DSP basics
Mathematics: In fact both numbers have 7 bits + 1 sign bit ➔ the size of the result
should be 7+7+1(sign) = 15 bits (no need for two sign bits in the result). The only
exception is the multiplication of two most negative numbers (-FS = negative full scale
values). This only operation results in a necessity of the 16th bit.
page 22 kubicek@vutbr.cz
DSP basics
VHDL: The results size is equal to the sum of lengths of both operands.
Solution: To save some hardware resources it is possible to drop the MSB of the result.
The operation -FS*-FS then results in a numerical error, which can be treated by checking
input values (the -FS at the multiplier input is handled as an overflow exception).
page 23 kubicek@vutbr.cz
DSP basics
Motivation: In some DSP blocks (like FIR filters) there are many stages of multiply
operation. The extra bits (when kept) accumulate through the block structure, which either
results in a requirement for large bit widths throughout the DSP block (much higher HW
requirements) or degrades output dynamic range (only a small portion of the result
represents a useful signal).
page 24 kubicek@vutbr.cz
DSP basics
Bit-width reduction
Why?
❑ To reduce hardware requirements (number of utilized FFs, LUTs, DSPs, BRAMs...)
❑ Adjust bit-width to the required output width
Problems:
❑ A small error appears at the output (noise and/or DC offset)
❑ The bit-width analysis and optimization is often a challenging task even for an
experienced engineer
16b
16b
24b
31b 24b
16b
24b
page 25 kubicek@vutbr.cz
DSP basics
CIC filter -1
z
-1
z
page 26
DSP basics
page 27
DSP basics
page 28 kubicek@vutbr.cz
DSP basics
page 29 kubicek@vutbr.cz
DSP functions:
maximum operating frequency
page 30 kubicek@vutbr.cz
DSP basics
CLK
Data
TCKO TLOG + TROUTE TSU
page 31 kubicek@vutbr.cz
DSP basics
CIC filter
FMAX ...?
page 32 kubicek@vutbr.cz
DSP basics
FIR filter
FMAX ...?
page 33 kubicek@vutbr.cz
Implementation of elementary arithmetic
functions in FPGA
page 34 kubicek@vutbr.cz
DSP in FPGA (implementation)
Integer
19 x LUT (<1%)
page 35 kubicek@vutbr.cz
DSP in FPGA (implementation)
Integer
33 x LUT (<1%)
page 36 kubicek@vutbr.cz
DSP in FPGA (implementation)
Integer
page 37 kubicek@vutbr.cz
DSP in FPGA (implementation)
page 38
DSP in FPGA (implementation)
Integer
page 39 kubicek@vutbr.cz
DSP in FPGA (implementation)
Floaing Point
page 40 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
page 41 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
page 42 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
1 x MULT18X18 (1 of 20)
maximum clock frequency
240 MHz
page 43 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
Integer
page 44 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
Floating Point
page 45 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
page 46 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
page 47 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
page 48 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
page 49 kubicek@vutbr.cz
DSP in FPGA – dedicated blocks for DSP
page 50 kubicek@vutbr.cz
Floating-Point DSP
page 51 kubicek@vutbr.cz
Floating-Point DSP
page 52 kubicek@vutbr.cz
Floating-Point DSP
page 53 kubicek@vutbr.cz
Resource sharing
Resource sharing
Try to share resource-hungry components
(like multipliers) to reduce HW requirements
page 54 kubicek@vutbr.cz
Resource sharing
Resource sharing
FIR filter example:
• Symmetric impulse response with 256 samples
• Sampling frequency 16 MS/s
• Clock frequency 128 MHz
• 16 MS/s at 128 MHz clock ➔ 8 clock cycles for each data sample
• Symmetric impulse response ➔ only 128 multiplications needed (255 additions)
• 8 clock cycles for 128 multiplications ➔ at least 16 multipliers are required
page 55 kubicek@vutbr.cz
Resource sharing
Resource sharing
FIR filter example:
• Symmetric impulse response with 128 samples
• Sampling frequency 122.88 MS/s
• Clock frequency 250 MHz
• 122.88 MS/s at 250 MHz clock ➔ 2 (integer) clock cycles for each data sample
• Symmetric impulse response ➔ only 64 multiplications needed (127 additions)
• 2 clock cycles for 64 multiplications ➔ at least 32 multipliers are required
page 56 kubicek@vutbr.cz
Resource sharing
FIR filter
page 58 kubicek@vutbr.cz
FPGA versus Digital Signal Processor
Alan Gatherer
CTO Communications Infrastructure Group, Texas Instruments
page 59 kubicek@vutbr.cz
FPGA versus Digital Signal Processor
page 60 kubicek@vutbr.cz
FPGA versus Digital Signal Processor
page 61 kubicek@vutbr.cz
FPGA versus Digital Signal Processor
page 62 kubicek@vutbr.cz
DSP to FPGA
page 64 kubicek@vutbr.cz
An example of simple DSP module implementation
and verification
page 65 kubicek@vutbr.cz
Tutorial 9
FPGAs in detail
❑ DSP in FPGA
❑ IO cells
❑ Signal integrity
❑ Synchronous data interfaces
❑ High speed transceivers
page 66 kubicek@vutbr.cz
Input / Output Cells
(Tiles, Blocks)
page 67 kubicek@vutbr.cz
IO cells
page 68 kubicek@vutbr.cz
IO cells
Requirements on an IO cell
❑ Support of multiple logic standards (different voltage levels)
❑ Support of differential pairs
❑ Parallel bus synchronization support
❑ Fast data transmission (over 1 Gbps)
❑ Integrated termination resistors
❑ ESD protection, pull-up / pull-down resistors...
page 69 kubicek@vutbr.cz
IO cells
page 70 kubicek@vutbr.cz
IO cells
page 71 kubicek@vutbr.cz
IO cells
IO cell structure –
development
❑ Support for both SDR and DDR
interfaces (dedicated SDR/DDR Flip-Flops in
the IO cell)
❑ Support for many different logic
standards (single-ended and differential),
programmable output driver (output
current, slew rate)
❑ Internal termination (Digitally
Controlled Impedance; DCI)
❑ Internal pull-up / pull-down resistor
❑ Integrated ESD protection
page 72
IO cells
page 73 kubicek@vutbr.cz
Logic standards
page 74 kubicek@vutbr.cz
Logic standards
page 75 kubicek@vutbr.cz
Logic standards
page 76 kubicek@vutbr.cz
Logic standards
Single-ended logic
standards
Input and output logic levels.
There are many standards supported
by modern FPGAs.
page 77
Logic standards
page 78 kubicek@vutbr.cz
Logic standards
page 79 kubicek@vutbr.cz
Logic standards
LVDS
page 80 kubicek@vutbr.cz
Logic standards
page 81 kubicek@vutbr.cz
FPGA banks
page 82 kubicek@vutbr.cz
FPGA: IO Banks
FPGA bank
❑ IO pins of an FPGA are grouped into so
called BANKs
❑ Each BANK has its own power supply
input
❑ According to the power supply voltage
each BANK can support some logic
standards (for example at 3.3V power
supply the bank can use 3.3V LVCMOS or
LVTTL logic but not 2.5V LVCOMS logic).
❑ Very useful for interfacing with chips
with different voltage standard interfaces
page 83 kubicek@vutbr.cz
FPGA: IO Banks
FPGA bank
FPGA: IO Banks
FPGA bank
Synchronous data interfaces
page 86 kubicek@vutbr.cz
Synchronous data interfaces
page 87
Synchronous data interfaces
page 88 kubicek@vutbr.cz
Synchronous data interfaces
page 89 kubicek@vutbr.cz
Synchronous data interfaces
page 90 kubicek@vutbr.cz
Synchronous data interfaces
page 91 kubicek@vutbr.cz
Synchronous data interfaces
Tbit = 2 000 ps
1 mm PCB ~ cca 7 ps
10 cm PCB ~ cca 700 ps
page 92 kubicek@vutbr.cz
Synchronous data interfaces
Timing requirements
Tbit
Tbit = 2 000 ps
tSUmin = 600 ps
tHmin = 330 ps
page 93 kubicek@vutbr.cz
Synchronous data interfaces
Timing requirements
Each data line must have same propagation delay so that data are valid at the same
moment at the receiver ➔ need to match electric length of all the data paths.
Meanders are used to stretch all the PCB traces to match their length to the longest one.
FPGA interface clock input
page 95 kubicek@vutbr.cz
Synchronous data interfaces
ADC FPGA
page 96 kubicek@vutbr.cz
Synchronous data interfaces
ADC FPGA
page 97 kubicek@vutbr.cz
Synchronous data interfaces
page 98 kubicek@vutbr.cz
Synchronous data interfaces
page 99 kubicek@vutbr.cz
Synchronous data interfaces
DDR interface
Native support of DDR functionality directly in IO cells (DDR cannot be implemented
without such components!)
Spartan-3
7-series
DDR interface
Native support of DDR functionality directly in IO cells.
Rule of thumb:
Treat a wire as a transmission line whenever its propagation
delay is more than 6-times larger than edge time of the
transmitted signal.
To use the transmission line means to use a wire with a
characteristic impedance and to terminate it with this characteristic
impedance on both sides.
T 60 1.9(2𝐻 + 𝑇)
❑ Stripline 𝑍0 (Ω) = 𝑙𝑛
H er ε𝑟 (0.8𝑊 + 𝑇)
H
H
H1 er
❑ Asymetric Stripline
H ❑ Differential Stripline
❑ Differential Microstrip
H
Trace impedance
Test patterns
Er ~ 6
Er ~ 3
Trace impedance
Er ~ 6
Er ~ 3
page 125
Signal Integrity (SI)
❑ Measurement
• S-parameters – vector network analyzer
• Time Domain Reflectometry – dedicated measurement instrument
• Eye diagram – oscilloscopes (both real-time or sampling can be used)
Simulation - CST
S-parameters
What's next?
Today we are able to communicate at
about 50 Gb/s on short distances
(few centimeters) using differential
pairs on FR4-based Cu plated PCBs.
Optical waveguides on a PCB can
significantly increase bandwidth and
enable longer paths.
Problem with coupling of the optical
signal to/from the waveguide.
Synchronous
Asynchronous
CH0 Tx
CH0 Rx
CH1 Tx
CH1 Rx
High speed transceivers
Logic standard
Current Mode Logic (CML)
Source-Coupled Logic (SCL)
8B/10B encoding
8B/10B encoding
11110000 10010110
PCB attenuation
Skin effect and proximity effect: the higher signal frequency the higher wire
resistance (for 10 GHz signal a copper trace has resistance of about 1 Ω per inch)
Dielectric attenuation: FR4 has large dissipation factor (0.02-0.03). For demanding
application a high quality dielectric can be used (dissipation factor 0.001 or lower)
2
=
2 f
High speed transceivers
1 oz = 35 um
½ oz = 18 um
High speed transceivers
Equalization
❑ Signal attenuation: degradation of signal quality, namely edge slope. Quality of the
signal is measured at the receiver using the eye diagram – wide open eye (both vertically
and horizontally) is required for reliable data transmission.
Eye diagram
❑ Eye opening is directly related to a bit error rate
(BER)
❑ Even a relatively low error rate of 10-12 can be
unacceptable for very high speed data transmissions
What's next?
Current transceivers are capable of 30 Gbps per differential pair for NRZ endoding, or 56
Gbps for PAM-4 encoding. This is close to physical limits for PCB traces ➔ need to search
for alternative solutions (like optical links).