Floating Point: Adders and Multipliers

Concordia University
FLOATING POINT
ADDERS AND MULTIPLIERS
1
Lecture #4
In this lecture we will go over the following concepts:
1) Floating Point Number representation

2) Accuracy and Dynamic range; IEEE standard
3) Floating Point Addition
4) Rounding Techniques
5) Floating point Multiplication
6) Architectures for FP Addition
7) Architectures for FP Multiplication
8) Comparison of two FP Architectures
9) Barrel Shifters
2
- Single and double precision data formats of IEEE 754 standard
Sign 8 bit - biased

23 bits - unsigned fraction P
S Exponent E
(a) IEEE single precision data format
Sign 11 bit - biased

52 bits - unsigned fraction p
S Exponent E
(b) IEEE double precision data format
3
Format parameters of IEEE 754 Floating Point Standard
Parameter Format
Single Double
Precision Precision
Format width in bits 32 64
Precision (p) = 23 + 1 52 + 1
fraction + hidden bit
Exponent width in bits 8 11
Maximum value of exponent + 127 + 1023
Minimum value of exponent -126 -1022
4
- Range of floating point numbers
Underflow
Overflow Wi thin Range Within Range Overflow
- Negative numbers Positive numbers 

0
Denormalized
5
Exceptions in IEEE 754
Exception Remarks
Overflow Result can be   or default maximum value
Underflow Result can be 0 or denormal
Divide by Zero Result can be  
Invalid Result is NaN
Inexact System specified rounding may be required
6
• Operations that can generate Invalid Results
Operation Remarks
Addition/ An operation of the type   

Subtraction
Multiplication An operation of the type 0 x 
Division Operations of the type 0/0 and /
Remainder Operations of the type x REM 0 and  REM y
Square Root Square Root of a negative number
7
IEEE compatible floating point multipliers
• Algorithm
Step 1
Calculate the tentative exponent of the product by adding the biased exponents of the two numbers, subtract-
ing the bias, (). bias is 127 and 1023 for single precision and double precision IEEE data format respectively
Step 2
If the sign of two floating point numbers are the same, set the sign of product to ‘+’, else set it to ‘-’.
Step 3
Multiply the two significands. For p bit significand the product is 2p bits wide (p, the width of significand
data field, is including the leading hidden bit (1)). Product of significands falls within range .
Step 4
Normalize the product if MSB of the product is 1 (i.e. product of ), by shifting the product right by 1 bit
position and incrementing the tentative exponent.
Evaluate exception conditions, if any.
Step 5
Round the product if R(M0 + S) is true, where M0 and R represent the pth and (p+1)st bits from the left end
of normalized product and Sticky bit (S) is the logical OR of all the bits towards the right of R bit. If the
rounding condition is true, a 1 is added at the pth bit (from the left side) of the normalized product. If all p
MSBs of the normalized product are 1’s, rounding can generate a carry-out. In that case normalization (step
4) has to be done again.
8
Operands Multiplication and Rounding
p-bit significand field
1 p - 1 lower order bits

Input
Significands } 1 p - 1 lower order bits
Significands before multiplication
2p bits
Cout
Result of significand multiplication before normalization shift
p-bit significand field
p - 1 higher order bits M0 R S
Normalized product before Rounding
Figure 2.4 - Significand multiplication, normalization and rounding
9
What’s the
best
architecture?
Architecture Consideration
Concordia University 10
A Simple FP Multiplier
Sign1 Sign2 Exp1 Exp2 Significand1 Significand2
Significand
Exponent & Sign
Multiplier
Logic
Normalization
Logic
Rounding
Logic
Correction Shift
Result Flags Logic
Result
Selector
Flags IEEE Product
11
A Dual Path FP Multiplier Concordia University
Exponents Input Floating Point Numbers
1st
Exponent Logic Control / Sign Logic
2nd
Significand Multiplier
(Partial Product
Bypass Logic
Processing)
3rd
Critical
Path CPA / Rounding Logic Sticky Logic
Path 2
Exponent Result Selector /

Incrementer Normalization Logic
Result Integration / Flag Logic
Flag bits IEEE product
12
S Exponent Significand
Case-1
Operand1 0 10000001 00000000101000111101011
Normal
Number Operand2 0 10000000 10101100110011001100110
Result 0 10000010 10101101110111110011100
Case-2 S Exponent Significand
Normal Operand1 0 10000000 00001100110011001100110
Number Operand2 0 10000000 00001100110011001100110
Result 0 10000001 00011010001111010110111
13
Comparison 0f 3 types of FP Multipliers using 0.22
micron CMOS technology
AREA POWER Delay

(cell) (mW) (ns)
Single Data Path FPM 2288.5 204.5 69.2
Double Data Path FPM 2997 94.5 68.81
Pipelined Double Data Path

3173 105 42.26
FPM
14
IEEE compatible floating point adders
• Algorithm
Step 1
Compare the exponents of two numbers for ( or ) and calculate the absolute value of difference between the two
exponents (). Take the larger exponent as the tentative exponent of the result.
Step 2
Shift the significand of the number with the smaller exponent, right through a number of bit positions that is equal to
the exponent difference. Two of the shifted out bits of the aligned significand are retained as guard (G) and Round
(R) bits. So for p bit significands, the effective width of aligned significand must be p + 2 bits. Append a third bit,
namely the sticky bit (S), at the right end of the aligned significand. The sticky bit is the logical OR of all shifted out
bits.
Step 3
Add/subtract the two signed-magnitude significands using a p + 3 bit adder. Let the result of this is SUM.
Step 4
Check SUM for carry out (Cout) from the MSB position during addition. Shift SUM right by one bit position if a carry out is detected
and increment the tentative exponent by 1. During subtraction, check SUM for leading zeros. Shift SUM left until the MSB of the shifted
result is a 1. Subtract the leading zero count from tentative exponent.
Evaluate exception conditions, if any.
Step 5
Round the result if the logical condition R”(M0 + S’’) is true, where M0 and R’’ represent the pth and (p + 1)st bits from the left
end of the normalized significand. New sticky bit (S’’) is the logical OR of all bits towards the right of the R’’ bit. If the rounding condition
is true, a 1 is added at the pth bit (from the left side) of the normalized significand. If p MSBs of the normalized significand are 1’s,
rounding can generate a carry-out. in that case normalization (step 4) has to be done again.
15
Floating Point Addition of Operands with Rounding
p bit significand field
Aligned
significands
} p - 1 higher order bits a0 0 0 0
p - 1 higher order bits b0 G R S
Significands before addition
Cout Fig. 1 - Aligned significands G’ R’ S’
Result of significand addition before normalization shift
p - 1 higher order bits M0 R” S”
Normalized Significand before Rounding
Fig 2.6 - Significand addition, normalization and rounding
16
IEEE Rounding
• IEEE default rounding mode -- Round to nearest - even
Significand Rounded Error Significand Rounded Error

Result Result
X0.00 X0. 0 X1.00 X1. 0
X0.01 X0. - 1/4 X1.01 X1. - 1/4
X0.10 X0. - 1/2 X1.10 X1. + 1 + 1/2
X0.11 X1. + 1/4 X1.11 X1. + 1 + 1/4
17
What’s the
best
architecture?
Floating Point Adder Architecture
19
Triple Path Floating Point Adder
Exponent Logic Control Logic
Data Selector
Data Selector/Pre-align
Pre-alignment (0/1 Bit Right Shifter)
(Right Barrel Shifter)/
Complementer
Adder/Rounding Logic Bypass Logic

Adder/Rounding Logic
Exponent Result Selector Leading Zero

Exponent Result Subtractor Counting logic
Incr/Decr Selector
Normalization Normalization
(1 bit Right/Left (Left Barrel Shifter)
Shifter)
Result Integration/Flag Logic
Flags IEEE Sum
Fig 4.2 - Block diagram of the TDPFADD
20
Pipelined Triple Paths Floating Point Adder TPFADD
1st
Exponent Logic Control Logic

2nd
Data Selector
Data Selector/Pre-align
Pre-alignment (0/1 Bit Right Shifter)
(Barrel Shifter Right)/
Complementer
3rd
Critical Path
Adder/Rounding Logic Adder/Rounding Logic Bypass Logic
4th
Exponent Result Exponent Result Selector Leading Zero

Incr/Decr Selector Subtractor Counter
5th
Normalization
(1 bit Right/Left Normalization
Shifter) (Barrel Shifter Left)
Result Integration / Flag Logic
Flag IEEE Sum

21
BP (bypass)
I
LZB LZA
BP BP
LZB LZA
LZA
K J
LZB
22
FPADDer with Leading Zero Anticipation Logic
exponents significands
Control e1 s1 s2
e2
exponent
0 1 difference 0 1 0 1
sign1
sign2
compare right shifter
sign control
bit inverter bit inverter
LZA logic 56b adder

LZA counter
rounding control
exponent left shift incrementer

subtract
selector
exponent compensation
sign
incrementer shifter
23
Comparison of Synthesis results for IEEE 754 Single Precision
FP addition Using Xilinx 4052XL-1 FPGA
Parameters SIMPLE TDPFADD PIPE/

TDPFADD
Maximum delay, D (ns) 327.6 213.8 101.11
Average Power, P 1836 1024 382.4

(mW)@ 2.38 MHz
Area A, Total 664 1035 1324

number of CLBs (#)
Power Delay Product 7.7. *104 4.31 *104. 3.82 *104

(ns. 10mW)
Area Delay Product 2.18`*104 2.21 * 104 1.34 *104

(10 # .ns)
Area-Delay2 Product 7.13.*106 4.73 * 106 1.35 *106

(10# . ns2 )
24
25
Reference List
[1] Computer Arithmetic Systems, Algorithms, Architecture and Implementations. A. Omondi. Prentice Hall, 1994.
[2] Computer Architecture A Quantitative Approach, chapter Appendix A. D. Goldberg. Morgan Kaufmann, 1990.
[3] Reduced latency IEEE floating-point standard adder architectures. Beaumont-Smith, A.; Burgess, N.; Lefrere, S.; Lim, C.C.; Computer Arithmetic,
1999. Proceedings. 14th IEEE Symposium on , 14-16 April 1999
[4] Rounding in Floating-Point Addition using a Compound Adder. J.D. Bruguera and T. Lang. Technical Report. University of Santiago de Compostela.
(2000)
[5] Floating point adder/subtractor performing ieee rounding and addition/subtraction in parallel. W.-C. Park, S.-W. Lee, O.-Y. Kown, T.-D. Han, and S.-D.
Kim. IEICE Transactions on Information and Systems, E79-D(4):297–305, Apr. 1996.
[6] Efficient simultaneous rounding method removing sticky-bit from critical path for floating point addition. Woo-Chan Park; Tack-Don Han; Shin-Dug
Kim; ASICs, 2000. AP-ASIC 2000. Proceedings of the Second IEEE Asia Pacific Conference on , 28-30 Aug. 2000 Pages:223 – 226
[7] Efficient implementation of rounding units. Burgess. N.; Knowles, S.; Signals, Systems, and Computers, 1999. Conference Record of the Thirty-Third
Asilomar Conference on, Volume: 2, 24-27 Oct. 1999 Pages: 1489 - 1493 vol.2
[8] The Flagged Prefix Adder and its Applications in Integer Arithmetic. Neil Burgess. Journal of VLSI Signal Processing 31, 263–271, 2002
[9] A family of adders. Knowles, S.; Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on , 11-13 June 2001 Pages:277 – 281
[10] PAPA - packed arithmetic on a prefix adder for multimedia applications. Burgess, N.; Application-Specific Systems, Architectures and Processors, 2002.
Proceedings. The IEEE International Conference on, 17-19 July 2002 Pages:197 – 207
[11] Nonheuristic optimization and synthesis of parallel prefix adders. R. Zimmermann, in Proc. Int.Workshop on Logic and Architecture Synthesis, Grenoble,
France, Dec. 1996, pp. 123–132.
[12] Leading-One Prediction with Concurrent Position Correction. J.D. Bruguera and T. Lang. IEEE Transactions on Computers. Vol. 48. No. 10. pp.
1083-1097. (1999)
[13] Leading-zero anticipatory logic for high-speed floating point addition. Suzuki, H.; Morinaka, H.; Makino, H.; Nakase, Y.; Mashiko, K.; Sumi, T.;
Solid-State Circuits, IEEE Journal of , Volume: 31 , Issue: 8 , Aug. 1996 Pages:1157 – 1164
[14] On low power floating point data path architectures. R. V. K. Pillai. Ph. D thesis, Concordia University, Oct. 1999.
[15] A low power approach to floating point adder design. Pillai, R.V.K.; Al-Khalili, D.; Al-Khalili, A.J.; Computer Design: VLSI in Computers and
Processors, 1997. ICCD '97. Proceedings. 1997 IEEE International Conference on, 12-15 Oct. 1997 Pages:178 – 185
[16] Design of Floating-Point Arithmetic Units. S.F.Oberman, H. Al-Twaijry and M.J.Flynn. Proc. Of the 13th IEEE Symp on Computer Arithmetic.
pp. 156-165 1997
[17] Digital Arithmetic. M.D. Ercegovac and T. Lang. San Francisco: Morgan Daufmann, 2004. ISBN 1-55860-798-6
[18] Computer Arithmetic Algorithms. Israel Koren. Pub A K Peters, 2002. ISBN 1-56881-160-8
[19] Parallel Prefix Adder Designs. Beaumont-Smith, A.; Lim, C.-C.; Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on, 11-13 June 2001
Pages:218 – 225
[20] Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic. Reto Zimmmemann and Wolfgang Fichtner, IEEE Journal of Solid-State Circuits,
VOL.,32, No.7, July 1997
[21] Comparative Delay, Noise and Energy of High-performance Domino Adders with SNP. Yibin Ye, etc., 2000 Symposium on VLSI Circuits Digest of
Technical Papers
[22] 5 GHz 32b Integer-Execution Core in 130nm Dual-Vt CMOS. Sriram Vangal, etc., IEEE Journal of Solid-State Circuits, VOL.37, NO.11, November
2002 26
[23] Performance analysis of low-power 1-bit CMOS full adder cells. A.Shams, T.Darwish and M.Byoumi, IEEE Trans. on VLSI Syst., vol. 10, no.1,
pp. 20-29, Feb 2002.
What about shifting?
How to shift several
bits at once ?
Barrel Shifters
Right Shift X3 X2 X1 X0
Barrel Shifter 00
01
10
11
00
01
10
0 11
00
01
0 10
0 11
00
0 01
0 10
0 11
S1 S0
28
Shift and Rotate Barrel Shifter
D3 D2 D1 D0
3 2 1 0 3 2 1 0 3 2 1 0 3 2 1
MUX MUX MUX MUX
1 0 1 0 1 0 1 0
S0
Y3 Y2 Y1 Y0
Select Out Put Operation

Si So Y3 Y2 Y1 Y0
0 0 D3 D2 D1 D0 No Shift
0 1 D2 D1 D0 D3 Rotate Once
1 0 D1 D0 D3 D2 Rotate Twice
1 1 D0 D3 D2 D1 Rotate 3 times 29
Distributed Barrel Shifter
x8 x7 x7 x6 x6 x5 x5 x4 x4 x3 x3 x2 x2 x1 x1 x0
MUX MUX MUX MUX MUX MUX MUX MUX
S0
k7 k6 k5 k4 k3 k2 k1 k0
k9 k7 k8 k6 k7 k5 k6 k4 k5 k3 k4 k2 k3 k1 k2 k0
S1
w7 w6 w5 w4 w3 w2 w1 w0
w11 w7 w10 w6 w9 w5 w8 w4 w7 w3 w6 w2 w5 w1 w4 w0
S2
y7 y6 y5 y4 y3 y2 y1 y0
30
Paths of the distributed Barrel Shifter
x8 x7 x7 x6 x6 x5 x5 x4 x4 x3 x3 x2 x2 x1 x1 x0
S0
k7 k6 k5 k4 k3 k2 k1 k0
S0=1 S1=0 S2=1
k9 k7 k8 k6 k7 k5 k6 k4 k5 k3 k4 k2 k3 k1 k2 k0
S1
Please note that in this

w7 w6 w5 w3 w2 w1 w0
w4 case if we have 8 bits
of data then inputs to
w11 w7 w10 w6 w9 w5 w8 w4 w7 w3 w6 w2 w5 w1 w4 w0
MUXes greater than 7
should be be set to a
desired value
S2
y7 y6 y5 y4 y3 y2 y1 y0
31
A Normalization Shifter for FP Arithmetic
32
. Block
Diagram of the Right Shifter & GRS-bit
Generation Component
33
The end
Thank you for your attendance
34
Appendix 2
For Information
35
Improvements to previous Designs
Control e1 s1 s2
e2
exponent
X
sign1
sign2
sign control
X
X
LZA logic
LZA counter
56b adder
S&M
rounding control
exponent
subtract
left shift Sincrementer
&M
S selector
&M
X
exponent
X
compensation
sign
incrementer shifter
36
Improvements in FADD from Previous Designs
Control e1 e2 s1 s2
exponent
sign1
sign2
ABSENT
sign control
LZA logic 56b adder

LZA counter
rounding control
exponent left shift incrementer

subtract
selector
exponent compensation
ABSENT
sign
incrementer shifter 37
opA opB opA opB
S EXP. SIGNIF. S EXP. SIGNIF. S EXP. SIGNIF. S EXP. SIGNIF.
Exponent 0 1 0 1
Sign(d) Sign(d) Exponent
difference
1 difference
Sign(d) 0 1 Sign(d) 0 1
d
Comparator
d
Right shifter Sign opA
Sign opA
Sign opB control Bit inverter Bit inverter
Sign opB
add/sub
control Bit inverter Right shifter
2
add/sub
2'COMP 5 LOP
2'COMP
Adder 3
LOP Adder
MSB
and 1-bit shifter
MSB
and
Cout
complement
4
Cout
NORMAL Rounding
NORMAL
6
Straightforward IEEE Rounding
MUX Floating-point addition algorithm 7

1. Exponent subtraction.
2. Alignment.
Advantages:
1. Positive result, Eliminate Complement 3. Significand addition.
2. Comparison // Alignment 4. Conversion.
3. Full Normal // Rounding 5. Leading-one detection.
6. Normalization. 38
7. Rounding.
Architecture Consideration Cont.
opA opB
opA opB
S EXP. SIGNIF. S EXP. SIGNIF.
S EXP. SIGNIF. S EXP. SIGNIF.
Exponent 0 1 0 1
Sign(d) Sign(d)
difference Exponent 0 1 0 1
Sign(d) Sign(d)
difference
d
1-bit shifter d
1-bit shifter
Right
shifter Bit inverter
Right
Sign opA shifter Bit inverter
Sign opB control Bit inverter 2'COMP
LOP Adder Sign opA
add/sub
Sign opB control Bit inverter Compound
LOP adder
2'COMP add/sub
Adder complement
Compound NORMAL
NORMAL
adder
1-bit shifter
Rounding Rounding 1-bit shifter
FAR path CLOSE path

FAR path
MUX CLOSE path MUX
Effective addition Effective addition Effective subtraction
Effective subtraction with d=0,1
Effective subtraction with d=0,1 Effective subtraction
with d>1 with d>1
(Compare to signal path)

Reduce latency Reduce total path delay
--eliminate Comparator The latency of the floating-point addition can
FAR data-path:
--No Conversion Increase area be improved if the rounding is combined
--two 2’s COMP ADDER
--No Full normalization with the addition/subtraction.
--No LOP 39
CLOSE data-path:
--No Full Alignment
Main Blocks
What blocks are
considered?
• Compound Adder with Flagged Prefix Adder (New)
• LOP with Concurrent Position Correction (New)
• Alignment Shifter
• Normalization Shifter
How can a compound
adder compute
fastest?
Compound Adder
Compound Adder
The Compound adder computes simultaneously the sum and the sum plus one, and then
the correct rounded result is obtained by selecting according to the requirements of the
rounding.
Effective Addition
A B
A  B 1
Effective Subtractio n
A  B  1  A-B
A  B  A-B-1
A  B  1  B-A-1
A  B  B-A
42
Compound Adder Cont.
• Round to nearest Sum, Sum+1
if g=1
if (LSB=1) OR (r+s=1)
Add 1 to the result
else Truncate at LSB
• Round Toward zero Sum
Truncate
• Round Toward +Infinity Sum, Sum+1 and Sum+2
if sign=positive
if any bits to the right of the result LSB=1
Add 1 to the result
else
Truncate at LSB
if sign=negative
Truncate at LSB
• Round Toward -Infinity Sum, Sum+1 and Sum+2
if sign=negative
Add 1 to the result
else
Truncate at LSB
if sign=positive
Truncate at LSB
43
Rounding Block
Compound Adder
The Compound adder computes simultaneously the sum and the sum plus one, and then
the correct rounded result is obtained by selecting according to the requirements of the
rounding.
Effective Addition
A B
A  B 1
Effective Subtractio n
A  B  1  A-B
A  B  A-B-1
A  B  1  B-A-1
A  B  B-A
44
Compound Adder Cont.
• Round to nearest Sum, Sum+1
if g=1
if (LSB=1) OR (r+s=1)
Add 1 to the result
else Truncate at LSB
• Round Toward zero Sum
Truncate
• Round Toward +Infinity Sum, Sum+1 and Sum+2
if sign=positive
Add 1 to the result
else
Truncate at LSB
if sign=negative
Truncate at LSB
• Round Toward -Infinity Sum, Sum+1 and Sum+2
if sign=negative
Add 1 to the result
else
Truncate at LSB
if sign=positive
Truncate at LSB
45
Rounding Block

Floating Point: Adders and Multipliers

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Floating Point: Adders and Multipliers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Floating Point: Adders and Multipliers

Uploaded by

Copyright:

Available Formats

Concordia University

1) Floating Point Number representation

Sign 8 bit - biased

Sign 11 bit - biased

Format width in bits 32 64

Exponent width in bits 8 11

Maximum value of exponent + 127 + 1023

Minimum value of exponent -126 -1022

- Negative numbers Positive numbers 

Overflow Result can be   or default maximum value

Underflow Result can be 0 or denormal

Divide by Zero Result can be  

Invalid Result is NaN

Inexact System specified rounding may be required

Addition/ An operation of the type   

Multiplication An operation of the type 0 x 

Division Operations of the type 0/0 and /

Remainder Operations of the type x REM 0 and  REM y

Square Root Square Root of a negative number

p-bit significand field

1 p - 1 lower order bits

Significands before multiplication

p - 1 higher order bits M0 R S

Normalized product before Rounding

Figure 2.4 - Significand multiplication, normalization and rounding

Sign1 Sign2 Exp1 Exp2 Significand1 Significand2

Result Flags Logic

Flags IEEE Product

Exponents Input Floating Point Numbers

Exponent Logic Control / Sign Logic

Exponent Result Selector /

Result Integration / Flag Logic

Flag bits IEEE product

AREA POWER Delay

Single Data Path FPM 2288.5 204.5 69.2

Double Data Path FPM 2997 94.5 68.81

Pipelined Double Data Path

p bit significand field

Significands before addition

Cout Fig. 1 - Aligned significands G’ R’ S’

Result of significand addition before normalization shift

p - 1 higher order bits M0 R” S”

Normalized Significand before Rounding

Fig 2.6 - Significand addition, normalization and rounding

Significand Rounded Error Significand Rounded Error

X0.00 X0. 0 X1.00 X1. 0

X0.01 X0. - 1/4 X1.01 X1. - 1/4

X0.10 X0. - 1/2 X1.10 X1. + 1 + 1/2

X0.11 X1. + 1/4 X1.11 X1. + 1 + 1/4

Exponent Logic Control Logic

Adder/Rounding Logic Bypass Logic

Exponent Result Selector Leading Zero

Result Integration/Flag Logic

Flags IEEE Sum

Fig 4.2 - Block diagram of the TDPFADD

Exponent Logic Control Logic

Adder/Rounding Logic Adder/Rounding Logic Bypass Logic

Exponent Result Exponent Result Selector Leading Zero

Power Delay Product 7.7. 104 4.31 104. 3.82 *104

Area Delay Product 2.18`104 2.21 104 1.34 *104

Area-Delay2 Product 7.13.106 4.73 106 1.35 *106