Floating Points
Floating Points
Floating Points
x
(
S
1)
(1
(Expone
Bias)
Fractio
2
31 30 23 22 0
(-1)s F 2E-127
0 non-zero denormalized
1-254 anything FP number
255 0 pm infinity
255 non-zero NaN
Why biased exponent?
• For faster comparisons (for sorting, etc.), allow integer
comparisons of floating point numbers:
• Unbiased exponent:
• Biased exponent:
• One can compute the mantissa just similar to the way one would
convert decimal whole numbers to binary.
• Take the decimal and repeatedly multiply the fractional
component by 2. The whole number portion is the next binary
bit.
• For whole numbers, append the binary whole number to the
mantissa and shift the exponent until the mantissa is in
normalized form.
Floating-Point Example
• Represent –0.75
– –0.75 = (–1)1 × 1.12 × 2–1
–S=1
– Fraction = 1000…002
– Exponent = –1 + Bias
• Single: –1 + 127 = 126 = 011111102
• Double: –1 + 1023 = 1022 = 011111111102
• Single: 1011111101000…00
• Double: 1011111111101000…00
Floating-Point Example
• What number is represented by the single-
precision float
11000000101000…00
–S=1
– Fraction = 01000…002
– Fxponent = 100000012 = 129
• x = (–1)1 × (1 + 012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
Converting to Floating Point
• E.g., Express 36.562510 as a 32-bit floating
point number (in hexadecimal)
• Step 1
– Express original value in binary
36.562510 =
100100.10012
• Step 2
– Normalize
100100.10012 =
1.0010010012 x 25
• Step 3
– Determine S, E, and M
+1.0010010012 x 25
n E = n + 127
S M
= 5 + 127
= 132
= 100001002
0 10000100 001001001000000000000002 =
4 2 1 2 4 0 0 016
Answer: 4212400016
Converting from Floating Point
• E.g., What decimal value is represented by the
following 32-bit floating point number?
C17B000016
• Step 1
– Express in binary and find S, E, and M
C17B000016 =
1 10000010 111101100000000000000002
S E M
1 = negative
0 = positive
• Step 2
– Find “real” exponent, n
– n = E – 127
= 100000102 – 127
= 130 – 127
=3
• Step 3
– Put S, M, and n together to form binary result
– (Don’t forget the implied “1.” on the left of the
mantissa.)
-1.11110112 x 2n =
-1.11110112 x 23 =
-1111.10112
• Step 4
– Express result in decimal
-1111.10112
-15 2-1 = 0.5
2-3 = 0.125
2-4 = 0.0625
0.6875
Answer: -15.6875
Denormal Numbers
• Exponent = 000...0 hidden bit is 0
x
(
S
1)
(0 Bias
Fraction
2
Smaller than normal numbers
x
(
S
1)
(0
0)2
Bias
0
.
0
Two representations
of 0.0!
Infinities and NaNs
• Exponent = 111...1, Fraction = 000...0
– ±Infinity
– Can be used in subsequent calculations, avoiding
need for overflow check
• Exponent = 111...1, Fraction ≠ 000...0
– Not-a-Number (NaN)
– Indicates illegal or undefined result
• e.g., 0.0 / 0.0
– Can be used in subsequent calculations
Representation of Floating Point
Numbers
• IEEE 754 double precision
31 30 20 19 0
(-1)s F 2E-1023
0 non-zero denormalized
1-2046 anything FP number
2047 0 pm infinity
2047 non-zero NaN
Is FP addition associative?
• Associativity law for addition: a + (b + c) = (a + b) + c
1 . C o m p a r e th e e x p o n e n ts o f th e tw o n u m b e r s .
S h ift t h e s m a lle r n u m b e r t o th e r ig h t u n t il its
e x p o n e n t w o u ld m a t c h t h e la r g e r e x p o n e n t
2 . A d d t h e s ig n if ic a n d s
O v e r f lo w o r Ye s
u n d e r f lo w ?
No E x c e p tio n
4 . R o u n d t h e s ig n if ic a n d t o t h e a p p r o p r ia te
n u m b e r o f b its
N o
S t ill n o r m a liz e d ?
Ye s
D one
Floating-Point Addition
• Now consider a 4-digit binary example
– 1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)
• 1. Align binary points
– Shift number with smaller exponent
– 1.0002 × 2–1 + –0.1112 × 2–1
• 2. Add significands
– 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
• 3. Normalize result & check for over/underflow
– 1.0002 × 2–4, with no over/underflow
• 4. Round and renormalize if necessary
– 1.0002 × 2–4 (no change) = 0.0625
FP Adder Hardware
• Much more complex than integer adder
• Doing it in one clock cycle would take too long
– Much longer than integer operations
• FP adder usually takes several cycles
– Can be pipelined
Floating Point Multiplication Algorithm
Floating-Point Multiplication
• Now consider a 4-digit binary example
– 1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)
• 1. Add exponents
– Unbiased: –1 + –2 = –3
– Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127
• 2. Multiply significands
– 1.0002 × 1.1102 = 1.1102 1.1102 × 2–3
• 3. Normalize result & check for over/underflow
– 1.1102 × 2–3 (no change) with no over/underflow
• 4. Round and renormalize if necessary
– 1.1102 × 2–3 (no change)
• 5. Determine sign: +ve × –ve –ve
– –1.1102 × 2–3 = –0.21875