Floating Point Alu
Floating Point Alu
Floating Point Alu
iv
v
vi
Chapter 1: Introduction
Floating Point ALU
When a CPU executes a program that is calling for a floating-point (FP) operation, there
are three ways by which it can carry out the operation. Firstly, it may call a floating-point unit
emulator, which is a floating-point library, using a series of simple fixed-point arithmetic
operations which can run on the integer ALU. These emulators can save the added hardware
cost of a FPU but are significantly slow. Secondly, it may use an add-on FPUs that are entirely
separate from the CPU, and are typically sold as an optional add-ons which are purchased only
when they are needed to speed up math-intensive operations. Else it may use integrated FPU
present in the system . The FPU designed by us is a single precision IEEE754 compliant
integrated unit. It can handle not only basic floating point operations like addition, subtraction,
multiplication and division but can also handle operations like shifting, logical operations.
Selectors Operations
00 Addition
01 Subtraction
10 Multiplication
11 Division
Table of Operation
1
Single Precision IEEE 754 Format
All the floating point numbers are composed by three components:
Sign: it indicates the sign of the number (0 positive and 1 negative)
Mantissa: it sets the value of the number
Exponent: it contains the value of the base power (biased), in single precision the exponent ranging
from 1 to 254
If a Simple Precision format is used the bits will be divided in that way:
The first bit (31st bit) is set the sign (S) of the number (0 positive and 1 negative)
2
Next 8 bits (from 30th to 23rd bit) represent the exponent (E)
The rest of the string, 23 bits (from 22nd to 0) is reserved to save the mantissa.
3
Exception of single precison floating point
The first case: The exponent is out of range when 11111111 2 = 255, but the range of
exponent in single precision is ranging from 1 to 254. The exponent is out of range the
the number is larger than 32 bit floating point can display, so the represent for this case
will be infinity.
The second case: When all the mantissa is not zero and all the exponent are 1, this case
we considered it not a number.
The third case: When all the exponent are 0 and all the matissa are also 0, this is 1×2-127 ,
we consider this 0
The fourth case: When all the exponent are 0 and all the matissa are not all 0, this is
number is still visionary , but it very small.
The maximum number single precision can display is
01111111011111111111111111111111 and the minimum is
10000000000000000000000000000001. There are some case that when we adding 2
number that is larger than 32 bit floating point can handle, this is what we call overflow
and viceversa when the number is too small and can not be display, that is underflow.
4
Example: We need to add two numbers 01000001111011010000000000000000
and 01000010101111010000000000000000.
For A = 01000001111011010000000000000000. Preconverting it we will have
29.625.
For B = 01000010101111010000000000000000. Preconverting it we will have 94.5.
To add A and B together, we first make their exponent equal to each other by make
smaller exponent match the larger exponent.
We have,
A = 1.11011010000000000000000 × 10000011 = 1.11011010000000000000000 ×
24
Comparing the result in the reviewed ALU, we have the correct answer.
Floating point binary addition
Floating point binary Subtraction
5
Two 32 bits floating point numbers can be subtracted by executing these following
step:
Step 1: Make smaller exponent match the larger exponent.
Step 2: In case we are subtracting, negate the number and add 1.
Step 3: Add mantissas together.
Step 4: Normalise the result if necessary.
For example: We need to sub two numbers 01000010101111010000000000000000
by 01000001111011010000000000000000 .
For A = 01000010101111010000000000000000. Preconverting it we will have 94.5.
For B = 01000001111011010000000000000000. Preconverting it we will have
29.625.
To subtract A to B, we first make their exponent equal to each other by make smaller
exponent match the larger exponent.
We have,
A = 1.01111010000000000000000 × 10000101 = 1.01111010000000000000000 ×
26
6
Comparing the result in the reviewed ALU, we have the correct answer.
7
-