22ee911 CA Unit 2
22ee911 CA Unit 2
22ee911 CA Unit 2
This document is confidential and intended solely for the educational purpose of RMK
Group of Educational Institutions. If you have received this document through email in
error, please notify the system manager. This document contains proprietary
information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy
through e-mail. Please notify the sender immediately by e-mail if you have received
this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking
any action in reliance on the contents of this information is strictly prohibited.
COMPUTER
ARCHITECTURE
22EE911
1 Contents
5
2 Course Objectives
6
3 Pre Requisites (Course Name with Code)
8
4 Syllabus (With Subject Code, Name, LTPC details)
10
5 Course Outcomes (6)
12
6 CO-PO/PSO Mapping
14
7 Lecture Plan (S.No., Topic, No. of Periods,
Proposed date, Actual Lecture Date,
pertaining CO, Taxonomy level, Mode of 17
Delivery)
8 Activity based learning
18
9 Lecture Notes ( with Links to Videos, e-book
reference, PPTs, Quiz and any other learning 21
materials )
10 Assignments ( For higher level learning and
Evaluation
- Examples: Case study, Comprehensive design, 65
etc.,)
11 Part A Q & A (with K level and CO)
67
12 Part B Qs (with K level and CO)
77
13 Supportive online Certification courses
80
(NPTEL, Swayam, Coursera, Udemy, etc.,)
14 Real time Applications in day to day life and to
82
Industry
15 Contents beyond the Syllabus ( COE related
84
Value added courses)
16 Assessment Schedule ( Proposed Date & Actual
86
Date)
17 Prescribed Text Books & Reference Books
88
18 Mini Project
90
COURSE OBJECTIVES
Course objectives
❖ To design arithmetic and Logic Unit for various fixed and floating Point
operations
OBJECTIVES
Computer Types - Functional Units — Basic Operational Concepts — RISC & CISC
Design philosophy, Number Representation and Arithmetic Operations -
Performance Measurement — Instruction Set Architecture - Memory Locations and
Addresses - Instructions and Instruction Sequencing – Addressing Modes.
TOTAL: 45 PERIODS
COURSE OUTCOMES
Course Outcomes
K6 Evaluation
K5 Synthesis
K4 Analysis
K3 Application
K2 Comprehension
K1 Knowledge
CO – PO/PSO Mapping
CO – PO /PSO Mapping Matrix
20
LECTURE NOTES
UNIT 2
Contents
1. Introduction to Arithmetic
2. Addition and Subtraction
1. Carry Look Ahead Adder
3. Multiplication
1. Sequential Multiplication Hardware
2. Booth’s Algorithm
3. Fast Multiplication
4. Division
1. Restoring Division
2. Non Restoring Division
5. Introduction to Floating point
1. Floating-point Representation
2. IEEE 754 Format
6. Floating point operations
1. Floating point Addition
22
UNIT II – ARITHMETIC FOR COMPUTERS
Addition and subtraction of two numbers are basic operations at the machine-
instruction level in all computers. These operations, as well as other arithmetic and logic
operations, are implemented in the arithmetic and logic unit (ALU) of the processor. In this
unit, we present the logic circuits used to implement arithmetic operations. The time needed
to perform addition or subtraction affects the processor's performance. Multiply and divide
operations, which require more complex circuitry than either addition or subtraction
operations, also affect performance. Here we described the representation of signed binary
numbers, and showed that 2's-complement is the best representation from the standpoint of
performing addition and subtraction operations.
The truth table for the sum and carry-out functions for adding equally weighted
bits x; and yi in two numbers X and Y is given below.
At each stage of the addition process must accommodate a carry-in bit. We use ci to
represent the carry-in to stage i, which is the same as the carry-out from stage (i —
1).
23
The logic expression for si can be implemented with a 3-input XOR gate, used as part of the
A convenient symbol for the complete circuit for a single stage of addition, called
24
Since the carries must propagate, or ripple, through this cascade, the
configuration is called a ripple-carry adder. The n-bit adder can be used to add 2's-
complement numbers X and Y, where the x„_1 and y„_1, bits are the sign bits.
The carry-in, co, into the least-significant-bit (LSB) position helps us to add 1 to a
number.
For example, the 2's-complement of a number can be formed by adding 1 to the
1's-complement of the number. The carry signals are also useful for interconnecting k
adders to form an adder capable of handling input numbers that are kn bits long, as shown
in Figure.
The following logic circuit can be used to perform either addition or subtraction
For addition:
This line is set to 0 for addition, then apply Y unchanged to one of the adder
25
For Subraction:
The Add/Sub control line is set to 1, so the Y number is 1 's-complemented by
reduce delay in adders is to use a logic gate network called a carry-lookahead network.
2. Carry-Lookahead Addition
A fast adder circuit must speed up the generation of the carry signal. The logic
26
Gi is called the generate function for stage i. If the generate function for stage i
is equal to 1, then ci+1 = 1, independent of the input carry, ci. This occurs when both xi
and yi are 1.
Pi is called the propagate functions for stage i. The propagate function means
that an input carry will produce an output carry when either xi is 1 or yi is 1.
All Gi and Pi functions can be formed independently and in parallel in one logic
gate delay after the X and Y operands are applied to the inputs of an n-bit adder.
Propagate function can be implemented as Pi = xi yi.
Two 2-input XOR gates to realize the 3-input XOR function for si,
27
For a 4-bit adder the carries can be implemented independently as follows.
The complete 4 bit adder is shown in the diagram. The carries are produced in block labeled
Carry-lookahead logic.
28
2.3 Multiplication of Unsigned Numbers
The first operand is called the multiplicand and the second the multiplier. The final result
is called the product.
If we ignore the sign bits, the length of the multiplication of an n-bit multiplicand and an
m-bit multiplier is a product that is n+ m bits long.
29
The main component in each cell is a full adder, FA.
The AND gate in each cell determines whether a multiplicand bit, nth is added to
the incoming partial-product bit, based on the value of the multiplier bit, qi. Each row i,
where 0 ≤ i ≤ 3, adds the multiplicand (appropriately shifted) to the incoming partial
product, PPi, to generate the outgoing partial product, PP(i 1), if qi = 1. If qi = 0, PPi is
passed vertically downward unchanged. PP0 is all 0s, and PP4 is the desired product. The
multiplicand is shifted left one position per row by the diagonal signal path.
Registers A and Q are shift registers. Together, they hold partial product PPi
while multiplier bit qi generates the signal Add/Noadd. This signal causes the multiplexer
MUX to select 0 when qi = 0, or to select the multiplicand M when qi = 1 , to be added to
PPi to generate PP(i + 1). The product is computed in n cycles.
30
The block diagram in Figure shows the hardware arrangement for sequential multiplication.
The partial product grows in length by one bit per cycle from the initial vector, PP0, of n 0s
in register A. The carry-out from the adder is stored in flip-flop C. shown at the left end
of register A.
At the start, the multiplier is loaded into register Q, the multiplicand into register M. and C
and A are cleared to 0. At the end of each cycle, C, A. and Q are shifted right one bit
position to allow for growth of the partial product as the multiplier is shifted out of
register Q.
Because of this shifting, multiplier bit 0 appears at the LSB position of Q to generate the
Add/Noadd signal at the correct time, starting with qo during the first cycle. qi during the
second cycle, and so on. After they are used, the multiplier bits are discarded by the right-
shift operation.
31
Note that the carry-out from the adder is the leftmost bit of PP(i + 1) and it must be held in
the C flip-flop to be shifted right with the contents of A and Q. After n cycles, the high-
order half of the product is held in register A and the low-order half is in register Q
32
2.3.4 The Booth Algorithm
The Booth algorithm generates a 2n-bit product and treats both positive and
negative 2's-complement n-bit operands uniformly. We can also reduce the number of
required operations in multiplication by regarding the multiplier as the difference between
two numbers. Eg:
Here product can be generated by adding 25 times the multiplicand to the 2’s compliment of
21 times the multiplicand. The multiplier is recoded as 0+1 0 0 0-1 0.
We should assume that a implied 0 lies to the right of the multiplier. implied 0
33
The Booth algorithm has two attractive features.
34
5. FAST MULTIPLICATION
A technique called bit-pair recoding of the multiplier results in using at most one
summand for each pair of bits in the multiplier. It is derived directly from the Booth
algorithm. Group the Booth-recoded multiplier bits in pairs, and observe the following.
The pair (+1 —1) is equivalent to the pair (0 +1). That is, instead of adding —1 times the
multiplicand M at shift position i to +1 x M at position i + 1, the same result is obtained
by adding +1 x M at position i. Other examples are: (+1 0) is equivalent to (0 +2), (-1
+1) is equivalent to (0 —1), and so on. Thus, if the Booth-recoded multiplier is examined
two bits at a time, starting from the right, it can be rewritten in a form that requires at
most one version of the multiplicand to be added to the partial product for each pair of
multiplier bits.
35
2.3.5.2 CARRY-SAVE ADDITION OF SUMMANDS
Multiplication requires the addition of several summands. A technique called carry-save
addition (CSA) can be used to speed up the process.
36
Consider the 4 x 4 multiplication array shown in Figure.
This structure is in the form of the array shown in Figure, in which the first row
consists of just the AND gates that produce the four inputs m3q0, m2q0, m1q0, and m0q0.
Instead of letting the carries ripple along the rows, they can be "saved" and
introduced into the next row, at the correct weighted positions, as shown in next Figure.
This frees up an input to each of three full adders in the first row. These inputs can be
used to introduce the third summand bits m2q2, m1q2, and m0q2. Now, two inputs of each
of three full adders in the second row are fed by the sum and carry outputs from the first
row. The third input is used to introduce the bits m2q3, m1q3, and m0q3 of the fourth
summand.
The high-order bits m3q2 and m3q3 of the third and fourth summands are
introduced into the remaining free full-adder inputs at the left end in the second and third
rows. The saved carry bits and the sum bits from the second row are now added in the
third row, which is a ripple-carry adder, to produce the final product bits. The delay
through the carry-save array is somewhat less than the delay through the ripple-carry
array. This is because the S and C vector outputs from each row are produced in parallel
in one full-adder delay
37
2.3.5.3 SUMMAND ADDITION TREE USING 3-2 REDUCERS
A more significant reduction in delay can be achieved when dealing with longer
operands than those considered earlier. We can group the summands in threes and
perform carry-save addition on each of these groups in parallel to generate a set of S and
C vectors in one full-adder delay. We group all the S and C vectors into threes, and
perform carry-save addition on them, generating a further set of S and C vectors in one
more adder delay. We continue with this process until there are only two vectors
remaining.
38
The adder at each bit position of the three summands is called a 3-2 reducer,
and the logic circuit structure that reduces a number of summands to two is called a CSA
tree. The final two S and C vectors can be added in a carry-lookahead adder to produce
the desired product. The six summands, A, B, . . , F are added by carry-save addition.
39
40
In the given example we can see that, there are 6 summands rows as A, B, C, D, E and F. They are
grouped into two groups each having three summands rows as (A, B and C) and (D, E and F).
Now Perform Carry-Save addition on each of these groups in parallel to generate Sum(S) and
Carry(C) vectors in one full-adder delay. Here, we will refer to a full-adder circuit as simply an adder.
Example: In Multiplication of ( 45) = 101101 with (63)= 111111, summands can be grouped in to
two groups as below.
The addition of Two groups do the addition using Carry Save Approach which generate S1,C1 and
S2,C2 as sum and carry respectively for group 1 and group2.
Next, we group all the S and C vectors into threes, and perform carry-save addition on them,
generating a further set of S and C vectors in one more adder delay. We continue with this
process until there are only two vectors remaining.
Add ( S1+C1+S2) using carry Save to generate (S3 and C3)
41
and In next stage Add ( S3, C3 and C2) to get (S4 and C4).
Finally we can see that the summands are reduced to 2 and the adder at each bit position of the
three summands is called a 3-2 reducer.
The final regular addition operation on S4 and C4, which produces the product, can be done with
a Carry-Look Ahead (CLA) adder. The following is the Summand addition tree using 3-2 Reducer at
a bit position
Drawback: The interconnection pattern between levels in a CSA tree that uses 3-2 reducers is
irregular.
42
when summands to be reduced is a power of 2, a more regularly structured tree can be obtained
by using 4-2 reducers. This is the usual case for the multiplication operation in the ALU of a
processor of 32/16/8 bit size.
DESIGN OF 4-2 REDUCER:
The addition of four equally-weighted bits, w, x, y, and z, from four summands, produces a value
in the range 0 to 4. Such a value cannot be represented by a sum bit, s, and a single carry bit, c.
Ex : w=1,x=1,y=1,z=1 then w+x+y+z = 1+1+1+1 = 100
This indicates that an additional bit is required at the output in addition to S and C.
A second carry bit, Cout , with the same weight as C, can be used along with S and C, to represent
any value in the range 0 to 5. This is sufficient for our purposes here. But, this would result into a
4-3 reducer, which provides less reduction than a 3-2 reducer
The specification for a 4-2 reducer is as follows:
The three outputs, s, c, and cout , represent the arithmetic sum of the five inputs, that is
(w + x + y + z + cin) = s + 2(c + cout)
Output S is the usual sum variable; that is, S is the XOR function of the five input variables.
The lateral carry, Cout , must be independent of Cin. It is a function of only the four input
variables w, x, y, and z.
A complete 16 possible combinations of (w,x,y,z) with Cin=0 and Cin=1 are shown as 3 different
cases.
when summands to be reduced is a power of 2, a more regularly structured tree can be obtained by
using 4-2 reducers. This is the usual case for the multiplication operation in the ALU of a processor
of 32/16/8 bit size.
43
Case (2) - Inputs (w,x,y,z) in which TWO of them are HIGH
As two of the inputs among (w,x,y,z) are equal to 1 , Cout is set as 1 and then exluding those bits
the C and S are decided with additional Cin. The truth table for all possible cases satisfyingcase(2)
is as shown below.
Case (3) - Inputs (w,x,y,z) in which more than TWO are HIGH as more than two inputs among
(w,x,y,z) are equal to 1 , Cout is set as 1 and then excluding those bits the C and S are decided
with additional Cin bit. The truth table for all possible cases satisfying case (3) is as shown below .
44
2
4. INTEGER DIVISION
1. RESTORING DIVISION
If the remainder is negative, a quotient bit of 0 is determined, the
dividend is restored by adding back the divisor, and the divisor is repositioned for another
subtraction. This is called the restoring division algorithm.
An n-bit positive divisor is loaded into register M and an n-bit positive dividend is
loaded into register Q at the start of the operation. Register A is set to 0. After the division
is complete, the n-bit quotient is in register Q and the remainder is in register A. The
required subtractions are facilitated by using 2's-complement arithmetic. The extra bit
position at the left end of both A and M accommodates the sign bit during subtractions.
45
The following algorithm performs restoring division.
Step 3. If the sign of A is 1, set qo to 0 and add M back to A (that is, restore A); otherwise,
set qo to 1.
46
2.4.2 NON-RESTORING DIVISION
The restoring division algorithm can be improved by avoiding the need for
restoring A after an unsuccessful subtraction. Subtraction is said to be unsuccessful if the
result is negative. Consider the sequence of operations that takes place after the
subtraction operation in the preceding algorithm. If A is positive, we shift left and
subtract M, that is, we perform 2A — M. If A is negative, we restore it by performing A +
M, and then we shift it left and subtract M. This is equivalent to performing 2A + M. The
q0 bit is appropriately set to 0 or 1 after the correct operation has been performed.
47
We can summarize this in the following algorithm for non-restoring division.
Stage 1: Do the following two steps n times: 1. If the sign of A is 0, shift A and Q left one
bit position and subtract M from A; otherwise, shift A and Q left and add M to A. 2. Now,
if the sign of A is 0, set qo to 1; otherwise, set qo to 0.
Stage 2: If the sign of A is 1, add M to A. Stage 2 is needed to leave the proper positive
remainder in A after the n cycles of Stage 1. The logic circuitry in restoring division can
also be used to perform this algorithm, except that the restore operations are no longer
needed.
48
Normalized number:
• A number in floating-point notation that has no leading 0s is known as normalized
number. i.e., a number start with a single nonzero digit. For example, 1.0𝑡𝑒𝑛× 10-9 is in
normalized scientific notation, but 0.1𝑡𝑒𝑛×10-8 and 10.0𝑡𝑒𝑛×10-10 are not.
Computer arithmetic that represents numbers in which the binary point is not fixed.
Fraction:
• The value, generally between 0 and 1, placed in the fraction field. The fraction is also
called the mantissa.
Exponent:
• In the numerical representation system of floating-point arithmetic, the value that is
placed in the exponent field.
Single precision:
• A floating-point value represented in a single 32-bit word.
• Floating-point numbers are usually a multiple of the size of a word.
• Where s is the sign of the floating-point number (1 meaning negative), exponent is the
value of the 8-bit exponent field (including the sign of the exponent), and fraction is the
23-bit number.
49
In general, floating-point numbers are of the form
(−1)𝑆 × F × 2𝐸
F involves the value in the fraction field and E involves the value in the exponent field
Overflow:
• A situation in which a positive exponent becomes too large to fit in the exponent field is
known as overflow.
Underflow:
• A situation in which a negative exponent becomes too large to fit in the exponent field is
known as underflow.
Double precision:
• One way to reduce chances of underflow or overflow is called double, and operations on
doubles are called double precision floating-point arithmetic.
50
2.5.2 IEEE 754 Format:
• MIPS double precision allows numbers almost as small as 2.0𝑡𝑒𝑛× 10+308 and almost as
large as 2.0𝑡𝑒𝑛× 10−308.
• Although double precision does increase the exponent range.
• Its primary advantage is its greater precision because of the much larger fraction.
• IEEE 754 makes the leading 1-bit of normalized binary numbers implicit. IEEE 754
enoding of floating pint numbers is shown in Fig 2.8. Hence, the number is actually 24
bits long in single precision (implied 1 and a 23-bit fraction), and 53 bits long in double
precision (1+52).
(−1)𝑆 × (1 + Fraction) × 2𝐸
Disadvantage:
• Negative exponents pose a challenge to simplified sorting. If we use two’s complement or
any other notation in which negative exponents have a 1 in the most significant bit of the
exponent field, a negative exponent will look like a big number.
51
Solution:
The desirable notation must therefore represent the most negative exponent as 00 … 00two
and the most positive as 11 … 11two. This convention is called biased notation, with the
bias being the number subtracted from the normal, unsigned representation to determine
the real value.
Example 1:
Show the IEEE 754 binary representation of the number 0.75ten in single and double
precision.
Answer:
• The number 0.75ten is represented by the binary as – 0.11two
• In scientific notation, the value is -0.11 two x 20
• In normalized scientific notation, the value is -1.1 two x 2-1
52
The general representation for a single precision number is
Subtracting the bias 127 from the exponent of 1.1two × 2−1 yields
53
Floating-point numbers have decimal points in them. The number 2.0 is a
floating-point number because it has a decimal in it. The number 2 (without a decimal point)
is a binary integer.
Floating-point operations involve floating-point numbers and typically take longer to execute
than simple binary integer operations. For this reason, most embedded applications avoid
wide-spread usage of floating-point math in favor of faster, smaller integer operations.
The operations are done with algorithms similar to those used on sign magnitude integers
(because of the similarity of representation) — example, only add numbers of the same
sign.
Example:
Let’s add numbers in scientific notation by hand to illustrate the problems in floating-point
addition 9.999 ten X 101 +1.610ten X 10-1. Assume that we can store only four decimal digits
Step 1: To be able to add these numbers properly, we must align the decimal point of
the number that has the smaller exponent. Hence, we need a form of the smaller
number, 1.610ten X 10-1, that matches the larger exponent.
54
Thus, the first step shift s the significand of the smaller number to the right until its
corrected exponent matches that of the larger number. But we can represent only four
decimal digits so, after shifting, the number is
0.016 × 101
Step 2: Next comes the addition of the significands
9.999𝑡𝑒𝑛
+ 0.016𝑡𝑒𝑛
10.015𝑡𝑒𝑛
Step 3: This sum is not in normalized scientific notation, so we need to adjust it:
10.015𝑡𝑒𝑛 × 101 = 1.0015𝑡𝑒𝑛 × 102
•Thus, after the addition we may have to shift the sum to put it into normalized form,
adjusting the exponent appropriately.
• If there are more bits to the left of the decimal point, the fraction is shifted right and
exponent increased. If there are leading zero’s, the fraction is shifted left and exponent
decreased.
•Whenever the exponent is increased or decreased, we must check for overflow or
underflow—that is, we must make sure that the exponent still fits in its field.
Step 4: Since we assumed that the significand can be only four digits long (excluding the
sign), we must round the number. The rules for truncation of a number if the digit to the
right of the desired point is between 0 and 4 and add 1 to the digit if the number to the
right is between 5 and 9.
1.0015𝑡𝑒𝑛 × 102
The number is rounded to four digits in the significand to
1.002𝑡𝑒𝑛 × 102
since the fourth digit to the right of the decimal point was between 5 and 9. If on adding 1
to a string of 9’s, the sum may no longer be normalized and we need to perform step 3
again. which is shown in Fig 2.9.
55
Fig.2.9 Floating Point Addition Algorithm
Floating Point Arithmetic on Addition and Subtraction
56
2.6.2 Floating-Point Multiplication
57
• We start with calculating the new exponent of the product by adding the biased
exponents, being sure to subtract one bias to get the proper result, which is shown in Fig
2.11.
• If rounding leads to further normalization, we once again check for exponent size.
• Finally, set the sign bit to 1 if the signs of the operands were different (negative product)
or to 0 if they were the same (positive product).
Example:
Now that we have explained floating-point addition, let’s try floating-point multiplication.
We start by multiplying decimal numbers in scientific notation by
1.110𝑡𝑒𝑛 × 1010 × 9.200𝑡𝑒𝑛 × 10−5
Assume that we can store only four digits of the significand and two digits of the
exponent.
Step 1:
Unlike addition, we calculate the exponent of the product by simply adding the exponents of
the operands together:
Accordingly, to get the correct biased sum when we add biased numbers, we must subtract
the bias from the sum:
58
Step 2. Next comes the multiplication of the significands:
1.110𝑡𝑒𝑛
× 9.220𝑡𝑒𝑛
0000
0000
2220
9990
10212000𝑡𝑒𝑛
There are three digits to the right of the decimal point for each operand, so the decimal
point is placed six digits from the right in the product significand:
10212000𝑡𝑒𝑛
Assuming that we can keep only three digits to the right of the decimal point, the product is
10.212 × 105.
Step 4. We assumed that the significand is only four digits long (excluding the sign), so we
must round the number. The number
1.0212𝑡𝑒𝑛 × 106
is rounded to four digits in the significand to
1.021𝑡𝑒𝑛 × 106
Step 5. Th e sign of the product depends on the signs of the original operands. If they are
both the same, the sign is positive; otherwise, it’s negative. Hence, the product is
+1.021𝑡𝑒𝑛 × 106
The sign of the sum in the addition algorithm was determined by addition of the
significands, but in multiplication, the sign of the product is determined by the signs of the
operands, which is shown in Fig 2.12..
59
2.6.3 Floating-Point Instructions
MIPS supports the IEEE 754 single precision and double precision formats with these
instructions:
Separate floating-point registers are $f0, $f1, $f2, . . .—used either for single precision or
double precision.
Separate loads and stores for floating-point registers: lwc1 and swc1. The base registers for
floating-point data transfers remain integer registers.
The MIPS code to load two single precision numbers from memory, add them, and then
store the sum might look like this:
A double precision register is really an even-odd pair of single precision registers, using the
even register number as its name. Thus, the pair of single precision registers $f2 and $f3
also form the double precision register named $f2.
60
Accuracy in Floating point Arithmetic
• If every intermediate result had to be truncated to the exact number of digits, there
would be no opportunity to round. IEEE 754, therefore, always keeps two extra bits on
the right during intermediate additions, called guard and round, respectively.
• Guard bits: The first of two extra bits kept on the right during intermediate calculations
of floating-point numbers; used to improve rounding accuracy.
• Round bits: Method to make the intermediate floating-point result fit the floating-point
format; the goal is typically to find the nearest number that can be represented in the
format.
• Since the worst case for rounding would be when the actual number is halfway between
two floating-point representations, accuracy in floating point is normally measured in
terms of the number of bits in error in the least significant bits of the significand; the
measure is called the number of units in the last place, or ulp. The number of bits in
error in the least significant bits of the significand between the actual number and the
number that can be represented is units in the last place (ulp)
• If a number were off by 2 in the least significant bits, it would be called off by 2 ulps.
Rounding modes
IEEE 754 has four rounding modes:
❖ always round up (toward +∞),
❖ always round down (toward -∞),
❖ truncate, and
❖ round to nearest even.
• The final mode determines what to do if the number is exactly halfway in between.
• Round up the case half the time and round down the other half.
• In IEEE 754, the least significant bit retained in a halfway case would be odd, add one, if
it’s even, truncate. This method always creates a 0 in the least significant bit in the tie-
breaking case, giving the rounding mode its name.
61
The goal of the extra rounding bits is to allow the computer to get the same
results as if the intermediate results were calculated to infinite precision and then rounded.
To support this goal and round to the nearest eve, the standard has a third bit in addition to
guard and round; it is set whenever there are non-zero bits to the right of the round bit.
This sticky bit allows the computer to see the difference between 0.50 … .00𝑡𝑒𝑛 and
Sticky Bit:
A bit used in rounding in addition to guard and round that is set whenever there
are non-zero bits to the right of the round bits.
The sticky bit may be set, for example, during addition, when the smaller
number is shifted to the right. Suppose we added 5.01𝑡𝑒𝑛 × 10−1 to 2.34𝑡𝑒𝑛 × 102 in the
example above. Even with guard and round, we would be adding 0.0050 to 2.34, with a
sum of 2.3450. The sticky bit would be set, since there are nonzero bits to the right.
Without the sticky bit to remember whether any 1s were shifted off, we would assume the
number is equal to 2.345000 … 00 and round to the nearest even of 2.34. With the sticky
bit to remember that the number is larger than 2.345000 … 00, we round instead to 2.35.
62
VIDEO LINKS
Quiz Time :
E-Book Link:
Patterson and Hennessy :The Hardware/Software Interface
64
ASSIGNMENTS
Assignment
5.Show the IEEE 754 binary representation of the number -0.75 in single
and double precision.
66
PART A
QUESTIONS & ANSWERS
Part - A Question & Answers
Overflow occurs if
● (+A) − (−B) = −C
● (−A) − (+B) = +C
2) The second technique reduces the time needed to add the summands.
68
5. Define ALU. (K1, CO2)
The arithmetic and logic unit (ALU) of a computer system is the place where the
actual execution of the instructions take place during the processing operations. All
calculations are performed and all comparisons (decisions) are made in the ALU. The
data and instructions, stored in the primary storage prior to processing are transferred as
and when needed to the ALU where processing takes place
6. What are the overflow conditions for addition and subtraction? (K1, CO2)
69
9. When can you say that a number is normalized? (K2, CO2)
When the decimal point is placed to the right of the first (nonzero) significant digit, the
number is said to be normalized. The end values 0 to 255 of the excess-127
a) When E = 0 and the mantissa fraction M is zero the value exact 0 is represented.
IEEE 754 single precision and double precision formats perform the following operations:
■ Floating-point addition
■ Floating-point subtraction
■ Floating-point multiplication
■ Floating-point division
■ Floating-point comparison
11. Write the Add/subtract rule for floating point numbers. (K2, CO2)
1) Choose the number with the smaller exponent and shift its mantissa right a number of
3) Perform addition/subtraction on the mantissa and determine the sign of the result
12. Write the multiply rule for floating point numbers. (K2, CO2)
70
13. What are the steps in the floating-point addition? (K1, CO2)
The steps in the floating-point addition are
1. Align the decimal point of the number that has the smaller exponent.
2. Addition of the significands.
3. Normalize the sum.
4. Round the result.
14. Write the IEEE 754 floating point format. (K1, CO2)
The IEEE 754 standard floating point representation is almost always an approximation of
the real number.
15. What are the advantages to represent number in IEEE format? (K1, CO2)
1. It simplifies exchange of data that includes floating-point numbers;
2.it simplifies the floating-point arithmetic algorithms to know that numbers will always be
in this form; and it increases the accuracy of the numbers that can be stored in a word,
since the unnecessary leading 0s are replaced by real digits to the right of the binary point
71
18. Write double precision representation of 𝟎. 𝟕𝟓𝟏𝟎 (K1, CO2)
19.In floating point numbers when so you say that an underflow or overflow
has occurred? (K2, CO2)
In single precision numbers when an exponent is less than -126 then we say that an
underflow has occurred. In single precision numbers when an exponent is less than +127
then we say that an overflow has occurred.
72
24. What are the floating point instructions supported by MIPS? (K1, CO2)
• Floating-point addition, single (add.s) and addition, double (add.d)
• Floating-point subtraction, single (sub.s) and subtraction, double (sub.d)
• Floating-point multiplication, single (mul.s) and multiplication, double (mul.d)
• Floating-point division, single (div.s) and division, double (div.d)
25. What are the ways to truncate the guard bits? (K1, CO2)
There are several ways to truncate the guard bits:
1) Chooping
2) Von Neumann rounding
3) Rounding
30. Write the multiply rule for floating point numbers. (K1, CO2)
1) Add the exponent and subtract 127.
2) Multiply the mantissa and determine the sign of the result .
3) Normalize the resulting value , if necessary.
73
31. What are the difficulties faced when we use floating point arithmetic?
(K1, CO2)
• Mantissa overflow: The addition of two mantissas of the same sign may result in a
carryout of the most significant bit
• Mantissa underflow: In the process of aligning mantissas ,digits may flow off the right
end of the mantissa.
• Exponent overflow: Exponent overflow occurs when a positive exponent exceeds the
maximum possible value.
• Exponent underflow: It occurs when a negative exponent exceeds the maximum
possible exponent value.
• In conforming to the IEEE standard mention any four situations under which a
processor sets exception flag.
• Underflow: If the number requires an exponent less than -126 or in a double
precision, if the number requires an exponent less than -1022 to represent its
32. Why floating point number is more difficult to represent and process than
integer? (K2, CO2)
An integer value requires only half the memory space as an equivalent. IEEE double-
precision floating point value. Applications that use only integer based arithmetic will
therefore also have significantly smaller memory requirement. A floating-point operation
usually runs hundreds of times slower than an equivalent integer based arithmetic
operation
74
33. How overflow is detected in fixed point arithmetic? (K1, CO2)
Overflow can occur only when adding two numbers that have same sign.
When both operands a and b have the same sign,an overflow occurs when the sign
of result does not agree with signs of a and b.
75
36. Explain about the special values in floating point numbers. (K1, CO2)
The end values 0 to 255 of the excess-127 exponent E( are used to represent
special values such as:
• When E(= 0 and the mantissa fraction M is zero the value exact 0 is represented.
• When E(= 255 and M=0, the value ( is represented.
• When E(= 0 and M (0 , denormal values are represented.
• When E(= 2555 and M(0, the value represented is called Not a number.
In single precision numbers when an exponent is less than -126 then we say that an
underflow has occurred. In single precision numbers when an exponent is less than
76
PART B – QUESTIONS
Part - B Questions
1. Sketch and explain the operation of 4-bit Carry look-a-head adder logic .
(K3, CO2)
2.Solve the problem. Multiply the following signed 2’s compliment numbers using
the Booth Algorithm. A=001110 and B=111001 where A is multiplicand and B is
6. Discuss in detail about division algorithm in detail with diagram and examples.
(K2, CO2)
7.Explain how floating point addition is carried out in a computer system. Give
example for a binary floating point addition. (K2, CO2)
78
Part - B Questions
15. Demonstrate and express the decimal values −0.75 as signed 6-bit fractions in binary
and give single, Double precession representation. (K3, CO2)
16. Demonstrate and express the decimal values −0.1 as signed 6-bit fractions in binary
and give single, Double precession representation. (K3, CO2)
17. Explain briefly about floating point addition and subtraction algorithms. (K2, CO2)
18. Explain the multiplication hardware with necessary illustrations, algorithm and
Multiply 13 X 20. (K3, CO2)
15.Divide (14)10 by (6)10 using the division algorithm with step by step intermediate
results and explain. (K3, CO2)
16. Using manual methods, perform the operations A ÷ B on the 5-bit unsigned numbers
A = 10101 and B = 00101. (K3, CO2)
17.Solve the problem . Mr.John has been assigned a job by his team leader in ALS
technologies. His project is to design an algorithm for 2’s compliment division using
addition and subtraction operation. Help Mr.John in designing an algorithm by sketching
the flowchart for restoring division and also check the working of it with the following
numbers: 2110 ÷ 410 (K3, CO2)
18.Sketch the block diagram of integer divider and explain the division algorithm with an
example and flowchart. (K3, CO2)
19.Solve the problem. Mr.John has been assigned a job by his team leader in ALS
technologies. His project is to design an algorithm for 2’s compliment division using
addition and subtraction operation. Help Mr.John in designing an algorithm by sketching
the flowchart for restoring division and also check the working of it with the following
numbers: 3110 ÷ 610 (K3, CO2)
20. Using manual methods, perform the operations A ÷ B on the 5-bit unsigned
numbers A = 11 and B = 3 (K3, CO2)
79
SUPPORTIVE ONLINE
CERTIFICATION COURSES
(NPTEL, SWAYAM,
COURSERA, UDEMY,
ETC.,)
Supportive Online Certification Courses
https://www.coursera.org/learn/comparch
https://nptel.ac.in/courses/106/105/106105163/
81
REAL TIME APPLICATIONS
IN DAY TO DAY LIFE AND
TO INDUSTRY
Real time applications in day to day life and to Industry
Applications in DSP
FPGA Programming
Floating-point operations are useful for computations involving large dynamic range,
but they require significantly more resources than integer operations. With the
current trends in system requirements and available FPGAs, floating-point
implementations are becoming more common and designers are increasingly taking
advantage of FPGAs as a platform for floating-point implementations. The rapid
advance in Field-Programmable Gate Array (FPGA) technology makes such devices
increasingly attractive for implementing floating-point arithmetic.
FPGA Programming
83
CONTENT
BEYOND
SYLLABUS
Content beyond Syllabus
Full Adder
Full Adder is the adder which adds three inputs and produces two outputs.
The first two inputs are A and B and the third input is an input carry as C-IN. The
output carry is designated as C-OUT and the normal output is designated as S which
is SUM. A full adder logic is designed in such a manner that can take eight inputs
together to create a byte-wide adder and cascade the carry bit from one adder to
the another.
Full Adder
Bit-pair recoding
Bit pair recoding halves the maximum number of summands. Group the
Booth-recoded multiplier bits in pairs and observe the following: The pair (+1 -1) is
equivalent to the pair (0 +1). That is instead of adding -1 times the multiplicand m
at shift position i to +1 ( M at position i+1, the same result is obtained by adding +1
( M at position i).
Bit-pair recoding
85
ASSESSMENT SCHEDULE
(PROPOSED DATE
&
ACTUAL DATE)
Assessment Schedule
(Proposed Date & Actual Date)
3 MODEL EXAMINATION
TEXT BOOK
REFERENCE
EBOOK LINKS:
https://drive.google.com/file/d/1ZxZ7d5dVERbiCwb5Md5L137fWoMwOFBh/view?usp=sh
ari ng
MINI PROJECT SUGGESTIONS
Mini Projects Suggestion
1. Stack machine ISA : Design a stack machine, its instruction set must be stack oriented (no
register!).
2. Implement quick sort and binary search using 8085 assembly language.
4. Design a instruction set for a limited functionality machine having all instructions of 8-bits
5. Microprocessor based automatic attendance recorder (make use of RFID: a unique for each
student).
7. Suggest and design a minimal CPU architecture for controlling the washing machine.
9. Design a serial interface to connect the 8085 micro-processor with a keyboard for that on
11. Design a Turing machine using java, to implement basic operations of TM.
14. Suggest a high speed addition method and logic for 4-bit addition.
17. Microprocessor based water level controller in domestic water storage tank (when water
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
relianceon the contents of this information is strictly prohibited.
92