HW_2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

MA 1032 Numerical Methods for Computer Science

MA1032 HW #2

Please try the following problem and case study. Come ready to share and discuss your work in
class. You should use available technology to make your computations easy!

Honor Code: You may discuss HW problems with each other but you may not simply copy
results. Those who are technologically more knowledgeably please do help those who are not
understand the mechanics of using the technology, but do not use the HW problems as examples
for demonstration!

Problem: Evaluate e-5 using two approaches:

and compare with the true value of 6.737947 x 10-3.


a. Use 20 terms to evaluate each series and compute the true errors. Are these round-off
or truncation errors? Explain.
b. Suppose we did not know the true value of e -5. We could get a sequence of
approximations of the true value by systematically increasing the number of terms
used to approximate e-5.
Define the approximate relative error as
εa = current approximation – previous approximation
current approximation

For each series, find the approximate relative errors as terms are added (up to 20
terms).
MA 1032 Numerical Methods for Computer Science

Case Study 1: Round-Off Errors in Computer Representation of Numbers

(Instructions: Read and understand the background and then try the problems at the end.)

As discussed in class, many rational numbers and all irrational numbers have infinite decimal
representation. Since computers use only a fixed number of significant digits in a computation,
round off errors will result. Further, because computers use base-2 representation, they often
cannot even represent certain exact base-10 numbers using the fixed number of significant digits.

Integer Representation: We will look at a couple of methods of representing integers. The basic
unit for representing information is called a word.

Signed magnitude method: In this basic method, the first bit of the word indicates the
sign (0 for positive and 1 for negative) of the integer. The remaining 15 bits are used to
store the number.
For example, since the integer –173 = (1 x 27) + (1 x 25) + (1 x 23) + (1 x 22) +(1 x 21), it
would be stored on a 16-bit computer as follows:

1 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1

Sign Number

2’s complement method: The previous method is not used to represent integers on
conventional computers. Instead the 2’s complement method is often used. In this
technique, the number’s sign is incorporated into its magnitude.

In this method, the most significant bit (left-most) indicates the sign of the number in the
usual manner (0 for positive, 1 for negative). Positive integers are represented by their
binary equivalent. Negative integers are represented by first inverting the representation
of the corresponding positive integer by changing 1s to 0s and 0s to 1s and then adding 1.
For example, using this method on an 8-bit computer,
7 is represented by 00000111 since 7 = 1112
-
7 is represented by 11111000 + 00000001 = 11111001

The method also allows binary arithmetic on signed integers to give the correct 2’s
complement results. For example, we have
7 + -7 = 00000111 + 11111001 = 00000000 = 0

Floating-Point Representation: This representation is usually used for representing fractions but
can be used for all real numbers. The number is represented as m . be where m represents a
MA 1032 Numerical Methods for Computer Science

fractional part (mantissa or significand) and be represents an integer part (exponent or


characteristic). In particular, b is the base of the number system used, and e the exponent.

For example, we represent 156.25 by 0.15675 x 103 in a floating-point base-10 system.

The picture below shows one way of storing a floating-point number representation in a word:

Sign of number Sign of exponent Magnitude of exponent Magnitude of mantissa

Since 156.75 = (1 x 27) + (1 x 24) + (1 x 23) + (1 x 22) + (1 x 2-1) + (1 x 2-2) = 1001110.112 , we


would represent 156.75 by 0.100111011 x 27 in a floating-point base-2 system.

Using the picture representation above, we would need 14-bits to store 156.75 exactly:

0 0 1 1 1 1 0 0 1 1 1 0 1 1

Sign of Magnitude
Magnitude
Number of
of Mantissa
Exponent
Sign of
Exponent

The mantissa is usually normalized if it has leading 0 digits.


For example, if we were to store 1/34 = 0.029411765… in a floating-point base-10
system that allowed only 4 decimal places to be stored. Then 1/34 would be stored as
0.0294 x 100. But the latter includes the useless 0 in the first decimal place and drops the
1 in the fifth decimal place. We can normalize the mantissa by dropping the leading 0 and
store 1/34 as 0.2941 x 10-1.

While normalization thus retains an additional significant figure when the number is stored, in
the process the absolute value of m is limited as follows: 1/b < m < 1.
Thus in a base-10 system, m is between 0.1 and 1 while in a base-2 system, it is between
0.5 and 1.

The advantage of floating-point representation is that fractions and very large numbers can be
stored on a computer. However, the representation takes more room and longer process time than
other methods used for integers. More importantly, this representation introduces a round-off
error because the mantissa holds only a finite number of significant figures.
MA 1032 Numerical Methods for Computer Science

1. Determine the range of integers in base-10 that can be represented on a 16-bit computer
using the signed-magnitude method.
2. What is the minimum number of bits required to represent the base-10 integer 1000 using the
2’s complement method?
3. Create a hypothetical floating-point number set for a machine that stores information using 7-
bit words. Use the 1st bit for the sign of the number, the next 3 for the sign and magnitude of
the exponent, and the last 3 for the magnitude of the mantissa.
a. Find the base-10 representation of the smallest possible positive number in this set
using the normalization constraint on the mantissa.
b. While keeping the same exponent as in part a, find the remaining positive numbers in
the set in increasing order.
c. What is true about the gaps between the numbers found thus far?
d. Continue to generate the positive numbers in the set in increasing order until you find
the maximum possible positive number that can be represented.
e. Based on your findings, explain why (or show how) the:
i. range of numbers that can be represented is limited
ii. number of numbers that can be represented within the range is finite
iii. interval Δx between consecutive numbers increases as the magnitude of the
numbers increases
f. Consider the following two round-off methods for storing numbers which are within
the range of the set but are not exactly represented by the numbers generated.
Suppose n is a number that falls between the consecutive numbers a and b (b > a)
generated by the set. Let Δx = b – a.
i. Chopping: Store n as a
ii. Rounding: Store n as the number it is closest to. If n falls exactly between a
and b then store n as b.

What is true of the errors that result from chopping? From rounding? Which method
gives a lower absolute error?

You might also like