02 Computational Methods For Numerical Analysis With R - 2
02 Computational Methods For Numerical Analysis With R - 2
02 Computational Methods For Numerical Analysis With R - 2
Error Analysis
29
30 Computational Methods for Numerical Analysis with R
Figure 2.1
Newark is near New York (Stamen Design)
be quite high. When making cookies, measurements for flour or sugar can be
“about a cup” or even undefined. It is unclear just how much salt is in a pinch.
On the other hand, when working with chocolate, temperatures have to be
controlled very carefully or the chocolate may not crystallize correctly when
cooled.
Numerical error is composed of two components, precision and accuracy.
Precision and accuracy are two related terms that are often used interchange-
ably. These terms describe two different facets of how an estimated value
relates to the true value. But each has a different meaning and a different
application in numerical analysis.
For this section, we will assume there is a true value, x, that we are at-
tempting to measure. We will further assume that x0 is the measured value.
The value of x0 may be measured through observation or may be the result of
a numerical process or calculation. Finally, the error will be represented by .
2.1.1 Accuracy
Accuracy measures how close an estimate is to the true value. This is the
commonly used definition for accuracy. For example, if we were travelling to
New York City, we might get directions. If we get directions to Newark, New
Jersey, we will end our trip in the wrong city, the wrong state, and on the
wrong side of the Hudson River. This is a problem of accuracy. Directions
to Boston or London are progressively less accurate than our directions to
Newark. But sometimes close is close enough, depending on how the estimate
Error Analysis 31
could be used. New York is a brief train ride from Newark. How much accuracy
we need depends on the application the estimate is used for.
We can measure accuracy in different ways. The first is absolute accuracy
or absolute error. This measures the raw distance between the correct value
of x and the measured value. Formally,
A = |x − x0 |, (2.1)
where A is the absolute error. This measurement is a raw figure and, im-
portantly, in units of the underlying measure. If x is in kilometers, so is x0 .
Accordingly, the value of A will also be denominated in kilometers and we
can say our estimate was off by A kilometers.
However, the absolute error of a measurement only partially conveys the
context of the measurement and may not present a meaningful error estimate
in context. In the directions, Newark is 16 kilometers (10 miles) from New
York. That is a substantial difference in this context. But in another context,
a 16-kilometer error may not make a substantial difference. For instance, using
the radius of the Earth’s orbit around the Sun again, an error of 16 kilometers
is not very large. This is especially true considering the Earth’s orbit varies
from 147 million to 151 million kilometers from the Sun.
Measuring how substantial the estimated error is leads to a different error
type definition. The relative error measures, as a part of the true value, how
much error there is in the measurement. Formally,
x − x0
R = , (2.2)
x
Figure 2.2
At least we got to New York, this time (Stamen Design)
and computational power to calculate a result using the full number of dig-
its. Practically, this is rarely a concern and the limits of the calculation are
described as the precision of the result.
2.1.2 Precision
Precision provides a measure for how specific a number is. That is, precision
explains the level of detail in a measurement. Continuing our travel analogy,
getting directions to New York when we are interested in going to the Empire
State Building is a problem of precision. We have gone to the right place,
but the right place is too large to get to the final destination. Directions
to Manhattan narrow the directions and you might see the Empire State
Building. But directions to the corner of 34th Street and Fifth Avenue will
take you to the building itself. As the directions increase in precision, we get
closer to our destination, without being incorrect at each step.
It is possible to estimate a value of x with an apparent high degree of
precision, but this may not necessarily be more accurate. For instance, we can
estimate the value of π at 22/7, a commonly used approximation,
22
π≈ ≈ 3.14285714 . . . (2.3)
7
This estimate is accurate to two decimal places. But the estimate given by
22/7 is a decimal that continues forever. Only the number of digits we are
willing to show limits its precision. However, the value of this estimate is its
accuracy and for some purposes, this level of accuracy is sufficient.
Error Analysis 33
Figure 2.3
A journey of a thousand miles starts with a good map (Stamen Design)
to misunderstand. The critical input value was the 5-kilogram figure. It has
one significant digit, the 5. The final result should include only one, and be
interpreted as 10 pounds.
Significant digits, also called significant figures, are the parts of a number
that include the precision of that number. This is limited to any non-zero
digits in the representation of the number. The significant digits convey all of
the precision of a number. For instance, the number π can be represented as
3, 3.1, 3.14, 3.142, 3.1416, 3.14159, and so on with an increasing number of
significant digits. Accordingly, as the number of significant digits increases, so
does the precision of the estimate.
Significant digits are frequently encoded in scientific notation. The number
1.5 × 108 is the same as 150 million and 150,000,000. When discussing the
distance from the Earth to the Sun, this estimate was shown to be the center
of a range. But as an estimate, 150 million kilometers is a good fit and 1.5×108
conveys all of the critical information. The 1.5 carries all of the precision and
the 108 carries the magnitude of the number. As a result, this estimate of the
radius of the Earth’s orbit contains two significant digits.
However, when doing mathematics with numbers of limited precision, the
lowest precision number determines the final precision. That is, the number
with the fewest significant digits included carries that number of significant
digits into the final calculation. That is why in the case of the 5-kilogram
holding box, the estimate in pounds should only be 10, no matter how precise a
conversion factor is used. Critically, this statement holds for multiplication and
division of estimated values, but also for addition and subtraction. However,
unlike with addition and subtraction, multiplication and division can change
the magnitude without affecting the number of significant digits in the final
result.
Coming back to rounding, the process of rounding numbers reduces both
the number of significant digits and the precision of an estimated value. For
each place of rounding, the number of significant digits is reduced by one.
With π, rounding to 6 total digits is 3.14159, because the next number is 2.
However, if the value of π is rounded to only 5 digits, the value is 3.1416.
In this case, a significant digit is lost and a single digit of precision is lost.
Accuracy is also lost, but by an unknown and bounded amount. In this case,
the rounding process happened in the last place and split the input range
evenly. As a result, the upper bound on the absolute error is 0.00005. We
know the error is actually smaller than that, but only because we know the
value of π.
Because a computer only has a finite amount of storage, we can only store
numbers with a limited degree of accuracy and precision. In practice, there are
standard representations of numbers that limit the accuracy and precision to
a fixed value. Two important representations are the integer and the floating
point number.
Error Analysis 35
Table 2.1
Hexadecimal, decimal, and binary numbers with equivalents
[1] -2147483647
> -2147483646L - 2L
[1] NA
Here, instead of the as.integer function, we can append capital letter “L”
to the number to force the integer format. The calculation will continue using
integers if all component parts are integers. The letter L stands for long, the
name of the declaration for a 32-bit integer. Removing the final L converts
the intermediate stages to numeric.
> -2147483646L
[1] -2147483646
Generally, we will call these the maximum and minimum of the data type,
the largest positive and negative numbers. Closely related, we will call the
largest number the number with the greatest magnitude, regardless of sign.
Finally, the smallest number is the term used to describe numbers of lowest
magnitude, regardless of sign. These are the numbers closest to 0. In the case
of integers, this is 1 and -1.
The maximum value of a 32-bit integer, roughly 2.1 billion, is too small for
many common figures. The 2010 population of the world, just under 7 billion
people, could not be represented by a 32-bit integer. The value of 13! is more
than 6 billion. And there are approximately 4.33×1019 potential arrangements
for a Rubik’s cube (Bandelow 1982, 45). This also ignores the fact the integer
data type cannot store numbers smaller than 1, and if we entered 1/2, it would
round down to zero.
> as.integer(0.5)
[1] 0
> as.integer(1.9)
[1] 1
This is not to say there are no advantages to storing numbers as integers.
One is that it provides a convenient format for entering binary numbers in-
Error Analysis 37
directly. R does not support binary digit input, without the use of a package
library. But it does support hexadecimal input. And we can use hexadecimal
input to quickly enter binary because there is an exact 4-bits (binary) to 1 digit
(hexadecimal) conversion factor. Table 2.1 provides a reference table for the
binary-decimal-hexadecimal digits from 0 to 15. Hexadecimal numbers can be
entered if they are preceded by the “0x” code to signal the use of hexadecimal
numbers:
> 0xFACE
[1] 64206
Of course, R has no trouble with numbers larger than 231 − 1. However,
it will not use the internal integer storage format to hold these numbers. In
fact, the intermediate forms of 231 used in the prior two examples were not
integers. Accordingly, these expressions were valid and held the full value until
the final conversion to the integer data type.
> 2^32
[1] 4294967296
> class(2^32)
[1] "numeric"
In the example above, the numbers are being stored in the internal numeric
data type, which supports numbers larger than a 32-bit integer can store.
Numeric, based on the underlying floating point number format, provides the
most flexible and powerful option for native mathematics available to R.
ators, explaining his motivations and design decisions. See Zuras et al. (2008) for the stan-
dard.
Error Analysis 39
digit is implicit. Therefore, this number is 0b0.11 in binary. Because the num-
ber is a fractional part in base 2, any number can be represented as the sum
of fractions where the denominator is a power of 2. The fraction 3/4 can be
represented by 1/2 + 1/4. As binary decimals, 1/2 = 0b0.1 and 1/4 = 0b0.01
This adds up and 3/4 = 0b0.11. Numbers that are not the sum of such frac-
tions cannot be represented at the exact value in the double precision format
and are instead approximated. One common example of this is 1/3.
Storing numbers in the double precision format has advantages for the
numerical processor that handles these numbers. For addition or subtraction,
both numbers should be in a common exponent. One can be converted to the
other by changing the mantissa. After this conversion, the numbers are added
or subtracted, then the result stored as a new floating point number. Using
10 as a base, we can add 200 to 2000 and watch the process:
Generally, the internal details of how floating point operations work are not
necessary for numerical analysts. But some specific data about the double
precision floating can help some analyses before they become too big and
complex.
In addition to representing numbers, floating point numbers permit several
special values to be stored. The most important of these is NaN, represented
in R by “NaN.” NaN stands for “not a number.” One way to get a result of NaN
is 0/0:
> 0 / 0
[1] NaN
NaN has the property that if introduced into an expression, the NaN will
generally propagate throughout the rest of the expression. For instance, NaN +
x = NaN, for any value of x. Also, NaNx, NaN ∗ x, and NaN/x, all equal NaN
for any value of x, including if x is itself NaN. There is an exception. Under
the ieee standard, NaNx = NaN unless x = 0, in which case NaN0 = 1. There
40 Computational Methods for Numerical Analysis with R
is a test function, is.nan that tests if the value given is NaN. Like the other
is.* functions, it returns a value of TRUE or FALSE and can accept vector and
matrix arguments. Finally, NaN is not equal to anything, even itself:
> NaN == NaN
[1] NA
NaN is distinct from NA. NaN implies a result that cannot be calculated for
whatever reason, or is not a floating point number. Some calculations that
lead to NaN, other than 0/0, are attempting to take a square root of a negative
number, or perform calculations with infinities that lead to undefined results:
> sqrt(-1)
[1] NaN
> Inf - Inf
[1] NaN
However, adding two infinities produces ∞:
> Inf + Inf
[1] Inf
NA is different from NaN in that NA is not a part of the ieee standard for
floating point numbers. NA is a construction of R used to represent a value
that is not known, as a placeholder. NA says no result was available or the
result is missing. It can be used in a matrix to fill in a value of a vector:
> c(1, 2, 3, 4, NA, 5, 6)
[1] 1 2 3 4 NA 5 6
> matrix(c(1, 2, NA, 4, NA, 6, NA, 8, 9), 3)
[,1] [,2] [,3]
[1,] 1 4 NA
[2,] 2 NA 8
[3,] NA 6 9
In addition, NaN is different from infinity, which has its own representa-
tion. Within R, positive infinity is represented as Inf and negative infinity is
represented by -Inf. The value of infinity is the result of a couple of different
operations, including exceeding the basic storage requirements. One interest-
ing property of the double precision data type is that division by 0 yields
infinity. Within most algebraic frameworks, division by 0 yields an undefined
result. We normally ascribe no meaning to this undefined result. It is just
undefined, and not infinity, so the definition matches.
Within double precision arithmetic, and by extension, R numeric arith-
metic, division by 0 leads to infinity, except when the dividend itself is 0:
> 1 / 0
[1] Inf
Error Analysis 41
And this is balanced by a certain symmetry that almost any value divided by
Inf returns 0:
> 1 / Inf
[1] 0
Of course, Inf divided by itself is undefined:
> Inf / Inf
[1] NaN
In addition to Inf, the double precision data type includes a negative infinity,
-Inf. This behaves in the same way as Inf, except the sign propagates as
expected. Inf also has the property of being equal to itself, but not equal to
-Inf. Inf is greater than all other numbers and -Inf is less than all numbers:
These special values give the floating point system a way to handle excep-
tional conditions. In some other environments, division by zero, for instance,
automatically terminates the program. By instead returning a NaN, floating
point allows computation to continue while still making it clear to us that
something has gone wrong.
1 + > 1, (2.6)
within the floating point system. Intuitively, we know that 1 plus any number
greater than 0 will be greater than 1. But with the number spacing within
floating point, that is not necessarily the case. Machine is the threshold value
for representation.
Given the smallest possible number we found the double precision type to
support, we might expect this number to be so minuscule as to be inconse-
quential. However, this number, available within R as .Machine$double.eps,
is comparatively quite large:
> .Machine$double.eps
[1] 2.220446e-16
While the smallest number is hundreds of orders of magnitude smaller than
even the smallest measurable distance, the value of machine is at the scale
of some values we might need. For instance, the width of a proton is 8.4087 ×
10−16 meters, a value just about four times larger than machine . But other
subatomic particles can be even smaller and it is possible to inadvertently
cross these limits with real-world analyses.
The limitation on machine is a real limit. Using the print function to
Error Analysis 43
force R to print more digits, we can see what happens when we add machine
to 1, along with some nearby values:
> print(1 + .Machine$double.eps, digits = 20)
[1] 1.000000000000000222
> print(1 + .Machine$double.eps * 2, digits = 20)
[1] 1.0000000000000004441
> print(1 + .Machine$double.eps / 2, digits = 20)
[1] 1
Values equal to or greater than machine , when added to 1 are distinct.
Those smaller are not. The computer will simply round up or down a result
appropriately and leave it at that.
R provides a second value to measure the precision of the floating point
implementation called .Machine$double.neg.eps. This is the smallest value
such that,
1 − < 1, (2.7)
within the floating point system. Like .Machine$double.eps, we expect any
positive number should meet this requirement, but for the same reasons, this
is not the case. This number is also much larger than some applications may
call for:
> .Machine$double.neg.eps
[1] 1.110223e-16
And we can see the number in action, just like with .Machine$double.eps.
> print(1 - .Machine$double.neg.eps, digits = 20)
[1] 0.99999999999999988898
> print(1 - .Machine$double.neg.eps * 2, digits = 20)
[1] 0.99999999999999977796
> print(1 - .Machine$double.neg.eps / 2, digits = 20)
[1] 1
Because floating point is a scientific notation system, the relative effect of
different addends to different numbers can change. We will define a new value,
x , such that,
x + x > x, (2.8)
in the floating point system where x is any positive number. As x increases in
magnitude, the value of x also increases. We can observe this experimentally
by adding to, for instance, 1000,
> print(1000 + .Machine$double.eps, digits = 20)
[1] 1000
which has no effect. The value of x is directly related to, and proportional
to, the value of x. A similar relationship can be shown for the corresponding
negative for x. The library pracma contains a function, eps, that provides
the value of x for any given value of x:
44 Computational Methods for Numerical Analysis with R
> library(pracma)
> eps(1000)
[1] 1.136868e-13
> eps(1000000)
[1] 1.164153e-10
> eps(1000000000)
[1] 1.192093e-07
As we can see, the value of x grows with x and the magnitude of x has a
linear relationship with the magnitude of x.
The most important effect of this is that certain numbers cannot be repre-
sented precisely within a floating
√ point system. It is obvious that transcenden-
tal numbers, such as π or 2, require infinite precision. Since infinite precision
requires infinite storage capacity, we can quickly assume these numbers must
be represented by approximations rather than true values.
This is a machine-induced error called round-off error. This error is the dif-
ference between the true value x and the floating point representation thereof,
x0 . Round-off error is a manifestation of accuracy problems and can lead to
some bizarre interactions.
Transcendental and irrational numbers are not the only unrepresentable
numbers. Some perfectly rational numbers such as 1/3, which repeats in-
finitely, cannot be represented. And in binary, even 1/10 repeats infinitely.
These numbers are represented by rounded off approximations. Finally, even
numbers that do not repeat in binary, and are rational, cannot be represented
if the mantissa requires more than 53 bits for storage, including the sign.
Sometimes, this is a problem, but we can counter it with our intuitive un-
derstanding of mathematics. For instance, we usually see the number 0.333 . . .
and convert it to 1/3 in our heads. But if we see 1.28571428571429, we prob-
ably do not immediately recognize it as 9/7. As numerical analysis is the
search for an answer that is good enough for the application domain, we try
to balance the need for a true answer with an answer that meets the need.
Functions within the MASS library can convert decimals to reduced frac-
tions. If double precision is insufficient to meet those requirements, several
packages for R are available that can provide greater precision. The first is
gmp which provides multiple precision floating point arithmetic. Based on a
widely available library, also called gmp, this multiple precision library is well
tested and provides precision to whatever finite degree we ask. Further, the
library Rmpfr provides additional function using the gmp library as a base.
[1] 3.330669e-15
Here, 1/3 is stored as 53 binary digits, which is around 15 decimal digits of
significance. Subtracting 0.33333333333333, non-repeating decimal, from 1/3
should return 3.3 × 10−15 . But the floating point representation of 1/3 has no
significant digits after the fifteenth. As a result, the floating point subtraction
results in the leftover bits in the representation that are, in reality, the round-
off error in the storage of 1/3.
This problem is called loss of significance. We can see the same result
subtracting a number near to 1 from 1:
> 1 - 0.999999999999
[1] 9.999779e-13
Loss of significance will occur when subtracting any two “close” numbers when
at least one is not perfectly represented in binary. Mathematically, we note
that round-off error becomes more significant in subtraction of the form x −
(x − δ) as δ goes to and for any value of x. This error can intensify if the
subtraction is an interim value preceding a multiplication, since the interim
result will just be multiplied as is and the error will increase in magnitude:
> (1 - 0.999999999999) * 1000
[1] 9.999779e-10
In practice, formulae can often be rearranged to remedy these sorts of sub-
tractions, as we saw with Kahan’s summation algorithm in Section 1.3.1.
In other cases, round-off error can change intermediate results. For in-
stance, this simple addition problem shows how a simple subtraction problem,
20.55 − 1.35 − 19.2, is not commutative in floating point arithmetic:
> 20.55 - 19.2 - 1.35
[1] 1.332268e-15
> 20.55 - 1.35 - 19.2
[1] 0
When we perform both of these calculations by hand, the value is 0. But a
simple rearranging to allow for the way mathematics is done by the computer
changes the result from something close enough to 0 that we probably do not
care, to a value that is truly 0.
Loss of significance is characterized by the behavior that the change in rel-
ative error of a result, following an operation, is much larger than the change
in absolute error. As explained in Section 2.1.1, we use relative error to un-
derstand the effect of the error, by showing its magnitude relative to the final
result. Generally, the relative error should be insignificant compared to the
absolute error. Loss of significance flips this around by creating a situation
where the change in absolute error is very small, comparatively. This hinders
our ability to understand the error.
Fortunately, we can solve many instances of loss of significance by changing
our approach to the calculation. Using the familiar quadratic formula, we can
46 Computational Methods for Numerical Analysis with R
x1 <- - ( b1 + t1 ) / t2
x2 <- - ( b1 - t1 ) / t2
return ( c ( x1 , x2 ) )
}
R Function 2.1
The quadratic formula
And the square root of 1156 is 34 and nothing terribly interesting has hap-
pened. This is usually the case and there are a number of contrived examples
of loss of significance that take place in an imaginary floating point calculator,
of five or ten significant digits. It is possible to avoid these by changing the
algorithm to not perform any subtraction in the numerator. This eliminates
the catastrophic cancellation that can cause problems.
For the quadratic formula,
√
−b ± b2 − 4ac
x= (2.13)
√2a √
−b ± b2 − 4ac −b − b2 − 4ac
= × √ (2.14)
2a −b − b2 − 4ac
2c
= √ . (2.15)
−b ∓ b2 − 4ac
Error Analysis 47
x1 <- t2 / ( - b1 - t1 )
x2 <- t2 / ( - b1 + t1 )
R Function 2.2
The quadratic formula rewritten
a =94906265.625, (2.16)
b =189812534.000, and (2.17)
c =94906268.375. (2.18)
more detail on the subject. In addition, Kahan’s work on loss of significance is remarkable,
leading to the Kahan summation algorithm from Section 1.3.1.
48 Computational Methods for Numerical Analysis with R
Here, we have a different incorrect result that is not, in any meaninful way,
better than the results we received from the quadratic function.
stantially smaller than machine . As shown in equation 2.8, the value of the
next floating point number after x changes based on how large x is. In fact,
the floating point numbers are closer together closer to 0. Further, there are
approximately the same number of floating point numbers between 0 and 1
as there are greater than 1.
At the other end of the spectrum, when the number is too large to be
expressed, we call the phenomenon overflow. Like underflow, overflow is largely
a theoretical problem, though one that is easier to create an example for. The
largest value can be interrogated in R using .Machine$double.xmax:
> .Machine$double.xmax
[1] 1.797693e+308
Because the same bits that hold negative exponents also hold positive
exponents, this is a number followed by more than 300 zeros. This number is
significantly larger than almost all numbers that might be used in practice. For
comparison, cosmologists estimate there are only 1080 hydrogen atoms in the
entire Universe. There are 2256 ≈ 1.15792089 × 1077 encryption keys possible
with the 256-bit variant of the AES encryption system. And the surface area
of the Milky Way’s galactic disk is roughly 10 × 1041 square meters.4
Like the smallest numbers, even at the extreme upper boundary of practical
mathematics and physics, the numbers involved are far smaller than the upper
bound of double precision. These exceptionally large numbers are more than
200 orders of magnitude smaller than the largest possible double precision
numbers can store. And like at the smallest numbers, we have larger numbers
available, if we need them, up to .Machine$double.xmax.
Here we can better contrive an example to illustrate the point. First, lim-
iting the values to those stored as integers, doubling the value will result in
integer overflow:
> 2147483647L * 2L
[1] NA
And a warning may be displayed, as well.
Of course, without the specification that these numbers be conducted with
32-bit integers, R will dutifully accept and calculate a result:
> 2147483647
[1] 2147483647
An advantage to using double precision data types to store numbers is
that they can store integers up to 253 without any loss of precision.
Even though the computer has a mechanism to handle numbers below
.Machine$double.xmin, there is no comparable method to store numbers
larger than .Machine$double.xmax. Multiplying .Machine$double.xmax by
2 yields a curious result:
4 Both this value and the number of atoms in the Universe are drawn from Wikipedia
> .Machine$double.xmax * 2
[1] Inf
While we know this number is not infinity, R returns Inf because it has no
other value available to signify the overflow has occurred.
In both the case of underflow and overflow, R does not cause an error. It
simply uses the best available option. In the case of underflow, using 0 may
likely be a good enough answer. In the case of overflow, it is unlikely that
infinity is a good enough answer, but if seen, should appropriately signal that
something has gone wrong.
For the same reasons as underflow, real data is unlikely to cause an overflow
condition. However, both overflow and underflow can arise in intermediate
calculations. This happens when all the input variables are representable as
is the true final answer. However, some intermediate calculation, such as an
intermediate sum when calculating a mean, is larger than can be represented.
In addition, not all systems will process data with double precision float-
ing point numbers. R is capable of interacting with and exchanging data with
systems that only support single precision arithmetic, where the limits are
substantially smaller. In a single precision environment, the smallest possi-
ble number is 5.87747175411144e − 39 and the largest possible number is
3.40282366920938e + 38. While these numbers are smaller and larger than
most applications, we have seen examples where they are insufficient.
We should keep in mind where the boundaries are and be aware if they are
approaching. It is possible that we may reach a boundary during an intervening
step in a complex calculation. Monitoring and understanding the calculation’s
complete process will allow us to avoid those conditions, or at least be aware
that a result may not be trusted. We will examine that in the next subsection.
− ` +
Figure 2.4
Error in the measurement of length
`1 `2
Figure 2.5
Additive error in measurement of length
ily include some error. We will say that ` = `0 ± where ` is the true length, `0
is the measured length, and is some error. In this case, is 0.5 millimeters.
An example is shown in Figure 2.4 where the potential error area is shaded.
We can allow that this measured error is acceptable because if it were not,
we would use a better measuring tool. However, even with that allowance,
it can cause trouble. If given two measurements and we must add them, the
errors themselves are additive. Under the same circumstances, we have two
measurements, `1 and `2 such that,
`1 = `01 ± (2.19)
`2 = `02 ± . (2.20)
Adding the values together leads to a combined error of,
`1 + `2 = (`01 ± ) + (`02 ± ) (2.21)
= `01 + `02 ± (2). (2.22)
Assuming each measurement were taken the same way, 1 2 , but the error
grows, regardless. In the example with the ruler, the combined error of two
measurements added is one millimeter. This is shown in Figure 2.5 where
the potential area, subject to the error estimates, is shaded. Accordingly, they
grow linearly and if we were designing a container to hold ten objects of length
`, the combined error is 10, and so errors can propagate through an addition
process.
52 Computational Methods for Numerical Analysis with R
+
h
−
− ` +
Figure 2.6
Multiplicative error in measurement of length
Other situations can amplify error more extremely. Imagine two measure-
ments, a length, ` and a height, h, with a desire to find the surface area. If
both were taken via the same method and have the same potential error in
the measurement, , the error is multiplicative, across all terms, and,
The entire error in the surface area is `0 + h0 2 . If the measurement error
were assumed to be within half a millimeter, as the prior example, then the
combined potential error in surface area is at more than half the length and
height added together, in square millimeters. This is shown graphically in
Figure 2.6. The potential error in a volume measurement is, commensurately,
cubical.
Through these examples, we can see the affects of compounding errors.
When a value is subject to machine error, through the floating point repre-
sentation or through some other source, the same error propagation patterns
emerge. In the case of the surface area shown above, the calculation will yield
a result, even if it is not necessarily correct.
But if one of the source values, such as the height, had overflowed, and was
now represented as Inf, then the final result would be Inf. The same thing
can happen with NaN. While these results are obviously unnatural, they arise
just as easily through mistake as they do through actual results. An accidental
logarithm of 0 returns -Inf and if buried within a larger calculation, we can
be left scratching our heads for a few minutes as we look for what went wrong.
This leads to the search for numerical stability, another important goal of
numerical analysis. Numerical stability is the property of numerical algorithm
Error Analysis 53
that the approximation errors, from whatever source, are not amplified by the
algorithm. This property manifests in two important ways.
The first way it manifests is that numerically stable algorithms will not pro-
duce drastic output changes after small changes in the input. Drastic changes
are sometimes in the eye of the beholder. An algorithm where the output
is an exponential function of the input will grow quickly as the input value
increases. Here, a drastic change is one beyond the expectation of change.
The second manifestation of numerical stability refers to how the algorithm
behaves near certain boundary values or singularity points. An algorithm that
divides by the input will behave well until the input value is 0, division’s sin-
gularity. However, how the algorithm behaves near the singularity is expected
and only produces incorrect results at 0.
A numerically unstable program is not well behaved in this sense. One
example would be an algorithm that will magnify the inherent errors without
bound. An analysis of the program will show that the final error is indetermi-
nate, or if determined, the relative error is large, leading to unusable results.
Some algorithms are only unstable at certain boundary points, when a com-
ponent function has no return value or meaning. In these cases, the algorithm
may be stable for some input domains, but not others.
Numerical stability is a desirable property to have in an algorithm, but it is
generally found in an algorithm-problem pairing. As a result, it does not nec-
essarily follow that an algorithm-problem pairing which is stable will remain
stable if the problem is changed. Sometimes, problem areas can be identified
ahead of time. In other cases, instabilities are best observed experimentally.
Observing how an algorithm behaves and produces results for different inputs
is a key aspect of numerical analysis.
2.4 Applications
With a firm understanding of how data is stored in R, we will now explore the
first application of this knowledge. In Section 2.4.1, we will see a simple method
to perform integer division, with a remainder, through repeated subtraction.
This method is slow and not very interesting. Now, using the mechanics of
binary data, we can explore a new method that implements long division like
we learned in school.
if ( n == 0)
stop ( " Attempted division by 0 " )
while ( r >= n ) {
quot <- quot + 1
r <- r - n
}
R Function 2.3
Naı̈ve division by repeated subtraction
as a replacement for the internal R operations for division and modulo, in-
teger division that just returns the remainder. However, they do perform the
operations and can be implemented in just a few lines of R.
The simplest approach to division mimics the basic method of multiplica-
tion. When learning multiplication, we learn the times tables. But before that,
we learn to intuit the mechanism by successively adding blocks of numbers
together. Given four sets of three apples each, we learn to multiply by adding
four sets of three together, 4 × 3 = 3 + 3 + 3 + 3 = 12. We can divide by
reversing the process.
This method is provided in Function 2.3, the function naivediv. We will
describe this function, and others, as naı̈ve, but this should not be considered
a pejorative. The naı̈ve approach is the most basic approach you could take
to solving a problem. The naivediv function accepts two options. The first
is the dividend, or the numerator of the division process. The second is the
divisor, or the denominator.
The function begins by setting an internal value, r to the dividend. From
there, the function subtracts the denominator from r in a loop until the value
of r is less than the divisor. For each subtraction, the value of a second internal
value, q, is incremented. When r is less than the divisor, the value of quot is
the quotient and r contains the remainder. The function returns a list with
two components named quot and r representing each value, respectively.
The algorithm is, at best, inefficient. It will cycle through the loop q =
bm/nc times. That is quite a bit and while it reduces when n is larger, it
increases when m is larger. The function returns the quotient and remainder
for any positive values of m and n:
> naivediv(314, 7)
Error Analysis 55
$quotient
[1] 44
$remainder
[1] 6
Before using these division functions, remember that R has a built-in op-
erator that performs the modulo operation, %%. The floor function, combined
with basic division, should be used to get the quotient. The modulo operator
should be the preferred choice:
> floor(314 / 7)
[1] 44
> 314 %% 7
[1] 6
Like most algorithms in this text, there are cleaner, faster, and more robust
implementations already available within R.
are available via LATEX, the French style is much clearer to follow.
56 Computational Methods for Numerical Analysis with R
if ( n == 0)
stop ( " Attempted division by 0 " )
for ( i in 31:0) {
r <- bitwShiftL (r , 1)
r <- r + bitwAnd ( bitwShiftR (m , i ) , 1)
if ( r >= n ) {
r <- r - n
quot <- quot + bitwShiftL (1 , i )
}
}
R Function 2.4
Binary long division
does this. This function, after preparing an initialization for the quotient and
remainder, executes a simple loop over the bits of the dividend. The function
need only execute over the significant bits and eliminate leading 0s; however,
when executing the algorithm, it simply returns 0 for each of those bits.
The algorithm uses the bit manipulation functions inside of R to provide
access to the bits. First, shifting the bits one unit to the left is equivalent to
multiplying by two. A shift to the right is equivalent to dividing by two, with
the remainder lost. The quotient is built up bit by bit, with the highest order
bit first; any remainder, at each point, is carried through to the next step.
> longdiv(314, 7)
$quotient
[1] 44
$remainder
[1] 6
The longdiv is substantially faster than division by repeated subtraction.
Further, its runtime is bounded by the size of the dividend, not both the
dividend and the divisor. In this implementation, the loop is executed over all
the available bits of storage, regardless of whether they are used. Therefore,
the loop will execute exactly 32 times, sufficient for the 32 bits of any integer
in R. For 1000/3, the naivediv function executes the loop 334 times, ten
times the maximum of longdiv.
Error Analysis 57
Comments
Understanding error, its nature and sources, is a key part of numerical analysis.
There are two main types of error we must contend with, round-off and algo-
rithmic, and they manifest in different ways and for different reasons. Worse,
they can appear individually or in concert with each other. However, each type
of error can be mitigated through defensive programming techniques and the
knowledge that, no matter how hard we try, error cannot be eliminated. It
can, at best, be bounded, in the general case.
Further, the use of double precision arithmetic allows us to save some cases
of round-off error before they happen. Round-off error, in a double precision
number, is sufficient to solve most numbers we need and solve most problems
we have. And there is good news in this.
Most numerical analysis problems represent some form of applied mathe-
matical problem. Perhaps it is defining a curve for a wind turbine or estimating
wealth disparity. Applied mathematical problems generally involve some type
of measurement of real-world values. In practice, our measurement error will
be a greater limiting factor for many of the calculations we will perform. For
instance, a standard rule will only have a measurement resolution of 1/16th of
an inch, or so. That number can be stored as a floating point number without
problems. However, with real-world data, each of the individual measurement
errors is a range that is added or subtracted to the given measurement. If we
are lucky, these errors cancel each other out leaving us with a net-zero error.
In practice, when measurement error is unbiased, then the error is centered
at 0 and appears with an estimable frequency pattern.
But a measuring tool, such as a ruler, may be biased, one way or the other.
Consequently, especially in the realm of error propagation, the defined errors
are maximum potential error, not minimum and certainly not estimates of the
actual error. It is obvious to think that if we know what the error is, we can
just adjust our results to accommodate it. But in reality, all we have is an
upper bound, in most cases.
As we work with numerical analysis, we want to keep an eye on that upper
bound of the error. The upper bound is still useful. We can use that to manage
expectations on results. Many of the algorithms we work with are focused
on minimizing that upper bound. When we fit that upper bound within a
tolerance we have specified, we know that our answer was good enough, for
whatever good enough means at that time. Fitting our algorithm to meet
our needs is a crucial part of the selection process, on par with meeting the
requirements we specified in Chapter 1.
The rest of this book will now introduce a series of algorithms for solving
different types of mathematical problems. Some are general solutions and some
are keyed to specific problems. Finally, some will be applied to specific real-
world problems as we see how all of this connects back to applied mathematics.
58 Computational Methods for Numerical Analysis with R
Exercises
1. What is the binary representation, as an integer, of the number
4714?
2. What is the floating point representation of the number 8673.34?
3. What is the floating point representation of the number 1/6?
4. The number e, the base of the natural logarithm, is approximately
2.71. What is the absolute error in that estimate? What is the rel-
ative error?
5. Give an example of a measurement which is accurate, but not pre-
cise.
6. Give an example of a measurement which is precise, but not accu-
rate.
7. How many numbers are there between 1 and .Machine$double.xmax
in double precision arithmetic?
8. How many numbers are there between 0 and 1 in double precision
arithmetic?
9. If a 5×6 array contains double precision numeric, find the minimum
number of bytes necessary to store the array.
10. Write the data storage format for the double precision represented
by the numbers 1/2, 1/3, and 1/10.
11. Find the smallest possible such that 1000 + > 1000.
√
12. Rewrite f (x) = x2 + 1 − 1 so there is no loss of significance.
13. What the relationship between x and −x ?
14. Some computers support quadruple precision arithmetic, using 128
bits to store a number. The exponent uses 15 bits, the fractional
part uses 112 bits, and the sign is stored in a single bit. What is the
largest and smallest number this can represent?
15. Research other methods of storing numeric data such as binary-
coded decimal, fixed-point, or Java’s BigDecimal and discuss the
pros and cons of these methods versus floating point numeric data
types.