7-Mathematics Foundations SLM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 100

SYMBIOSIS INTERNATIONAL

(DEEMED UNIVERSITY)
Established under Section 3 of the UGC Act. 1956
Awarded Category - I by UGC

Symbiosis School for Online


and Digital Learning
Gram: Lavale, Tal: Mulshi, Dist: Pune, Maharashtra,
India Pin: 412115

E-CONTENT
MATHEMATICS FOUNDATIONS
M.Sc (Data Science) SEM – 1

Dr. Mandar Pande


CONTENT

 MODULE – 1 : Fundamentals Of 2
Mathematics

 MODULE – 2 : Discrete Mathematics 11

 MODULE – 3 : Differential Calculus 34

 MODULE – 4 : Linear Algebra 60

 MODULE – 5 : Computational Complexity 90


Theory

 MODULE – 6 : Numerical Analysis 95


About Course
The course will provide an understanding of mathematical concepts, fundamentals of Discrete
Mathematics, Calculus and Linear Algebra and their relevance in all data science. Also, the course
will familiarize with underlying mathematical constructs in various algorithms used in data science
and data analytics.

Learning Objective
1. To review basic mathematical concepts and build a foundation for understanding advanced
mathematics topics with relevance to data science
2. To get an understanding of the fundamentals of Discrete Mathematics, Calculus and Linear
Algebra and their relevance in all data science algorithms and concepts
3. To get a broad perspective of the underlying mathematical constructs in various algorithms
used in data science and data analytics
4. To use these mathematical constructs in advanced courses in data science.
5. To get an overview of computational complexity for understanding complexity of computer
science problems and associated algorithms
6. To get a broad perspective of numerical mathematics and its critical importance in solving
problems numerically on a computer.

1
MODULE 1 FUNDAMENTALS OF MATHEMATICS

Learning Objectives
 Understand various types of numbers including real and absolute
 Understand the concepts of an exponent and logarithm
 Define and understand equations and graphs
 Understand the meaning of functions of one independent variable

Unit Content Summary


1.1 Introduction to real numbers
1.2 Absolute value of a number
1.3 Exponents
1.4 Logarithms
1.5 Natural exponents and logarithms
1.6 Equations and graphs

This unit gives an overview of the fundamentals of mathematics starting from real numbers, and
proceeding towards exponents, logarithms, equations, functions, and graphs. This also gives an
overview of the functions in one independent variable.

1.1. Introduction to real numbers

Figure 1, show a horizontal straight line. All types of real numbers can be represented along this
line. Each number can be represented as exactly one point on the line. Even the reverse is true, that
is, each point represents exactly one number. This line which has a one-to-one correspondence is
called as the real number line.

2
Figure 1: The real number line

As is seen from the figure, 0 (zero) is the pivot around which all numbers are referenced. Positive
numbers are on the right-hand side of 0. Negative numbers are on the left-hand side of 0. The set
of real numbers can be classified into various types.
Natural numbers are positive numbers with the difference between any two consecutive numbers
being 1. Integers, which are whole numbers. Integers are both positive and negative. Natural
numbers can be considered as positive integers. Next are rational numbers. Rational numbers are
those numbers, which can be expressed as the quotient of two integers p and q. Therefore, a rational
number can be expressed mathematically as p/q. Examples of rational numbers are shown in

Figure 1, such as ½, whose value in a decimal representation is 0.5. Or 1/3, whose value is
0.33333..... It is important to note that a number whose decimal pattern is repeating is also a
rational number.
Irrational numbers on the other hand are those numbers, which cannot be expressed in the form
of p/q; These numbers have an infinite number of non-repeating decimal values or representations.
For example, sqrt(2) has a value of 1.41421356237.... This is an infinite nonrepeating pattern.
Another example is the exponent ‘e’ called as Euler number which has a value 2.718281828459....,
which is again an infinite nonrepeating pattern.

3
1.2. Absolute value of a number
The absolute value of a number N is denoted by |N|. Let us consider a cartesian coordinate system,
which is a system of lines (more appropriately called axes), which are perpendicular to each other
as shown in Figure 2 below. This will be explained in more detail in a later section.

Figure 2: Cartesian Coordinates in 2 dimensions

Then the absolute value represents the distance of N from zero. The absolute value of a positive
number is the same as the original number. However, the absolute value of a negative number is
simply the value of the number without the negative sign. For example, |3| = 3 and |-5| = 5.
Mathematically, the absolute value of a number N is represented as follows:

𝑁, 𝑓𝑜𝑟 𝑁 ≥ 0
|𝑁| = {
−𝑁, 𝑓𝑜𝑟 𝑁 < 0
1.3. Exponents
The concept of exponents is about representing the product of a number with itself. In general, if
x is any real number or a variable and n is a positive integer, then the x multiplied by itself n times
is represented mathematically as xn. In this case, n is called as the exponent and x is called as the
base. By the definition of exponents, the following two points are important to remember
For any x,
 the value of x0 = 1
 00 is undefined
Any number or variable x, that is not raised to an exponent is assumed to be raised to the first
power or to the power of 1. Therefore, 3.51 is the same as writing it as 3.5.

4
Assuming a and b are integers and x and y are real numbers, the laws of exponents are given below.
 (xa)( xb) = xa+b

 (xa)/( xb) = xa-b

 (xa)b= xab

 (xy)a = xa ya

 (x/y)a = xa/ ya

 1/ xa = x-a
 sqrt(x) = x1/2

1.4. Logarithms
Let x be a real number. Then for x > 0, the common logarithm of x is written as log10(x) or simply
as log (x) or log x. It is the power to which 10 must be raised to get x. 10 is called as the base of
the logarithm. Other positive numbers except for 1 can serve as the base. Logarithms are usually
a very useful mathematical tool to solve problems that involve exponents.
Assuming that, b > 0 and b ≠ 1, and x, y > 0, following are the rules of logarithms
 logb (xy) = logb (x) + logb (y)
 logb (x/y) = logb (x) - logb (y)
 logb xa = a logb x

𝑎
logb √𝑥 = (1/a) (logb x)
1.5. Natural exponents and logarithms
The Euler constant ‘e’ is the most commonly used base for exponential and logarithmic functions
which are usually encountered in engineering and science. From a mathematical perspective the
expression for e is given by
1 𝑛
𝑒 = lim (1 + )
𝑛→∞ 𝑛

5
The exponential functions which are written with the base ‘e’ are called as natural exponential
functions. In general, they are written as
𝑦 = 𝑒𝑥

Consequently, the logarithmic functions which are written with the base ‘e’ are called as natural
logarithmic functions. In general, they are written as:

𝑦 = ln(𝑥) = log 𝑒 𝑥

It is very interesting and important to note that 𝑒 𝑥 and log 𝑒 𝑥 are inverses of each other.

Following are the properties of natural exponents and natural logarithms


• 𝑒 ln 𝑎 = 𝑎
• 𝑒 ln 𝑥 = 𝑥
• 𝑒 ln 𝑓(𝑥) = 𝑓(𝑥)
• Conversely
• 𝑙𝑛 𝑒 𝑎 = 𝑎
• 𝑙𝑛 𝑒 𝑥 = 𝑥
• 𝑙𝑛 𝑒 𝑓(𝑥) = 𝑓(𝑥)

1.6. Equations and graphs


An equation is defined as a mathematical statement in which two algebraic expressions are equated
to each other on both sides of the equality sign. When we have an equation in which all the
variables are raised to the first power, then such as equation is known as a linear equation. The
mathematical expressions on either side are representations of some physical quantity called
variables and the objective of writing an equation is to get a solution to the variables that are
embedded in the equation. A solution of an equation is a number that is obtained, and which is
substituted for each occurrence of the variable gives the same value on either side of the equation.
Some examples of equations are given below

6
 6x + 7 = 19. Here the solution for the variable x is 2. If we substitute 2 instead of x in this
equation, on the left side of the equation we get 6 x 2 + 7 which is 19. This is also the value
on the right side of the equation.
 3x – 8 = 2y + 12. If we go ahead and solve this equation, we get y = 2 and x = 8.

1.6.1. Basic properties of equations


Following are the properties of equations, for all real numbers a, b and c, where a = b
 Addition property: a + c = b + c
 Subtraction property: a – c = b – c
 Multiplication property: a.c = b.c
 Division property: a/c = b/c (c not equal to zero)

1.6.2. Various coordinate systems:


There are numerous coordinate systems in mathematics, depending on the kind of problem that is
to be solved. The most common coordinate system is called as the cartesian coordinate system. In
this system,
• x, y, z are called as coordinates or axes
• Each of the axes is perpendicular to the other
• The Cartesian coordinate system is static
A graphical representation of a point in the cartesian coordinate system is shown in Figure 3.

(𝒙, 𝒚, 𝒛)

7
Figure 3: Three-dimensional Cartesian Coordinate system

When the problem to be solved has a curved surface, then using the cartesian coordinate system
becomes tedious to solve the problem. In such a case, one uses the spherical coordinate system.
In a spherical coordinate system, as shown in
Figure 4
• 𝑟, 𝜃, 𝜑 are the coordinates or axes. 𝑟 is the radial axes, 𝜃, 𝜑 are the angular axes.
• Each of the coordinates is perpendicular to the other
• The spherical coordinate system is dynamic

This system is dynamic in the sense that the coordinates 𝑟, 𝜃, 𝜑 can change their values during
the problem solving itself.

𝒛
(𝑟, 𝜃, 𝜑)
𝒓
𝜽

𝒚
𝜑

Figure 4: Three-dimensional Spherical Coordinate system

8
Exercise
Problems and Questions
1. Explain the meaning of the real number line.
2. What are integers?
3. Explain the concept of rational numbers.
4. How are irrational numbers different from rational numbers?
5. Why are the Euler constant ‘e’ and 𝜋 said to be irrational numbers?
6. What is the cartesian co-ordinate system?

Simplify the following expressions using the exponent rules


7. 𝑥 3 . 𝑥 5
8. 𝑥 8 / 𝑥 2
𝑥2
9.
√𝑥

10. (𝑥 4 )5

Find the y-intercept for the following equations. (note: y-intercept is the point at which the line
crosses the y-axis. Alternatively, it is the point where x = 0).
11. 3𝑥 + 𝑦 = 7
12. 9𝑥 − 3𝑦 = 72

Match the following:


𝜋 Cartesian coordinates
x, y, z axes Spherical coordinates
𝑟, 𝜃, 𝜑 3.1415
1.41412 e
3x – 5 = 22 Irrational number
1 𝑛 Equation
lim (1 + )
𝑛→∞ 𝑛

9
References
1. Calculus for Business, Economics, And the Social Sciences, Schaum’s Outlines, Edward
T. Dowling, McGraw-Hill 1990.

10
MODULE 2 DISCRETE MATHEMATICS

Learning Objectives
 Understand the mathematical notations used in mathematics and logic
 Get an overview of set theory
 Get an understanding of permutations and combinations
 Develop a good understanding of probability theory,
 Develop an understanding of Markov chains and
 Get an understanding of graph theory

Unit Content Summary

Discrete Mathematics deals with those elements of mathematics which are discrete. This means
that in discrete mathematics we are dealing with operations which use algebra and arithmetic. In
this unit, concepts of the mathematical notations used in mathematics, and logic are presented.
Further this unit gives an overview of set theory, permutations and combinations, probability
theory, Markov chains and graph theory.

2.1 Logic
Propositional logic: A proposition is a collection of declarative statements that has two types of
truth values. The truth value can either be "true” or "false".
By definition

11
• A proposition consists of propositional variables and connectives.
• The propositional variables are denoted by capital letters (A, B, etc).
• The connectives are those that connect the propositional variables.
An example of a proposition is “4 + 5 = 14 – 4” which is “FALSE” since the left side of the
equation returns a value 9 whereas the right side returns a value 10.
Connectives: In propositional logic there are connectives which are used to connect the
propositions. These are “OR”, “AND”, “NOT” or negation, “Implication” (if- then) and “If and
only if”.
Table 1 below, gives the notations which are used for each of the connectives

Connective Name Notation


OR ∨
AND ∧
NOT (or Negation) ¬
If-then ->
If and only if ⇔
Table 1: Connectives and their notation

Each of the connectives operate on two or more propositions and follow a truth table (same as the
Boolean logic in Digital Electronics)
• The OR operation of two propositions A and B is written as 𝐴 ∨ 𝐵
• The truth table is given below
A B A∨B
True True True
True False True
False True True
False False False

In a similar manner, we can create the truth tables for the other operations including AND, NOT,
XOR (exclusive OR) and so on.

12
For example, the truth table for the XOR is given below. The operation returns a True when either
of the propositions is True. In case both the propositions are either true or false, the operation
returns False. This is shown in the table below.

A B A XOR
B
True True False
True False True
False True True
False False False

Predicate Logic deals with predicates, which are propositions containing variables.
A predicate is an expression of one or more variables defined on some specific domain.
A predicate with variables can be made a proposition by either assigning a value to the variable or
by quantifying the variable.
• The following are some examples of predicates:
• Let E(x, y) denote "x = y"
• Let X(a, b, c) denote "a + b + c = 0"
• Let M(x, y) denote "x is married to y"

Quantifiers:
The variable of predicates is quantified by quantifiers.
There are two types of quantifiers in predicate logic:
• Universal Quantifier and Existential Quantifier.

Universal Quantifier
• Universal quantifier states that the statements within its scope are true for every value of
the specific variable. It is denoted by the symbol ∀.
• ∀𝑥𝑃(𝑥) is read as for every value of 𝑥, 𝑃(𝑥) is true.
• Example − "Man is mortal" can be transformed into the propositional form ∀𝑥𝑃(𝑥) where
𝑃(𝑥) is the predicate which denotes x is mortal and the universe of discourse is all men.

13
Existential Quantifier
• Existential quantifier states that the statements within its scope are true for some values of
the specific variable. It is denoted by the symbol ∃.
• ∃𝑥𝑃(𝑥) is read as for some values of 𝑥, 𝑃(𝑥) is true.
• Example − "Some people are dishonest" can be transformed into the propositional
form ∃𝑥𝑃(𝑥) where 𝑃(𝑥) is the predicate which denotes 𝑥 is dishonest and the universe
of discourse is some people.

2.2 Permutations and Combinations


Permutations and Combinations deal with the various ways in which a set of objects or entities can
be arranged.
The arrangements depend on multiple things or constraints such as
 The various types of objects available for selection
 Whether a particular type of object has multiple copies of itself
 Whether arranging the various types of objects in a particular order matter or it does not
 Whether a particular object can be replaced after selecting it for an arrangement
Whenever the order of arranging a set of objects does not matter, we talk about combination.
However, when the order of selecting objects does matter, we talk about a permutation.
Let us take an example to understand permutations and combinations. Let us consider a situation
in which you have a set of marbles of three colours, namely, red (R), blue (B) and green (G) and
you are asked to select 1 marble of each colour.
 If there is no specification about the order in which you can select the marbles, then we talk
about a combination. Specifically, you can choose RGB or RBG or BRG and so on and all
the choices are essentially equivalent.
 However, if the order of selecting a particular colour does matter, then we talk about a
permutation.

Hence, a selection choice RGB is different from RBG and BRG and so on.
Therefore,
• In a combination, order of selection does not matter
• In a permutation, order of selection does matter

14
Another example that can be taken is of a combination lock that are available on many
travelling suitcases. Specifically, let us select a 3 number combination lock with the following
points:
• Each number can go from 0 – 9
• The suitcase opens for only one selection of each of the three numbers. For example, let us
assume that the selected choice for opening the lock is 3 5 7.
• Then, unless we get this choice, the suitcase does not open.
• Thus, the conclusion is that the order does matter in this case.
• Therefore, we are talking about a permutation here. So, we should talk about a
permutation lock rather than a combination lock.

2.3 Permutations
Permutations with repetition: Let us consider an example of 9 marbles with 9 different colours
in a box.
• We are asked to choose 3 out of these 9 marbles.
• Since repetition is allowed, each time we can select 1 out of the 9 options, since the selected
marble can be placed back into the box.
• Therefore, the number of ways in which we can choose 3 marbles out of 9 when repetition
is allowed will be
9 x 9 x 9 = 93 = 81
• If we were asked to select 5 out of the available 9 marbles, with repetition allowed, we
would get 95 choices.
• In general, if we must select r of n types, with repetition allowed, the permutations are nr

Permutations without repetition: In this case when repetition is not allowed, the number of
choices available for selection after one is selected reduces by 1 every time. Let us consider the
above example of 9 marbles of different colours again.
• We are asked to choose 3 out of the 9 with repetition not allowed.
• The difference this time is that after we select one marble, we cannot replace it back into
the box.
• The first time we choose a marble, we have 9 choices.

15
• For the next selection, however, we have one choice less since we cannot replace the
selected marble back. Therefore, we have 8 choices.
• For the third selection, we have 7 choices.
• Thus, the number of permutations that are possible for this problem is
9 x 8 x 7 = 504
• In general, if we must choose 9 objects out of the available 9 with no repetition, the number
of choices is:

9 x 8 x 7 x....x 3 x 2 x 1

This is an appropriate point in time to introduce the concept of a factorial function.

Factorial Function: The definition of a factorial function for n objects is simply

n! = n x (n – 1) x (n – 2) x....x 3 x 2 x 1

Following are the assumptions of a factorial function


 1! = 1
 0! = 1

Formula for Permutations in terms of factorial function:


The formula for finding the number of permutations for choosing r objects out of n available
objects is
𝒏!
(𝒏 − 𝒓)!

• where n is the number of things to choose from,


• and we choose r of them,
• no repetitions,
• order matters.

16
Example: In how many ways can the first, and second place be awarded to 10 people?
Solution:
𝟏𝟎! 𝟏𝟎!
= = 𝟗𝟎
(𝟏𝟎 − 𝟐)! 𝟖!

Instead of writing the whole formula, people use different notations such as:

𝑛
𝑛!
𝑟𝑃 =
(𝑛 − 𝑟)!

2.4 Combinations
Combinations without repetition:
Let us consider the following example to explain the concept of combinations without repetition.
Let us assume that there are a set of 3 balls in a box. Each ball is numbered 1, 2 and 3. We are
asked to arrange them. The table below shows the various possibilities for the cases when order
does or does not matter:

Order does matter Order does not matter

123

132

213
123
231

312

321

In the cases where placing each of the ball in a particular order matter, we get 6 options.
However, in case order does not matter, then placing the balls in any order does not make a
difference. Therefore, in the second case (that is when order does not matter), it essentially means
that placing the balls in the order 123 is equivalent to any of the other 5 options.

17
The second case is what is called as a combination and the first case is the permutation.
The formula for finding the number of combinations for choosing r objects out of n available
objects is

𝑛
𝒏! 𝟏 𝒏! 𝒏
𝑟𝐶 = . = =( )
𝑟
(𝒏 − 𝒓)! 𝒓! (𝒏 − 𝒓)! 𝒓!

Where:
• n is the number of things to choose from
• we choose r of them
• no repetition
• order does not matter

Combinations - Some important results


Let 𝑛 and 𝑟 be positive integers such that 𝑟 ≤ 𝑛. Then
𝑛 𝑛
• 𝑟𝐶 = 𝑛−𝑟𝐶

• 𝑛
𝑟𝐶 + 𝑟−1𝑛𝐶 = 𝑛+1
𝑟𝐶
𝑛𝑛−1
• 𝑟𝐶 = (𝑛 − 𝑟 + 1) 𝑟−1𝑛𝐶

2.5 Set Theory


A set is a collection of distinct objects. Sets are often specified with curly brace notation.
Example: {1, 2, 3} is a set
However, {1, 1, 3} a not a set because 1 appears twice in the second collection. Therefore, in a set
the elements need to be unique.
When we need to mathematically define a set, we use the following terminology.
For example, the set of even integers can be written as:
S = {𝟐𝒏 : 𝒏 𝒊𝒔 𝒂𝒏 𝒊𝒏𝒕𝒆𝒈𝒆𝒓}
• The opening and closing curly braces denote a set
• 2n specifies the members of the set,
• The colon says “such that” or “where” and
• Everything following the colon are conditions that explain or refine the membership.
The set definition can be read as “The set of twice n where n is an integer”.

18
The integers are the set of whole numbers, both positive and negative and can be written compactly
as a set in the following manner:

S = {0, ±1, ±2, ±3, . . .}

Some important definitions of Set Theory


Empty Set
Definition 1: The empty set is a set containing no objects. It is written as a pair of curly braces
with nothing inside {} or by using the symbol ∅.
Examples of empty sets
a. 𝐿𝑒𝑡 𝐴 = {𝑥 : 9 < 𝑥 < 10, 𝑥 𝑖𝑠 𝑎 𝑛𝑎𝑡𝑢𝑟𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 } 𝑖𝑠 𝑎 𝑛𝑢𝑙𝑙 𝑠𝑒𝑡 𝑏𝑒𝑐𝑎𝑢𝑠𝑒
𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑁𝑂 𝑛𝑎𝑡𝑢𝑟𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 9 𝑎𝑛𝑑 10.
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒, 𝐴 = {} 𝑜𝑟 𝜑

b. 𝐿𝑒𝑡 𝑊 = {𝑑: 𝑑 > 8, 𝑑 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑦𝑠 𝑖𝑛 𝑎 𝑤𝑒𝑒𝑘} 𝑖𝑠 𝑎𝑙𝑠𝑜 𝑎 𝑣𝑜𝑖𝑑 𝑠𝑒𝑡
𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑜𝑛𝑙𝑦 7 𝑑𝑎𝑦𝑠 𝑖𝑛 𝑎 𝑤𝑒𝑒𝑘.

Membership of a set
Definition 2: The set membership symbol ∈ is used to say that an object is a member of a set.
It has a partner symbol ∉ which is used to say an object is not in a set.
Equal Sets
Definition 3: We say two sets are equal if they have the same members.
𝑬𝒙𝒂𝒎𝒑𝒍𝒆: 𝑆 = {1, 2, 3}, 𝑇 = {2, 3, 1}

How to write 3 is an element of S?


3∈S

How to write 4 is not an element of S?


4∉S

How are S and T related?

19
They are equal sets
Cardinality of a set
Definition 4: The cardinality of a set is its size. For a finite set, the cardinality of a set is the number
of members it contains. In symbolic notation the size of a set S is written |S|.
Intersection of Sets
Definition 5: The intersection of two sets S and T is the collection of all objects that are in both
sets.
 It is written S ∩ T
 Using curly brace notation, 𝑺 ∩ 𝑻 = {𝒙 : ( 𝒙 ∈ 𝑺 ) 𝒂𝒏𝒅 ( 𝒙 ∈ 𝑻 )}
 The symbol ∧ represents the Boolean symbol “and”. Using this we can rewrite

𝑺 ∩ 𝑻 = {𝒙: ( 𝒙 ∈ 𝑺 ) ∧ ( 𝒙 ∈ 𝑻 )}

Example 3: Intersections of sets


𝑆𝑢𝑝𝑝𝑜𝑠𝑒 𝑆 = {1, 2, 3, 5}, 𝑇 = {1,3,4,5}, 𝑎𝑛𝑑 𝑈 = {2,3,4,5}. Determine
 S∩T = {},
{1, 3, 5}

 S∩U= {}
{2, 3, 5}

 T∩U= {}
{3, 4, 5}

 S∩T∩U = {}
{3, 5}

Disjoint Sets
Definition 6: If A and B are sets and 𝐴 ∩ 𝐵 = ∅ then we say that A and B are disjoint, or disjoint
sets.
Union of Sets

20
Definition 7: The union of two sets S and T is the collection of all objects that are in either set. It
is written S ∪ T.

Using curly brace notion

𝑺 ∪ 𝑻 = {𝒙 : (𝒙 ∈ 𝑺) 𝒐𝒓 (𝒙 ∈ 𝑻 )}

Symbolic equivalent of “OR” is ∨ which lets us re-write the definition of union as:

𝑺 ∪ 𝑻 = { 𝒙: ( 𝒙 ∈ 𝑺 ) ∨ ( 𝒙 ∈ 𝑻 )}

Universal Set
Definition 8: The universal set is the set of all possible objects (at least for a given collection of
set theoretic computations). The universal set is commonly written U

Example: If we declare our universal set to be the set of integers then { 1⁄2 , 2⁄3 } is not a well-
defined set because the objects used to define it are not members of the universal set.

Solution: The symbols {1⁄2 , 2⁄3 } do define a set if a universal set that includes 1⁄2 and 2⁄3 is
chosen. The problem arises from the fact that neither of these numbers are integers.

Compliment of a set:
Definition 9: The compliment of a set S is the collection of objects in the universal set that are not
in S.

The compliment is written Sc.

𝑆 𝑐 = { 𝑥: ( 𝑥 ∈ 𝑈 ) ∧ ( 𝑥 ∉ 𝑆 )}

or more compactly as
𝑆𝑐 = { 𝑥 : 𝑥 ∉ 𝑆 }
21
However, it should be clear that the compliment of a set always depends on which universal set is
chosen.
There is also a Boolean symbol associated with the complimentation operation which is the NOT
operation.
The notation for NOT is ¬
The definition for compliment of a set S then becomes:

𝑆𝑐 = { 𝑥 : ¬ ( 𝑥 ∈ 𝑆 ) }

Difference of sets:
Definition 10: The difference of two sets S and T is the collection of objects in S that are not in T.

The difference is written as S − T.

In curly brace notation


𝑆 − 𝑇 = { 𝑥 : 𝑥 ∈ ( 𝑆 ∩ ( 𝑇 𝑐 ) ) },

Or alternately,
𝑆 − 𝑇 = {𝑥: (𝑥 ∈ 𝑆) ∧ (𝑥 ∉ 𝑇)}

Subsets:
Definition 11: For two sets S and T we say that S is a subset of T if each element of S is also an
element of T.

In formal notation S ⊆ T if for all x ∈ S we have x ∈ T.

If S ⊆ T, then we also say T contains S which can be written T ⊇S.


If S ⊆ T and S ≠ T, then we write S ⊂ T and we say S is a proper subset of T.

Example 9: Subsets
If A = {a, b, c}, then A has eight different subsets. Write down the 8 subsets

22
Solution: The subsets are {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}, {} (or ∅)
Notice that A ⊆ A and in fact each set is a subset of itself. The empty set ∅ is a subset of every set.

2.5.1 Propositions:
Proposition 1: Two sets are equal if and only if each is a subset of the other. In symbolic notation:
(𝐴 = 𝐵) ⇔ ( 𝐴 ⊆ 𝐵 ) ∧ ( 𝐵 ⊆ 𝐴)

Proposition 2: De Morgan’s Laws: Suppose that S and T are sets. De Morgan’s Laws state that

(𝑆 ∪ 𝑇)𝑐 = 𝑆 𝑐 ∩ 𝑇 𝑐

(𝑆 ∩ 𝑇 )𝑐 = 𝑆 𝑐 ∪ 𝑇 𝑐
Venn Diagrams: A Venn diagram is a way of depicting the relationship between sets. Each set is
shown as a circle and circles overlap if the sets intersect.

Some examples of various set theory concepts through Venn diagrams are given in the figure
below:

23
2.6 Probability Theory
Probability is a measurement of the chance of some specified outcome happening. Probability is
usually measured as a percentage (10%, 20% etc. up to 100%) or as a fraction between 0 and 1

Example: Consider a standard “die”.

The probability of getting an even number is ½ (0.5).

The probability of getting a number greater than 4 is 2/6 = 1/3

The probability of getting a number greater than 2 is 4/6 = 2/3

Independent and Dependent events:


Two events are independent if the outcome of one event does not affect the probability of the
other event.

Example of independent events: The results of multiple coin flips. Each coin flip is independent
of the previous coin flip. There is no influence or impact of the previous coin flip on the subsequent
flip. Hence such events are termed as independent.

Example of dependent events: The results of drawing various cards from the same deck are
dependent, if we assume that the cards are not added back to the deck after being drawn. This is
also known as “without replacement”.

When the first card is drawn from a deck, there is one less card in the deck.

Therefore, the outcome affects the probability of drawing other cards from the deck because it
reduces the possible outcomes.

Rule of Product: When calculating the probability that multiple independent events will all occur,
the probabilities are multiplied. This is the rule of product.

24
Example: What is the chance of rolling a sum of 9 followed by a sum of 10 on a pair of traditional
dice?

Solution: The probability of a sum of 9 is 4/36


• The probability of a sum of 10 is 3/36.
• The overall probability of a 9 followed by a 10 is 4/36 × 3/36.
• Note: Because two die rolls do not affect each other (they are independent), multiplication is
allowed.

The Rule of Sum: The probability that a composite event of multiple dependent events will occur
is a sum of the probabilities of each individual event, as long as the parts cannot occur at the same
time.

Example: What is the chance of rolling a 1, 2, 3, or 4 on a single traditional die?


• A 1, 2, 3 or 4 occurs on a single throw with chance 1/6, and each part is independent, so
the chance overall is the sum of all four probabilities:
• 1/6 + 1/6 + 1/6 + 1/6 = 4/6 = 2/3

Rule of thumb to know when to apply rule of sum or product:


• Rule of Product applies when an event AND another event occur, while
• Rule of Sum applies when an event OR another event occurs.

Conditional Probability
A conditional probability is a probability that is calculated with knowledge about the outcome of
some other event.
For instance, a question might be:
• We have a bag with 3 red marbles and 2 blue marbles
• We pick out one red marble without replacing it
• What is the probability that the next pick will also be a red marble?
Note: In this case, the odds that our second pick will be a red marble are dependent on what the
first pick was.

25
Whenever additional knowledge is given about the outcome of an experiment, it potentially affects
probability.

Conditional Probability – Formal Definition


Let A and B be two events.

Conditional probability is denoted as P(A∣B)

It is read as "the probability of A given B", and can be calculated as

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)

This calculation can be read as "the probability of A given B is the probability of both A and B (the
intersection of A and B) divided by the probability of B.”

Bayes Theorem
Bayes theorem is a way to relate conditional probabilities to each other. It follows directly from
the definition of conditional probability, with some algebraic rearrangement.

Let A and B be events, then 𝑃(𝐴|𝐵) and 𝑃(𝐵|𝐴) are related by the following equations

𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) =
𝑃(𝐵)

𝑃(𝐴|𝐵)𝑃(𝐵)
𝑃(𝐵 |𝐴) =
𝑃(𝐴)

2.7 Markov Chains


A Markov chain is a mathematical system that experiences transitions from one state to another
according to certain probabilistic rules.

26
Markov chains are widely used in economics, game theory, queueing (communication) theory,
genetics, and finance.
The figure below represents a two-state Markov process. A and B are called as states of the Markov
process. Each number represents the probability of the Markov process changing from one state to
another state, with the direction indicated by the arrow from one state to another.
For example, if the Markov process is in state A, then the probability that it changes to state B is
0.7, while the probability it remains in state A is 0.3. Similarly, the probability that the Markov
process remains in state B is 0.6, while the probability that it changes to state A is 0.4. It is
important to remember that the sum of all the probabilities that go out of a particular state
(including the probability that it remains in the same state) should always be equal to 1.

Markov Chains – Example


Everyday there are two possibilities in terms of the weather. It rains or it does not rain. If there is
no rain on a particular day, then there is a 20% possibility that it will rain the next day. If it does
rain on that particular day however, there is a 30% chance that it will rain the following day. Now
if there's a 50% probability that it rains today, what is the probability that it will it rain tomorrow?
Solution: We first convert this textual problem into a Markov process, which simplifies the
problem and the mathematics.

27
We get the following Markov process

Let’s now try to solve the problem:


Let’s assume 𝑝 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑖𝑡 𝑟𝑎𝑖𝑛𝑠
3 1
Then the total probability that it rains tomorrow is 𝑝. 10 + (1 − 𝑝). 5
1
It is given that 𝑝 = 2
1 3 1 1 1
Thus, probability that it will rain tomorrow = 2 . 10 + 2 . 5 = 4

Markov Chains – Transition Matrix


Another version of Markov chains is to encode the information in a matrix (also called a transition
matrix).
Let us consider the previous example with two states - rainy, and sunny. The associated transition
matrix associates each row and column with the state raining or not raining. The entry in each row
and column is the probability of transitioning from the state the row represents to the state the
column represents. With a 2 by 2 transitional matrix using states A (Sunny) and B (Rainy), the
matrix is filled as shown in the figure below, where each entry identifies the probability of a
particular state that transitions to another state.

28
End
A B
A
Start
B
State Transition Diagram and Transition probability Matrix
A 3 state Markov chain shown by a state transition diagram.
Consider a Markov chain with three possible states 1, 2, and 3 and the following transition
probability matrix
1 1 1
4 2 4
1 2
𝑃= 0
3 3
1 1
(2 0
2)
The corresponding state transition diagram is shown in the figure below

29
2.8 Graph Theory
Historical perspective of Graph Theory: Seven bridges of Königsberg (Euler)
The history of graph theory may be specifically traced to 1735, when the Swiss
mathematician Leonhard Euler solved the Königsberg bridge problem. The Königsberg bridge
problem was an old puzzle concerning the possibility of finding a path over every one of seven
bridges that span a forked river flowing past an island—but without crossing any bridge twice.
Euler argued that no such path exists. His proof involved only references to the physical
arrangement of the bridges, but essentially, he proved the first theorem in graph theory.

A graph is a set of vertices (that is, points or nodes) and edges (or lines) that connect the vertices.
The figure below shows various types of graphs.
A graph without loops and with only one edge between any two vertices is called a simple graph
(a).
A graph in which any two vertices are joined by more than one edge is called a multigraph (b).
When each vertex is connected by an edge to every other vertex, the graph is called a complete
graph (c).
A directed graph (d) is one in which the vertices are given a direction.
Graph theory is used in several disciplines including computer science, biology, chemistry and so
on.

30
Graph Theory – A Real Life Problem
Traveling salesman problem, an optimization problem in graph theory in which the nodes
(cities) of a graph are connected by directed edges (routes), where the weight of an edge indicates
the distance between two cities. The problem is to find a path that visits each city once, returns to
the starting city, and minimizes the distance traveled. The only known general
solution algorithm that guarantees the shortest path requires a solution time that grows
exponentially with the problem size (i.e., the number of cities). This is an example of an NP-
complete problem (from nonpolynomial), for which no known efficient (i.e., polynomial
time) algorithm exists.

31
Exercise
Problems and Questions
1. There are 10 people in a room who have come for lunch. Since there is place only for 3 on
a table, the waiter needs to randomly invite 3 out of these 10 to the table. What are the
different combinations that the waiter can choose from?
2. A combination lock has 4 sets of numbers, each going from 0 to 9. What is the number of
ways in which one can set a numerical lock for the same?
3. Out of a possible 7 consonants and 4 vowels one needs to form words having 3 consonants
and 2 vowels. What are the number of possible ways in which this can be accomplished?
4. We have two sets. Set A has 25 objects, set B has 30 objects. Their union has 45 objects.
How many objects will be there in the intersection of A and B?
5. There are a group of 50 friends who have come out for a party to a hotel. In the group, 20
like coca-cola, and 30 like tea or coffee. Every friend likes at least one of these cold or hot
drinks. Determine the number of friends who like the hot beverage.
6. Find out the probability of getting the sum of 9 when two dice are tossed simultaneously.
7. A person has three jobs that he does daily in 10 hours. He spends 70% of his time on his
day job. He spends 20% time on the second job and the remaining 10% time on the third
job. The probability that he takes a coffee in his first job is 1%, while it is 2% on his second
job and 5% on the third job since he keeps getting increasingly tired by the time he is on to
his 3rd job. What is the probability that he is working on the second job while taking coffee?

32
References
1. Discrete Maths – One Semester course on Sets, Logic, Proofs, Probability, Graph theory by
Dr. Trefor Bazett: https://bit.ly/3uSZFEt
2. 6.041 Probabilistic Systems Analysis and Applied Probability: https://bit.ly/3PxVQMZ
3. https://www.mathsisfun.com/combinatorics/combinations-permutations.html
4. https://www.electronics-tutorials.ws/boolean/demorgan.html
5. https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

33
MODULE 3 DIFFERENTIAL CALCULUS

Learning Objectives
 Understand and comprehend limits of a function of one independent variable
 Understand and evaluate continuity of functions of one independent variable
 Develop a good foundation on differentiability of functions of one independent variable
 Understand the meaning of the slope / tangent / rate of change of a function of one
independent variable
 Evaluate the derivatives of functions of one and multiple independent variables.
 Understand and calculate local and global minima and maxima of functions of one and
multiple independent variables.
 Apply principles of maxima and minima to optimization problems.

Unit Content Summary

This unit gives a detailed understanding of the principles of differential calculus and its
applications in data science. Starting from limits, continuity, and differentiability of a function of
one independent variable, we discuss extrema of functions and their applications to optimization
problems which are among the most common problems to be solved in data science.

3.1 Limit of a function

Let us consider the following two functions and their corresponding graphs

Example 1: 𝑓(𝑥) = 0.2𝑥 3 − 𝑥 2 + 2

34
Figure 5: Graph of function f(x)

+𝟐, 𝒙 ≥ 𝟎
Example 2: 𝑔(𝑥) = {
−𝟐, 𝒙 ≤ 𝟎

Figure 6: Graph of function g(x)

With reference to the Example 1 and Example 2, we now consider the concept of limit of a
function.

Example 1: Let us talk about approaching the limit from both sides in Figure 5
In Figure 5, as 𝒙 𝒂𝒑𝒑𝒓𝒐𝒂𝒄𝒉𝒆𝒔 𝟎 from either side (that is from the right and left sides of 0)

lim[𝑓(𝑥)] = 2
𝑥→0

As another example, as 𝒙 𝒂𝒑𝒑𝒓𝒐𝒂𝒄𝒉𝒆𝒔 𝟑. 𝟑𝟑 from either side,

35
lim [𝑓(𝑥)] = −1.7
𝑥→3.33

Example 2: Let us consider the discontinuous function in Figure 6. As 𝑥 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 0 from the
right side in Figure 6, 𝑔(𝑥) 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 2. This is called a one-sided limit.

But as 𝑥 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 0 from the left side in the figure, 𝑔(𝑥) 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 − 2. The limit does not
exist, therefore, since 𝑔(𝑥) does not approach a single number as 𝒙 𝒂𝒑𝒑𝒓𝒐𝒂𝒄𝒉𝒆𝒔 𝟎 from both
sides

Definition of a Limit: If the values of the function 𝑓(𝑥) of a function 𝑓 get closer to a single
number 𝐿 for all values of 𝑥, as 𝑥 gets closer to but does not equal 𝑎, then 𝐿 is defined as the limit
of 𝑓(𝑥) as 𝑥 approaches 𝑎, and is written as

𝐥𝐢𝐦 𝒇(𝒙) = 𝑳
𝒙→𝒂

3.2 Continuous and Discontinuous Functions


With reference to the graph of the function f(x) in Figure 5, we see that one can traverse from left
to right of the graph without any break. However, with reference to the graph of the function g(x)
in Figure 6, there is a sudden jump from -2 to +2 as one moves from the negative x-axis to the
positive x-axis. The function in Figure 5, is said to be continuous while the function in Figure 6 is
said to be discontinuous.

A continuous function is one whose graph can be drawn without lifting pencil from paper.
Formally, a function f is said to be continuous at x = a, if each of the following three conditions
are satisfied:

𝑓(𝑥) 𝑖𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑, 𝑡ℎ𝑎𝑡 𝑖𝑠, 𝑒𝑥𝑖𝑠𝑡𝑠, 𝑎𝑡 𝑥 = 𝑎

lim 𝑓(𝑥) exists


𝑥→𝑎

36
lim 𝑓(𝑥) = 𝑓(𝑎)
𝑥→𝑎

3.2.1 Slope of a curvilinear function:


The slope of a linear function is the same at every point on the line. The slope of a curvilinear
function, however, differs at different points on the curve. Geometrically, slope of a curvilinear
function at a given point is measured by the slope of a line drawn tangent to the function at that
point. To measure the slope of a curvilinear line at different points, separate tangent lines are
needed. In geometry, a tangent line to a circle or any conic section, is a straight line that touches
the circle only at one point. Figure 7 below provides a graphical view of the slope / tangent of a
curvilinear function as explained above.

12

10

8
A
∆𝑦A
6

∆𝑥𝐴
4

0
0 20 40 60 80 100 120

Figure 7: Slope / Tangent of a curvilinear function

3.2.2 Differentiability and Continuity


A function is said to be differentiable at a point if the derivative of the function exists at that point.
For a function to be differentiable at a specific point the following two conditions need to be
satisfied

37
a. The function needs to be continuous at the identified point
b. It must have a tangent which is unique at that point.

3.3 Derivative of a function of one independent variable


Let us consider a function 𝑦 = 𝑓(𝑥). The derivative of the function 𝑓 𝑎𝑡 𝑥 which is written as
𝑑𝑦
𝑓 ′ 𝑜𝑟 𝑑𝑥 is defined as

[𝑓(𝑥1 + ∆𝑥) − 𝑓(𝑥1 )]


𝑓 ′ (𝑥) = lim 𝑖𝑓 𝑡ℎ𝑒 𝑙𝑖𝑚𝑖𝑡 𝑒𝑥𝑖𝑠𝑡𝑠
∆𝑥→0 ∆𝑥

The derivative of a function measures the slope and the instantaneous rate of change of the original
function 𝑓(𝑥) at a given point. The process of finding the derivative of a function is called as
differentiation

The derivative of a function can be written in several different ways, with many symbols which
also includes some of the Greek alphabets.

Thus, if 𝑦 = 𝑓(𝑥), the derivative can be written as

𝑑𝑦 ′ 𝑑𝑓 𝑑
𝑓 ′ (𝑥), , 𝑦 , , [𝑓(𝑥)]
𝑑𝑥 𝑑𝑥 𝑑𝑥

If 𝑦 = 𝜙(𝑡), then we can write the derivative as

𝑑𝑦 ′ 𝑑𝜙 𝑑
𝜙 ′ (𝑡), ,𝑦 , , [𝜙(𝑡)]
𝑑𝑥 𝑑𝑥 𝑑𝑥

If the derivative of 𝑦 = 𝑓(𝑥) is evaluated at 𝑥 = 𝑎, proper notation includes

𝑑𝑦
𝑓 ′ (𝑎) 𝑎𝑛𝑑
𝑑𝑥

38
3.3.1 Rules of differentiation:
By differentiation, one means finding the derivative of a given function 𝑦 = 𝑓(𝑥) . There are
several rules to find the derivative of functions. In listing the rules of differentiation for the function
𝑓(𝑥) it should be noted that the function could be some combination of some other functions which
we can generalize as 𝑔(𝑥)𝑎𝑛𝑑 ℎ(𝑥). These are unspecified functions of the independent variable
𝑥.

Some of the basic ones are listed below.

a. The constant function rule:


The derivative of a constant function 𝑓(𝑥) = 𝑐 𝑤ℎ𝑒𝑟𝑒 𝑐 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 is zero.
Mathematically we write this as follows:

Given 𝒇(𝒙) = 𝒄, 𝒇′ (𝒙) = 𝟎

b. The Linear Function rule:


The derivative of a linear function 𝑓(𝑥) = 𝑚𝑥 + 𝑐 is equal to the coefficient of 𝑥, which
is 𝑚 in the linear function 𝑓(𝑥) given above.

In general, the derivative of a variable 𝑥 raised to the first power is always equal to the
coefficient of the variable.

Mathematically we write this as follows:

Given 𝒇(𝒙) = 𝒎𝒙 + 𝒄, 𝒇′ (𝒙) = 𝒎

c. The Power Function rule:


When the variable 𝑥 is raised to a power 𝑛 where n is any integer either positive or negative,
then the derivative is calculated as follows:

Given 𝒇(𝒙) = 𝒙𝒏 , 𝒇′ (𝒙) = 𝒏𝒙𝒏−𝟏

39
Some examples of the fundamental rules of differentiation
a. Given 𝑓(𝑥) = 4, 𝑓 ′ (𝑥) = 0
b. Given 𝑓(𝑥) = 5𝑥 + 9, 𝑓 ′ (𝑥) = 5
c. Given 𝑓(𝑥) = 𝑥10 , 𝑓 ′ (𝑥) = 10𝑥 (10−1) = 10𝑥 9

3.3.2 Rules of differentiation for more complex functions

The following rules of differentiation apply to a function 𝑓(𝑥) which is a combination of two
functions 𝑔(𝑥)𝑎𝑛𝑑 ℎ(𝑥).

d. Sum and difference rule:


For 𝑓(𝑥) = 𝑔(𝑥) ± ℎ(𝑥), 𝑓 ′ (𝑥) = 𝑔′ (𝑥) ± ℎ′ (𝑥)

e. Product rule:
For 𝑓(𝑥) = 𝑔(𝑥) × ℎ(𝑥), 𝑓 ′ (𝑥) = 𝑔(𝑥). ℎ′ (𝑥) + 𝑔′ (𝑥). ℎ(𝑥)

f. Quotient rule:
[ℎ(𝑥).𝑔′ (𝑥)−𝑔(𝑥).ℎ′ (𝑥)]
Given 𝑓(𝑥) = 𝑔(𝑥) ÷ ℎ(𝑥), 𝑓 ′ (𝑥) = [ℎ(𝑥)2 ]

g. The generalized power function rule


Given 𝑓(𝑥) = [𝑔(𝑥)]𝑛 , 𝑓 ′ (𝑥) = 𝑛[𝑔(𝑥)]𝑛−1 . 𝑔′ (𝑥)

h. The chain rule:


For 𝑓(𝑥) = 𝑔[ℎ(𝑥)], 𝑓 ′ (𝑥) = 𝑔′ [ℎ(𝑥)]. ℎ′ (𝑥)

Examples for Rules of Differentiation for Complex Functions

The constant times a function rule


Given 𝑓(𝑥) = 6𝑥 4 , 𝑓 ′ (𝑥) = 𝑘𝑔′ (𝑥) = 6 × 4𝑥 3 = 24𝑥 3

40
Sum and difference rule:
Given 𝑓(𝑥) = 8𝑥 4 + 5𝑥 3 , 𝑓 ′ (𝑥) = 32𝑥 3 + 15𝑥 2

Given 𝑓(𝑥) = 3𝑥 5 − 2𝑥 4 − 3, 𝑓 ′ (𝑥) = 15𝑥 4 − 8𝑥 3

Product rule:
Given 𝑓(𝑥) = 8𝑥 2 (4𝑥 − 3), 𝑓 ′ (𝑥) = 8𝑥 2 (4) + 16𝑥(4𝑥 − 3)
= 32𝑥 2 + (64𝑥 2 − 48𝑥)
= 86𝑥 2 − 48𝑥

Quotient rule:
[ℎ(𝑥).𝑔′ (𝑥)−𝑔(𝑥).ℎ′ (𝑥)]
Given 𝑓(𝑥) = 𝑔(𝑥) ÷ ℎ(𝑥), 𝑓 ′ (𝑥) = [ℎ(𝑥)2 ]

2𝑥 4 [(5𝑥−6).8𝑥 3 −2𝑥 4 . (5)]


So, for 𝑓(𝑥) = 5𝑥−6, 𝑓 ′ (𝑥) = [(5𝑥−6)2 ]

Leave it to you to further simplify this.

The generalized power function rule


Given 𝑓(𝑥) = [𝑔(𝑥)]𝑛 , say 𝑓(𝑥) = (𝑥 2 + 7)4 , 𝑤ℎ𝑖𝑐ℎ 𝑚𝑒𝑎𝑛𝑠 𝑔(𝑥) = (𝑥 2 + 7)

𝑓 ′ (𝑥) = 𝑛[𝑔(𝑥)]𝑛−1 . 𝑔′ (𝑥),


so for the example, we get 4(𝑥 2 + 7)3 . (2𝑥) = 8𝑥. (𝑥 2 + 7)3

The chain rule:


𝑓(𝑥) = 𝑔[ℎ(𝑥)], 𝑓 ′ (𝑥) = 𝑔′ [ℎ(𝑥)]. ℎ′ (𝑥)
Let f(x) = (8𝑥 + 9)1/2 ,
In this equation, let 𝑔(𝑥) = 𝑥1/2 𝑎𝑛𝑑 ℎ(𝑥) = (8𝑥 + 9)
1 −1/2
𝑔′ (𝑥) = 𝑥 𝑎𝑛𝑑 ℎ′ (𝑥) = 8
2

41
Then, using the expression 𝑓 ′ (𝑥) = 𝑔′ [ℎ(𝑥)]. ℎ′ (𝑥), we get
1
𝑔′ (ℎ(𝑥)) = (2)(8𝑥 + 9)−1/2

And substituting the appropriate values into the chain rule formula, we get
1 4
𝑓 ′ (𝑥) = = (2)(8𝑥 + 9)−1/2 . 8 =
√(8𝑥+9)

3.3.3 Higher-order derivatives


The calculation of higher-order derivatives of a function 𝑓(𝑥) follows the same process as the one
for calculating the first derivative. The second-order derivative, written as 𝑓 ′′ (𝑥) essentially
measures the slope of the function 𝑓 ′ (𝑥) that is the first-order derivative of 𝑓(𝑥) which is called
as the original or primitive function.

3.3.4 Increasing and decreasing Functions


A function 𝑓(𝑥) is said to be increasing at 𝑥 = 𝑎 if the following condition holds. In the area near
the point [𝑎, 𝑓(𝑎)] the graph of the function increases as it moves from left to right.

The function 𝑓(𝑥) is said to be decreasing at 𝑥 = 𝑎 if the following condition holds. In the area
near the point [𝑎, 𝑓(𝑎)] the graph of the function decreases as it moves from left to right.

It is important to realize that in the absence of a graph, the derivatives of a function can be used to
identify important features of the function including the concepts of an increasing and decreasing
functions.

Earlier we understood that the first derivative of a function measures the following two concepts
both of which are equivalent: Rate of change of the function which is also the slope of the function.

Therefore, if the first derivative is positive at 𝑥 = 𝑎, it means that (see Figure 8 below) the
following points hold good:
• 𝑓 ′ (𝑎) > 0
• Its rate of change is positive and

42
• Its slope is positive, that is increasing

160

140

120

100

80

60

40

20

0
-15 -10 -5 0 5 10 15
𝑥 = 𝑎1 𝑥 = 𝑎2

Concave upwards
(Convex) Concave upwards
(Convex)

Figure 8: Increasing Function

If the first derivative is negative at 𝑥 = 𝑎, it means (see Figure 9 below) the following points hold
good:
• 𝑓 ′ (𝑎) < 0
• Its rate of change is negative and
• Its slope is negative that is deceasing

43
20

0
-15 -10 -5 0 5 10 15
-20

-40

-60

-80

-100

-120

-140

-160

𝑥 = 𝑎2
𝑥 = 𝑎1

Concave downwards

Figure 9: Decreasing Function

Concavity for Increasing and decreasing Functions


For increasing and deceasing functions, there is an additional concept of concavity.
 A function is said to be concave upwards (or convex) at the point 𝑥 = 𝑎 if the area close
to the point [𝑎, 𝑓(𝑎)], the graph of the function lies completely above its tangent line.
 A function is said to be concave downwards at 𝑥 = 𝑎 if the area close to the point [𝑎, 𝑓(𝑎)],
the graph of the function lies completely below its tangent line.

44
When we do not have a graph to guide us, the second derivative of a function provides useful
information about concavity
If 𝑓 ′′ (𝑎) > 0, 𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑛𝑐𝑎𝑣𝑒 𝑢𝑝𝑤𝑎𝑟𝑑 𝑎𝑡 𝑥 = 𝑎
If 𝑓 ′′ (𝑎) < 0, 𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑛𝑐𝑎𝑣𝑒 𝑑𝑜𝑤𝑛𝑤𝑎𝑟𝑑 𝑎𝑡 𝑥 = 𝑎

The Table 2 below explains the conditions to decide on increasing/decreasing functions along with
their concavity:

Table 2: Increasing/Decreasing and Concavity of functions

3.3.5 Extremum (Minimum or Maximum) of a function


An extremum point of a function is a point where the value of the function is either a relative
maximum or a relative minimum. In the usual case, it is not possible to draw a graph of a function
to determine whether the function has any relative maxima or minima points. In such cases, one
takes recourse to a set of mathematical steps to determine points of extrema

The mathematical steps are the following:

Step 1: For a function 𝑓(𝑥) to be at a relative maximum or minimum at a point 𝑥 = 𝑎, its first
derivative must equal zero at a. This point is called as the critical point.

45
Step 2: To specifically determine if at point a, the function is a maximum or a minimum, we
calculate the 2nd derivative of the function at point a.

If 𝑓 ′′ (𝑎) > 0, the function is at a relative minimum


If 𝑓 ′′ (𝑎) < 0, the function is at a relative maximum
If 𝑓 ′′ (𝑎) = 0, one cannot determine whether the function is at a maximum or a minimum

Figure 10 and Figure 11 below give a graphical perspective of a function at relative minimum and
relative maximum.

160

140

120

100

80

60

40

20

0
-15 -10 -5 0 5 10 15

Relative minimum
at x = a

Figure 10: Relative minimum of a function

46
Relative maximum
at x = a

20

0
-15 -10 -5 0 5 10 15
-20

-40

-60

-80

-100

-120

-140

-160

Figure 11: Relative maximum of a function

3.3.6 Inflection Point of a function


Inflection point is a very interesting feature of a function. This is the point at which the function
changes from a concave upward to a concave downwards or the opposite happens (that is the
function changes from a concave downwards to a concave upward). It should be noted that an
inflection point can be determined mathematically when the 2nd derivative of a function is zero.

This is shown in Figure 12 below. In 3 dimensions, one talks about an inflection point as well as
a saddle point, where in one of the dimensions the function gives a maximum, while in the other
dimension the function results in a minimum.

47
Figure 12: Graphical representation of an Inflection point

An example of an inflection point at x = 0, for the function 𝑦 = 𝑥 3 is shown in Figure 13 below.

𝑦 = 𝑥3

Figure 13: An example of an inflection point at x = 0, for the function 𝒚 = 𝒙𝟑

3.3.7 Optimization using differentiation


Optimization problems are amongst the most important and common across various industries,
including Retail, Manufacturing, Healthcare and so on. The most common problems involve

48
optimizing the supply chain routes for a variety of products, ensuring maximum supplies with
minimum travel time and distance. There are numerous other problems which need optimization
in terms of multiple variables and optimization for cost, quality, or time or a combination of two
or more of these factors. Some examples may be
• maximize profit subject to a capacity limitation of the current production plant
• minimize cost subject to meeting a minimal quota of output

The function which is to be optimized is usually referred to as the objective function. The function
setting the limitation is called the constraint. By expressing one variable in the constraint in terms
of the other and substituting the new expression back into the objective function, one can optimize
a function subject to a constraint using ordinary differentiation.

From a differential calculus perspective, an optimization problem involves finding the relative
maximum or minimum of a function of one or usually multiple independent variables and
constraints. The process of finding the extrema is well known as we have explained in an earlier
section involving the calculation of the first and second derivatives of the function.

Let us consider a few examples to understand the concept of constrained optimization.

Example 1: Optimize 𝑦 = 3𝑥 3 − 36𝑥 2 + 135𝑥 − 17


Solution:
We simply need to test the first and the second derivative conditions, as mentioned.

a. Take the first derivative, set it equal to zero and solve for 𝑥0 to find the critical values
• 𝑓 ′ (𝑥) = 9𝑥 2 − 72𝑥 + 135
• = 9(𝑥 − 3)(𝑥 − 5) = 0
• So critical values are 𝑥 = 3 𝑎𝑛𝑑 𝑥 = 5
b. Now take the second derivative
• 𝑓 ′′ (𝑥) = 18𝑥 − 72
• Evaluate it at the critical values that is at 𝑥 = 3 𝑎𝑛𝑑 𝑥 = 5

49
• 𝑓 ′′ (𝑥 = 3) = −18 < 0, which means the function at this point is concave down,
and is a relative maximum
• 𝑓 ′′ (𝑥 = 5) = 18 > 0, which means the function at this point is concave up, and is
a relative minimum

Example 2: Optimize (maximize) profits 𝜋 for a firm, given the total revenue 𝑅 = 6400𝑄 − 20𝑄 2
and the total cost 𝐶 = 𝑄 3 − 5𝑄 2 + 400𝑄 + 52000, assuming 𝑄 > 0, find
a. The profit function 𝜋 = 𝑅 − 𝐶
b. Find the critical values
c. Take the 2nd derivative and evaluate it at the critical points.

Solution:
a. The profit function 𝜋 = 𝑅 − 𝐶
• 𝜋 = 6400𝑄 − 20𝑄 2 − (𝑄 3 − 5𝑄 2 + 400𝑄 + 52000)
b. Find the critical values: Take the first derivative, set it to zero and solve for 𝑄0 to find the
critical values
• 𝜋 ′ = − 3𝑄 2 − 30𝑄 + 6000 = − 3(𝑄 2 + 10𝑄 − 2000) = 0

− 3(𝑄 + 50)(𝑄 − 40) = 0

So 𝑄0 = −50 𝑜𝑟𝑄0 = 40 are the critical values

c. Take the 2nd derivative


• evaluate it at the positive critical point,
• ignore the negative critical value which has no economic significance
(mathematically it is the relative minimum, which in this problem is not relevant)
• Check the sign for concavity to be sure of a relative minimum
• 𝜋 ′′ = − 6𝑄 − 30
• So 𝜋 ′′ (40) = − 6 × 40 − 30 = −270 < 0. So the curve is concave down, relative
maximum

50
• Thus, profit is maximized at 𝑄 = 40, where 𝜋(40) = −(40)3 − 15 (40)2 +
6000(40) − 52000 = 100,000

3.3.8 Derivatives of Natural Exponential and Logarithmic functions

Given 𝑓(𝑥) = 𝑒 𝑔(𝑥) 𝑤ℎ𝑒𝑟𝑒 𝑔(𝑥) 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑏𝑙𝑒

𝑓 ′ (𝑥) = 𝑒 𝑔(𝑥) . 𝑔′ (𝑥)

Given 𝑓(𝑥) = ln[𝑔(𝑥)] , 𝑤ℎ𝑒𝑟𝑒 𝑔(𝑥) 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑎𝑛𝑑 𝑑𝑖𝑓𝑓𝑒𝑟𝑛𝑡𝑖𝑎𝑏𝑙𝑒

1
𝑓 ′ (𝑥) = 𝑔(𝑥). 𝑔′ (𝑥);

𝑔′ (𝑥)
𝑇ℎ𝑎𝑡 𝑖𝑠 𝑓 ′ (𝑥) = 𝑔(𝑥)

Examples of the derivatives of natural exponential functions


Given 𝑓(𝑥) = 𝑒 𝑔(𝑥) 𝑤ℎ𝑒𝑟𝑒 𝑔(𝑥) 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑏𝑙𝑒
𝑓 ′ (𝑥) = 𝑒 𝑔(𝑥) . 𝑔′ (𝑥)

Example 1: 𝑓(𝑥) = 𝑒 𝑥 , 𝑡ℎ𝑒𝑛 𝑔′ (𝑥) = 1, therefore


𝑓 ′ (𝑥) = 𝑒 𝑥 . The derivative of 𝑒 𝑥 is the original function itself.
Example 2: 𝑓(𝑥) = 𝑒 5𝑥−2 , 𝑔′ (𝑥) = 5
𝑓 ′ (𝑥) = 5𝑒 5𝑥−2
5
Example 3: 𝑓(𝑥) = 𝑒 4𝑥 , 𝑔′ (𝑥) = 20𝑥 4

5
So 𝑓 ′ (𝑥) = 20𝑥 4 . 𝑒 4𝑥

Examples of the derivatives of natural logarithmic functions


Given 𝑓(𝑥) = ln[𝑔(𝑥)] , 𝑤ℎ𝑒𝑟𝑒 𝑔(𝑥) 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑎𝑛𝑑 𝑑𝑖𝑓𝑓𝑒𝑟𝑛𝑡𝑖𝑎𝑏𝑙𝑒

51
𝑔′ (𝑥)
𝑓 ′ (𝑥) = 𝑔(𝑥)

Example 1: 𝑓(𝑥) = ln 𝑥 , 𝑠𝑜 𝑔(𝑥) = 𝑥, and 𝑔′ (𝑥) = 1

1
So 𝑓 ′ (𝑥) = 𝑥

Example 2: 𝑓(𝑥) = ln(7𝑥 2 − 15), 𝑡ℎ𝑒𝑛 𝑔(𝑥) = 7𝑥 2 − 15 𝑎𝑛𝑑 𝑔′ (𝑥) = 14𝑥.


14𝑥
So 𝑓 ′ (𝑥) = 7𝑥 2 −15

Exponential and logarithmic functions are embedded in many practical applications across various
areas of science, engineering, economics, finance, social sciences and many industry verticals.
Both the exponent and ln can be used interchangeably to solve real life problems
Some examples are:
• decay of radioactive materials
• usage of the exponent in electromagnetic theory
• Percentage rate of change in literacy levels of a developing country
• Usage in machine learning and deep learning problems

3.4 Functions of two independent variables


For most of the problems in real life, functions usually depend on multiple independent variables.
We can define 𝑧 = 𝑓(𝑥, 𝑦) as a function of two independent variables 𝑥 𝑎𝑛𝑑 𝑦 and if 𝑧 has only
one value for a pair of real values of (𝑥, 𝑦) (also called as an ordered pair in the domain of the
function 𝑓

In case of a function 𝑦 = 𝑓(𝑥) that depends only on one independent variable x, the domain of
the function f(x) is defined to be the set of all x values for which the function is defined, and
the range of the function is the set of all 𝑦 values that 𝑓 takes. (Please refer to
https://www.intmath.com/functions-and-graphs/2a-domain-and-range.php for more information
on domain and range of a function).

52
By convention, 𝑧 is called the dependent variable, and 𝑥 𝑎𝑛𝑑 𝑦 are the independent variables.

Examples of some multivariable functions are


𝑧 = 𝑥 2 + 𝑦 2 + 2𝑥𝑦
𝑧 = 3𝑥𝑦 + 𝑥 3 + ln(𝑦 − 2)

3.4.1 1st order Partial Derivatives of a function of two independent variables


Let us consider a function 𝑧 = 𝑓(𝑥, 𝑦) which depends on two independent variables x and y. Then
the 1st order partial derivative gives the measures the rate of change of the dependent variable (𝑧)
with respect to one of the independent variables (𝑥 𝑜𝑟 𝑦) while the other independent variable
(𝑦 𝑜𝑟 𝑥) is kept constant. Therefore, a partial derivative exists for each independent variable.

The partial derivative of 𝑧 with respect to 𝑥 measures the rate of change of 𝑧 with respect to 𝑥
while 𝑦 is held constant.

𝜕𝑧 𝜕𝑓
It is written as 𝜕𝑥 or 𝜕𝑥 , 𝑓𝑥 (𝑥, 𝑦) or 𝑧𝑥 and is defined as

𝜕𝑧 (𝑓(𝑥 + ∆𝑥, 𝑦) − 𝑓(𝑥, 𝑦))


= lim
𝜕𝑥 ∆𝑥→0 ∆𝑥

Similarly, the partial derivative of 𝑧 with respect to 𝑦 measures the rate of change of z with
respect to y while x is kept constant

𝜕𝑧 (𝑓(𝑥, 𝑦 + ∆𝑦, ) − 𝑓(𝑥, 𝑦))


= lim
𝜕𝑦 ∆𝑦→0 ∆𝑦

Example - Partial Derivatives


The partial derivatives of a function 𝑧 = 5𝑥 3 𝑦 4 are found as follows:
When differentiating with respect to x, treat the y term as a constant by joining it with the constant
term, in this case 5. So, we can rewrite the function as 𝑧 = (5𝑦 4 )𝑥 3. Thus

53
𝜕𝑧 𝜕(𝑥 3 )
• 𝑓𝑥 = 𝜕𝑥 = (5𝑦 4 ) = (5𝑦 4 ). 3𝑥 2 = 15𝑥 2 𝑦 4
𝜕𝑥

Similarly, when differentiating with respect to y, treat the x term as a constant by joining it with
the constant term. So, we can rewrite the function as 𝑧 = (5𝑥 3 )𝑦 4. Thus

𝜕𝑧 𝜕(𝑦 4 )
• 𝑓𝑦 = 𝜕𝑦 = (5𝑥 3 ) = (5𝑥 3 ). 4𝑦 3 = 20𝑥 3 𝑦 3
𝜕𝑦

Rules of Partial Differentiation


Following are some of the important rules of partial derivatives:

Product Rule
• Given 𝑓(𝑥, 𝑦) = 𝑧 = 𝑔(𝑥, 𝑦). ℎ(𝑥, 𝑦)
𝜕𝑧 𝜕ℎ 𝜕𝑔
𝑓𝑥 = = 𝑔(𝑥, 𝑦) + ℎ(𝑥, 𝑦)
𝜕𝑥 𝜕𝑥 𝜕𝑥
𝜕𝑧 𝜕ℎ 𝜕𝑔
𝑓𝑦 = = 𝑔(𝑥, 𝑦) + ℎ(𝑥, 𝑦)
𝜕𝑦 𝜕𝑦 𝜕𝑦

Quotient Rule
• Given 𝑓(𝑥, 𝑦) = 𝑧 = 𝑔(𝑥, 𝑦)/ℎ(𝑥, 𝑦)
𝜕𝑔 𝜕ℎ
𝜕𝑧 (ℎ(𝑥, 𝑦) 𝜕𝑥 − 𝑔(𝑥, 𝑦) 𝜕𝑥 )
=
𝜕𝑥 [ℎ(𝑥, 𝑦)]2
𝜕𝑔 𝜕ℎ
𝜕𝑧 (ℎ(𝑥, 𝑦) 𝜕𝑦 − 𝑔(𝑥, 𝑦) 𝜕𝑦)
=
𝜕𝑦 [ℎ(𝑥, 𝑦)]2

3.4.2 Second-order Partial Derivatives


The 2nd order partial derivative implies that the function has been differentiated partially with
respect to one of the independent variables twice while the second independent variable has been
held constant. So the expressions for the 2nd order partial derivatives with respect to either x or y
are given as follows:

54
𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑥𝑥 = ( ) = 𝜕𝑥 2
𝜕𝑥 𝜕𝑥
𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑦𝑦 = 𝜕𝑦 (𝜕𝑦) = 𝜕𝑦 2

Cross partial derivatives:


When calculating the cross-partial derivatives, the first derivative will be with respect to one of
the independent variables. The second derivative will then be the differentiation of the first
derivative with respect to the 2nd independent variable. Thus we have two 2nd order cross-partial
𝜕 𝜕𝑧 𝜕 𝜕𝑧
derivatives namely ( ) and ( )
𝜕𝑦 𝜕𝑥 𝜕𝑥 𝜕𝑦

𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑥𝑦 = (𝑓𝑥 )𝑦 = 𝜕𝑦 (𝜕𝑥) = 𝜕𝑦𝜕𝑥
𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑦𝑥 = (𝑓𝑦 )𝑥 = 𝜕𝑥 (𝜕𝑦) = 𝜕𝑥𝜕𝑦

Note:
• A cross-partial derivative measures the rate of change of a 1st order partial derivative with
respect to the other independent variable.
• By Young’s theorem, if both the cross partial derivatives are continuous, they will be identical

3.4.3 Optimization of Multivariable functions


For a multivariable function 𝑧 = 𝑓(𝑥, 𝑦) to identify whether it is at a relative minimum or
maximum, three conditions need to be met, namely:

1. The 1st-order partial derivatives must be equal to zero simultaneously


• That is 𝑓𝑥 (𝑎, 𝑏) = 𝑓𝑦 (𝑎, 𝑏) = 0
• The interpretation of this is that at the given point (𝑎, 𝑏), called the critical point,
the function is neither increasing nor decreasing.

2. The 2nd-order direct partial derivatives, when evaluated at the critical point (𝑎, 𝑏),
• must both be positive to identify a minimum.

55
• That is: [𝑓𝑥𝑥 (𝑎, 𝑏), 𝑓𝑦𝑦 (𝑎, 𝑏)] > 0
• must both be negative to identify a maximum.
• That is: [𝑓𝑥𝑥 (𝑎, 𝑏), 𝑓𝑦𝑦 (𝑎, 𝑏)] < 0

3. The product of the 2nd-order direct partial derivatives when calculated at the critical point
must exceed the product of the cross-partials calculated at the same critical point.
Since 𝑓𝑥𝑦 = 𝑓𝑦𝑥 ,
• we have [𝑓𝑥𝑥 (𝑎, 𝑏). 𝑓𝑦𝑦 (𝑎, 𝑏)] > [𝑓𝑥𝑦 (𝑎, 𝑏)]2
2
• or [𝑓𝑥𝑥 (𝑎, 𝑏). 𝑓𝑦𝑦 (𝑎, 𝑏)] − [𝑓𝑥𝑦 (𝑎, 𝑏)] > 0

NOTE: If [𝑓𝑥𝑥 . 𝑓𝑦𝑦 ] < [𝑓𝑥𝑦 ]2


• When the same sign, the function is at an inflection point both 𝑓𝑥𝑥 and 𝑓𝑦𝑦 have
• When 𝑓𝑥𝑥 and 𝑓𝑦𝑦 have opposite signs, the function is at a saddle point
• If [𝑓𝑥𝑥 . 𝑓𝑦𝑦 ] = [𝑓𝑥𝑦 ]2 , one cannot decide whether the function is at an extrema,
inflection, or saddle point.

Problems and Questions


Using the rules of limits that have been specified in the main text, find the limits for the following
functions:
1. lim[𝑥 2 (𝑥 − 5)]
𝑥→7

4𝑥 2 −9𝑥
2. lim
𝑥→3 𝑥+7

1
3. lim( 5𝑥 2 − 13)3
𝑥→3

Based on the rules that help in evaluating if a function is continuous or discontinuous at a particular
point, determine points of discontinuity, if any, for the following functions
4. 𝑓(𝑥) = 7𝑥 2 − 4𝑥 + 23

56
5. 𝑔(𝑥) = (5𝑥 5 − 8𝑥 3 + 26𝑥)/125

𝑥−7
6. ℎ(𝑥) = 𝑥

7. Define continuity and differentiability of a function of one independent variable.

By assuming 𝑢(𝑥) 𝑎𝑛𝑑 𝑣(𝑥) as functions of the independent variable 𝑥, write down the rules of
𝑑𝑦
differentiation for the following in terms of 𝑑𝑥

8. The sum rule.

9. The product rule.

10. The quotient rule.

11. The generalized power rule.

Find the derivative of the following functions


12. 𝑓(𝑥) = 7𝑥 − 12

13. 𝑦 = 25 − 6𝑥

14. 𝑦 = 9𝑥 4

15. 𝑦 = 𝑥 3 (5𝑥 + 4)

16. 𝑦 = (24𝑥 6 + 18𝑥 5 )/3𝑥 2

17. 𝑦 = (2𝑥 3 + 7)5

18. 𝑦 = (𝑥 2 − 7𝑥 + 4)3

57
1
19. 𝑦 = 4𝑥 3 +8𝑥+5

20. 𝑦 = √16 − 𝑥 2

58
References
1. Calculus for Business, Economics, And the Social Sciences, Schaum’s Outlines, Edward T.
Dowling, McGraw-Hill 1990.
2. Some graphs are drawn courtesy GeoGebra https://www.geogebra.org
3. https://www.intmath.com/functions-and-graphs/2a-domain-and-range.php

59
MODULE 4 LINEAR ALGEBRA

Learning Objectives
 Understand about vectors and vector algebra.
 Understand about matrices, various types of matrices and matrix algebra.
 Understand vector spaces and its features including linear independence, basis functions,
orthogonality, orthogonal bases
 Understand determinants, their properties, and operations.
 Solve a system of simultaneous equations using Cramer’s rule.
 Solve a system of simultaneous linear equations 𝐴𝑥 = 𝑏 using Gaussian Elimination and
using matrix formulation (that is using Echelon matrices and Row Canonical forms).
 Obtain Inverse of a matrix using Echelon matrices and Row Canonical forms.
 Solve eigenvalue equations of the form 𝐴𝑥 = 𝜆𝑥.
 Understand about Singular Value Decomposition
 Obtain the singular values and eigenvectors of non-square matrices
 Obtain an overview of tensors

Unit Content Summary

Linear Algebra has some of the most important mathematical concepts that are used in data
science. It uses concepts of vectors, matrices, tensors and in general vector spaces. At the most

60
general level, there are 5 distinct types of problems that Linear Algebra attempts to solve. These
are the following

1. 𝐴𝑥 = 𝑏
2. 𝐴𝑥 = 𝜆𝑥
3. 𝐴𝑣 = 𝜎𝑢
4. 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ||𝐴𝑥||2 /||𝑥||2
5. 𝐹𝑎𝑐𝑡𝑜𝑟𝑖𝑧𝑒 𝑡ℎ𝑒 𝑚𝑎𝑡𝑟𝑖𝑥 𝐴

Out of these 5 problems, we will be considering the first three problems. The first problem of 𝐴𝑥 =
𝑏, deals with a system of simultaneous linear equations. The second problem 𝐴𝑥 = 𝜆𝑥, is also
referred to as the eigenvalue problem. The third problem 𝐴𝑣 = 𝜆𝑢 is also referred to as the Singular
Value Decomposition problem or SVD in short. To solve these problems, one needs to understand
the concepts of vectors, matrices, and their operations. Vectors and matrices are elements of
something called as a Vector Space.

4.1 Vectors and Vector Algebra


The set of real numbers as we have seen in the first chapter, is denoted by the symbol ℝ. It is
usually called as the field of real numbers. A vector can be defined in multiple ways. The usual
understanding of the vector from physics is an object or entity which has a magnitude and a
direction. Such entities or elements exist in a 3-dimensional space, represented by a symbol 𝑟⃑ with
a magnitude r and the direction given by the arrow on top of the magnitude. As a reference we
consider the 3-dimensional cartesian coordinates for defining the magnitude and direction of the
vector. The magnitude is called a scalar, which is a real number. The value of this scalar is
calculated as a length (or norm) of the vector in the 3-dimensional space, with components of the
vector along the 3 directions, namely, x, y and z. This length is also called as the Euclidean
distance. The mathematical formula for the Euclidean distance for a vector is

𝑟 = |𝑟⃑| = √𝑥 2 + 𝑦 2 + 𝑧 2

61
We say that this vector exists in ℝ3 as a 3-tuple (x, y, z). The most general definition of a vector
is that it is a set of n-tuples. As an example, let’s consider the weights of 6 students, in kilograms:
55, 62, 44, 50, 57, 41. We can assign a symbol 𝑥1 to the first element 55, 𝑥2 to the second element
62 and so on. Each of these 𝑥𝑖 is termed as a tuple. We combine these tuples into a linear array or
as a set of numbers, and write the same as

𝑥 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 )

Such an array is called as a vector. In the above case, we say that this is a vector in ℝ6 since we
have 6 elements, and each element is a real number. A vector in ℝ𝑛 will contain n elements.
Row vector and column vector
The vector 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 … 𝑥𝑛−2 , 𝑥𝑛−1 , 𝑥𝑛 ) is called as a row vector since all elements are
arranged in one row. An alternate way to write a vector is in a column form, called as a column
vector. 𝑥 𝑇 is called as the transpose of the vector 𝑥 and is written in the column form as follows:

𝑥1
𝑥2
𝑇
𝑥 = 𝑥3
𝑥𝑛−1
( 𝑥𝑛 )
4.1.1 Vector Algebra
Now that we have defined a vector either as a set arranged in a row or as a column, let us define
its algebra or the various allowed operations between two or more vectors. Consider two vectors
𝑢 = (𝑢1 , 𝑢2 , 𝑢3 … 𝑢𝑛−1 , 𝑢𝑛 ) and 𝑣 = (𝑣1 , 𝑣2 , 𝑣3 … 𝑣𝑛−1 , 𝑣𝑛 ) in ℝ𝑛 . Then the following properties
hold good:

Sum of vectors: 𝑢 + 𝑣 = (𝑢1 + 𝑣1 , 𝑢2 + 𝑣2 , … 𝑢𝑛 + 𝑣𝑛 ).


It is to be noted that in the above case, 𝑢𝑖 and 𝑣𝑖 are scalar quantities.

Scalar product of a vector with a scalar: Consider 𝑢 is a vector in ℝ𝑛 and 𝑐 is a constant in ℝ.


Then the product of the vector 𝑢 with the scalar 𝑐 is given by
𝑐𝑢 = (𝑐𝑢1 , 𝑐𝑢2 , 𝑐𝑢3 … 𝑐𝑢𝑛−1 , 𝑐𝑢𝑛 )

62
Negative of a vector: −𝑢 = (−1)𝑢 and 𝑢 − 𝑣 = 𝑢 + (−𝑣)

Linear combination of m vectors: Let’s suppose we have been given vectors 𝑢1 , 𝑢2 , 𝑢3 ... 𝑢𝑚 in
ℝ𝑚 and scalars 𝑐1 , 𝑐2 , 𝑐3 … 𝑐𝑚 in ℝ. The linear combination 𝑣 of these m vectors is given by

𝑣 = (𝑐1 𝑢1 + 𝑐2 𝑢2 + ⋯ + 𝑐𝑚 𝑢𝑚)

Dot (or inner) product of two vectors: Let us consider two vectors 𝑢 = (𝑢1 , 𝑢2 , 𝑢3 … 𝑢𝑛−1 , 𝑢𝑛 )
and 𝑣 = (𝑣1 , 𝑣2 , 𝑣3 … 𝑣𝑛−1 , 𝑣𝑛 ) in ℝ𝑛 . Then their dot product or inner product is defined as

𝑢. 𝑣 = 𝑢1 𝑢1 + 𝑢2 𝑣2 + 𝑢3 𝑣3 … + 𝑢𝑛 𝑣𝑛

The two vectors are said to be orthogonal (or perpendicular) if 𝑢. 𝑣 = 0

Norm (Length) of a vector: The norm or length of a vector 𝑣 in ℝ𝑛 is denoted by ||𝑣|| is defined
as the positive square root of the dot product of the vector with itself. That is

||𝑣|| = √𝑣. 𝑣 = √𝑣12 + 𝑣22 + ⋯ 𝑣𝑛2

Thus ||𝑣|| ≥ 0, unless 𝑣 = 0

1 𝑣
The unit vector 𝑣̂ = ||𝑣|| 𝑣 = ||𝑣||.

It is the unique unit vector which points in the same direction as 𝑣

We now summarize the above properties of vectors 𝑣⃗ and 𝑢


⃗⃗
 (−1)𝑣⃗ + 𝑣⃗ =?
Every vector 𝑣⃗ has an opposite (−1)𝑣⃗ = −𝑣⃗, where 𝑣⃗ + (−𝑣⃗) = ⃗0⃗ is a zero vector.
Other properties of the zero vector ⃗0⃗ and scalar multiplication are given below.
Note that 𝑎 𝑎𝑛𝑑 𝑏 are scalars (have only magnitude)

63
 𝑣⃗ + ⃗0⃗ = 𝑣⃗ Existence of the zero vector ⃗0⃗
 𝑢
⃗⃗ + 𝑣⃗ = 𝑣⃗ + 𝑢
⃗⃗ (Commutativity)
 (𝑎 + 𝑏)𝑣⃗ = 𝑎𝑣⃗ + 𝑏𝑣⃗ (Scalar distributivity)
 𝑎(𝑏𝑣⃗) = (𝑎𝑏)𝑣⃗ (Multiplicative associativity)
 1𝑣⃗ = 𝑣⃗ (Multiplicative identity).
 𝑎(𝑣⃗ + 𝑤
⃗⃗⃗) = 𝑎𝑣⃗ + 𝑎𝑤
⃗⃗⃗ (Vector distributivity)

Now we are ready with the concept of vector space.

4.2 Vector Spaces


A vector space 𝑉 is a non-empty set of objects {𝑣⃗} with the following addition and scalar
multiplication rules:
• 𝑣⃗ + 𝑤
⃗⃗⃗ = 𝑤
⃗⃗⃗ + 𝑣⃗
• ⃗⃗ + (𝑣⃗ + 𝑤
𝑢 ⃗⃗⃗) = (𝑢
⃗⃗ + 𝑣⃗) + 𝑤
⃗⃗⃗
• 𝑣⃗ + (−𝑣⃗) = ⃗⃗
0 is a zero vector
• 𝑣⃗ + ⃗0⃗ = 𝑣⃗ Alternative explanation for existence of the zero vector ⃗0⃗
• (𝑐 + 𝑑)𝑣⃗ = 𝑐𝑣⃗ + 𝑑𝑣⃗ (Scalar distributivity)
• 𝑐(𝑑𝑣⃗) = (𝑐𝑑)𝑣⃗ (Multiplicative associativity)
• 1𝑣⃗ = 𝑣⃗ (Multiplicative identity)
• 𝑐(𝑣⃗ + 𝑤
⃗⃗⃗) = 𝑐𝑣⃗ + 𝑐𝑤
⃗⃗⃗ (Vector distributivity)

Some real-life examples of vectors are given below. It is to be noted that all of these vectors
constitute vector spaces and satisfy the two fundamental axioms of vector spaces , namely vector
addition and scalar multiplication.

• Geometric vectors: This example of a vector may be familiar from high school mathematics
and physics. Geometric vectors are directed segments, which can be drawn (at least in two
dimensions).

64
• Polynomials are also vectors
• Two polynomials can be added together, which results in another polynomial.
• They can be multiplied by a scalar λ ∈ ℝ, and the result is a polynomial as well.
• Therefore, polynomials are (rather unusual) instances of vectors.
• Note that polynomials are very different from geometric vectors.
• While geometric vectors are concrete “drawings”, polynomials are abstract concepts.
• However, they are both vectors in the sense previously described.

• Audio signals are vectors


• Audio signals are represented as a series of numbers.
• We can add audio signals together, and their sum is a new audio signal.
• If we scale an audio signal, we also obtain an audio signal.
• Therefore, audio signals are a type of vector, too.

• Elements of ℝ𝒏 (tuples of n real numbers) are vectors


• ℝ𝑛 is more abstract than polynomials, and it is the concept we focus on in this course.
• We will largely focus on vectors in ℝ𝑛 since most algorithms in linear algebra are
formulated in ℝ𝑛

Subspaces: As the name suggests, a vector subspace is a vector space which is a subset of a larger
vector space. For example, if the vector space is defined in ℝ3 a subspace can be either in ℝ2 or
ℝ1 .

Span of a vector space


Let us consider that 𝑉 is a vector space in the field ℝ𝑛 . Let 𝑣1 , 𝑣2 , 𝑣3 … 𝑣𝑛 be vectors in the vector
space 𝑉 and 𝑎1 , 𝑎2 , 𝑎3 , … 𝑎𝑛 are scalars in ℝ. Then we say that the vectors 𝑣𝑖 span the vector space
if an unknown vector 𝑣 can be expressed as a linear combination of the vectors and scalars, that is
if
𝑣 = 𝑎1 𝑣1 + 𝑎2 𝑣2 + ⋯ + 𝑎𝑛 𝑣𝑛

65
Linear Independence, Basis, Dimension
Let us consider that 𝑉 is a vector space in the field ℝ𝑛 . We now examine the concept of linear
dependence and independence.
Let 𝑣1 , 𝑣2 , 𝑣3 … 𝑣𝑛 be vectors in the vector space 𝑉 and 𝑎1 , 𝑎2 , 𝑎3 , … 𝑎𝑛 are scalars in ℝ and not all
of them are zero. Then the vectors are said to be linearly dependent, if

𝑎1 𝑣1 + 𝑎2 𝑣2 + ⋯ + 𝑎𝑛 𝑣𝑛 = 0

Else, the vectors are said to be linearly independent.


Basis of a vector space 𝑽:
Consider a set 𝑆 = {𝑣1 , 𝑣2 , … 𝑣𝑛 } of vectors. Then 𝑆 is said to constitute a basis of V, if the
following two properties are satisfied:
 𝑆 in linearly independent
 𝑆 spans 𝑉

Dimension of a vector space 𝑽:


A vector space 𝑉 is said to be of a dimension (finite) n or we say that it is an n-dimensional vector
space if 𝑉 has a basis with n elements.

Examples of bases
A vector space in ℝ𝑛 : Let us consider the following n vectors in ℝ𝑛 :

𝑒1 = (1, 0, 0, … . .0, 0), 𝑒2 = (0, 1, 0, … . .0, 0), … 𝑒𝑛 = (0, 0, 0, … . .0, 1)

These vectors ae linearly independent since the dot product of any two vectors is 0.

Vector space 𝑴 = 𝑴𝒓,𝒔 of all 𝒓 × 𝒔 matrices


Let us consider as a specific example the set of the following 6 matrices which are of dimension
2×3

66
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
[ ],[ ],[ ],[ ],[ ],[ ]
0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1

These form a basis of 2 × 3 matrices such that any 2 × 3 matrix can be expressed in terms of a
linear combination of the above 6 matrices.

Orthogonality and orthonormality of vector spaces


Orthogonal vectors: Let us consider a set 𝑆 = {𝑣1 , 𝑣2 , … 𝑣𝑛 } of nonzero vectors in an inner product
space V.
𝑆 is said to be orthogonal if each pair of vectors in 𝑆 are orthogonal. 𝑆 is said to be orthonormal if
𝑆 is orthogonal and each vector in 𝑆 has unit length.
Alternatively, 𝑆 is orthogonal if the inner product of any two vectors is zero. That is

𝑣𝑖 . 𝑣𝑗 = 𝛿𝑖𝑗
0, 𝑖𝑓 𝑖 ≠ 𝑗
where, 𝛿𝑖𝑗 = {
1, 𝑖𝑓 𝑖 = 𝑗

4.3 Matrices and Matrix Algebra


Matrices can be considered as a generalized version of vectors. An alternate geometrical
perspective of matrices is the following. A transformation in linear algebra is the process that
transforms one vector into another. The function or the object that carries out this transformation
is a matrix.

A general form of a matrix A can be written as

𝑎11 𝑎12 ⋯ 𝑎1𝑛


𝐴=[ ⋮ ⋱ ⋮ ]
𝑎𝑚1 ⋯ 𝑎𝑚𝑛

The above matrix has m rows and n columns. Thus, the mathematical representation of such a
matrix is 𝐴𝑚𝑛 . Alternatively, we say that 𝐴 is a matrix in ℝ𝑚×𝑛 .

67
Matrices play an important and central role in linear algebra. They can be used to represent systems
of simultaneous linear equations in a compact form such as given below:

𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 = 𝑏1


𝑎21 𝑥1 + 𝑎22 𝑥2 + 𝑎23 𝑥3 = 𝑏2
𝑎31 𝑥1 + 𝑎32 𝑥2 + 𝑎33 𝑥3 = 𝑏3

This is the first of the 5 general problems of linear algebra. The above system of simultaneous
linear equations can be written in an even more compact form using the concept of vectors and
matrices as follows:

𝐴𝑥 = 𝑏

In the above equation, 𝐴 is a matrix, 𝑥 and 𝑏 are vectors. Specifically, with respect to the system
of simultaneous linear equations,

𝑎11 𝑎12 𝑎13


𝑏
𝑎
𝐴 = ( 21 𝑎22 𝑎23 ), 𝑥 = (𝑥𝑥12 ), 𝑏 = (𝑏12 )
𝑎31 𝑎32 𝑎33 𝑥3 𝑏3

Each row and each column of the matrix 𝐴 can be considered as a vector.

4.3.1 Matrix addition and scalar multiplication


Before we discuss about matrix operations, some terminology is in order.

A m x n matrix can be represented compactly as 𝐴 = [𝑎𝑖𝑗 ] where, 𝑎𝑖𝑗 is 𝑖𝑗𝑡ℎ element of the matrix.
Further, 𝑖, 𝑗 are indices of the matrix and can take values from 𝑖 = 1, 2, … . 𝑚 𝑎𝑛𝑑 𝑗 = 1, 2, … 𝑛.
Two matrices can be added only if both have the same dimensions. Two matrices of dimensions
m x n can be added elementwise. That is,

If 𝐴 = [𝑎𝑖𝑗 ] and 𝐵 = [𝑏𝑖𝑗 ] are two matrices with the same dimensions, m x n, then

68
𝐴 + 𝐵 = [𝑎𝑖𝑗 + 𝑏𝑖𝑗 ]

In the expanded form we have

𝑎11 + 𝑏11 𝑎12 + 𝑏12 … 𝑎1𝑛 + 𝑏1𝑛


𝑎 + 𝑏21 𝑎22 + 𝑏22 … 𝑎2𝑛 + 𝑏2𝑛
𝐴 + 𝐵 = ( 21 )
… … …
𝑎𝑚1 + 𝑏𝑚1 𝑎𝑚2 + 𝑏𝑚2 … 𝑎𝑚𝑛 + 𝑏𝑚𝑛

The product of a matrix 𝐴 = [𝑎𝑖𝑗 ] with a scalar c is given by

𝑐. 𝐴 = [𝑐. 𝑎𝑖𝑗 ]

That is, each element 𝑎𝑖𝑗 is multiplied by the scalar c.

Note that 𝐴 + 𝐵 and 𝑐𝐴 have the same dimensions as the original matrices.

Some additional definitions are given below:

−𝐴 = (−1)𝐴

𝐴 − 𝐵 = 𝐴 + (−𝐵)

Matrix Multiplication: The product of two matrices is not straightforward. There are certain
conditions to be met before we can calculate the product of two matrices.
1. If 𝐴 and B, are matrices of dimension m x n, they cannot be multiplied.
2. For to matrices to be multiplied, the dimension of the column of the first matrix should be
equal to the dimension of the row of the second matrix. Specifically, to be able to multiply
two matrices, 𝐴 and 𝐵, if 𝐴 has dimensions m x p, then 𝐵 should necessarily have the
dimension of its rows as p. The column dimension can be anything, say n

69
3. Therefore, for a matrix 𝐴 with dimension m x p and 𝐵 with dimension p x n, the product
of these two matrices 𝐶 = 𝐴 × 𝐵 has dimension m x n.
4. If the dimensions of the rows and columns of a matrix are the same, that is n x n, it is called
a square matrix. Two square matrices having the same dimensions can be multiplied and
the dimensions of the product matrix also is the same as the individual matrices.

The product of matrices 𝐴 = [𝑎𝑖𝑘 ]and 𝐵 = [𝑏𝑘𝑗 ] is the matrix 𝐶. It is given by

𝑐𝑖𝑗 = 𝑎𝑖1 𝑏1𝑗 + 𝑎𝑖2 𝑏2𝑗 … + 𝑎𝑖𝑝 𝑏𝑝𝑗 = ∑ 𝑎𝑖𝑘 𝑏𝑘𝑗


𝑘=1

Transpose of a matrix: The transpose of a matrix 𝐴 is written as 𝐴𝑇 is the matrix that is obtained
by transposing the columns of 𝐴 to rows of 𝐴𝑇 . Therefore, if 𝐴 is a matrix with dimensions m x
n, the transpose matrix 𝐴𝑇 will have dimensions n x m. It should be noted that in case the matrix
has just one row and n columns (a row vector), the transpose will have n rows and 1 column
(column vector).

1 3
1 2 3 𝑇
For example, if 𝐴 = ( ) is a 2 x 3 matrix, then 𝐴 = (2 4) will be a 3 x 2 matrix.
3 4 5
3 5

1
If 𝑣 is a vector, 𝑣 = (1 2) then 𝑣 𝑇 = ( )
2

Square matrices: As mentioned earlier, a square matrix is one in which the number of rows and
columns are the same. It should be noted that matrix addition and multiplication of two square
matrices with the same dimensions is always possible.

Invertible (nonsingular) matrices


For any matrix A to have an inverse, it needs to be square. It should however be noted that not all
square matrices have an inverse. The mathematical condition for a square matrix A to have an
inverse is the following. A square matrix A can have an inverse or is nonsingular if there exists
another matrix B which meets the following condition:

70
𝐴𝐵 = 𝐵𝐴 = 𝐼, where 𝐼 is the identity matrix

Such a matrix will always be a unique matrix. We call the matrix 𝐵 as an inverse of 𝐴 and it is
represented by 𝐴−1

4.4 System of Linear Equations


Let us consider a situation where we have n unknowns 𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛 . If we must put these in a
mathematical form which is called as a linear equation, the standard format looks like this

𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 … + 𝑎𝑛 𝑥𝑛 = 𝑏

Here, 𝑎1 , 𝑎2 , ... 𝑎𝑛 are constants. The constant 𝑎𝑖 is usually referred to as the coefficient of the
unknown variable 𝑥𝑖 . If we solve this equation and obtain the values of the unknowns, the solution
will be a list of values or a vector which as we discussed above is a list of n tuples. Let us assume
that the solution for {𝑥𝑖 } is the vector 𝑣 = (𝑘1 𝑘2 𝑘3 … 𝑘𝑛 ). Then we can substitute these values
instead of the variables 𝑥𝑖 in the equation above to obtain a solution

𝑎1 𝑘1 + 𝑎2 𝑘2 + 𝑎3 𝑘 … + 𝑎𝑛 𝑘 = 𝑏

If we have a more general system of simultaneous linear equations such as


𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 + ⋯ 𝑎1𝑛 𝑥𝑛 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 + 𝑎23 𝑥3 + ⋯ 𝑎2𝑛 𝑥𝑛 = 𝑏2
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + 𝑎𝑚3 𝑥3 + ⋯ 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚

Here, 𝑎𝑖𝑗 is the coefficient of 𝑥𝑗 in the equation 𝐿𝑖 . Both 𝑎𝑖𝑗 and 𝑏𝑖 are constants.

As mentioned in the introduction section of matrices, we can use matrix operations to solve a set
of simultaneous linear equations. We need to introduce some additional concepts before we can
approach the steps for the solutions to the above problem.

71
4.4.1 Augmented Matrix and Coefficient matrix of a system
Considering the above system of equations, we can convert it into two matrices
Augmented matrix:
𝑎11 𝑎12 … 𝑎1𝑛 𝑏1
𝑎21 𝑎22 … 𝑎2𝑛 𝑏2
𝑀=( )
… … … …
𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛 𝑏𝑛

is called as the Augmented matrix.

𝑎11 𝑎12 … 𝑎1𝑛


𝑎21 𝑎22 … 𝑎2𝑛
𝐴=( … … … … )
𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛

Elementary operations in a system of simultaneous linear equations


There are a few fundamental or elementary operations which are used on a system of simultaneous
linear equations. These operations help in getting at least one solution to such a system if it exists.
These operations are used in an algorithm called as the Gaussian elimination to provide solutions
to such systems.

Let us consider a system of m linear equations 𝐿1 , 𝐿2 , ... 𝐿𝑛 . The operations are the following

1. Interchange of two of the equations 𝐿𝑖 and 𝐿𝑗 .


2. Multiply and replace one of the equations 𝐿𝑖 with a constant k.
3. Replace an equation by the sum of a multiple of another equation and multiple of itself.

4.4.2 Linear Systems in Triangular and Echelon Forms


Triangular Form: Let us evaluate the following system of linear equations, which is in triangular
form

2𝑥1 + 6𝑥2 − 𝑥3 + 4𝑥4 − 2𝑥5 = 9


𝑥3 + 2𝑥4 + 3𝑥4 = 1

72
7𝑥3 − 𝑥4 = 3
2𝑥4 = 8

The first unknown 𝑥1 is the leading unknown in the first equation, 𝑥2 in the second, and so on.
These are called as pivot variables. As can be seen from this set of equations, the last equation
gives the solution for the 4th variable. Consequently, by substituting this value of 𝑥4 (which is 4)
into the previous equations, we can get a unique solution. The vector for this system of equations
is 𝑥 = (3, −2, 1, 4).

Echelon Form: Let us consider the following system of simultaneous linear equations:

2𝑥1 + 6𝑥2 − 𝑥3 + 4𝑥4 − 2𝑥5 = 15


𝑥3 + 2𝑥4 + 2𝑥5 = 5
3𝑥4 − 9𝑥5 = 6

This is said to be in echelon form. The number of equations is less than the number of variables.
There are the following options of solutions for echelon forms:
Let us consider a system of linear equations in echelon form with r equations and n unknowns.
There are 2 cases to evaluate
Case 1: If r = n, that is the number of equations is equal to the number of variables, then we have
a unique solution.
Case 2: If r < n, that is the number of equations is less than the number of variables, then we can
assign values to the remaining n – r free variables and get solutions for the r pivot variables. These
free variables are called as parameterized variables since the free variables can take any value.
Therefore, we end up with an infinite number of solutions through the parameterized variables.
Gaussian Elimination Algorithm
Using the above 3 elementary operations, and the definitions of the triangular and echelon forms,
we define the Gaussian elimination algorithm as follows. The algorithm consists of two parts.
Part A: This is the forward elimination and consists of a step wise reduction of the system of
linear equations. At the end, we get either a degenerate equation with no solution or an equivalent
simpler system in triangular or echelon form.

73
Part B: This is the backward elimination part, and this is a step wise back substitution to find the
solution of the simpler system.
Gaussian Elimination example:
Let us consider the following system of linear equations to illustrate the Gaussian elimination
algorithm for solving for the variables.

𝐿1 : 𝑥 − 3𝑦 − 2𝑧 = 6
𝐿2: 2𝑥 − 4𝑦 − 3𝑧 = 8
𝐿3: − 3𝑥 + 6𝑦 + 8𝑧 = −5

Part A: Let us leverage the coefficient 1 of 𝑥 in the first equation 𝐿1 as the pivot to eliminate 𝑥
from the second equation 𝐿2 and from the third equation 𝐿3 . We use the elementary operations to
accomplish this as follows:

1. Multiply 𝐿1 with -2 and add it to 𝐿2 . Essentially, we relace 𝐿2 with -2𝐿1 + 𝐿2


2. Multiply 𝐿1 with 3 and add it to 𝐿3 . Essentially, we replace 𝐿3 by 3𝐿1 + 𝐿3

Through these operations, we get a simplified set of equations, namely

𝐿1 : 𝑥 − 3𝑦 − 2𝑧 = 6
𝐿2: 2𝑦 + 𝑧 = −4
𝐿3: − 3𝑦 + 2𝑧 = 13

As a next step we further simplify these equations using the elementary operations and end up in
a triangular form of the equations. We use the coefficient 2 of the variable y in 𝐿2 to eliminate y
from the equations.
The operations are the following: Multiply 𝐿2 by 3/2 and add it to 𝐿3 . So essentially, we replace
𝐿2 with (3/2) 𝐿2 + 𝐿3 . The revised system is the following
𝐿1 : 𝑥 − 3𝑦 − 2𝑧 = 6
𝐿2: 2𝑦 + 𝑧 = −4
𝐿3: 7𝑧 = 14

74
This is the triangular form of the equations, and we thus have a unique solution.

Part B: To obtain the values of the variables, we now proceed in the backward direction.

1. Looking at 𝐿3 , the solution for z is 2.


2. Now we substitute the value of z in 𝐿2 and obtain y = -3
3. Finally, substituting y and z in 𝐿1 we get x = 1

Thus, the vector u = (1. -3, 2) is the unique solution.

4.4.3 Echelon Matrices, Row Canonical Form


An equivalent but an elegant option to solve a system of simultaneous linear equations is using the
matrix methods. These leverage the fundamental principles of elementary operations that was
discussed in earlier sections. We deal with Echelon and the row canonical forms of matrices to get
solutions to linear equations as well as obtaining inverse of a matrix if it exists.

Echelon Matrices: A matrix is in Echelon form if the following conditions are satisfied:

1. All the zero rows, if there are any are at the bottom of the matrix
2. Each leading nonzero entry in a row is to the right of the leading nonzero entry in the
preceding row.

Row canonical form: A matrix is in a row canonical form or Row Reduced Echelon Form (RREF)
if it is an echelon matrix, and it satisfies the following additional properties
1. Each pivot (leading nonzero entry) is equal to 1.
2. Each pivot is the only nonzero entry in its column.

To convert a matrix into the Echelon matrix and the Row canonical forms, we use the same
elementary operations as we used earlier.

75
Application to systems of linear equations
We now use the Echelon Matrix and Row canonical form to solve a system of linear equations

Example: Solve the following system of linear equation:

𝑥 + 2𝑦 + 𝑧 = 3
2𝑥 + 5𝑦 − 𝑧 = −4
3𝑥 − 2𝑦 − 𝑧 = 5

Solution:
1. Reduce the augmented matrix M to echelon form and then to row canonical form as
follows:

1 2 1 3 1 2 1 3 1 2 1 3
𝑀 = (2 5 −1 −4) ~ (0 1 −3 −10) ~ (0 1 −3 −10)
3 −2 −1 5 0 −8 −4 −4 0 0 −28 −84

1 2 1 3 1 2 1 3 1 0 0 2
(0 1 −3 −10) ~ (0 1 0 −1) ~ (0 1 0 −1)
0 0 1 3 0 0 1 3 0 0 1 3

Thus, the system has the unique solution 𝑥 = 2, 𝑦 = −1, 𝑧 = 3 or equivalently the vector u = (2, -
1, 3)

Inverse of a matrix using Echelon matrices and Row Canonical forms


Another use of the echelon and row canonical forms of matrices is to find the inverse of a matrix
if it exists. Let us see this through an example.

1 0 2
Example: Find the inverse of the matrix 𝐴 = (2 −1 3)
4 1 8

Solution: First form the block matrix M = [A, I] and row reduce M to an echelon form

76
1 0 2 1 0 0 1 0 2 1 0 0
𝑀 = (2 −1 3 0 1 0) ~ (0 −1 −1 −2 1 0)
4 1 8 0 0 1 0 1 0 −4 0 1

1 0 2 1 0 0
~ (0 −1 −1 −2 1 0)
0 0 −1 −6 1 1

In echelon form, the left half of the matrix is in triangular form; therefore, A has an inverse. Next,
we further row reduce M to its row canonical form:

1 0 0 −11 2 2 1 0 0 −11 2 2
𝑀~ (0 −1 0 4 0 −1) ~ (0 1 0 −4 0 1)
0 0 1 6 −1 −1 0 0 1 6 −1 −1

Therefore, the inverse is the right part of the row reduced matrix that is

−11 2 2
−1
𝐴 = ( −4 0 1)
6 −1 −1

4.5 Determinants
Determinant of order n is defined to be a n x n square array of numbers (or functions) with the
array written within vertical bars, as follows:
𝑎11 𝑎12 … 𝑎1𝑛
𝑎21 𝑎22 … 𝑎2𝑛
𝐷𝑛 = | … … … …|
𝑎𝑛1 𝑎𝑛2 … 𝑎𝑛𝑛

The determinant 𝐷 has a value that is obtained by forming all n! products through the following
examples
Determinants of order 2 expand to 2! that is 2 terms:
𝑎11 𝑎12
𝐷2 = |𝑎 | = 𝑎11 𝑎22 − 𝑎12 𝑎21
21 𝑎22

77
Determinants of order 3 expand to 3! That is 6 terms:

𝑎11 𝑎12 𝑎13


𝑎
D3 = | 21 𝑎22 𝑎23 |
𝑎31 𝑎32 𝑎33

Which can be expanded to: a11(a22a33-a23a32) – a12(a21a33-a23a31) + a13(a21a32-a33a31)

Properties of Determinants
 Transposing (rows to columns) does not alter a determinant’s value
 Interchanging two rows (or two columns) changes the sign of the value of a determinant
 Multiplication of all members of a single column (or row) by a constant k causes the value of
the determinant to be multiplied by k
 If the elements of a column (or row) are sums of two quantities, the determinant can be
decomposed into a sum of two determinants
 Any determinant with two rows (or two columns) equal (or proportional), has the value zero
 The value of a determinant is unchanged if a multiple of one row is added (column by column)
to another row or if a multiple of one column is added (row by row) to another column.
 If each element of a row (or column) is zero, the determinant has the value zero.

4.6 Eigenvalue equations


We now move over to solve linear algebra problems of the type

𝐴𝑥 = 𝜆𝑥

In the above equation 𝐴 is a square matrix, 𝑥 is a vector and 𝜆 is a number. In the cases we will
consider, 𝜆 is a real number. In general, 𝜆 is a complex number. In the terminology of the
eigenvalue equation, 𝜆 is called as an eigenvalue and 𝑥 is an eigenvector that corresponds to the
specific eigenvalue.
The mathematical process to solve and eigenvalue equation has the following steps:
Finding the eigenvalues

78
 Form the equation (𝐴 − 𝜆𝐼)𝑥 = 0
 We now evaluate the characteristic equation det(𝐴 − 𝜆𝐼) = 0
 The solutions to the characteristic equation are the eigenvalues.

Finding the eigenvectors


 We start with the equation (𝐴 − 𝜆𝐼)𝑥 = 0
 In the equation, we plug in the first eigenvalue 𝜆1
 We get a system of linear equations, which can be solved using elementary row operations.
 We will sometimes need to choose values for the components of the eigenvector.

Examples:
1 1
1. Find the eigenvalues and eigenvectors of the matrix 𝐴 = ( )
4 1
Solution: Since 𝐴 is a 2 x 2 matrix, we expect that it will have 2 eigenvalues and their
corresponding eigenvectors.

Finding the Eigenvalues:


1 1 𝜆 0
𝐴 − 𝜆𝐼 = ( )−( )=0
4 1 0 𝜆

1−𝜆 1
Thus: 𝑑𝑒𝑡 ( )=0
4 1−𝜆

We thus get the quadratic equation


(1 − 𝜆)(1 − 𝜆) − 4 = 0
𝜆2 − 2𝜆 − 3 = 0
Solving this equation, we get 𝜆1 = 3, 𝜆2 = −1

Finding the Eigenvectors:


 For 𝜆1 = 3, we plug in this eigenvalue into the expression (𝐴 − 𝜆𝐼)
1 1 3 0 −2 1
 ( )−( )=( )
4 1 0 3 4 −2

79
−2 1
 Now we do a row operation 2𝑅1 + 𝑅2 and get the revised matrix as ( )
0 0
 Inserting this matrix into the eigenvalue equation, we get
−2 1 𝑥1 0
( ) (𝑥 ) = ( )
0 0 2 0
 We thus get −2𝑥1 + 𝑥2 = 0 and this implies 𝑥2 = 2𝑥1
 We may now assume that 𝑥1 = 1 then 𝑥2 = 2
1
 Thus, the first eigenvector is ( )
2
 Similarly, when we go through the above process for 𝜆2 = −1 and solve for the
1
eigenvector, we get the second eigenvector ( )
−2

Example 2: Find the eigenvalues and eigen vectors for the following matrix:
1 0 0
𝐴 = (3 −2 0)
2 3 4

Solution:
 We first find out the eigenvalues by solving the characteristic equation.
 The characteristic equation is the following:

1−𝜆 0 0
| 3 −2 − 𝜆 0 |=0
2 3 4−𝜆

 Solving for 𝜆, we get 𝜆1 = 1 , 𝜆2 = −2, 𝜆3 = 4


 Finding the eigenvectors for each of the eigenvalues

1−1 0 0 𝑥1
 Substituting 𝜆1 = 1, in the matrix, ( 3 −2 − 1 𝑥
0 ) ( 2) = 0
2 3 4 − 1 𝑥3

 R1: 0𝑥1 + 0𝑥2 + 0𝑥3 = 0 gives us no information about any of the 𝑥𝑖 𝑠


 R2: 3𝑥1 − 3𝑥2 + 0𝑥3 = 0

80
 R3: 2𝑥1 + 3𝑥2 + 3𝑥3 = 0
 R2 gives 𝑥1 = 𝑥2
5
 Substituting for 𝑥2 in R3, gives 𝑥3 = − 3 𝑥1
1
 Choosing 𝑥1 = 1, we get the eigenvector 𝑥⃑ = ( 15)
−3
0
 Similarly, working out for 𝜆2 = −2, the eigenvector is 𝑥⃑ = (−2)
1
 For 𝜆3 = 4, we get the matrix equations to be the following:
−3 0 0 𝑥1
( 3 −6 0) (𝑥2 ) = 0
2 3 0 𝑥3

 Thus, we get from R1: 3𝑥1 = 0 which means 𝑥1 = 0

 From R2: 3𝑥1 − 6𝑥2 = 0 , which means after substituting for 𝑥1 = 0, we get 𝑥2 = 0

 From R3: 2𝑥1 + 3𝑥2 = 0, which gives no additional information

 Since, we get no information about 𝑥3 , we treat it as a free variable and can assign any value.
The simplest value that can be assigned is 𝑥3 = 1

0
 Thus, the eigenvector is 𝑥⃑ = (0)
1

Example 3:
0 1 0
Find the eigenvalues and eigenvectors for 𝐻 = (1 0 0)
0 0 2

Solution:
Solving for the characteristic equation, we get eigenvalues as 𝜆 = 2, +1 and -1

81
The eigenvectors are

0 1 1
𝑥1 = (0) , ⃗⃗⃗⃗⃑
⃗⃗⃗⃑ 𝑥2 = (1) , ⃗⃗⃗⃗⃑
𝑥3 = (−1)
1 0 0
Some interesting insights can be drawn from this example.

1. If we take a transpose of the matrix H, we get the same matrix H back.


a. Matrices where 𝐻 = 𝐻 𝑇 are called as symmetric matrices.
2. It is important to note that for symmetric matrices, the eigenvectors are orthogonal.
a. What orthogonality means has been explained earlier, in the dot product of two
vectors. Specifically, two vectors are said to be orthogonal if their dot product is 0.
b. Mathematically, we can write this as
0, 𝑖𝑓 𝑖 ≠ 𝑗
𝑥𝑖 ⃗⃗⃗⃑
⃗⃗⃗⃑. 𝑥𝑗 = {
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

We can now introduce two types of matrices, whose definitions depend on understanding
eigenvalues and eigenvectors.

Symmetric Positive Definite and Semidefinite Matrices


Symmetric Positive Definite Matrix:
• A symmetric matrix is called symmetric positive definite if all its eigenvalues are positive

Symmetric Positive Semidefinite Matrix:


• A symmetric matrix is called symmetric positive semidefinite if all its eigenvalues are
negative
• A symmetric matrix 𝐴 is positive definite or semidefinite iff there exists a real nonsingular
matrix M (det M ≠ 0), such that
• 𝐴 = 𝑀𝑀𝑇 , where 𝑀𝑇 is the transpose
4.7 Singular Value Decomposition (SVD) of non-square matrices
Singular Value Decomposition, popularly known as SVD is a generalization of the problem of
eigenvalue equation to non-square matrices. Specifically, we attempt to solve the problem

82
𝐴 = 𝑈 Σ 𝑉𝑇

In the above equation,


 𝐴 is a non-square matrix of dimension 𝑚 × 𝑛.
 𝑈 and 𝑉 are column orthogonal matrices. That is, the columns of the matrices are
orthogonal unit vectors. This is expressed in another mathematical manner as follows:

𝑈 𝑇 𝑈 = 𝐼, and 𝑉 𝑇 𝑉 = 𝐼
I, being the identity matrix

 Σ is a diagonal matrix that is it has non-zero elements only along its main diagonal
o The entries of the diagonal are
 denoted by 𝜎𝑖 ,
 are positive,
 are sorted in decreasing order
o These are called as singular values (a generalization to eigenvalues)

Finding 𝑼, 𝚺 𝐚𝐧𝐝 𝑽𝑻
Given a general matrix 𝐴 with dimensions 𝑚 × 𝑛 we now outline the process to determine
𝑈, Σ and 𝑉 𝑇

1. We first compute 𝐴𝑇 𝐴
2. Next, we determine the eigenvalues of 𝐴𝑇 𝐴 namely 𝜆𝑖
3. By definition, 𝜎𝑖 = √𝜆𝑖
4. The eigenvectors corresponding to the 𝜆𝑖 form the 𝑉 matrix, namely, 𝑣1 , 𝑣2 , 𝑣3 , … 𝑣𝑛
5. To determine the 𝑈 matrix, we apply the following formula
a. For a special case of 3 eigenvalues 𝜆𝑖

1 1 𝑁𝑆(𝐴𝑇 )
𝑈 = [𝜎 𝐴𝑣1 𝐴𝑣2 |𝐴𝑇 |
]
1 𝜎2

83
1 1 𝑁𝑆(𝐴𝑇 )
Note that in the above expression: 𝑢1 = 𝜎 𝐴𝑣1 , 𝑢2 = 𝜎 𝐴𝑣2 and 𝑢3 = |𝐴𝑇 |
1 2

NS is the null space of matrix 𝐴𝑇

Null space of a matrix 𝐴 is defined as the set of all n-dimensional column vectors 𝑣, such that
𝐴𝑣 = 0

Let us take an example

1 1
Example: Find 𝑈, Σ and 𝑉 𝑇 for the following matrix 𝐴 = (1 0)
0 1
Solution:
 We first compute the matrix 𝐴𝑇 𝐴

2 1
𝐴𝑇 𝐴 = ( )
1 2

 Next, we find the eigenvalues of the matrix 𝐴𝑇 𝐴 and we get 𝜆1 = 3, 𝜆2 = 1


 Thus, 𝜎𝑖 = √𝜆𝑖 and we get 𝜎1 = √3, 𝜎2 = 1
 The eigenvectors are now determined, which will be the column vectors for the matrix 𝑉
 The eigenvectors need to be normalized that is we need to get a set of orthonormal vectors.
1 1 1 −1
 We get 𝑣
̂1 = ( ), 𝑣
̂2 = ( ).
√2 1 √2 1
o Note that 𝑣
̂.
1 𝑣
̂2 = 0 as expected, since the eigenvectors are orthogonal as well as
normalized.
 Calculating for the eigenvectors 𝑢𝑖 we get
2 0
1 1
𝑢1 = (1), 𝑢2 = (−1)
√6 √2
1 −1
 To determine 𝑢3 we use the Gram-Schmidt orthogonalization rule

84
𝑢̂.
1 𝑒̂3 𝑢̂.
2 𝑒̂3
𝑢3 = [ .𝑢
̂1 + .𝑢
̂]
𝑢̂
1. 𝑢
̂1 𝑢̂ ̂2 2
2. 𝑢

−1
1
 After the calculations, we get 𝑢3 = 3 ( 1 )
1
 Thus, the factorization of the matrix 𝐴 is the following

2 1
0 −
√6 3 1 1
1 1 1 √3 1
𝐴= − . ( 0 1) . √2 √2
√6 √2 3 1 1
0 0 −
1 1 1
− ( √2 √2)
(√6 √2 3 )

4.8 Tensor Fundamentals


Definition of a Tensor: A tensor is a multi-dimensional array of numbers. Scalars, vectors, and
matrices are special cases of a tensor. Specifically,
 All scalars are tensor of rank 0.
 A vector is a tensor of rank 1 tensor.
 A matrix is a rank 2 tensor.
 A rank 3 tensor is a 3-dimensional list of numbers
 In general, any multi-dimensional array of numbers is a tensor

Tensors in Machine Learning:


Modern data is usually of a multivariate form which means that there are numerous independent
variables on which it depends. Tensors usually play an important role in data science, specifically
machine learning and deep learning by making it convenient to handle data by encoding
multivariate data. As a simple example, a picture is generally represented by three fields: width,
height, and depth (color). It therefore makes it convenient to encode it as a 3D tensor. However,
in most cases we deal with large number of pictures. The fourth field which is sample size becomes
relevant. A series of images, such as the famous MNIST dataset, can be easily stored in a 4D tensor
in Tensorflow. This representation allows problems involving big data to be solved easily.

85
Tensor Operations
The operations on tensors are a general case of matrix and vector operations. Some of the
operations are listed below:
 Tensor addition
 Tensor subtraction
 Tensor Hadamard product
 Tensor division
 Tensor product

86
Problems and Questions
Determine the real dimensional space in which the following are vectors. That is determine if the
vector is in ℝ𝑛 where n is an integer.
1. 𝑣 = (2, −5)

2. 𝑣 = (0, 0, 0)

3. 𝑣 = (3, 4, 5, 6)

4. Let us assume that 𝑢 = (1, −2, 3), 𝑣 = (4, 5, −1), 𝑤 = (2, 7, 4). Find out the inner
products of the following vectors: (𝑢. 𝑣), (𝑢. 𝑤) and (𝑣. 𝑤)

5. Let 𝑢 = (1, −2, −4, 5, 3). Find the norm of the vector 𝑢 that is ||𝑢||.

1 3
6. Find the product AB and BA of the following two matrices 𝐴 = ( ) and 𝐵 =
2 −1
2 0 −4
( )
5 −2 6

Find the transpose of the following matrices


1 −2 3
7. 𝐴 = ( )
7 8 −9

1 2 3
8. 𝐵 = (2 4 5)
3 5 6

2
9. 𝐷 = (−4)
6

10. Identify which of the following matrices are symmetric, that is 𝐴𝑇 = 𝐴

87
5 −7 1
11. 𝐴 = (−7 8 2)
1 2 −4

0 4 −3
12. 𝐵 = (−4 0 5)
3 −5 0

0 0 0
13. 𝐴 = ( )
0 0 0

Determine if each of the following equations is linear or nonlinear


14. 5𝑥 + 7𝑦 − 8𝑦𝑧 = 16

15. 𝑥 + 𝜋𝑦 + 𝑒𝑧 = 𝑙𝑜𝑔5

16. 3𝑥 + 𝑘𝑦 − 8𝑧 = 16

Solve the following systems of linear equations


17. 2𝑥 − 5𝑦 = 11, 3𝑥 + 4𝑦 = 5

18. 2𝑥 − 3𝑦 = 8, −6𝑥 + 9𝑦 = 6

19. 2𝑥 − 3𝑦 = 8, −4𝑥 + 6𝑦 = −16

20. Let us consider the following vectors. 𝑢 = (1, 1, 0), 𝑣 = (1, 3, 2), 𝑤 = (4, 9, 5). Determine
if the vectors are linearly independent.

3 1
21. Consider the matrix 𝐴 = ( ). Find the eigenvalues and eigenvectors of this matrix.
2 2

4 2
22. Find the eigenvalues and eigenvectors of the matrix 𝐴 = ( )
3 −1

88
References
1. Schaum’s Outline of Theory and Problems of Linear Algebra, Seymour Lipschutz, Third
Edition, McGraw-Hill Education (India) Edition 2005.
2. Linear transformation and matrices: https://bit.ly/2RND0oQ
3. Finding Eigenvalues and Eigenvectors, https://www.youtube.com/watch?v=TQvxWaQnrqI
4. Tensor Fundamentals various definitions: https://www.youtube.com/watch?v=TvxmkZmBa-
k
5. Tensor fundamentals: https://towardsdatascience.com/quick-ml-concepts-tensors-
eb1330d7760f
6. Tensor Operations: https://machinelearningmastery.com/introduction-to-tensors-for-machine-
learning/

89
MODULE 5 COMPUTATIONAL COMPLEXITY THEORY

Learning Objectives
The learning objectives of computational complexity theory are the following. After studying this
unit, you will be able to:
 Understand about computational complexity.
 Overview of theoretical machine models such as the Turing machine
 A broad understanding of the big O notation

Unit Content Summary

5.1 Meaning of difficult problems in terms of computational complexity


Problems which have a time complexity that is polynomial in the size n of a binary input are
manageable to deal with as the size of the input grows. We call these class of problems 𝒫. Some
of the most difficult or unmanageable problems are those in which the time complexity scales
exponentially with the binary input size for solving the problem. However, if the problem
statement is changed to only verify or check if a particular solution is correct or not, then they
scale polynomially with an increase in the size of the binary input. A well-known class of difficult
problems are NP problems, such as an unstructured search required to guess a random password,
integer factoring, simulating the dynamics of electrons in molecules or the so-called protein
folding problem.

The Integer Factorization Problem


Amongst the most well-known of the NP problems is integer factorization. This involves factoring
a very large semi-prime number into its prime factors. The technique of RSA public-key
encryption is developed with the assumption that given a large semi-prime number (a number that
is a product of exactly two prime numbers), verifying a prime factor is an easy problem (just divide

90
and check the remainder), but finding that prime factor is hard (NP). Decades of searching has
not yielded a fast solution to integer factoring, and many computer scientists suspect that it is not
possible to solve this problem in polynomial time (that is, it is not in P). The current research about
whether hard NP problems have fast solutions is well-known even outside the field of computer
science and is popularly known as the P vs NP.

5.2 Machine Models – Turing Machine


A Turing machine is a mathematical model of a generic computing machine. It is a theoretical
product that manipulates symbols contained on a strip of tape. Turing machines are to be looked
at as a general model of a computing machine—anything from an advanced supercomputer to a
mathematician with a pencil and paper. They are not intended as a practical computing technology.
Computational Complexity can be looked at from three perspectives as shown in the figure below

Resources
(time,
space,
energy..)

Computationa
l Complexity

Underlying
Type of computational
problem model (det.
(decision, Turing m/c,
optimization, prob. Turing
m/c, Quantum
..) Computer…)

Basic Terminology for defining computational complexity


Problem: Number of logic gates required to add two n-bit numbers
Let’s assume that a specific algorithm requires f(n) = 24n + 2log(n) + 16 gates to perform this
task. In the limit of large n (say 100,000 or more), we need to consider only the 24n term since the

91
other two terms are insignificant in comparison. Essential behavior of the algorithm is summed up
by saying that the number of operations required scales like n

 The O(f(n)) notation is used to set upper bounds on the behavior of a function.
 The O(f(n)) notation is particularly useful for studying the worst-case behavior of specific
algorithms

5.3 Big O notation


The running time of a (computer) program is proportional to some constant multiplied by one of
these terms plus some smaller terms. Leading term + smaller terms (which become negligible for
large N)

Examples:
𝑁 3 + 5𝑁 2 − 3𝑁 + 7

 Here, 𝑁 3 is the leading term.


 For large 𝑁, say 𝑁 ≈ 108 , the contributions from the 𝑁 2 and 𝑁 terms (and the constant 7) is
negligible in comparison to the 𝑁 3 term. So, we ignore them.
 The Big O notation is used to get an estimate of the worst-case performance of the algorithms
in terms of the amount of time taken to computationally solve a particular type or class of
problems.

92
Problems and Questions
1. Explain the meaning of computational complexity
2. Explain in detail the three perspectives of computational complexity
3. What is the big O notation and why is it important in defining the complexity of algorithms?
4. Differentiate between the polynomial class and non-polynomial class of problem
complexities.

93
References
1. Quantum Computation and Quantum Information, Michael A. Neilsen and Issac L. Chuang
2. https://towardsdatascience.com/complexity-theory-101-introduction-to-big-o-bab99152ad44

94
MODULE 6 NUMERICAL ANALYSIS

Learning Objectives
The learning objectives of numerical analysis are the following. After studying this unit, you will
be able to:
 Understand about basics of numerical analysis
 Meaning of computational errors
 An overview of numerical stability and well-posed problems
 Overview of interpolation, extrapolation and regression

Unit Content Summary

6.1 Overview of numerical analysis


Numerical analysis is the study of algorithms that use numerical approximation (as against
symbolic manipulations) for problems involving mathematics, other physical sciences,
engineering, medicine, business, and art. The present growth in computing power has enabled the
use of more advanced and complex numerical analysis, which has helped in presenting detailed
and real-life mathematical models in science and engineering.
Some examples of numerical analysis include ODEs (Ordinary Differential Equations) as found
in astrophysics (predicting the motions of planets, stars, and galaxies), numerical linear algebra in
data analysis, and stochastic differential equations and Markov chains for simulating living cells
in medicine and biology. Prior to the advent of modern computers, numerical methods used
hand interpolation formulas, with data that was derived from large, printed tables. However, since
the mid 20th century, computers have been brought in to calculate the required functions instead.
However, many of the same formulae continue to be used in software algorithms.

95
6.2 Generation and propagation of computational errors
The study of errors forms an important part of numerical analysis.
There are several ways in which error can be introduced in the solution of the problem.
• Round off errors: Round-off errors arise because it is impossible to represent all real
numbers exactly on a machine with finite memory (which is what all practical digital
computers are).
• Truncation and discretization errors: Truncation errors are committed when an iterative
method is terminated or a mathematical procedure is approximated, and the approximate
solution differs from the exact solution.
• Similarly, discretization induces a discretization error because the solution of the discrete
problem does not coincide with the solution of the continuous problem.
• For instance, in the iteration in the sidebar to compute the solution of 3𝑥 3 + 4 = 28, after
10 or so iterations, it can be concluded that the root is roughly 1.99 (for example).
Therefore, there is a truncation error of 0.01.
• Once an error is generated, it will generally propagate through the calculation.
• For instance, already noted is that the operation ‘+′ on a calculator (or a computer) is
inexact.
• Therefore, a calculation 𝑎 + 𝑏 + 𝑐 + 𝑑 + 𝑒 will be increasingly incorrect since the
truncation error is carried forward.

6.3 Numerical stability and well – posed problems


Numerical stability is affected by the number of the significant digits the machine keeps on.
If a machine is used that keeps only the four most significant decimal digits, a good example on
loss of significance can be given by these two equivalent functions

𝑥
𝑓(𝑥) = 𝑥(√𝑥 + 1 − √𝑥) and 𝑔(𝑥) =
√𝑥+1+√𝑥

Comparing the results of 𝑓(500) = 500(√501 − √500) = 500(22.38 − 22.36) = 500(0.2) =


10
and
500 500 500
𝑔(500) = = 22.38+22.36 = 44.74 = 11.17
√501+√500

96
By comparing the above two results, it is clear that loss of significance (caused here
by catastrophic cancellation from subtracting approximations to the nearby numbers √501 and
√500) despite the subtraction's being computed exactly) has a huge effect on the results, even
though both functions are equivalent.
The desired value, computed using infinite precision, is 11.174755 …
6.4 Interpolation, Extrapolation, Regression
 Interpolation solves the following problem: given the value of some unknown function at
several points, what value does that function have at some other point between the given
points?
 Extrapolation is very similar to interpolation, except that now the value of the unknown
function at a point which is outside the given points must be found.
 Regression is also similar, but it considers that the data is imprecise.
 Given some points, and a measurement of the value of some function at these points (with
an error), the unknown function can be found.
 The least squares-method is one way to achieve this.

97
Problems and Questions
1. Explain why numerical methods are important in solving computational problems.
2. Explain the difference between interpolation and extrapolation.
3. Explain how regression is different from extrapolation

References
1. https://en.wikipedia.org/wiki/Numerical_analysis

98

You might also like