7-Mathematics Foundations SLM
7-Mathematics Foundations SLM
7-Mathematics Foundations SLM
(DEEMED UNIVERSITY)
Established under Section 3 of the UGC Act. 1956
Awarded Category - I by UGC
E-CONTENT
MATHEMATICS FOUNDATIONS
M.Sc (Data Science) SEM – 1
MODULE – 1 : Fundamentals Of 2
Mathematics
Learning Objective
1. To review basic mathematical concepts and build a foundation for understanding advanced
mathematics topics with relevance to data science
2. To get an understanding of the fundamentals of Discrete Mathematics, Calculus and Linear
Algebra and their relevance in all data science algorithms and concepts
3. To get a broad perspective of the underlying mathematical constructs in various algorithms
used in data science and data analytics
4. To use these mathematical constructs in advanced courses in data science.
5. To get an overview of computational complexity for understanding complexity of computer
science problems and associated algorithms
6. To get a broad perspective of numerical mathematics and its critical importance in solving
problems numerically on a computer.
1
MODULE 1 FUNDAMENTALS OF MATHEMATICS
Learning Objectives
Understand various types of numbers including real and absolute
Understand the concepts of an exponent and logarithm
Define and understand equations and graphs
Understand the meaning of functions of one independent variable
This unit gives an overview of the fundamentals of mathematics starting from real numbers, and
proceeding towards exponents, logarithms, equations, functions, and graphs. This also gives an
overview of the functions in one independent variable.
Figure 1, show a horizontal straight line. All types of real numbers can be represented along this
line. Each number can be represented as exactly one point on the line. Even the reverse is true, that
is, each point represents exactly one number. This line which has a one-to-one correspondence is
called as the real number line.
2
Figure 1: The real number line
As is seen from the figure, 0 (zero) is the pivot around which all numbers are referenced. Positive
numbers are on the right-hand side of 0. Negative numbers are on the left-hand side of 0. The set
of real numbers can be classified into various types.
Natural numbers are positive numbers with the difference between any two consecutive numbers
being 1. Integers, which are whole numbers. Integers are both positive and negative. Natural
numbers can be considered as positive integers. Next are rational numbers. Rational numbers are
those numbers, which can be expressed as the quotient of two integers p and q. Therefore, a rational
number can be expressed mathematically as p/q. Examples of rational numbers are shown in
Figure 1, such as ½, whose value in a decimal representation is 0.5. Or 1/3, whose value is
0.33333..... It is important to note that a number whose decimal pattern is repeating is also a
rational number.
Irrational numbers on the other hand are those numbers, which cannot be expressed in the form
of p/q; These numbers have an infinite number of non-repeating decimal values or representations.
For example, sqrt(2) has a value of 1.41421356237.... This is an infinite nonrepeating pattern.
Another example is the exponent ‘e’ called as Euler number which has a value 2.718281828459....,
which is again an infinite nonrepeating pattern.
3
1.2. Absolute value of a number
The absolute value of a number N is denoted by |N|. Let us consider a cartesian coordinate system,
which is a system of lines (more appropriately called axes), which are perpendicular to each other
as shown in Figure 2 below. This will be explained in more detail in a later section.
Then the absolute value represents the distance of N from zero. The absolute value of a positive
number is the same as the original number. However, the absolute value of a negative number is
simply the value of the number without the negative sign. For example, |3| = 3 and |-5| = 5.
Mathematically, the absolute value of a number N is represented as follows:
𝑁, 𝑓𝑜𝑟 𝑁 ≥ 0
|𝑁| = {
−𝑁, 𝑓𝑜𝑟 𝑁 < 0
1.3. Exponents
The concept of exponents is about representing the product of a number with itself. In general, if
x is any real number or a variable and n is a positive integer, then the x multiplied by itself n times
is represented mathematically as xn. In this case, n is called as the exponent and x is called as the
base. By the definition of exponents, the following two points are important to remember
For any x,
the value of x0 = 1
00 is undefined
Any number or variable x, that is not raised to an exponent is assumed to be raised to the first
power or to the power of 1. Therefore, 3.51 is the same as writing it as 3.5.
4
Assuming a and b are integers and x and y are real numbers, the laws of exponents are given below.
(xa)( xb) = xa+b
(xa)b= xab
(xy)a = xa ya
(x/y)a = xa/ ya
1/ xa = x-a
sqrt(x) = x1/2
1.4. Logarithms
Let x be a real number. Then for x > 0, the common logarithm of x is written as log10(x) or simply
as log (x) or log x. It is the power to which 10 must be raised to get x. 10 is called as the base of
the logarithm. Other positive numbers except for 1 can serve as the base. Logarithms are usually
a very useful mathematical tool to solve problems that involve exponents.
Assuming that, b > 0 and b ≠ 1, and x, y > 0, following are the rules of logarithms
logb (xy) = logb (x) + logb (y)
logb (x/y) = logb (x) - logb (y)
logb xa = a logb x
𝑎
logb √𝑥 = (1/a) (logb x)
1.5. Natural exponents and logarithms
The Euler constant ‘e’ is the most commonly used base for exponential and logarithmic functions
which are usually encountered in engineering and science. From a mathematical perspective the
expression for e is given by
1 𝑛
𝑒 = lim (1 + )
𝑛→∞ 𝑛
5
The exponential functions which are written with the base ‘e’ are called as natural exponential
functions. In general, they are written as
𝑦 = 𝑒𝑥
Consequently, the logarithmic functions which are written with the base ‘e’ are called as natural
logarithmic functions. In general, they are written as:
𝑦 = ln(𝑥) = log 𝑒 𝑥
It is very interesting and important to note that 𝑒 𝑥 and log 𝑒 𝑥 are inverses of each other.
6
6x + 7 = 19. Here the solution for the variable x is 2. If we substitute 2 instead of x in this
equation, on the left side of the equation we get 6 x 2 + 7 which is 19. This is also the value
on the right side of the equation.
3x – 8 = 2y + 12. If we go ahead and solve this equation, we get y = 2 and x = 8.
(𝒙, 𝒚, 𝒛)
7
Figure 3: Three-dimensional Cartesian Coordinate system
When the problem to be solved has a curved surface, then using the cartesian coordinate system
becomes tedious to solve the problem. In such a case, one uses the spherical coordinate system.
In a spherical coordinate system, as shown in
Figure 4
• 𝑟, 𝜃, 𝜑 are the coordinates or axes. 𝑟 is the radial axes, 𝜃, 𝜑 are the angular axes.
• Each of the coordinates is perpendicular to the other
• The spherical coordinate system is dynamic
This system is dynamic in the sense that the coordinates 𝑟, 𝜃, 𝜑 can change their values during
the problem solving itself.
𝒛
(𝑟, 𝜃, 𝜑)
𝒓
𝜽
𝒚
𝜑
8
Exercise
Problems and Questions
1. Explain the meaning of the real number line.
2. What are integers?
3. Explain the concept of rational numbers.
4. How are irrational numbers different from rational numbers?
5. Why are the Euler constant ‘e’ and 𝜋 said to be irrational numbers?
6. What is the cartesian co-ordinate system?
10. (𝑥 4 )5
Find the y-intercept for the following equations. (note: y-intercept is the point at which the line
crosses the y-axis. Alternatively, it is the point where x = 0).
11. 3𝑥 + 𝑦 = 7
12. 9𝑥 − 3𝑦 = 72
9
References
1. Calculus for Business, Economics, And the Social Sciences, Schaum’s Outlines, Edward
T. Dowling, McGraw-Hill 1990.
10
MODULE 2 DISCRETE MATHEMATICS
Learning Objectives
Understand the mathematical notations used in mathematics and logic
Get an overview of set theory
Get an understanding of permutations and combinations
Develop a good understanding of probability theory,
Develop an understanding of Markov chains and
Get an understanding of graph theory
Discrete Mathematics deals with those elements of mathematics which are discrete. This means
that in discrete mathematics we are dealing with operations which use algebra and arithmetic. In
this unit, concepts of the mathematical notations used in mathematics, and logic are presented.
Further this unit gives an overview of set theory, permutations and combinations, probability
theory, Markov chains and graph theory.
2.1 Logic
Propositional logic: A proposition is a collection of declarative statements that has two types of
truth values. The truth value can either be "true” or "false".
By definition
11
• A proposition consists of propositional variables and connectives.
• The propositional variables are denoted by capital letters (A, B, etc).
• The connectives are those that connect the propositional variables.
An example of a proposition is “4 + 5 = 14 – 4” which is “FALSE” since the left side of the
equation returns a value 9 whereas the right side returns a value 10.
Connectives: In propositional logic there are connectives which are used to connect the
propositions. These are “OR”, “AND”, “NOT” or negation, “Implication” (if- then) and “If and
only if”.
Table 1 below, gives the notations which are used for each of the connectives
Each of the connectives operate on two or more propositions and follow a truth table (same as the
Boolean logic in Digital Electronics)
• The OR operation of two propositions A and B is written as 𝐴 ∨ 𝐵
• The truth table is given below
A B A∨B
True True True
True False True
False True True
False False False
In a similar manner, we can create the truth tables for the other operations including AND, NOT,
XOR (exclusive OR) and so on.
12
For example, the truth table for the XOR is given below. The operation returns a True when either
of the propositions is True. In case both the propositions are either true or false, the operation
returns False. This is shown in the table below.
A B A XOR
B
True True False
True False True
False True True
False False False
Predicate Logic deals with predicates, which are propositions containing variables.
A predicate is an expression of one or more variables defined on some specific domain.
A predicate with variables can be made a proposition by either assigning a value to the variable or
by quantifying the variable.
• The following are some examples of predicates:
• Let E(x, y) denote "x = y"
• Let X(a, b, c) denote "a + b + c = 0"
• Let M(x, y) denote "x is married to y"
Quantifiers:
The variable of predicates is quantified by quantifiers.
There are two types of quantifiers in predicate logic:
• Universal Quantifier and Existential Quantifier.
Universal Quantifier
• Universal quantifier states that the statements within its scope are true for every value of
the specific variable. It is denoted by the symbol ∀.
• ∀𝑥𝑃(𝑥) is read as for every value of 𝑥, 𝑃(𝑥) is true.
• Example − "Man is mortal" can be transformed into the propositional form ∀𝑥𝑃(𝑥) where
𝑃(𝑥) is the predicate which denotes x is mortal and the universe of discourse is all men.
13
Existential Quantifier
• Existential quantifier states that the statements within its scope are true for some values of
the specific variable. It is denoted by the symbol ∃.
• ∃𝑥𝑃(𝑥) is read as for some values of 𝑥, 𝑃(𝑥) is true.
• Example − "Some people are dishonest" can be transformed into the propositional
form ∃𝑥𝑃(𝑥) where 𝑃(𝑥) is the predicate which denotes 𝑥 is dishonest and the universe
of discourse is some people.
Hence, a selection choice RGB is different from RBG and BRG and so on.
Therefore,
• In a combination, order of selection does not matter
• In a permutation, order of selection does matter
14
Another example that can be taken is of a combination lock that are available on many
travelling suitcases. Specifically, let us select a 3 number combination lock with the following
points:
• Each number can go from 0 – 9
• The suitcase opens for only one selection of each of the three numbers. For example, let us
assume that the selected choice for opening the lock is 3 5 7.
• Then, unless we get this choice, the suitcase does not open.
• Thus, the conclusion is that the order does matter in this case.
• Therefore, we are talking about a permutation here. So, we should talk about a
permutation lock rather than a combination lock.
2.3 Permutations
Permutations with repetition: Let us consider an example of 9 marbles with 9 different colours
in a box.
• We are asked to choose 3 out of these 9 marbles.
• Since repetition is allowed, each time we can select 1 out of the 9 options, since the selected
marble can be placed back into the box.
• Therefore, the number of ways in which we can choose 3 marbles out of 9 when repetition
is allowed will be
9 x 9 x 9 = 93 = 81
• If we were asked to select 5 out of the available 9 marbles, with repetition allowed, we
would get 95 choices.
• In general, if we must select r of n types, with repetition allowed, the permutations are nr
Permutations without repetition: In this case when repetition is not allowed, the number of
choices available for selection after one is selected reduces by 1 every time. Let us consider the
above example of 9 marbles of different colours again.
• We are asked to choose 3 out of the 9 with repetition not allowed.
• The difference this time is that after we select one marble, we cannot replace it back into
the box.
• The first time we choose a marble, we have 9 choices.
15
• For the next selection, however, we have one choice less since we cannot replace the
selected marble back. Therefore, we have 8 choices.
• For the third selection, we have 7 choices.
• Thus, the number of permutations that are possible for this problem is
9 x 8 x 7 = 504
• In general, if we must choose 9 objects out of the available 9 with no repetition, the number
of choices is:
9 x 8 x 7 x....x 3 x 2 x 1
n! = n x (n – 1) x (n – 2) x....x 3 x 2 x 1
16
Example: In how many ways can the first, and second place be awarded to 10 people?
Solution:
𝟏𝟎! 𝟏𝟎!
= = 𝟗𝟎
(𝟏𝟎 − 𝟐)! 𝟖!
Instead of writing the whole formula, people use different notations such as:
𝑛
𝑛!
𝑟𝑃 =
(𝑛 − 𝑟)!
2.4 Combinations
Combinations without repetition:
Let us consider the following example to explain the concept of combinations without repetition.
Let us assume that there are a set of 3 balls in a box. Each ball is numbered 1, 2 and 3. We are
asked to arrange them. The table below shows the various possibilities for the cases when order
does or does not matter:
123
132
213
123
231
312
321
In the cases where placing each of the ball in a particular order matter, we get 6 options.
However, in case order does not matter, then placing the balls in any order does not make a
difference. Therefore, in the second case (that is when order does not matter), it essentially means
that placing the balls in the order 123 is equivalent to any of the other 5 options.
17
The second case is what is called as a combination and the first case is the permutation.
The formula for finding the number of combinations for choosing r objects out of n available
objects is
𝑛
𝒏! 𝟏 𝒏! 𝒏
𝑟𝐶 = . = =( )
𝑟
(𝒏 − 𝒓)! 𝒓! (𝒏 − 𝒓)! 𝒓!
Where:
• n is the number of things to choose from
• we choose r of them
• no repetition
• order does not matter
• 𝑛
𝑟𝐶 + 𝑟−1𝑛𝐶 = 𝑛+1
𝑟𝐶
𝑛𝑛−1
• 𝑟𝐶 = (𝑛 − 𝑟 + 1) 𝑟−1𝑛𝐶
18
The integers are the set of whole numbers, both positive and negative and can be written compactly
as a set in the following manner:
b. 𝐿𝑒𝑡 𝑊 = {𝑑: 𝑑 > 8, 𝑑 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑦𝑠 𝑖𝑛 𝑎 𝑤𝑒𝑒𝑘} 𝑖𝑠 𝑎𝑙𝑠𝑜 𝑎 𝑣𝑜𝑖𝑑 𝑠𝑒𝑡
𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑜𝑛𝑙𝑦 7 𝑑𝑎𝑦𝑠 𝑖𝑛 𝑎 𝑤𝑒𝑒𝑘.
Membership of a set
Definition 2: The set membership symbol ∈ is used to say that an object is a member of a set.
It has a partner symbol ∉ which is used to say an object is not in a set.
Equal Sets
Definition 3: We say two sets are equal if they have the same members.
𝑬𝒙𝒂𝒎𝒑𝒍𝒆: 𝑆 = {1, 2, 3}, 𝑇 = {2, 3, 1}
19
They are equal sets
Cardinality of a set
Definition 4: The cardinality of a set is its size. For a finite set, the cardinality of a set is the number
of members it contains. In symbolic notation the size of a set S is written |S|.
Intersection of Sets
Definition 5: The intersection of two sets S and T is the collection of all objects that are in both
sets.
It is written S ∩ T
Using curly brace notation, 𝑺 ∩ 𝑻 = {𝒙 : ( 𝒙 ∈ 𝑺 ) 𝒂𝒏𝒅 ( 𝒙 ∈ 𝑻 )}
The symbol ∧ represents the Boolean symbol “and”. Using this we can rewrite
𝑺 ∩ 𝑻 = {𝒙: ( 𝒙 ∈ 𝑺 ) ∧ ( 𝒙 ∈ 𝑻 )}
S∩U= {}
{2, 3, 5}
T∩U= {}
{3, 4, 5}
S∩T∩U = {}
{3, 5}
Disjoint Sets
Definition 6: If A and B are sets and 𝐴 ∩ 𝐵 = ∅ then we say that A and B are disjoint, or disjoint
sets.
Union of Sets
20
Definition 7: The union of two sets S and T is the collection of all objects that are in either set. It
is written S ∪ T.
𝑺 ∪ 𝑻 = {𝒙 : (𝒙 ∈ 𝑺) 𝒐𝒓 (𝒙 ∈ 𝑻 )}
Symbolic equivalent of “OR” is ∨ which lets us re-write the definition of union as:
𝑺 ∪ 𝑻 = { 𝒙: ( 𝒙 ∈ 𝑺 ) ∨ ( 𝒙 ∈ 𝑻 )}
Universal Set
Definition 8: The universal set is the set of all possible objects (at least for a given collection of
set theoretic computations). The universal set is commonly written U
Example: If we declare our universal set to be the set of integers then { 1⁄2 , 2⁄3 } is not a well-
defined set because the objects used to define it are not members of the universal set.
Solution: The symbols {1⁄2 , 2⁄3 } do define a set if a universal set that includes 1⁄2 and 2⁄3 is
chosen. The problem arises from the fact that neither of these numbers are integers.
Compliment of a set:
Definition 9: The compliment of a set S is the collection of objects in the universal set that are not
in S.
𝑆 𝑐 = { 𝑥: ( 𝑥 ∈ 𝑈 ) ∧ ( 𝑥 ∉ 𝑆 )}
or more compactly as
𝑆𝑐 = { 𝑥 : 𝑥 ∉ 𝑆 }
21
However, it should be clear that the compliment of a set always depends on which universal set is
chosen.
There is also a Boolean symbol associated with the complimentation operation which is the NOT
operation.
The notation for NOT is ¬
The definition for compliment of a set S then becomes:
𝑆𝑐 = { 𝑥 : ¬ ( 𝑥 ∈ 𝑆 ) }
Difference of sets:
Definition 10: The difference of two sets S and T is the collection of objects in S that are not in T.
Or alternately,
𝑆 − 𝑇 = {𝑥: (𝑥 ∈ 𝑆) ∧ (𝑥 ∉ 𝑇)}
Subsets:
Definition 11: For two sets S and T we say that S is a subset of T if each element of S is also an
element of T.
Example 9: Subsets
If A = {a, b, c}, then A has eight different subsets. Write down the 8 subsets
22
Solution: The subsets are {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}, {} (or ∅)
Notice that A ⊆ A and in fact each set is a subset of itself. The empty set ∅ is a subset of every set.
2.5.1 Propositions:
Proposition 1: Two sets are equal if and only if each is a subset of the other. In symbolic notation:
(𝐴 = 𝐵) ⇔ ( 𝐴 ⊆ 𝐵 ) ∧ ( 𝐵 ⊆ 𝐴)
Proposition 2: De Morgan’s Laws: Suppose that S and T are sets. De Morgan’s Laws state that
(𝑆 ∪ 𝑇)𝑐 = 𝑆 𝑐 ∩ 𝑇 𝑐
(𝑆 ∩ 𝑇 )𝑐 = 𝑆 𝑐 ∪ 𝑇 𝑐
Venn Diagrams: A Venn diagram is a way of depicting the relationship between sets. Each set is
shown as a circle and circles overlap if the sets intersect.
Some examples of various set theory concepts through Venn diagrams are given in the figure
below:
23
2.6 Probability Theory
Probability is a measurement of the chance of some specified outcome happening. Probability is
usually measured as a percentage (10%, 20% etc. up to 100%) or as a fraction between 0 and 1
Example of independent events: The results of multiple coin flips. Each coin flip is independent
of the previous coin flip. There is no influence or impact of the previous coin flip on the subsequent
flip. Hence such events are termed as independent.
Example of dependent events: The results of drawing various cards from the same deck are
dependent, if we assume that the cards are not added back to the deck after being drawn. This is
also known as “without replacement”.
When the first card is drawn from a deck, there is one less card in the deck.
Therefore, the outcome affects the probability of drawing other cards from the deck because it
reduces the possible outcomes.
Rule of Product: When calculating the probability that multiple independent events will all occur,
the probabilities are multiplied. This is the rule of product.
24
Example: What is the chance of rolling a sum of 9 followed by a sum of 10 on a pair of traditional
dice?
The Rule of Sum: The probability that a composite event of multiple dependent events will occur
is a sum of the probabilities of each individual event, as long as the parts cannot occur at the same
time.
Conditional Probability
A conditional probability is a probability that is calculated with knowledge about the outcome of
some other event.
For instance, a question might be:
• We have a bag with 3 red marbles and 2 blue marbles
• We pick out one red marble without replacing it
• What is the probability that the next pick will also be a red marble?
Note: In this case, the odds that our second pick will be a red marble are dependent on what the
first pick was.
25
Whenever additional knowledge is given about the outcome of an experiment, it potentially affects
probability.
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
This calculation can be read as "the probability of A given B is the probability of both A and B (the
intersection of A and B) divided by the probability of B.”
Bayes Theorem
Bayes theorem is a way to relate conditional probabilities to each other. It follows directly from
the definition of conditional probability, with some algebraic rearrangement.
Let A and B be events, then 𝑃(𝐴|𝐵) and 𝑃(𝐵|𝐴) are related by the following equations
𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
𝑃(𝐴|𝐵)𝑃(𝐵)
𝑃(𝐵 |𝐴) =
𝑃(𝐴)
26
Markov chains are widely used in economics, game theory, queueing (communication) theory,
genetics, and finance.
The figure below represents a two-state Markov process. A and B are called as states of the Markov
process. Each number represents the probability of the Markov process changing from one state to
another state, with the direction indicated by the arrow from one state to another.
For example, if the Markov process is in state A, then the probability that it changes to state B is
0.7, while the probability it remains in state A is 0.3. Similarly, the probability that the Markov
process remains in state B is 0.6, while the probability that it changes to state A is 0.4. It is
important to remember that the sum of all the probabilities that go out of a particular state
(including the probability that it remains in the same state) should always be equal to 1.
27
We get the following Markov process
28
End
A B
A
Start
B
State Transition Diagram and Transition probability Matrix
A 3 state Markov chain shown by a state transition diagram.
Consider a Markov chain with three possible states 1, 2, and 3 and the following transition
probability matrix
1 1 1
4 2 4
1 2
𝑃= 0
3 3
1 1
(2 0
2)
The corresponding state transition diagram is shown in the figure below
29
2.8 Graph Theory
Historical perspective of Graph Theory: Seven bridges of Königsberg (Euler)
The history of graph theory may be specifically traced to 1735, when the Swiss
mathematician Leonhard Euler solved the Königsberg bridge problem. The Königsberg bridge
problem was an old puzzle concerning the possibility of finding a path over every one of seven
bridges that span a forked river flowing past an island—but without crossing any bridge twice.
Euler argued that no such path exists. His proof involved only references to the physical
arrangement of the bridges, but essentially, he proved the first theorem in graph theory.
A graph is a set of vertices (that is, points or nodes) and edges (or lines) that connect the vertices.
The figure below shows various types of graphs.
A graph without loops and with only one edge between any two vertices is called a simple graph
(a).
A graph in which any two vertices are joined by more than one edge is called a multigraph (b).
When each vertex is connected by an edge to every other vertex, the graph is called a complete
graph (c).
A directed graph (d) is one in which the vertices are given a direction.
Graph theory is used in several disciplines including computer science, biology, chemistry and so
on.
30
Graph Theory – A Real Life Problem
Traveling salesman problem, an optimization problem in graph theory in which the nodes
(cities) of a graph are connected by directed edges (routes), where the weight of an edge indicates
the distance between two cities. The problem is to find a path that visits each city once, returns to
the starting city, and minimizes the distance traveled. The only known general
solution algorithm that guarantees the shortest path requires a solution time that grows
exponentially with the problem size (i.e., the number of cities). This is an example of an NP-
complete problem (from nonpolynomial), for which no known efficient (i.e., polynomial
time) algorithm exists.
31
Exercise
Problems and Questions
1. There are 10 people in a room who have come for lunch. Since there is place only for 3 on
a table, the waiter needs to randomly invite 3 out of these 10 to the table. What are the
different combinations that the waiter can choose from?
2. A combination lock has 4 sets of numbers, each going from 0 to 9. What is the number of
ways in which one can set a numerical lock for the same?
3. Out of a possible 7 consonants and 4 vowels one needs to form words having 3 consonants
and 2 vowels. What are the number of possible ways in which this can be accomplished?
4. We have two sets. Set A has 25 objects, set B has 30 objects. Their union has 45 objects.
How many objects will be there in the intersection of A and B?
5. There are a group of 50 friends who have come out for a party to a hotel. In the group, 20
like coca-cola, and 30 like tea or coffee. Every friend likes at least one of these cold or hot
drinks. Determine the number of friends who like the hot beverage.
6. Find out the probability of getting the sum of 9 when two dice are tossed simultaneously.
7. A person has three jobs that he does daily in 10 hours. He spends 70% of his time on his
day job. He spends 20% time on the second job and the remaining 10% time on the third
job. The probability that he takes a coffee in his first job is 1%, while it is 2% on his second
job and 5% on the third job since he keeps getting increasingly tired by the time he is on to
his 3rd job. What is the probability that he is working on the second job while taking coffee?
32
References
1. Discrete Maths – One Semester course on Sets, Logic, Proofs, Probability, Graph theory by
Dr. Trefor Bazett: https://bit.ly/3uSZFEt
2. 6.041 Probabilistic Systems Analysis and Applied Probability: https://bit.ly/3PxVQMZ
3. https://www.mathsisfun.com/combinatorics/combinations-permutations.html
4. https://www.electronics-tutorials.ws/boolean/demorgan.html
5. https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
33
MODULE 3 DIFFERENTIAL CALCULUS
Learning Objectives
Understand and comprehend limits of a function of one independent variable
Understand and evaluate continuity of functions of one independent variable
Develop a good foundation on differentiability of functions of one independent variable
Understand the meaning of the slope / tangent / rate of change of a function of one
independent variable
Evaluate the derivatives of functions of one and multiple independent variables.
Understand and calculate local and global minima and maxima of functions of one and
multiple independent variables.
Apply principles of maxima and minima to optimization problems.
This unit gives a detailed understanding of the principles of differential calculus and its
applications in data science. Starting from limits, continuity, and differentiability of a function of
one independent variable, we discuss extrema of functions and their applications to optimization
problems which are among the most common problems to be solved in data science.
Let us consider the following two functions and their corresponding graphs
34
Figure 5: Graph of function f(x)
+𝟐, 𝒙 ≥ 𝟎
Example 2: 𝑔(𝑥) = {
−𝟐, 𝒙 ≤ 𝟎
With reference to the Example 1 and Example 2, we now consider the concept of limit of a
function.
Example 1: Let us talk about approaching the limit from both sides in Figure 5
In Figure 5, as 𝒙 𝒂𝒑𝒑𝒓𝒐𝒂𝒄𝒉𝒆𝒔 𝟎 from either side (that is from the right and left sides of 0)
lim[𝑓(𝑥)] = 2
𝑥→0
35
lim [𝑓(𝑥)] = −1.7
𝑥→3.33
Example 2: Let us consider the discontinuous function in Figure 6. As 𝑥 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 0 from the
right side in Figure 6, 𝑔(𝑥) 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 2. This is called a one-sided limit.
But as 𝑥 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 0 from the left side in the figure, 𝑔(𝑥) 𝑎𝑝𝑝𝑟𝑜𝑎𝑐ℎ𝑒𝑠 − 2. The limit does not
exist, therefore, since 𝑔(𝑥) does not approach a single number as 𝒙 𝒂𝒑𝒑𝒓𝒐𝒂𝒄𝒉𝒆𝒔 𝟎 from both
sides
Definition of a Limit: If the values of the function 𝑓(𝑥) of a function 𝑓 get closer to a single
number 𝐿 for all values of 𝑥, as 𝑥 gets closer to but does not equal 𝑎, then 𝐿 is defined as the limit
of 𝑓(𝑥) as 𝑥 approaches 𝑎, and is written as
𝐥𝐢𝐦 𝒇(𝒙) = 𝑳
𝒙→𝒂
A continuous function is one whose graph can be drawn without lifting pencil from paper.
Formally, a function f is said to be continuous at x = a, if each of the following three conditions
are satisfied:
36
lim 𝑓(𝑥) = 𝑓(𝑎)
𝑥→𝑎
12
10
8
A
∆𝑦A
6
∆𝑥𝐴
4
0
0 20 40 60 80 100 120
37
a. The function needs to be continuous at the identified point
b. It must have a tangent which is unique at that point.
The derivative of a function measures the slope and the instantaneous rate of change of the original
function 𝑓(𝑥) at a given point. The process of finding the derivative of a function is called as
differentiation
The derivative of a function can be written in several different ways, with many symbols which
also includes some of the Greek alphabets.
𝑑𝑦 ′ 𝑑𝑓 𝑑
𝑓 ′ (𝑥), , 𝑦 , , [𝑓(𝑥)]
𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑑𝑦 ′ 𝑑𝜙 𝑑
𝜙 ′ (𝑡), ,𝑦 , , [𝜙(𝑡)]
𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑑𝑦
𝑓 ′ (𝑎) 𝑎𝑛𝑑
𝑑𝑥
38
3.3.1 Rules of differentiation:
By differentiation, one means finding the derivative of a given function 𝑦 = 𝑓(𝑥) . There are
several rules to find the derivative of functions. In listing the rules of differentiation for the function
𝑓(𝑥) it should be noted that the function could be some combination of some other functions which
we can generalize as 𝑔(𝑥)𝑎𝑛𝑑 ℎ(𝑥). These are unspecified functions of the independent variable
𝑥.
In general, the derivative of a variable 𝑥 raised to the first power is always equal to the
coefficient of the variable.
39
Some examples of the fundamental rules of differentiation
a. Given 𝑓(𝑥) = 4, 𝑓 ′ (𝑥) = 0
b. Given 𝑓(𝑥) = 5𝑥 + 9, 𝑓 ′ (𝑥) = 5
c. Given 𝑓(𝑥) = 𝑥10 , 𝑓 ′ (𝑥) = 10𝑥 (10−1) = 10𝑥 9
The following rules of differentiation apply to a function 𝑓(𝑥) which is a combination of two
functions 𝑔(𝑥)𝑎𝑛𝑑 ℎ(𝑥).
e. Product rule:
For 𝑓(𝑥) = 𝑔(𝑥) × ℎ(𝑥), 𝑓 ′ (𝑥) = 𝑔(𝑥). ℎ′ (𝑥) + 𝑔′ (𝑥). ℎ(𝑥)
f. Quotient rule:
[ℎ(𝑥).𝑔′ (𝑥)−𝑔(𝑥).ℎ′ (𝑥)]
Given 𝑓(𝑥) = 𝑔(𝑥) ÷ ℎ(𝑥), 𝑓 ′ (𝑥) = [ℎ(𝑥)2 ]
40
Sum and difference rule:
Given 𝑓(𝑥) = 8𝑥 4 + 5𝑥 3 , 𝑓 ′ (𝑥) = 32𝑥 3 + 15𝑥 2
Product rule:
Given 𝑓(𝑥) = 8𝑥 2 (4𝑥 − 3), 𝑓 ′ (𝑥) = 8𝑥 2 (4) + 16𝑥(4𝑥 − 3)
= 32𝑥 2 + (64𝑥 2 − 48𝑥)
= 86𝑥 2 − 48𝑥
Quotient rule:
[ℎ(𝑥).𝑔′ (𝑥)−𝑔(𝑥).ℎ′ (𝑥)]
Given 𝑓(𝑥) = 𝑔(𝑥) ÷ ℎ(𝑥), 𝑓 ′ (𝑥) = [ℎ(𝑥)2 ]
41
Then, using the expression 𝑓 ′ (𝑥) = 𝑔′ [ℎ(𝑥)]. ℎ′ (𝑥), we get
1
𝑔′ (ℎ(𝑥)) = (2)(8𝑥 + 9)−1/2
And substituting the appropriate values into the chain rule formula, we get
1 4
𝑓 ′ (𝑥) = = (2)(8𝑥 + 9)−1/2 . 8 =
√(8𝑥+9)
The function 𝑓(𝑥) is said to be decreasing at 𝑥 = 𝑎 if the following condition holds. In the area
near the point [𝑎, 𝑓(𝑎)] the graph of the function decreases as it moves from left to right.
It is important to realize that in the absence of a graph, the derivatives of a function can be used to
identify important features of the function including the concepts of an increasing and decreasing
functions.
Earlier we understood that the first derivative of a function measures the following two concepts
both of which are equivalent: Rate of change of the function which is also the slope of the function.
Therefore, if the first derivative is positive at 𝑥 = 𝑎, it means that (see Figure 8 below) the
following points hold good:
• 𝑓 ′ (𝑎) > 0
• Its rate of change is positive and
42
• Its slope is positive, that is increasing
160
140
120
100
80
60
40
20
0
-15 -10 -5 0 5 10 15
𝑥 = 𝑎1 𝑥 = 𝑎2
Concave upwards
(Convex) Concave upwards
(Convex)
If the first derivative is negative at 𝑥 = 𝑎, it means (see Figure 9 below) the following points hold
good:
• 𝑓 ′ (𝑎) < 0
• Its rate of change is negative and
• Its slope is negative that is deceasing
43
20
0
-15 -10 -5 0 5 10 15
-20
-40
-60
-80
-100
-120
-140
-160
𝑥 = 𝑎2
𝑥 = 𝑎1
Concave downwards
44
When we do not have a graph to guide us, the second derivative of a function provides useful
information about concavity
If 𝑓 ′′ (𝑎) > 0, 𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑛𝑐𝑎𝑣𝑒 𝑢𝑝𝑤𝑎𝑟𝑑 𝑎𝑡 𝑥 = 𝑎
If 𝑓 ′′ (𝑎) < 0, 𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑛𝑐𝑎𝑣𝑒 𝑑𝑜𝑤𝑛𝑤𝑎𝑟𝑑 𝑎𝑡 𝑥 = 𝑎
The Table 2 below explains the conditions to decide on increasing/decreasing functions along with
their concavity:
Step 1: For a function 𝑓(𝑥) to be at a relative maximum or minimum at a point 𝑥 = 𝑎, its first
derivative must equal zero at a. This point is called as the critical point.
45
Step 2: To specifically determine if at point a, the function is a maximum or a minimum, we
calculate the 2nd derivative of the function at point a.
Figure 10 and Figure 11 below give a graphical perspective of a function at relative minimum and
relative maximum.
160
140
120
100
80
60
40
20
0
-15 -10 -5 0 5 10 15
Relative minimum
at x = a
46
Relative maximum
at x = a
20
0
-15 -10 -5 0 5 10 15
-20
-40
-60
-80
-100
-120
-140
-160
This is shown in Figure 12 below. In 3 dimensions, one talks about an inflection point as well as
a saddle point, where in one of the dimensions the function gives a maximum, while in the other
dimension the function results in a minimum.
47
Figure 12: Graphical representation of an Inflection point
𝑦 = 𝑥3
48
optimizing the supply chain routes for a variety of products, ensuring maximum supplies with
minimum travel time and distance. There are numerous other problems which need optimization
in terms of multiple variables and optimization for cost, quality, or time or a combination of two
or more of these factors. Some examples may be
• maximize profit subject to a capacity limitation of the current production plant
• minimize cost subject to meeting a minimal quota of output
The function which is to be optimized is usually referred to as the objective function. The function
setting the limitation is called the constraint. By expressing one variable in the constraint in terms
of the other and substituting the new expression back into the objective function, one can optimize
a function subject to a constraint using ordinary differentiation.
From a differential calculus perspective, an optimization problem involves finding the relative
maximum or minimum of a function of one or usually multiple independent variables and
constraints. The process of finding the extrema is well known as we have explained in an earlier
section involving the calculation of the first and second derivatives of the function.
a. Take the first derivative, set it equal to zero and solve for 𝑥0 to find the critical values
• 𝑓 ′ (𝑥) = 9𝑥 2 − 72𝑥 + 135
• = 9(𝑥 − 3)(𝑥 − 5) = 0
• So critical values are 𝑥 = 3 𝑎𝑛𝑑 𝑥 = 5
b. Now take the second derivative
• 𝑓 ′′ (𝑥) = 18𝑥 − 72
• Evaluate it at the critical values that is at 𝑥 = 3 𝑎𝑛𝑑 𝑥 = 5
49
• 𝑓 ′′ (𝑥 = 3) = −18 < 0, which means the function at this point is concave down,
and is a relative maximum
• 𝑓 ′′ (𝑥 = 5) = 18 > 0, which means the function at this point is concave up, and is
a relative minimum
Example 2: Optimize (maximize) profits 𝜋 for a firm, given the total revenue 𝑅 = 6400𝑄 − 20𝑄 2
and the total cost 𝐶 = 𝑄 3 − 5𝑄 2 + 400𝑄 + 52000, assuming 𝑄 > 0, find
a. The profit function 𝜋 = 𝑅 − 𝐶
b. Find the critical values
c. Take the 2nd derivative and evaluate it at the critical points.
Solution:
a. The profit function 𝜋 = 𝑅 − 𝐶
• 𝜋 = 6400𝑄 − 20𝑄 2 − (𝑄 3 − 5𝑄 2 + 400𝑄 + 52000)
b. Find the critical values: Take the first derivative, set it to zero and solve for 𝑄0 to find the
critical values
• 𝜋 ′ = − 3𝑄 2 − 30𝑄 + 6000 = − 3(𝑄 2 + 10𝑄 − 2000) = 0
50
• Thus, profit is maximized at 𝑄 = 40, where 𝜋(40) = −(40)3 − 15 (40)2 +
6000(40) − 52000 = 100,000
1
𝑓 ′ (𝑥) = 𝑔(𝑥). 𝑔′ (𝑥);
𝑔′ (𝑥)
𝑇ℎ𝑎𝑡 𝑖𝑠 𝑓 ′ (𝑥) = 𝑔(𝑥)
5
So 𝑓 ′ (𝑥) = 20𝑥 4 . 𝑒 4𝑥
51
𝑔′ (𝑥)
𝑓 ′ (𝑥) = 𝑔(𝑥)
1
So 𝑓 ′ (𝑥) = 𝑥
Exponential and logarithmic functions are embedded in many practical applications across various
areas of science, engineering, economics, finance, social sciences and many industry verticals.
Both the exponent and ln can be used interchangeably to solve real life problems
Some examples are:
• decay of radioactive materials
• usage of the exponent in electromagnetic theory
• Percentage rate of change in literacy levels of a developing country
• Usage in machine learning and deep learning problems
In case of a function 𝑦 = 𝑓(𝑥) that depends only on one independent variable x, the domain of
the function f(x) is defined to be the set of all x values for which the function is defined, and
the range of the function is the set of all 𝑦 values that 𝑓 takes. (Please refer to
https://www.intmath.com/functions-and-graphs/2a-domain-and-range.php for more information
on domain and range of a function).
52
By convention, 𝑧 is called the dependent variable, and 𝑥 𝑎𝑛𝑑 𝑦 are the independent variables.
The partial derivative of 𝑧 with respect to 𝑥 measures the rate of change of 𝑧 with respect to 𝑥
while 𝑦 is held constant.
𝜕𝑧 𝜕𝑓
It is written as 𝜕𝑥 or 𝜕𝑥 , 𝑓𝑥 (𝑥, 𝑦) or 𝑧𝑥 and is defined as
Similarly, the partial derivative of 𝑧 with respect to 𝑦 measures the rate of change of z with
respect to y while x is kept constant
53
𝜕𝑧 𝜕(𝑥 3 )
• 𝑓𝑥 = 𝜕𝑥 = (5𝑦 4 ) = (5𝑦 4 ). 3𝑥 2 = 15𝑥 2 𝑦 4
𝜕𝑥
Similarly, when differentiating with respect to y, treat the x term as a constant by joining it with
the constant term. So, we can rewrite the function as 𝑧 = (5𝑥 3 )𝑦 4. Thus
𝜕𝑧 𝜕(𝑦 4 )
• 𝑓𝑦 = 𝜕𝑦 = (5𝑥 3 ) = (5𝑥 3 ). 4𝑦 3 = 20𝑥 3 𝑦 3
𝜕𝑦
Product Rule
• Given 𝑓(𝑥, 𝑦) = 𝑧 = 𝑔(𝑥, 𝑦). ℎ(𝑥, 𝑦)
𝜕𝑧 𝜕ℎ 𝜕𝑔
𝑓𝑥 = = 𝑔(𝑥, 𝑦) + ℎ(𝑥, 𝑦)
𝜕𝑥 𝜕𝑥 𝜕𝑥
𝜕𝑧 𝜕ℎ 𝜕𝑔
𝑓𝑦 = = 𝑔(𝑥, 𝑦) + ℎ(𝑥, 𝑦)
𝜕𝑦 𝜕𝑦 𝜕𝑦
Quotient Rule
• Given 𝑓(𝑥, 𝑦) = 𝑧 = 𝑔(𝑥, 𝑦)/ℎ(𝑥, 𝑦)
𝜕𝑔 𝜕ℎ
𝜕𝑧 (ℎ(𝑥, 𝑦) 𝜕𝑥 − 𝑔(𝑥, 𝑦) 𝜕𝑥 )
=
𝜕𝑥 [ℎ(𝑥, 𝑦)]2
𝜕𝑔 𝜕ℎ
𝜕𝑧 (ℎ(𝑥, 𝑦) 𝜕𝑦 − 𝑔(𝑥, 𝑦) 𝜕𝑦)
=
𝜕𝑦 [ℎ(𝑥, 𝑦)]2
54
𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑥𝑥 = ( ) = 𝜕𝑥 2
𝜕𝑥 𝜕𝑥
𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑦𝑦 = 𝜕𝑦 (𝜕𝑦) = 𝜕𝑦 2
𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑥𝑦 = (𝑓𝑥 )𝑦 = 𝜕𝑦 (𝜕𝑥) = 𝜕𝑦𝜕𝑥
𝜕 𝜕𝑧 𝜕2 𝑧
• 𝑓𝑦𝑥 = (𝑓𝑦 )𝑥 = 𝜕𝑥 (𝜕𝑦) = 𝜕𝑥𝜕𝑦
Note:
• A cross-partial derivative measures the rate of change of a 1st order partial derivative with
respect to the other independent variable.
• By Young’s theorem, if both the cross partial derivatives are continuous, they will be identical
2. The 2nd-order direct partial derivatives, when evaluated at the critical point (𝑎, 𝑏),
• must both be positive to identify a minimum.
55
• That is: [𝑓𝑥𝑥 (𝑎, 𝑏), 𝑓𝑦𝑦 (𝑎, 𝑏)] > 0
• must both be negative to identify a maximum.
• That is: [𝑓𝑥𝑥 (𝑎, 𝑏), 𝑓𝑦𝑦 (𝑎, 𝑏)] < 0
3. The product of the 2nd-order direct partial derivatives when calculated at the critical point
must exceed the product of the cross-partials calculated at the same critical point.
Since 𝑓𝑥𝑦 = 𝑓𝑦𝑥 ,
• we have [𝑓𝑥𝑥 (𝑎, 𝑏). 𝑓𝑦𝑦 (𝑎, 𝑏)] > [𝑓𝑥𝑦 (𝑎, 𝑏)]2
2
• or [𝑓𝑥𝑥 (𝑎, 𝑏). 𝑓𝑦𝑦 (𝑎, 𝑏)] − [𝑓𝑥𝑦 (𝑎, 𝑏)] > 0
4𝑥 2 −9𝑥
2. lim
𝑥→3 𝑥+7
1
3. lim( 5𝑥 2 − 13)3
𝑥→3
Based on the rules that help in evaluating if a function is continuous or discontinuous at a particular
point, determine points of discontinuity, if any, for the following functions
4. 𝑓(𝑥) = 7𝑥 2 − 4𝑥 + 23
56
5. 𝑔(𝑥) = (5𝑥 5 − 8𝑥 3 + 26𝑥)/125
𝑥−7
6. ℎ(𝑥) = 𝑥
By assuming 𝑢(𝑥) 𝑎𝑛𝑑 𝑣(𝑥) as functions of the independent variable 𝑥, write down the rules of
𝑑𝑦
differentiation for the following in terms of 𝑑𝑥
13. 𝑦 = 25 − 6𝑥
14. 𝑦 = 9𝑥 4
15. 𝑦 = 𝑥 3 (5𝑥 + 4)
18. 𝑦 = (𝑥 2 − 7𝑥 + 4)3
57
1
19. 𝑦 = 4𝑥 3 +8𝑥+5
20. 𝑦 = √16 − 𝑥 2
58
References
1. Calculus for Business, Economics, And the Social Sciences, Schaum’s Outlines, Edward T.
Dowling, McGraw-Hill 1990.
2. Some graphs are drawn courtesy GeoGebra https://www.geogebra.org
3. https://www.intmath.com/functions-and-graphs/2a-domain-and-range.php
59
MODULE 4 LINEAR ALGEBRA
Learning Objectives
Understand about vectors and vector algebra.
Understand about matrices, various types of matrices and matrix algebra.
Understand vector spaces and its features including linear independence, basis functions,
orthogonality, orthogonal bases
Understand determinants, their properties, and operations.
Solve a system of simultaneous equations using Cramer’s rule.
Solve a system of simultaneous linear equations 𝐴𝑥 = 𝑏 using Gaussian Elimination and
using matrix formulation (that is using Echelon matrices and Row Canonical forms).
Obtain Inverse of a matrix using Echelon matrices and Row Canonical forms.
Solve eigenvalue equations of the form 𝐴𝑥 = 𝜆𝑥.
Understand about Singular Value Decomposition
Obtain the singular values and eigenvectors of non-square matrices
Obtain an overview of tensors
Linear Algebra has some of the most important mathematical concepts that are used in data
science. It uses concepts of vectors, matrices, tensors and in general vector spaces. At the most
60
general level, there are 5 distinct types of problems that Linear Algebra attempts to solve. These
are the following
1. 𝐴𝑥 = 𝑏
2. 𝐴𝑥 = 𝜆𝑥
3. 𝐴𝑣 = 𝜎𝑢
4. 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ||𝐴𝑥||2 /||𝑥||2
5. 𝐹𝑎𝑐𝑡𝑜𝑟𝑖𝑧𝑒 𝑡ℎ𝑒 𝑚𝑎𝑡𝑟𝑖𝑥 𝐴
Out of these 5 problems, we will be considering the first three problems. The first problem of 𝐴𝑥 =
𝑏, deals with a system of simultaneous linear equations. The second problem 𝐴𝑥 = 𝜆𝑥, is also
referred to as the eigenvalue problem. The third problem 𝐴𝑣 = 𝜆𝑢 is also referred to as the Singular
Value Decomposition problem or SVD in short. To solve these problems, one needs to understand
the concepts of vectors, matrices, and their operations. Vectors and matrices are elements of
something called as a Vector Space.
𝑟 = |𝑟⃑| = √𝑥 2 + 𝑦 2 + 𝑧 2
61
We say that this vector exists in ℝ3 as a 3-tuple (x, y, z). The most general definition of a vector
is that it is a set of n-tuples. As an example, let’s consider the weights of 6 students, in kilograms:
55, 62, 44, 50, 57, 41. We can assign a symbol 𝑥1 to the first element 55, 𝑥2 to the second element
62 and so on. Each of these 𝑥𝑖 is termed as a tuple. We combine these tuples into a linear array or
as a set of numbers, and write the same as
𝑥 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 )
Such an array is called as a vector. In the above case, we say that this is a vector in ℝ6 since we
have 6 elements, and each element is a real number. A vector in ℝ𝑛 will contain n elements.
Row vector and column vector
The vector 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 … 𝑥𝑛−2 , 𝑥𝑛−1 , 𝑥𝑛 ) is called as a row vector since all elements are
arranged in one row. An alternate way to write a vector is in a column form, called as a column
vector. 𝑥 𝑇 is called as the transpose of the vector 𝑥 and is written in the column form as follows:
𝑥1
𝑥2
𝑇
𝑥 = 𝑥3
𝑥𝑛−1
( 𝑥𝑛 )
4.1.1 Vector Algebra
Now that we have defined a vector either as a set arranged in a row or as a column, let us define
its algebra or the various allowed operations between two or more vectors. Consider two vectors
𝑢 = (𝑢1 , 𝑢2 , 𝑢3 … 𝑢𝑛−1 , 𝑢𝑛 ) and 𝑣 = (𝑣1 , 𝑣2 , 𝑣3 … 𝑣𝑛−1 , 𝑣𝑛 ) in ℝ𝑛 . Then the following properties
hold good:
62
Negative of a vector: −𝑢 = (−1)𝑢 and 𝑢 − 𝑣 = 𝑢 + (−𝑣)
Linear combination of m vectors: Let’s suppose we have been given vectors 𝑢1 , 𝑢2 , 𝑢3 ... 𝑢𝑚 in
ℝ𝑚 and scalars 𝑐1 , 𝑐2 , 𝑐3 … 𝑐𝑚 in ℝ. The linear combination 𝑣 of these m vectors is given by
𝑣 = (𝑐1 𝑢1 + 𝑐2 𝑢2 + ⋯ + 𝑐𝑚 𝑢𝑚)
Dot (or inner) product of two vectors: Let us consider two vectors 𝑢 = (𝑢1 , 𝑢2 , 𝑢3 … 𝑢𝑛−1 , 𝑢𝑛 )
and 𝑣 = (𝑣1 , 𝑣2 , 𝑣3 … 𝑣𝑛−1 , 𝑣𝑛 ) in ℝ𝑛 . Then their dot product or inner product is defined as
𝑢. 𝑣 = 𝑢1 𝑢1 + 𝑢2 𝑣2 + 𝑢3 𝑣3 … + 𝑢𝑛 𝑣𝑛
Norm (Length) of a vector: The norm or length of a vector 𝑣 in ℝ𝑛 is denoted by ||𝑣|| is defined
as the positive square root of the dot product of the vector with itself. That is
1 𝑣
The unit vector 𝑣̂ = ||𝑣|| 𝑣 = ||𝑣||.
63
𝑣⃗ + ⃗0⃗ = 𝑣⃗ Existence of the zero vector ⃗0⃗
𝑢
⃗⃗ + 𝑣⃗ = 𝑣⃗ + 𝑢
⃗⃗ (Commutativity)
(𝑎 + 𝑏)𝑣⃗ = 𝑎𝑣⃗ + 𝑏𝑣⃗ (Scalar distributivity)
𝑎(𝑏𝑣⃗) = (𝑎𝑏)𝑣⃗ (Multiplicative associativity)
1𝑣⃗ = 𝑣⃗ (Multiplicative identity).
𝑎(𝑣⃗ + 𝑤
⃗⃗⃗) = 𝑎𝑣⃗ + 𝑎𝑤
⃗⃗⃗ (Vector distributivity)
Some real-life examples of vectors are given below. It is to be noted that all of these vectors
constitute vector spaces and satisfy the two fundamental axioms of vector spaces , namely vector
addition and scalar multiplication.
• Geometric vectors: This example of a vector may be familiar from high school mathematics
and physics. Geometric vectors are directed segments, which can be drawn (at least in two
dimensions).
64
• Polynomials are also vectors
• Two polynomials can be added together, which results in another polynomial.
• They can be multiplied by a scalar λ ∈ ℝ, and the result is a polynomial as well.
• Therefore, polynomials are (rather unusual) instances of vectors.
• Note that polynomials are very different from geometric vectors.
• While geometric vectors are concrete “drawings”, polynomials are abstract concepts.
• However, they are both vectors in the sense previously described.
Subspaces: As the name suggests, a vector subspace is a vector space which is a subset of a larger
vector space. For example, if the vector space is defined in ℝ3 a subspace can be either in ℝ2 or
ℝ1 .
65
Linear Independence, Basis, Dimension
Let us consider that 𝑉 is a vector space in the field ℝ𝑛 . We now examine the concept of linear
dependence and independence.
Let 𝑣1 , 𝑣2 , 𝑣3 … 𝑣𝑛 be vectors in the vector space 𝑉 and 𝑎1 , 𝑎2 , 𝑎3 , … 𝑎𝑛 are scalars in ℝ and not all
of them are zero. Then the vectors are said to be linearly dependent, if
𝑎1 𝑣1 + 𝑎2 𝑣2 + ⋯ + 𝑎𝑛 𝑣𝑛 = 0
Examples of bases
A vector space in ℝ𝑛 : Let us consider the following n vectors in ℝ𝑛 :
These vectors ae linearly independent since the dot product of any two vectors is 0.
66
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
[ ],[ ],[ ],[ ],[ ],[ ]
0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1
These form a basis of 2 × 3 matrices such that any 2 × 3 matrix can be expressed in terms of a
linear combination of the above 6 matrices.
𝑣𝑖 . 𝑣𝑗 = 𝛿𝑖𝑗
0, 𝑖𝑓 𝑖 ≠ 𝑗
where, 𝛿𝑖𝑗 = {
1, 𝑖𝑓 𝑖 = 𝑗
The above matrix has m rows and n columns. Thus, the mathematical representation of such a
matrix is 𝐴𝑚𝑛 . Alternatively, we say that 𝐴 is a matrix in ℝ𝑚×𝑛 .
67
Matrices play an important and central role in linear algebra. They can be used to represent systems
of simultaneous linear equations in a compact form such as given below:
This is the first of the 5 general problems of linear algebra. The above system of simultaneous
linear equations can be written in an even more compact form using the concept of vectors and
matrices as follows:
𝐴𝑥 = 𝑏
In the above equation, 𝐴 is a matrix, 𝑥 and 𝑏 are vectors. Specifically, with respect to the system
of simultaneous linear equations,
Each row and each column of the matrix 𝐴 can be considered as a vector.
A m x n matrix can be represented compactly as 𝐴 = [𝑎𝑖𝑗 ] where, 𝑎𝑖𝑗 is 𝑖𝑗𝑡ℎ element of the matrix.
Further, 𝑖, 𝑗 are indices of the matrix and can take values from 𝑖 = 1, 2, … . 𝑚 𝑎𝑛𝑑 𝑗 = 1, 2, … 𝑛.
Two matrices can be added only if both have the same dimensions. Two matrices of dimensions
m x n can be added elementwise. That is,
If 𝐴 = [𝑎𝑖𝑗 ] and 𝐵 = [𝑏𝑖𝑗 ] are two matrices with the same dimensions, m x n, then
68
𝐴 + 𝐵 = [𝑎𝑖𝑗 + 𝑏𝑖𝑗 ]
𝑐. 𝐴 = [𝑐. 𝑎𝑖𝑗 ]
Note that 𝐴 + 𝐵 and 𝑐𝐴 have the same dimensions as the original matrices.
−𝐴 = (−1)𝐴
𝐴 − 𝐵 = 𝐴 + (−𝐵)
Matrix Multiplication: The product of two matrices is not straightforward. There are certain
conditions to be met before we can calculate the product of two matrices.
1. If 𝐴 and B, are matrices of dimension m x n, they cannot be multiplied.
2. For to matrices to be multiplied, the dimension of the column of the first matrix should be
equal to the dimension of the row of the second matrix. Specifically, to be able to multiply
two matrices, 𝐴 and 𝐵, if 𝐴 has dimensions m x p, then 𝐵 should necessarily have the
dimension of its rows as p. The column dimension can be anything, say n
69
3. Therefore, for a matrix 𝐴 with dimension m x p and 𝐵 with dimension p x n, the product
of these two matrices 𝐶 = 𝐴 × 𝐵 has dimension m x n.
4. If the dimensions of the rows and columns of a matrix are the same, that is n x n, it is called
a square matrix. Two square matrices having the same dimensions can be multiplied and
the dimensions of the product matrix also is the same as the individual matrices.
Transpose of a matrix: The transpose of a matrix 𝐴 is written as 𝐴𝑇 is the matrix that is obtained
by transposing the columns of 𝐴 to rows of 𝐴𝑇 . Therefore, if 𝐴 is a matrix with dimensions m x
n, the transpose matrix 𝐴𝑇 will have dimensions n x m. It should be noted that in case the matrix
has just one row and n columns (a row vector), the transpose will have n rows and 1 column
(column vector).
1 3
1 2 3 𝑇
For example, if 𝐴 = ( ) is a 2 x 3 matrix, then 𝐴 = (2 4) will be a 3 x 2 matrix.
3 4 5
3 5
1
If 𝑣 is a vector, 𝑣 = (1 2) then 𝑣 𝑇 = ( )
2
Square matrices: As mentioned earlier, a square matrix is one in which the number of rows and
columns are the same. It should be noted that matrix addition and multiplication of two square
matrices with the same dimensions is always possible.
70
𝐴𝐵 = 𝐵𝐴 = 𝐼, where 𝐼 is the identity matrix
Such a matrix will always be a unique matrix. We call the matrix 𝐵 as an inverse of 𝐴 and it is
represented by 𝐴−1
𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 … + 𝑎𝑛 𝑥𝑛 = 𝑏
Here, 𝑎1 , 𝑎2 , ... 𝑎𝑛 are constants. The constant 𝑎𝑖 is usually referred to as the coefficient of the
unknown variable 𝑥𝑖 . If we solve this equation and obtain the values of the unknowns, the solution
will be a list of values or a vector which as we discussed above is a list of n tuples. Let us assume
that the solution for {𝑥𝑖 } is the vector 𝑣 = (𝑘1 𝑘2 𝑘3 … 𝑘𝑛 ). Then we can substitute these values
instead of the variables 𝑥𝑖 in the equation above to obtain a solution
𝑎1 𝑘1 + 𝑎2 𝑘2 + 𝑎3 𝑘 … + 𝑎𝑛 𝑘 = 𝑏
Here, 𝑎𝑖𝑗 is the coefficient of 𝑥𝑗 in the equation 𝐿𝑖 . Both 𝑎𝑖𝑗 and 𝑏𝑖 are constants.
As mentioned in the introduction section of matrices, we can use matrix operations to solve a set
of simultaneous linear equations. We need to introduce some additional concepts before we can
approach the steps for the solutions to the above problem.
71
4.4.1 Augmented Matrix and Coefficient matrix of a system
Considering the above system of equations, we can convert it into two matrices
Augmented matrix:
𝑎11 𝑎12 … 𝑎1𝑛 𝑏1
𝑎21 𝑎22 … 𝑎2𝑛 𝑏2
𝑀=( )
… … … …
𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛 𝑏𝑛
Let us consider a system of m linear equations 𝐿1 , 𝐿2 , ... 𝐿𝑛 . The operations are the following
72
7𝑥3 − 𝑥4 = 3
2𝑥4 = 8
The first unknown 𝑥1 is the leading unknown in the first equation, 𝑥2 in the second, and so on.
These are called as pivot variables. As can be seen from this set of equations, the last equation
gives the solution for the 4th variable. Consequently, by substituting this value of 𝑥4 (which is 4)
into the previous equations, we can get a unique solution. The vector for this system of equations
is 𝑥 = (3, −2, 1, 4).
Echelon Form: Let us consider the following system of simultaneous linear equations:
This is said to be in echelon form. The number of equations is less than the number of variables.
There are the following options of solutions for echelon forms:
Let us consider a system of linear equations in echelon form with r equations and n unknowns.
There are 2 cases to evaluate
Case 1: If r = n, that is the number of equations is equal to the number of variables, then we have
a unique solution.
Case 2: If r < n, that is the number of equations is less than the number of variables, then we can
assign values to the remaining n – r free variables and get solutions for the r pivot variables. These
free variables are called as parameterized variables since the free variables can take any value.
Therefore, we end up with an infinite number of solutions through the parameterized variables.
Gaussian Elimination Algorithm
Using the above 3 elementary operations, and the definitions of the triangular and echelon forms,
we define the Gaussian elimination algorithm as follows. The algorithm consists of two parts.
Part A: This is the forward elimination and consists of a step wise reduction of the system of
linear equations. At the end, we get either a degenerate equation with no solution or an equivalent
simpler system in triangular or echelon form.
73
Part B: This is the backward elimination part, and this is a step wise back substitution to find the
solution of the simpler system.
Gaussian Elimination example:
Let us consider the following system of linear equations to illustrate the Gaussian elimination
algorithm for solving for the variables.
𝐿1 : 𝑥 − 3𝑦 − 2𝑧 = 6
𝐿2: 2𝑥 − 4𝑦 − 3𝑧 = 8
𝐿3: − 3𝑥 + 6𝑦 + 8𝑧 = −5
Part A: Let us leverage the coefficient 1 of 𝑥 in the first equation 𝐿1 as the pivot to eliminate 𝑥
from the second equation 𝐿2 and from the third equation 𝐿3 . We use the elementary operations to
accomplish this as follows:
𝐿1 : 𝑥 − 3𝑦 − 2𝑧 = 6
𝐿2: 2𝑦 + 𝑧 = −4
𝐿3: − 3𝑦 + 2𝑧 = 13
As a next step we further simplify these equations using the elementary operations and end up in
a triangular form of the equations. We use the coefficient 2 of the variable y in 𝐿2 to eliminate y
from the equations.
The operations are the following: Multiply 𝐿2 by 3/2 and add it to 𝐿3 . So essentially, we replace
𝐿2 with (3/2) 𝐿2 + 𝐿3 . The revised system is the following
𝐿1 : 𝑥 − 3𝑦 − 2𝑧 = 6
𝐿2: 2𝑦 + 𝑧 = −4
𝐿3: 7𝑧 = 14
74
This is the triangular form of the equations, and we thus have a unique solution.
Part B: To obtain the values of the variables, we now proceed in the backward direction.
Echelon Matrices: A matrix is in Echelon form if the following conditions are satisfied:
1. All the zero rows, if there are any are at the bottom of the matrix
2. Each leading nonzero entry in a row is to the right of the leading nonzero entry in the
preceding row.
Row canonical form: A matrix is in a row canonical form or Row Reduced Echelon Form (RREF)
if it is an echelon matrix, and it satisfies the following additional properties
1. Each pivot (leading nonzero entry) is equal to 1.
2. Each pivot is the only nonzero entry in its column.
To convert a matrix into the Echelon matrix and the Row canonical forms, we use the same
elementary operations as we used earlier.
75
Application to systems of linear equations
We now use the Echelon Matrix and Row canonical form to solve a system of linear equations
𝑥 + 2𝑦 + 𝑧 = 3
2𝑥 + 5𝑦 − 𝑧 = −4
3𝑥 − 2𝑦 − 𝑧 = 5
Solution:
1. Reduce the augmented matrix M to echelon form and then to row canonical form as
follows:
1 2 1 3 1 2 1 3 1 2 1 3
𝑀 = (2 5 −1 −4) ~ (0 1 −3 −10) ~ (0 1 −3 −10)
3 −2 −1 5 0 −8 −4 −4 0 0 −28 −84
1 2 1 3 1 2 1 3 1 0 0 2
(0 1 −3 −10) ~ (0 1 0 −1) ~ (0 1 0 −1)
0 0 1 3 0 0 1 3 0 0 1 3
Thus, the system has the unique solution 𝑥 = 2, 𝑦 = −1, 𝑧 = 3 or equivalently the vector u = (2, -
1, 3)
1 0 2
Example: Find the inverse of the matrix 𝐴 = (2 −1 3)
4 1 8
Solution: First form the block matrix M = [A, I] and row reduce M to an echelon form
76
1 0 2 1 0 0 1 0 2 1 0 0
𝑀 = (2 −1 3 0 1 0) ~ (0 −1 −1 −2 1 0)
4 1 8 0 0 1 0 1 0 −4 0 1
1 0 2 1 0 0
~ (0 −1 −1 −2 1 0)
0 0 −1 −6 1 1
In echelon form, the left half of the matrix is in triangular form; therefore, A has an inverse. Next,
we further row reduce M to its row canonical form:
1 0 0 −11 2 2 1 0 0 −11 2 2
𝑀~ (0 −1 0 4 0 −1) ~ (0 1 0 −4 0 1)
0 0 1 6 −1 −1 0 0 1 6 −1 −1
Therefore, the inverse is the right part of the row reduced matrix that is
−11 2 2
−1
𝐴 = ( −4 0 1)
6 −1 −1
4.5 Determinants
Determinant of order n is defined to be a n x n square array of numbers (or functions) with the
array written within vertical bars, as follows:
𝑎11 𝑎12 … 𝑎1𝑛
𝑎21 𝑎22 … 𝑎2𝑛
𝐷𝑛 = | … … … …|
𝑎𝑛1 𝑎𝑛2 … 𝑎𝑛𝑛
The determinant 𝐷 has a value that is obtained by forming all n! products through the following
examples
Determinants of order 2 expand to 2! that is 2 terms:
𝑎11 𝑎12
𝐷2 = |𝑎 | = 𝑎11 𝑎22 − 𝑎12 𝑎21
21 𝑎22
77
Determinants of order 3 expand to 3! That is 6 terms:
Properties of Determinants
Transposing (rows to columns) does not alter a determinant’s value
Interchanging two rows (or two columns) changes the sign of the value of a determinant
Multiplication of all members of a single column (or row) by a constant k causes the value of
the determinant to be multiplied by k
If the elements of a column (or row) are sums of two quantities, the determinant can be
decomposed into a sum of two determinants
Any determinant with two rows (or two columns) equal (or proportional), has the value zero
The value of a determinant is unchanged if a multiple of one row is added (column by column)
to another row or if a multiple of one column is added (row by row) to another column.
If each element of a row (or column) is zero, the determinant has the value zero.
𝐴𝑥 = 𝜆𝑥
In the above equation 𝐴 is a square matrix, 𝑥 is a vector and 𝜆 is a number. In the cases we will
consider, 𝜆 is a real number. In general, 𝜆 is a complex number. In the terminology of the
eigenvalue equation, 𝜆 is called as an eigenvalue and 𝑥 is an eigenvector that corresponds to the
specific eigenvalue.
The mathematical process to solve and eigenvalue equation has the following steps:
Finding the eigenvalues
78
Form the equation (𝐴 − 𝜆𝐼)𝑥 = 0
We now evaluate the characteristic equation det(𝐴 − 𝜆𝐼) = 0
The solutions to the characteristic equation are the eigenvalues.
Examples:
1 1
1. Find the eigenvalues and eigenvectors of the matrix 𝐴 = ( )
4 1
Solution: Since 𝐴 is a 2 x 2 matrix, we expect that it will have 2 eigenvalues and their
corresponding eigenvectors.
1−𝜆 1
Thus: 𝑑𝑒𝑡 ( )=0
4 1−𝜆
79
−2 1
Now we do a row operation 2𝑅1 + 𝑅2 and get the revised matrix as ( )
0 0
Inserting this matrix into the eigenvalue equation, we get
−2 1 𝑥1 0
( ) (𝑥 ) = ( )
0 0 2 0
We thus get −2𝑥1 + 𝑥2 = 0 and this implies 𝑥2 = 2𝑥1
We may now assume that 𝑥1 = 1 then 𝑥2 = 2
1
Thus, the first eigenvector is ( )
2
Similarly, when we go through the above process for 𝜆2 = −1 and solve for the
1
eigenvector, we get the second eigenvector ( )
−2
Example 2: Find the eigenvalues and eigen vectors for the following matrix:
1 0 0
𝐴 = (3 −2 0)
2 3 4
Solution:
We first find out the eigenvalues by solving the characteristic equation.
The characteristic equation is the following:
1−𝜆 0 0
| 3 −2 − 𝜆 0 |=0
2 3 4−𝜆
1−1 0 0 𝑥1
Substituting 𝜆1 = 1, in the matrix, ( 3 −2 − 1 𝑥
0 ) ( 2) = 0
2 3 4 − 1 𝑥3
80
R3: 2𝑥1 + 3𝑥2 + 3𝑥3 = 0
R2 gives 𝑥1 = 𝑥2
5
Substituting for 𝑥2 in R3, gives 𝑥3 = − 3 𝑥1
1
Choosing 𝑥1 = 1, we get the eigenvector 𝑥⃑ = ( 15)
−3
0
Similarly, working out for 𝜆2 = −2, the eigenvector is 𝑥⃑ = (−2)
1
For 𝜆3 = 4, we get the matrix equations to be the following:
−3 0 0 𝑥1
( 3 −6 0) (𝑥2 ) = 0
2 3 0 𝑥3
From R2: 3𝑥1 − 6𝑥2 = 0 , which means after substituting for 𝑥1 = 0, we get 𝑥2 = 0
Since, we get no information about 𝑥3 , we treat it as a free variable and can assign any value.
The simplest value that can be assigned is 𝑥3 = 1
0
Thus, the eigenvector is 𝑥⃑ = (0)
1
Example 3:
0 1 0
Find the eigenvalues and eigenvectors for 𝐻 = (1 0 0)
0 0 2
Solution:
Solving for the characteristic equation, we get eigenvalues as 𝜆 = 2, +1 and -1
81
The eigenvectors are
0 1 1
𝑥1 = (0) , ⃗⃗⃗⃗⃑
⃗⃗⃗⃑ 𝑥2 = (1) , ⃗⃗⃗⃗⃑
𝑥3 = (−1)
1 0 0
Some interesting insights can be drawn from this example.
We can now introduce two types of matrices, whose definitions depend on understanding
eigenvalues and eigenvectors.
82
𝐴 = 𝑈 Σ 𝑉𝑇
𝑈 𝑇 𝑈 = 𝐼, and 𝑉 𝑇 𝑉 = 𝐼
I, being the identity matrix
Σ is a diagonal matrix that is it has non-zero elements only along its main diagonal
o The entries of the diagonal are
denoted by 𝜎𝑖 ,
are positive,
are sorted in decreasing order
o These are called as singular values (a generalization to eigenvalues)
Finding 𝑼, 𝚺 𝐚𝐧𝐝 𝑽𝑻
Given a general matrix 𝐴 with dimensions 𝑚 × 𝑛 we now outline the process to determine
𝑈, Σ and 𝑉 𝑇
1. We first compute 𝐴𝑇 𝐴
2. Next, we determine the eigenvalues of 𝐴𝑇 𝐴 namely 𝜆𝑖
3. By definition, 𝜎𝑖 = √𝜆𝑖
4. The eigenvectors corresponding to the 𝜆𝑖 form the 𝑉 matrix, namely, 𝑣1 , 𝑣2 , 𝑣3 , … 𝑣𝑛
5. To determine the 𝑈 matrix, we apply the following formula
a. For a special case of 3 eigenvalues 𝜆𝑖
1 1 𝑁𝑆(𝐴𝑇 )
𝑈 = [𝜎 𝐴𝑣1 𝐴𝑣2 |𝐴𝑇 |
]
1 𝜎2
83
1 1 𝑁𝑆(𝐴𝑇 )
Note that in the above expression: 𝑢1 = 𝜎 𝐴𝑣1 , 𝑢2 = 𝜎 𝐴𝑣2 and 𝑢3 = |𝐴𝑇 |
1 2
Null space of a matrix 𝐴 is defined as the set of all n-dimensional column vectors 𝑣, such that
𝐴𝑣 = 0
1 1
Example: Find 𝑈, Σ and 𝑉 𝑇 for the following matrix 𝐴 = (1 0)
0 1
Solution:
We first compute the matrix 𝐴𝑇 𝐴
2 1
𝐴𝑇 𝐴 = ( )
1 2
84
𝑢̂.
1 𝑒̂3 𝑢̂.
2 𝑒̂3
𝑢3 = [ .𝑢
̂1 + .𝑢
̂]
𝑢̂
1. 𝑢
̂1 𝑢̂ ̂2 2
2. 𝑢
−1
1
After the calculations, we get 𝑢3 = 3 ( 1 )
1
Thus, the factorization of the matrix 𝐴 is the following
2 1
0 −
√6 3 1 1
1 1 1 √3 1
𝐴= − . ( 0 1) . √2 √2
√6 √2 3 1 1
0 0 −
1 1 1
− ( √2 √2)
(√6 √2 3 )
85
Tensor Operations
The operations on tensors are a general case of matrix and vector operations. Some of the
operations are listed below:
Tensor addition
Tensor subtraction
Tensor Hadamard product
Tensor division
Tensor product
86
Problems and Questions
Determine the real dimensional space in which the following are vectors. That is determine if the
vector is in ℝ𝑛 where n is an integer.
1. 𝑣 = (2, −5)
2. 𝑣 = (0, 0, 0)
3. 𝑣 = (3, 4, 5, 6)
4. Let us assume that 𝑢 = (1, −2, 3), 𝑣 = (4, 5, −1), 𝑤 = (2, 7, 4). Find out the inner
products of the following vectors: (𝑢. 𝑣), (𝑢. 𝑤) and (𝑣. 𝑤)
5. Let 𝑢 = (1, −2, −4, 5, 3). Find the norm of the vector 𝑢 that is ||𝑢||.
1 3
6. Find the product AB and BA of the following two matrices 𝐴 = ( ) and 𝐵 =
2 −1
2 0 −4
( )
5 −2 6
1 2 3
8. 𝐵 = (2 4 5)
3 5 6
2
9. 𝐷 = (−4)
6
87
5 −7 1
11. 𝐴 = (−7 8 2)
1 2 −4
0 4 −3
12. 𝐵 = (−4 0 5)
3 −5 0
0 0 0
13. 𝐴 = ( )
0 0 0
15. 𝑥 + 𝜋𝑦 + 𝑒𝑧 = 𝑙𝑜𝑔5
16. 3𝑥 + 𝑘𝑦 − 8𝑧 = 16
18. 2𝑥 − 3𝑦 = 8, −6𝑥 + 9𝑦 = 6
20. Let us consider the following vectors. 𝑢 = (1, 1, 0), 𝑣 = (1, 3, 2), 𝑤 = (4, 9, 5). Determine
if the vectors are linearly independent.
3 1
21. Consider the matrix 𝐴 = ( ). Find the eigenvalues and eigenvectors of this matrix.
2 2
4 2
22. Find the eigenvalues and eigenvectors of the matrix 𝐴 = ( )
3 −1
88
References
1. Schaum’s Outline of Theory and Problems of Linear Algebra, Seymour Lipschutz, Third
Edition, McGraw-Hill Education (India) Edition 2005.
2. Linear transformation and matrices: https://bit.ly/2RND0oQ
3. Finding Eigenvalues and Eigenvectors, https://www.youtube.com/watch?v=TQvxWaQnrqI
4. Tensor Fundamentals various definitions: https://www.youtube.com/watch?v=TvxmkZmBa-
k
5. Tensor fundamentals: https://towardsdatascience.com/quick-ml-concepts-tensors-
eb1330d7760f
6. Tensor Operations: https://machinelearningmastery.com/introduction-to-tensors-for-machine-
learning/
89
MODULE 5 COMPUTATIONAL COMPLEXITY THEORY
Learning Objectives
The learning objectives of computational complexity theory are the following. After studying this
unit, you will be able to:
Understand about computational complexity.
Overview of theoretical machine models such as the Turing machine
A broad understanding of the big O notation
90
and check the remainder), but finding that prime factor is hard (NP). Decades of searching has
not yielded a fast solution to integer factoring, and many computer scientists suspect that it is not
possible to solve this problem in polynomial time (that is, it is not in P). The current research about
whether hard NP problems have fast solutions is well-known even outside the field of computer
science and is popularly known as the P vs NP.
Resources
(time,
space,
energy..)
Computationa
l Complexity
Underlying
Type of computational
problem model (det.
(decision, Turing m/c,
optimization, prob. Turing
m/c, Quantum
..) Computer…)
91
other two terms are insignificant in comparison. Essential behavior of the algorithm is summed up
by saying that the number of operations required scales like n
The O(f(n)) notation is used to set upper bounds on the behavior of a function.
The O(f(n)) notation is particularly useful for studying the worst-case behavior of specific
algorithms
Examples:
𝑁 3 + 5𝑁 2 − 3𝑁 + 7
92
Problems and Questions
1. Explain the meaning of computational complexity
2. Explain in detail the three perspectives of computational complexity
3. What is the big O notation and why is it important in defining the complexity of algorithms?
4. Differentiate between the polynomial class and non-polynomial class of problem
complexities.
93
References
1. Quantum Computation and Quantum Information, Michael A. Neilsen and Issac L. Chuang
2. https://towardsdatascience.com/complexity-theory-101-introduction-to-big-o-bab99152ad44
94
MODULE 6 NUMERICAL ANALYSIS
Learning Objectives
The learning objectives of numerical analysis are the following. After studying this unit, you will
be able to:
Understand about basics of numerical analysis
Meaning of computational errors
An overview of numerical stability and well-posed problems
Overview of interpolation, extrapolation and regression
95
6.2 Generation and propagation of computational errors
The study of errors forms an important part of numerical analysis.
There are several ways in which error can be introduced in the solution of the problem.
• Round off errors: Round-off errors arise because it is impossible to represent all real
numbers exactly on a machine with finite memory (which is what all practical digital
computers are).
• Truncation and discretization errors: Truncation errors are committed when an iterative
method is terminated or a mathematical procedure is approximated, and the approximate
solution differs from the exact solution.
• Similarly, discretization induces a discretization error because the solution of the discrete
problem does not coincide with the solution of the continuous problem.
• For instance, in the iteration in the sidebar to compute the solution of 3𝑥 3 + 4 = 28, after
10 or so iterations, it can be concluded that the root is roughly 1.99 (for example).
Therefore, there is a truncation error of 0.01.
• Once an error is generated, it will generally propagate through the calculation.
• For instance, already noted is that the operation ‘+′ on a calculator (or a computer) is
inexact.
• Therefore, a calculation 𝑎 + 𝑏 + 𝑐 + 𝑑 + 𝑒 will be increasingly incorrect since the
truncation error is carried forward.
𝑥
𝑓(𝑥) = 𝑥(√𝑥 + 1 − √𝑥) and 𝑔(𝑥) =
√𝑥+1+√𝑥
96
By comparing the above two results, it is clear that loss of significance (caused here
by catastrophic cancellation from subtracting approximations to the nearby numbers √501 and
√500) despite the subtraction's being computed exactly) has a huge effect on the results, even
though both functions are equivalent.
The desired value, computed using infinite precision, is 11.174755 …
6.4 Interpolation, Extrapolation, Regression
Interpolation solves the following problem: given the value of some unknown function at
several points, what value does that function have at some other point between the given
points?
Extrapolation is very similar to interpolation, except that now the value of the unknown
function at a point which is outside the given points must be found.
Regression is also similar, but it considers that the data is imprecise.
Given some points, and a measurement of the value of some function at these points (with
an error), the unknown function can be found.
The least squares-method is one way to achieve this.
97
Problems and Questions
1. Explain why numerical methods are important in solving computational problems.
2. Explain the difference between interpolation and extrapolation.
3. Explain how regression is different from extrapolation
References
1. https://en.wikipedia.org/wiki/Numerical_analysis
98