Matrix
Matrix
Matrix
INTRODUCTION
"Matrix theory" redirects here. For the physics topic, see Matrix string theory.
Each element of a matrix is often denoted by a variable with two subscripts. For
instance, a2,1 represents the element at the second row and first column of a matrix A. In
mathematics, a matrix (plural matrices) is a rectangular arrayof numbers, symbols, or
expressions, arranged in rows and columnsthat is treated in certain prescribed ways. One
such way is to state the order of the matrix. For example, the order of the matrix below is a
2x3 matrix because there are two rows and three columns. The individual items in a matrix
are called its elements or entries.
Provided that they are the same size (have the same number of rows and the same
number of columns), two matrices can be added or subtracted element by element. The rule
for matrix multiplication, however, is that two matrices can be multiplied only when the
number of columns in the first equals the number of rows in the second. A major application
of matrices is to represent linear transformations, that is, generalizations of linear functions
such as f(x) = 4x. For example, the rotation of vectors in three dimensional space is a linear
transformation which can be represented by a rotation matrix R: if v is a column vector (a
matrix with only one column) describing the position of a point in space, the product Rv is a
column vector describing the position of that point after a rotation.
1
The product of two transformation matrices is a matrix that represents the
composition of two linear transformations. Another application of matrices is in the solution
of systems of linear equations. If the matrix is square, it is possible to deduce some of its
properties by computing its determinant. For example, a square matrix has an inverse if and
only if its determinant is not zero. Insight into the geometry of a linear transformation is
obtainable (along with other information) from the matrix's eigenvalues and eigenvectors.
DEFINITION
2
Size
The size of a matrix is defined by the number of rows and columns that it contains. A matrix
with m rows and n columns is called an m n matrix or m-by-n matrix, while m and n are
called its dimensions. For example, the matrix A above is a 3 2 matrix.
Matrices which have a single row are called row vectors, and those which have a single
column are called column vectors. A matrix which has the same number of rows and columns
is called a square matrix. A matrix with an infinite number of rows or columns (or both) is
called an infinite matrix. In some contexts, such as computer algebra programs, it is useful to
consider a matrix with no rows or no columns, called an empty matrix.
Row
1n A matrix with one row, sometimes used to represent a vector
vector
Notation
Matrices are commonly written in box brackets or an alternative notation uses large
parentheses instead of box brackets:
3
The specifics of symbolic matrix notation varies widely, with some prevailing trends.
Matrices are usually symbolized using upper-case letters (such as A in the examples above),
while the corresponding lower-case letters, with two subscript indices (e.g., a11, or a1,1),
represent the entries. In addition to using upper-case letters to symbolize matrices, many
authors use a special typographical style, commonly boldface upright (non-italic), to further
distinguish matrices from other mathematical objects. An alternative notation involves the use
of a double-underline with the variable name, with or without boldface style, (e.g., ).
The entry in the i-th row and j-th column of a matrix A is sometimes referred to as the i,j,
(i,j), or (i,j)th entry of the matrix, and most commonly denoted as ai,j, or aij. Alternative
notations for that entry are A[i,j] or Ai,j. For example, the (1,3) entry of the following matrix
A is 5 (also denoted a13, a1,3, A[1,3] or A1,3):
Sometimes, the entries of a matrix can be defined by a formula such as ai,j = f(i, j). For
example, each of the entries of the following matrix A is determined by aij = i j.
In this case, the matrix itself is sometimes defined by that formula, within square brackets or
double parenthesis. For example, the matrix above is defined as A = [i-j], or A = ((i-j)). If
matrix size is m n, the above-mentioned formula f(i, j) is valid for any i = 1, ..., m and any j
= 1, ..., n. This can be either specified separately, or using m n as a subscript. For instance,
the matrix A above is 3 4 and can be defined as A = [i j] (i = 1, 2, 3; j = 1, ..., 4), or A = [i
j]34.
Some programming languages utilize doubly subscripted arrays (or arrays of arrays) to
represent an m--n matrix. Some programming languages start the numbering of array
indexes at zero, in which case the entries of an m-by-n matrix are indexed by 0 i m 1
4
and 0 j n 1.[9] This article follows the more common convention in mathematical writing
where enumeration starts from 1.
Basic operations
External video
There are a number of basic operations that can be applied to modify matrices, called matrix
addition, scalar multiplication, transposition, matrix multiplication, row operations, and
submatrix.[11]
(A +
B)i,j
= Ai,j
+
5
Bi,j,
wher
e1
im
and
1 j
n.
(cA)i
,j = c
Ai,j.
This
operation is
called scalar
multiplicatio
n, but its
result is not
6
named
scalar
product to
avoid
confusion,
since scalar
product is
sometimes
used as a
synonym for
inner
product.
The
transpose of
an m-by-n
matrix A is
the n-by-m
matrix AT
(also
denoted Atr
Transpositio or t
A)
n formed by
turning rows
into
columns and
vice versa:
(AT)i,
j =
Aj,i.
Familiar properties of numbers extend to these operations of matrices: for example, addition
is commutative, i.e., the matrix sum does not depend on the order of the summands:
7
A + B = B + A.[12] The transpose is compatible with addition and scalar multiplication, as
expressed by (cA)T = c(AT) and (A + B)T = AT + BT. Finally, (AT)T = A.
Matrix multiplication
Multiplication of two matrices is defined if and only if the number of columns of the left
matrix is the same as the number of rows of the right matrix. If A is an m-by-n matrix and B
is an n-by-p matrix, then their matrix product AB is the m-by-p matrix whose entries are
given by dot product of the corresponding row of A and the corresponding column of B:
where 1 i m and 1 j p.[13] For example, the underlined entry 2340 in the product is
calculated as (2 1000) + (3 100) + (4 10) = 2340:
Matrix multiplication satisfies the rules (AB)C = A(BC) (associativity), and (A+B)C =
AC+BC as well as C(A+B) = CA+CB (left and right distributivity), whenever the size of the
matrices is such that the various products are defined.[14] The product AB may be defined
8
without BA being defined, namely if A and B are m-by-n and n-by-k matrices, respectively,
and m k. Even if both products are defined, they need not be equal, i.e., generally
AB BA,
whereas
Besides the ordinary matrix multiplication just described, there exist other less frequently
used operations on matrices that can be considered forms of multiplication, such as the
Hadamard product and the Kronecker product.[15] They arise in solving matrix equations such
as the Sylvester equation.
Row operations
These operations are used in a number of ways, including solving linear equations and
finding matrix inverses.
Submatrix
9
A submatrix of a matrix is obtained by deleting any collection of rows and/or columns. [16][17]
[18]
For example, from the following 3-by-4 matrix, we can construct a 2-by-3 submatrix by
removing row 3 and column 2:
The minors and cofactors of a matrix are found by computing the determinant of certain
submatrices.[18][19]
Linear equations
Matrices can be used to compactly write and work with multiple linear equations, i.e.,
systems of linear equations. For example, if A is an m-by-n matrix, x designates a column
vector (i.e., n1-matrix) of n variables x1, x2, ..., xn, and b is an m1-column vector, then the
matrix equation
Ax = b
...
10
Linear transformations
The vectors represented by a 2-by-2 matrix correspond to the sides of a unit square
transformed into a parallelogram.
Matrices and matrix multiplication reveal their essential features when related to linear
transformations, also known as linear maps. A real m-by-n matrix A gives rise to a linear
transformation Rn Rm mapping each vector x in Rn to the (matrix) product Ax, which is a
vector in Rm. Conversely, each linear transformation f: Rn Rm arises from a unique m-by-n
matrix A: explicitly, the (i, j)-entry of A is the ith coordinate of f(ej), where ej =
(0,...,0,1,0,...,0) is the unit vector with 1 in the jth position and 0 elsewhere. The matrix A is
said to represent the linear map f, and A is called the transformation matrix of f.
can be viewed as the transform of the unit square into a parallelogram with vertices at (0, 0),
(a, b), (a + c, b + d), and (c, d). The parallelogram pictured at the right is obtained by
11
multiplying A with each of the column vectors and in turn. These vectors
define the vertices of the unit square.
The following table shows a number of 2-by-2 matrices with the associated linear maps of
R2. The blue original is mapped to the green grid and shapes. The origin (0,0) is marked with
a black point.
Under the 1-to-1 correspondence between matrices and linear maps, matrix multiplication
corresponds to composition of maps:[25] if a k-by-m matrix B represents another linear map g :
Rm Rk, then the composition g f is represented by BA since
The last equality follows from the above-mentioned associativity of matrix multiplication.
The rank of a matrix A is the maximum number of linearly independent row vectors of the
matrix, which is the same as the maximum number of linearly independent column vectors. [26]
Equivalently it is the dimension of the image of the linear map represented by A.[27] The rank-
nullity theorem states that the dimension of the kernel of a matrix plus the rank equals the
number of columns of the matrix.[28]
Square matrices
12
Main article: Square matrix
A square matrix is a matrix with the same number of rows and columns. An n-by-n matrix is
known as a square matrix of order n. Any two square matrices of the same order can be added
and multiplied. The entries aii form the main diagonal of a square matrix. They lie on the
imaginary line which runs from the top left corner to the bottom right corner of the matrix.
Main types
Diagonal matrix
If all entries of A below the main diagonal are zero, A is called an upper triangular matrix.
Similarly if all entries of A above the main diagonal are zero, A is called a lower triangular
matrix. If all entries outside the main diagonal are zero, A is called a diagonal matrix.
Identity matrix
The identity matrix In of size n is the n-by-n matrix in which all the elements on the main
diagonal are equal to 1 and all other elements are equal to 0, e.g.
13
It is a square matrix of order n, and also a special kind of diagonal matrix. It is called an
identity matrix because multiplication with it leaves a matrix unchanged:
A square matrix A that is equal to its transpose, i.e., A = AT, is a symmetric matrix. If instead,
A was equal to the negative of its transpose, i.e., A = AT, then A is a skew-symmetric
matrix. In complex matrices, symmetry is often replaced by the concept of Hermitian
matrices, which satisfy A = A, where the star or asterisk denotes the conjugate transpose of
the matrix, i.e., the transpose of the complex conjugate of A.
By the spectral theorem, real symmetric matrices and complex Hermitian matrices have an
eigenbasis; i.e., every vector is expressible as a linear combination of eigenvectors. In both
cases, all eigenvalues are real.[29] This theorem can be generalized to infinite-dimensional
situations related to matrices with infinitely many rows and columns, see below.
A square matrix A is called invertible or non-singular if there exists a matrix B such that
AB = BA = In.[30][31]
Definite matrix
14
Q(x,y) = 1/4 x2 + y2 Q(x,y) = 1/4 x2 1/4 y2
Q(x) = xTAx
takes only positive values (respectively only negative values; both some negative and some
positive values).[32] If the quadratic form takes only non-negative (respectively only non-
positive) values, the symmetric matrix is called positive-semidefinite (respectively negative-
semidefinite); hence the matrix is indefinite precisely when it is neither positive-semidefinite
nor negative-semidefinite.
A symmetric matrix is positive-definite if and only if all its eigenvalues are positive, i.e., the
matrix is positive-semidefinite and it is invertible. [33] The table at the right shows two
possibilities for 2-by-2 matrices.
Allowing as input two different vectors instead yields the bilinear form associated to A:
BA (x, y) = xTAy.[34]
Orthogonal matrix
15
An orthogonal matrix is a square matrix with real entries whose columns and rows are
orthogonal unit vectors (i.e., orthonormal vectors). Equivalently, a matrix A is orthogonal if
its transpose is equal to its inverse:
which entails
An orthogonal matrix A is necessarily invertible (with inverse A1 = AT), unitary (A1 = A*),
and normal (A*A = AA*). The determinant of any orthogonal matrix is either +1 or 1. A
special orthogonal matrix is an orthogonal matrix with determinant +1. As a linear
transformation, every orthogonal matrix with determinant +1 is a pure rotation, while every
orthogonal matrix with determinant -1 is either a pure reflection, or a composition of
reflection and rotation.
Main operations
Trace
The trace, tr(A) of a square matrix A is the sum of its diagonal entries. While matrix
multiplication is not commutative as mentioned above, the trace of the product of two
matrices is independent of the order of the factors:
tr(AB) = tr(BA).
tr(A) = tr(AT).
16
Determinant
A linear transformation on R2 given by the indicated matrix. The determinant of this matrix is
1, as the area of the green parallelogram at the right is 1, but the map reverses the
orientation, since it turns the counterclockwise orientation of the vectors to a clockwise one.
The determinant det(A) or |A| of a square matrix A is a number encoding certain properties of
the matrix. A matrix is invertible if and only if its determinant is nonzero. Its absolute value
equals the area (in R2) or volume (in R3) of the image of the unit square (or cube), while its
sign corresponds to the orientation of the corresponding linear map: the determinant is
positive if and only if the orientation is preserved.
The determinant of 3-by-3 matrices involves 6 terms (rule of Sarrus). The more lengthy
Leibniz formula generalises these two formulae to all dimensions.[35]
The determinant of a product of square matrices equals the product of their determinants:
Adding a multiple of any row to another row, or a multiple of any column to another column,
does not change the determinant. Interchanging two rows or two columns affects the
determinant by multiplying it by 1.[37] Using these operations, any matrix can be transformed
to a lower (or upper) triangular matrix, and for such matrices the determinant equals the
product of the entries on the main diagonal; this provides a method to calculate the
17
determinant of any matrix. Finally, the Laplace expansion expresses the determinant in terms
of minors, i.e., determinants of smaller matrices.[38] This expansion can be used for a
recursive definition of determinants (taking as starting case the determinant of a 1-by-1
matrix, which is its unique entry, or even the determinant of a 0-by-0 matrix, which is 1), that
can be seen to be equivalent to the Leibniz formula. Determinants can be used to solve linear
systems using Cramer's rule, where the division of the determinants of two related square
matrices equates to the value of each of the system's variables.[39]
Av = v
[41]
Computational aspects
Matrix calculations can be often performed with different techniques. Many problems can be
solved by both direct algorithms or iterative approaches. For example, the eigenvectors of a
square matrix can be obtained by finding a sequence of vectors xn converging to an
eigenvector when n tends to infinity.[43]
To be able to choose the more appropriate algorithm for each specific problem, it is important
to determine both the effectiveness and precision of all the available algorithms. The domain
18
studying these matters is called numerical linear algebra.[44] As with other numerical
situations, two main aspects are the complexity of algorithms and their numerical stability.
Determining the complexity of an algorithm means finding upper bounds or estimates of how
many elementary operations such as additions and multiplications of scalars are necessary to
perform some algorithm, e.g., multiplication of matrices. For example, calculating the matrix
product of two n-by-n matrix using the definition given above needs n3 multiplications, since
for any of the n2 entries of the product, n multiplications are necessary. The Strassen
algorithm outperforms this "naive" algorithm; it needs only n2.807 multiplications.[45] A refined
approach also incorporates specific features of the computing devices.
In many practical situations additional information about the matrices involved is known. An
important case are sparse matrices, i.e., matrices most of whose entries are zero. There are
specifically adapted algorithms for, say, solving linear systems Ax = b for sparse matrices A,
such as the conjugate gradient method.[46]
An algorithm is, roughly speaking, numerically stable, if little deviations in the input values
do not lead to big deviations in the result. For example, calculating the inverse of a matrix via
Laplace's formula (Adj (A) denotes the adjugate matrix of A)
A1 = Adj(A) / det(A)
may lead to significant rounding errors if the determinant of the matrix is very small. The
norm of a matrix can be used to capture the conditioning of linear algebraic problems, such as
computing a matrix's inverse.[47]
Although most computer languages are not designed with commands or libraries for matrices,
as early as the 1970s, some engineering desktop computers such as the HP 9830 had ROM
cartridges to add BASIC commands for matrices. Some computer languages such as APL
were designed to manipulate matrices, and various mathematical programs can be used to aid
computing with matrices.[48]
Decomposition
19
There are several methods to render matrices into a more easily accessible form. They are
generally referred to as matrix decomposition or matrix factorization techniques. The interest
of all these techniques is that they preserve certain properties of the matrices in question, such
as determinant, rank or inverse, so that these quantities can be calculated after applying the
transformation, or that certain matrix operations are algorithmically easier to carry out for
some types of matrices.
The LU decomposition factors matrices as a product of lower (L) and an upper triangular
matrices (U).[49] Once this decomposition is calculated, linear systems can be solved more
efficiently, by a simple technique called forward and back substitution. Likewise, inverses of
triangular matrices are algorithmically easier to calculate. The Gaussian elimination is a
similar algorithm; it transforms any matrix to row echelon form.[50] Both methods proceed by
multiplying the matrix by suitable elementary matrices, which correspond to permuting rows
or columns and adding multiples of one row to another row. Singular value decomposition
expresses any matrix A as a product UDV, where U and V are unitary matrices and D is a
diagonal matrix.
An example of a matrix in Jordan normal form. The grey blocks are called Jordan blocks.
20
the eigendecomposition, the nth power of A (i.e., n-fold iterated matrix multiplication) can be
calculated via
and the power of a diagonal matrix can be calculated by taking the corresponding powers of
the diagonal entries, which is much easier than doing the exponentiation for A instead. This
can be used to compute the matrix exponential eA, a need frequently arising in solving linear
differential equations, matrix logarithms and square roots of matrices.[53] To avoid
numerically ill-conditioned situations, further algorithms such as the Schur decomposition
can be employed.[54]
Matrices can be generalized in different ways. Abstract algebra uses matrices with entries in
more general fields or even rings, while linear algebra codifies properties of matrices in the
notion of linear maps. It is possible to consider matrices with infinitely many columns and
rows. Another extension are tensors, which can be seen as higher-dimensional arrays of
numbers, as opposed to vectors, which can often be realised as sequences of numbers, while
matrices are rectangular or two-dimensional arrays of numbers.[55] Matrices, subject to certain
requirements tend to form groups known as matrix groups.
This article focuses on matrices whose entries are real or complex numbers. However,
matrices can be considered with much more general types of entries than real or complex
numbers. As a first step of generalization, any field, i.e., a set where addition, subtraction,
multiplication and division operations are defined and well-behaved, may be used instead of
R or C, for example rational numbers or finite fields. For example, coding theory makes use
of matrices over finite fields. Wherever eigenvalues are considered, as these are roots of a
polynomial they may exist only in a larger field than that of the entries of the matrix; for
instance they may be complex in case of a matrix with real entries. The possibility to
reinterpret the entries of a matrix as elements of a larger field (e.g., to view a real matrix as a
complex matrix whose entries happen to be all real) then allows considering each square
matrix to possess a full set of eigenvalues. Alternatively one can consider only matrices with
entries in an algebraically closed field, such as C, from the outset.
21
More generally, abstract algebra makes great use of matrices with entries in a ring R.[56] Rings
are a more general notion than fields in that a division operation need not exist. The very
same addition and multiplication operations of matrices extend to this setting, too. The set
M(n, R) of all square n-by-n matrices over R is a ring called matrix ring, isomorphic to the
endomorphism ring of the left R-module Rn.[57] If the ring R is commutative, i.e., its
multiplication is commutative, then M(n, R) is a unitary noncommutative (unless n = 1)
associative algebra over R. The determinant of square matrices over a commutative ring R
can still be defined using the Leibniz formula; such a matrix is invertible if and only if its
determinant is invertible in R, generalising the situation over a field F, where every nonzero
element is invertible.[58] Matrices over superrings are called supermatrices.[59]
Matrices do not always have all their entries in the same ring or even in any ring at all. One
special but common case is block matrices, which may be considered as matrices whose
entries themselves are matrices. The entries need not be quadratic matrices, and thus need not
be members of any ordinary ring; but their sizes must fulfil certain compatibility conditions.
Linear maps Rn Rm are equivalent to m-by-n matrices, as described above. More generally,
any linear map f: V W between finite-dimensional vector spaces can be described by a
matrix A = (aij), after choosing bases v1, ..., vn of V, and w1, ..., wm of W (so n is the
dimension of V and m is the dimension of W), which is such that
In other words, column j of A expresses the image of vj in terms of the basis vectors wi of W;
thus this relation uniquely determines the entries of the matrix A. Note that the matrix
depends on the choice of the bases: different choices of bases give rise to different, but
equivalent matrices.[60] Many of the above concrete notions can be reinterpreted in this light,
for example, the transpose matrix AT describes the transpose of the linear map given by A,
with respect to the dual bases.[61]
22
These properties can be restated in a more natural way: the category of all matrices with
entries in a field with multiplication as composition is equivalent to the category of finite
dimensional vector spaces and linear maps over this field.
More generally, the set of mn matrices can be used to represent the R-linear maps between
the free modules Rm and Rn for an arbitrary ring R with unity. When n = m composition of
these maps is possible, and this gives rise to the matrix ring of nn matrices representing the
endomorphism ring of Rn.
Matrix groups
Any property of matrices that is preserved under matrix products and inverses can be used to
define further matrix groups. For example, matrices with a given size and with a determinant
of 1 form a subgroup of (i.e., a smaller group contained in) their general linear group, called a
special linear group.[64] Orthogonal matrices, determined by the condition
MTM = I,
form the orthogonal group.[65] Every orthogonal matrix has determinant 1 or 1. Orthogonal
matrices with determinant 1 form a subgroup called special orthogonal group.
Every finite group is isomorphic to a matrix group, as one can see by considering the regular
representation of the symmetric group.[66] General groups can be studied using matrix groups,
which are comparatively well-understood, by means of representation theory.[67]
Infinite matrices
23
It is also possible to consider matrices with infinitely many rows and/or columns [68] even if,
being infinite objects, one cannot write down such matrices explicitly. All that matters is that
for every element in the set indexing rows, and every element in the set indexing columns,
there is a well-defined entry (these index sets need not even be subsets of the natural
numbers). The basic operations of addition, subtraction, scalar multiplication and
transposition can still be defined without problem; however matrix multiplication may
involve infinite summations to define the resulting entries, and these are not defined in
general.
module is isomorphic to the ring of column finite matrices whose entries are
indexed by , and whose columns each contain only finitely many nonzero entries. The
endomorphisms of M considered as a left R module result in an analogous object, the row
finite matrices whose rows each only have finitely many nonzero entries.
If infinite matrices are used to describe linear maps, then only those matrices can be used all
of whose columns have but a finite number of nonzero entries, for the following reason. For a
matrix A to describe a linear map f: VW, bases for both spaces must have been chosen;
recall that by definition this means that every vector in the space can be written uniquely as a
(finite) linear combination of basis vectors, so that written as a (column) vector v of
coefficients, only finitely many entries vi are nonzero. Now the columns of A describe the
images by f of individual basis vectors of V in the basis of W, which is only meaningful if
these columns have only finitely many nonzero entries. There is no restriction on the rows of
A however: in the product Av there are only finitely many nonzero coefficients of v
involved, so every one of its entries, even if it is given as an infinite sum of products,
involves only finitely many nonzero terms and is therefore well defined. Moreover this
amounts to forming a linear combination of the columns of A that effectively involves only
finitely many of them, whence the result has only finitely many nonzero entries, because each
of those columns do. One also sees that products of two matrices of the given type is well
defined (provided as usual that the column-index and row-index sets match), is again of the
same type, and corresponds to the composition of linear maps.
24
If R is a normed ring, then the condition of row or column finiteness can be relaxed. With the
norm in place, absolutely convergent series can be used instead of finite sums. For example,
the matrices whose column sums are absolutely convergent sequences form a ring.
Analogously of course, the matrices whose row sums are absolutely convergent series also
form a ring.
In that vein, infinite matrices can also be used to describe operators on Hilbert spaces, where
convergence and continuity questions arise, which again results in certain constraints that
have to be imposed. However, the explicit point of view of matrices tends to obfuscate the
matter,[nb 3]
and the abstract and more powerful tools of functional analysis can be used
instead.
Empty matrices
An empty matrix is a matrix in which the number of rows or columns (or both) is zero. [69][70]
Empty matrices help dealing with maps involving the zero vector space. For example, if A is
a 3-by-0 matrix and B is a 0-by-3 matrix, then AB is the 3-by-3 zero matrix corresponding to
the null map from a 3-dimensional space V to itself, while BA is a 0-by-0 matrix. There is no
common notation for empty matrices, but most computer algebra systems allow creating and
computing with them. The determinant of the 0-by-0 matrix is 1 as follows from regarding
the empty product occurring in the Leibniz formula for the determinant as 1. This value is
also consistent with the fact that the identity map from any finite dimensional space to itself
has determinant 1, a fact that is often used as a part of the characterization of determinants.
Applications
There are numerous applications of matrices, both in mathematics and other sciences. Some
of them merely take advantage of the compact representation of a set of numbers in a matrix.
For example, in game theory and economics, the payoff matrix encodes the payoff for two
players, depending on which out of a given (finite) set of alternatives the players choose. [71]
Text mining and automated thesaurus compilation makes use of document-term matrices such
as tf-idf to track frequencies of certain words in several documents.[72]
25
under which addition and multiplication of complex numbers and matrices correspond to
each other. For example, 2-by-2 rotation matrices represent the multiplication with some
complex number of absolute value 1, as above. A similar interpretation is possible for
quaternions[73] and Clifford algebras in general.
Early encryption techniques such as the Hill cipher also used matrices. However, due to the
linear nature of matrices, these codes are comparatively easy to break. [74] Computer graphics
uses matrices both to represent objects and to calculate transformations of objects using affine
rotation matrices to accomplish tasks such as projecting a three-dimensional object onto a
two-dimensional screen, corresponding to a theoretical camera observation. [75] Matrices over
a polynomial ring are important in the study of control theory.
Chemistry makes use of matrices in various ways, particularly since the use of quantum
theory to discuss molecular bonding and spectroscopy. Examples are the overlap matrix and
the Fock matrix used in solving the Roothaan equations to obtain the molecular orbitals of the
HartreeFock method.
Graph theory
The adjacency matrix of a finite graph is a basic notion of graph theory.[76] It records which
vertices of the graph are connected by an edge. Matrices containing just two different values
(1 and 0 meaning for example "yes" and "no", respectively) are called logical matrices. The
26
distance (or cost) matrix contains information about distances of the edges.[77] These concepts
can be applied to websites connected by hyperlinks or cities connected by roads etc., in which
case (unless the connection network is extremely dense) the matrices tend to be sparse, i.e.,
contain few nonzero entries. Therefore, specifically tailored matrix algorithms can be used in
network theory.
At the saddle point (x = 0, y = 0) (red) of the function f(x,y) = x2 y2, the Hessian matrix
is indefinite.
It encodes information about the local growth behaviour of the function: given a critical point
x = (x1, ..., xn), i.e., a point where the first partial derivatives of vanish, the
function has a local minimum if the Hessian matrix is positive definite. Quadratic
programming can be used to find global minima or maxima of quadratic functions closely
related to the ones attached to matrices (see above).[79]
27
If n > m, and if the rank of the Jacobi matrix attains its maximal value m, f is locally
invertible at that point, by the implicit function theorem.[81]
Partial differential equations can be classified by considering the matrix of coefficients of the
highest-order differential operators of the equation. For elliptic partial differential equations
this matrix is positive definite, which has decisive influence on the set of possible solutions of
the equation in question.[82]
The finite element method is an important numerical method to solve partial differential
equations, widely applied in simulating complex physical systems. It attempts to approximate
the solution to some equation by piecewise linear functions, where the pieces are chosen with
respect to a sufficiently fine grid, which in turn can be recast as a matrix equation.[83]
Two different Markov chains. The chart depicts the number of particles (of a total of 1000) in
state "2". Both limiting values can be determined from the transition matrices, which are
Stochastic matrices are square matrices whose rows are probability vectors, i.e., whose
entries are non-negative and sum up to one. Stochastic matrices are used to define Markov
chains with finitely many states.[84] A row of the stochastic matrix gives the probability
28
distribution for the next position of some particle currently in the state that corresponds to the
row. Properties of the Markov chain like absorbing states, i.e., states that any particle attains
eventually, can be read off the eigenvectors of the transition matrices.[85]
Statistics also makes use of matrices in many different forms. [86] Descriptive statistics is
concerned with describing data sets, which can often be represented as data matrices, which
may then be subjected to dimensionality reduction techniques. The covariance matrix
encodes the mutual variance of several random variables.[87] Another technique using matrices
are linear least squares, a method that approximates a finite set of pairs (x1, y1), (x2, y2), ...,
(xN, yN), by a linear function
yi axi + b, i = 1, ..., N
which can be formulated in terms of matrices, related to the singular value decomposition of
matrices.[88]
Random matrices are matrices whose entries are random numbers, subject to suitable
probability distributions, such as matrix normal distribution. Beyond probability theory, they
are applied in domains ranging from number theory to physics.[89][90]
Linear transformations and the associated symmetries play a key role in modern physics. For
example, elementary particles in quantum field theory are classified as representations of the
Lorentz group of special relativity and, more specifically, by their behavior under the spin
group. Concrete representations involving the Pauli matrices and more general gamma
matrices are an integral part of the physical description of fermions, which behave as spinors.
[91]
For the three lightest quarks, there is a group-theoretical representation involving the
special unitary group SU(3); for their calculations, physicists use a convenient matrix
representation known as the Gell-Mann matrices, which are also used for the SU(3) gauge
group that forms the basis of the modern description of strong nuclear interactions, quantum
chromodynamics. The CabibboKobayashiMaskawa matrix, in turn, expresses the fact that
the basic quark states that are important for weak interactions are not the same as, but linearly
related to the basic quark states that define particles with specific and distinct masses.[92]
29
Linear combinations of quantum states
The first model of quantum mechanics (Heisenberg, 1925) represented the theory's operators
by infinite-dimensional matrices acting on quantum states.[93] This is also referred to as matrix
mechanics. One particular example is the density matrix that characterizes the "mixed" state
of a quantum system as a linear combination of elementary, "pure" eigenstates.[94]
Another matrix serves as a key tool for describing the scattering experiments that form the
cornerstone of experimental particle physics: Collision reactions such as occur in particle
accelerators, where non-interacting particles head towards each other and collide in a small
interaction zone, with a new set of non-interacting particles as the result, can be described as
the scalar product of outgoing particle states and a linear combination of ingoing particle
states. The linear combination is given by a matrix known as the S-matrix, which encodes all
information about the possible interactions between particles.[95]
Normal modes
Geometrical optics
Geometrical optics provides further matrix applications. In this approximative theory, the
wave nature of light is neglected. The result is a model in which light rays are indeed
geometrical rays. If the deflection of light rays by optical elements is small, the action of a
lens or reflective element on a given light ray can be expressed as multiplication of a two-
component vector with a two-by-two matrix called ray transfer matrix: the vector's
components are the light ray's slope and its distance from the optical axis, while the matrix
encodes the properties of the optical element. Actually, there are two kinds of matrices, viz. a
30
refraction matrix describing the refraction at a lens surface, and a translation matrix,
describing the translation of the plane of reference to the next refracting surface, where
another refraction matrix applies. The optical system, consisting of a combination of lenses
and/or reflective elements, is simply described by the matrix resulting from the product of the
components' matrices.[98]
Electronics
Traditional mesh analysis in electronics leads to a system of linear equations that can be
described with a matrix.
The behaviour of many electronic components can be described using matrices. Let A be a 2-
dimensional vector with the component's input voltage v1 and input current i1 as its elements,
and let B be a 2-dimensional vector with the component's output voltage v2 and output current
i2 as its elements. Then the behaviour of the electronic component can be described by B = H
A, where H is a 2 x 2 matrix containing one impedance element (h12), one admittance
element (h21) and two dimensionless elements (h11 and h22). Calculating a circuit now reduces
to multiplying matrices.
History
Matrices have a long history of application in solving linear equations but they were known
as arrays until the 1800s. The Chinese text The Nine Chapters on the Mathematical Art
written in 10th2nd century BCE is the first example of the use of array methods to solve
simultaneous equations,[99] including the concept of determinants. In 1545 Italian
mathematician Girolamo Cardano brought the method to Europe when he published Ars
Magna.[100] The Japanese mathematician Seki used the same array methods to solve
simultaneous equations in 1683.[101] The Dutch Mathematician Jan de Witt represented
transformations using arrays in his 1659 book Elements of Curves (1659).[102] Between 1700
and 1710 Gottfried Wilhelm Leibniz publicized the use of arrays for recording information or
solutions and experimented with over 50 different systems of arrays. [100] Cramer presented his
rule in 1750.
The term "matrix" (Latin for "womb", derived from matermother[103]) was coined by James
Joseph Sylvester in 1850,[104] who understood a matrix as an object giving rise to a number of
31
determinants today called minors, that is to say, determinants of smaller matrices that derive
from the original one by removing columns and rows. In an 1851 paper, Sylvester explains:
Arthur Cayley published a treatise on geometric transformations using matrices that were not
rotated versions of the coefficients being investigated as had previously been done. Instead he
defined operations such as addition, subtraction, multiplication, and division as
transformations of those matrices and showed the associative and distributive properties held
true. Cayley investigated and demonstrated the non-commutative property of matrix
multiplication as well as the commutative property of matrix addition. [100] Early matrix theory
had limited the use of arrays almost exclusively to determinants and Arthur Cayley's abstract
matrix operations were revolutionary. He was instrumental in proposing a matrix concept
independent of equation systems. In 1858 Cayley published his Memoir on the theory of
matrices[106][107] in which he proposed and demonstrated the Cayley-Hamilton theorem.[100]
An English mathematician named Cullis was the first to use modern bracket notation for
matrices in 1913 and he simultaneously demonstrated the first significant use the notation A
= [ai,j] to represent a matrix where ai,j refers to the ith row and the jth column.[100]
The study of determinants sprang from several sources. [108] Number-theoretical problems led
Gauss to relate coefficients of quadratic forms, i.e., expressions such as x2 + xy 2y2, and
linear maps in three dimensions to matrices. Eisenstein further developed these notions,
including the remark that, in modern parlance, matrix products are non-commutative. Cauchy
was the first to prove general statements about determinants, using as definition of the
determinant of a matrix A = [ai,j] the following: replace the powers ajk by ajk in the polynomial
where denotes the product of the indicated terms. He also showed, in 1829, that the
eigenvalues of symmetric matrices are real. [109] Jacobi studied "functional determinants"
later called Jacobi determinants by Sylvesterwhich can be used to describe geometric
transformations at a local (or infinitesimal) level, see above; Kronecker's Vorlesungen ber
32
die Theorie der Determinanten[110] and Weierstrass' Zur Determinantentheorie,[111] both
published in 1903, first treated determinants axiomatically, as opposed to previous more
concrete approaches such as the mentioned formula of Cauchy. At that point, determinants
were firmly established.
Many theorems were first established for small matrices only, for example the Cayley
Hamilton theorem was proved for 22 matrices by Cayley in the aforementioned memoir,
and by Hamilton for 44 matrices. Frobenius, working on bilinear forms, generalized the
theorem to all dimensions (1898). Also at the end of the 19th century the GaussJordan
elimination (generalizing a special case now known as Gauss elimination) was established by
Jordan. In the early 20th century, matrices attained a central role in linear algebra. [112] partially
due to their use in classification of the hypercomplex number systems of the previous
century.
The inception of matrix mechanics by Heisenberg, Born and Jordan led to studying matrices
with infinitely many rows and columns.[113] Later, von Neumann carried out the mathematical
formulation of quantum mechanics, by further developing functional analytic notions such as
linear operators on Hilbert spaces, which, very roughly speaking, correspond to Euclidean
space, but with an infinity of independent directions.
The word has been used in unusual ways by at least two authors of historical importance.
Bertrand Russell and Alfred North Whitehead in their Principia Mathematica (19101913)
use the word matrix in the context of their Axiom of reducibility. They proposed this axiom
as a means to reduce any function to one of lower type, successively, so that at the bottom
(0 order) the function is identical to its extension:
Let us give the name of matrix to any function, of however many variables, which
does not involve any apparent variables. Then any possible function other than a
matrix is derived from a matrix by means of generalization, i.e., by considering the
proposition which asserts that the function in question is true with all possible values
or with some value of one of the arguments, the other argument or arguments
remaining undetermined.[114]
33
For example a function (x, y) of two variables x and y can be reduced to a collection of
functions of a single variable, e.g., y, by considering the function for all possible values of
individuals ai substituted in place of variable x. And then the resulting collection of
functions of the single variable y, i.e., ai: (ai, y), can be reduced to a matrix of values by
considering the function for all possible values of individuals bi substituted in place of
variable y:
Alfred Tarski in his 1946 Introduction to Logic used the word matrix synonymously with
the notion of truth table as used in mathematical logic.
34
Applications of Matrices and Determinants
Area of a Triangle
Consider a triangle with vertices at (x 1,y1), (x2,y2), and (x3,y3). If the triangle was a right
triangle, it would be pretty easy to compute the area of the triangle by finding one-half the
product of the base and the height.
However, when the triangle is not a right triangle, there are a couple of other ways that the
area can be found.
Heron's Formula
35
If you know the lengths of the three
sides of the triangle, you can use Heron's Formula to find the area of the triangle.
s = 1/2 ( a + b + c )
Area = sqrt ( s ( s-a) ( s-b) ( s-c) )
Using the distance formulas, we can find that the lengths of the sides (arbitrarily assigning a,
b, and c) are a = 3 sqrt(2), b = sqrt(61), and c = sqrt(73).
s ( s - a ) ( s - b ) ( s - c ) = 1089 / 4
When you take the square root of that, you get 33/2, so the area of that triangle is 16.5.
36
Must know the lengths of the sides of the triangle. If you don't then you have to use
the distance formula to find the lengths of the sides of the triangle.
You have to compute the semi-perimeter, so chances are you will have fractions to
work with.
Lots of square roots are involved. For the lengths of the sides of the triangle and for
the area of the triangle.
Geometric Technique
The area of the triangle we desire will be the area of the rectangle minus the areas of the three
triangles.
The legs of the three triangles can be found by simple subtraction of coordinates and then
used to find the area since the area of a triangle is one-half the base times the height.
37
The sum of the areas of the triangles is 9/2 + 15 + 12 = 63 / 2 or 31.5.
The area of a rectangle is base times height, so the bounding rectangle has area = 8 ( 6 ) = 48.
The area of the triangle in the middle is the difference between the rectangle and the sum of
the areas of the three outer triangles.
Determinants
It turns out that the area of a triangle can also be found using determinants. The derivation of
the formula is kind of long and most of you don't care to see it, so it's on a separate page.
What you do is form a 33 determinant where the first column are the x's for all the points,
the second column are the y's for all the points, and the last column is all ones.
x y 1
point 1 -2 2 1
point 2 1 5 1
point 3 6 -1 1
-2 2 1
1 5 1 = + (-2) 5 1 -1 2 1 +6 2 1
6 -1 1 -1 1 -1 1 5 1
= -2 ( 5 + 1 ) - 1 ( 2 + 1 ) + 6 ( 2 - 5 ) = -2 ( 6 ) - 1 ( 3 ) + 6 ( -3 ) = -12 - 3 - 18 = -33.
38
It is possible that you will get a negative determinant, like we did here. Don't worry about
that. The sign is determined by the order you put the points in and can be easily changed just
by switching two rows of the determinant. Area, on the other hand, can't be negative, so if
you get a negative, just drop the sign and make it positive. Finally, divide it by 2 to find the
area.
| -33 | = 33
33 / 2 = 16.5, which was the area.
x1 y1 1
Area = 1/2 x2 y2 1
x3 y3 1
The plus/minus in this case is meant to take whichever sign is needed so the answer is
positive (non-negative). Do not say the area is both positive and negative.
Why not use absolute value, you ask? Well, think how confusing it would be to have the
absolute value of a determinant.
Continue with the idea of finding the area of a triangle. If the area of a triangle was equal to
zero, then there would be no triangle, the points would be collinear (on a line).
We can drop the plus/minus and the one-half from the front of the determinant, because the
area will be zero only if the determinant is zero.
39
Three points are collinear if and only if the determinant found by placing the x-coordinates in
the first column, the y-coordinates in the second column, and one's in the third column is
equal to zero.
x1 y1 1
Does x2 y2 1 equal 0?
x3 y3 1
You are not setting the determinant equal to zero, you are testing to see if the determinant is
zero.
Equation of a Line
You can force three points to be collinear by setting the determinant equal to zero.
Notice this time, that the actual variables x and y are in the determinant. That is because you
have two points on a line given, and the point (x,y) is a generic point on the line.
When you expand this, I strongly recommend that you expand along the first row. That way,
your multiplications to find the determinants won't involve x or y.
x y 1
x1 y1 1 = 0
x2 y2 1
Cramer's Rule
The derivation of Cramer's Rule can be found on another page. Here are the results.
40
3x + 5y = 12
2x - 4y = 9
x y
3 5
D= = - 22
2 -4
Let Dx be the determinant of the coefficient matrix where the x column has been replaced by
the constants from the right hand side.
rhs y
12 5
Dx = = - 93
9 -4
Let Dy be the determinant of the coefficient matrix where the y column has been replaced by
the constants from the right hand side.
x rhs
3 12
Dy = =3
2 9
41
In this case, D = -22, so Cramer's Rule will work and
x = Dx / D = -93 / -22 = 93/22
y = Dy / D = 3 / -22 = - 3/22
Cramer's rule is named after Gabriel Cramer who lived from 1704 - 1752. He is not the one
who developed the technique originally, however, as the Chinese were known to have used
the method before him.
Cramer's Rule can be extended to larger systems in the same manner. Simply replace each
column in the coefficient matrix by the right hand side, and then divide that determinant by
the determinant of the coefficient matrix.
For a 33 system, then let Dz be the determinant of the coefficient matrix where the z column
has been replaced by the constants from the right hand side.
If you need just one variable, you can find it with Cramer's Rule, whereas with the
other techniques you would have to find all of the variables.
Takes a long time to re-enter each matrix into the calculator to find the determinant.
Best suited for a computer or calculator program.
Doesn't always work. If the determinant of the coefficient matrix is zero, the
technique fails and it is either no solution or many solutions.
If D=0 and Dx=0 and Dy=0 then there are many solutions. You won't be able to tell
what the solutions are from Cramer's Rule, but there are many solutions. This is the
dependent case.
42
If D=0, but at least one of the other determinants is not zero, then there is no solution.
This is the inconsistent case.
Cryptography
Cryptography involves encrypting data so that a third party can not intercept and read the
data.
In the early days of satellite television, the video signals weren't encrypted and anyone with a
satellite dish could watch whatever was being shown. Well, this didn't work because all of the
networks using satellites didn't want the satellite dish owners to be able to receive their
satellite feed for no cost while cable subscribers had to pay for the channel, they were losing
money. So, they started encrypting the video signal with a system called Videocipher (later
replaced by Videocipher II).
What the Videocipher encryption system did was to convert the signal into digital form,
encrypt it, and send the data over the satellite. If the satellite dish owner had a Videocipher
box, and paid for the channel, then the box would descramble (unencrypt) the signal and
return it to its original, useful form.
This was done by using a key that was invertible. It was very important that they key be
invertible, or there would be no way to return the encrypted data to its original form.
Encryption Process
4. Convert the matrix into a stream of numerical values that contains the encrypted
message.
Example
43
Consider the message "Red Rum"
A message is converted into numeric form according to some scheme. The easiest scheme is
to let space=0, A=1, B=2, ..., Y=25, and Z=26. For example, the message "Red Rum" would
become 18, 5, 4, 0, 18, 21, 13.
This data was placed into matrix form. The size of the matrix depends on the size of the
encryption key. Let's say that our encryption matrix (encoding matrix) is a 2x2 matrix. Since I
have seven pieces of data, I would place that into a 4x2 matrix and fill the last spot with a
space to make the matrix complete. Let's call the original, unencrypted data matrix A.
18 5
4 0
A=
18 21
13 0
There is an invertible matrix which is called the encryption matrix or the encoding matrix.
We'll call it matrix B. Since this matrix needs to be invertible, it must be square.
This could really be anything, it's up to the person encrypting the matrix. I'll use this matrix.
4 -2
B=
-1 3
The unencrypted data is then multiplied by our encoding matrix. The result of this
multiplication is the matrix containing the encrypted data. We'll call it matrix X.
67 -21
X =AB = 16 -8
44
51 27
52 -26
The message that you would pass on to the other person is the the stream of numbers 67, -21,
16, -8, 51, 27, 52, -26.
Decryption Process
1. Place the encrypted stream of numbers that represents an encrypted message into a
matrix.
2. Multiply by the decoding matrix. The decoding matrix is the inverse of the encoding
matrix.
Example
The message you need to decipher is in the encrypted data stream 67, -21, 16, -8, 51, 27, 52,
-26.
The encryption matrix is not transmitted. It is known by the receiving party so that they can
decrypt the message. Other times, the inverse is known by the receiving party. The encryption
matrix can not be sent with the data, otherwise anyone could grab the data and decode the
information. Also, by not having the decoding matrix, someone intercepting the message
doesn't know what size of matrix to use.
The receiving end gets the encrypted message and places it into matrix form.
67 -21
45
16 -8
X=
51 27
52 -26
The receiver must calculate the inverse of the encryption matrix. This would be the
decryption matrix or the decoding matrix.
0.3 0.2
B-1 =
0.1 0.4
The receiver then multiplies the encrypted data by the inverse of the encryption matrix. The
result is the original unencrypted matrix.
18 5
4 0
A = X B-1 =
18 21
13 0
The receiver then takes the matrix and breaks it apart into values 18, 5, 4, 0, 18, 21, 13, 0 and
converts each of those into a character according to the numbering scheme. 18=R, 5=E, 4=D,
0=space, 18=R, 21=U, 13=M, 0=space.
Trailing spaces will be discarded and the message is received as intended: "RED RUM"
46
Applications of Matrix Mathematics
We see the results of matrix mathematics in every computer-generated image that has a
reflection, or distortion effects such as light passing through rippling water.
Before computer graphics, the science of optics used matrix mathematics to account for
reflection and for refraction.
Matrix arithmetic helps us calculate the electrical properties of a circuit, with voltage,
amperage, resistance, etc.
The field of probability and statistics may use matrix representations. A probability vector
lists the probabilities of different outcomes of one trial. A stochastic matrix is a square matrix
whose rows are probability vectors. Computers run Markov simulations based on stochastic
matrices in order to model events ranging from gambling through weather forecasting to
quantum mechanics.
47
Matrix mathematics simplifies linear algebra, at least in providing a more compact way to
deal with groups of equations in linear algebra.
An example of a square matrix with variables, rather than numbers, is . This is a square
matrix because the number of rows equals the number of columns.
We can only add matrices of the same dimensions, because we add the corresponding
elements. .
Matrix multiplication is another matter entirely. Lets multiply matrices MP=R. M is an mXn
matrix; P is nXp; and the result R will have dimension mXp. Note that the number of
columns of the left-hand matrix, M, must equal the number of rows of the right hand matrix,
P. For example:
48
Graphic Uses of Matrix Mathematics
Graphic software uses matrix mathematics to process linear transformations to render images.
A square matrix, one with exactly as many rows as columns, can represent a linear
transformation of a geometric object. For example, in the Cartesian X-Y plane, the matrix
reflects an object in the vertical Y axis. In a video game, this would render the upside-
down mirror image of a castle reflected in a lake.
If the video game has curved reflecting surfaces, such as a shiny silver goblet, the linear
transformation matrix would be more complicated, to stretch or shrink the reflection.
The Identity matrix is an nXn square matrix with ones on the diagonal and zeroes elsewhere.
It causes absolutely no change as a linear transformation; much like multiplying an ordinary
Suppose we have two square nXn matrices, A and B, such that AB=I n. Then we call B the
inverse matrix of A, and show it as A-1. The first practical point is that the inverse matrix A-1
reverses the changes made by the original linear transformation matrix A.
The Determinant
Another important task in matrix arithmetic is to calculate the determinant of a 2X2 square
On the other hand, if we apply M as the linear transformation of a unit square U into U M, then
the determinant |M| is the area of that transformed square. In a sense, the determinant is the
size, or norm, of a square matrix.
49
Daily Matrix Applications
However, few of us are likely to consciously apply matrix mathematics in our day to day
lives.
http://www.slideshare.net/moneebakhtar50/application-of-matrices-in-real-life
https://en.wikipedia.org/wiki/Matrix_%28mathematics%29
http://www.decodedscience.com/practical-uses-matrix-mathematics/40494
50