Tensor PDF
Tensor PDF
Tensor PDF
It is now well established that all matter consists of elementary particles1 that interact through
mutual attraction or repulsion. In our everyday life, however, we do not see the elemental nature
of matter, we see continuous regions of material such as the air in a room, the water in a glass or
the wood of a desk. Even in a gas, the number of individual molecules in macroscopic volumes is
enormous ≈ 1019 cm−3 . Hence, when describing the behaviour of matter on scales of microns or
above it is simply not practical to solve equations for each individual molecule. Instead, theoretical
models are derived using average quantities that represent the state of the material.
In general, these average quantities will vary with position in space, but an important concept to
bear in mind is that the fundamental behaviour of matter should not depend on the particular coor-
dinate system chosen to represent the position. The consequences of this almost trivial observation
are far-reaching and we shall find that it dictates the form of many of our governing equations. We
shall always consider our space to be three-dimensional and Euclidean2 and we describe position in
space by a position vector, r, which runs from a specific point, the origin, to our chosen location.
The exact coordinate system chosen will depend on the problem under consideration; ideally it
should make the problem as “easy” as possible.
1.1 Vectors
A vector is a geometric quantity that has a “magnitude and direction”. A more mathematically
precise, but less intuitive, definition is that a vector is an element of a vector space. Many physical
quantities are naturally described in terms of vectors, e.g. position, velocity and acceleration, force.
The invariance of material behaviour under changes in coordinates means that if a vector represents
a physical quantity then it must not vary if we change our coordinate system. Imagine drawing a
line that connects two points in two-dimensional (Euclidean) plane, that line remains unchanged
whether we describe it as “x units across and y units up from the origin” or “r units from the origin
in the θ direction”. Thus, a vector is an object that exists independent of any coordinate system,
but if we wish to describe it we must choose a specific coordinate system and its representation in
that coordinate system (its components) will depend on the specific coordinates chosen.
1
The exact nature of the most basic unit is, of course, still debated, but the fundamental discrete nature of matter
is not.
2
We won’t worry about relativistic effects at all.
7
1.1.1 Cartesian components and summation convention
The fact that we have a Euclidean space means that we can always choose a Cartesian coordinate
system with fixed orthonormal base vectors, e1 = i, e2 = j and e3 = k. For a compact notation, it
is much more convenient to use the numbered subscripts rather than different symbols to distinguish
the base vectors. Any vector quantity a can be written as a sum of its components in the direction
of the base vectors
a = a1 e1 + a2 e2 + a3 e3 ; (1.1)
and the vector a can be represented via its components (a1 , a2 , a3 ); and so, e1 = (1, 0, 0), e2 =
(0, 1, 0) and e3 = (0, 0, 1). We will often represent the components of vectors using an index, i.e.
(a1 , a2 , a3 ) is equivalent to aI , where I ∈ {1, 2, 3}. In addition, we use the Einstein summation
convention in which any index that appears twice represents a sum over all values of that index
3
X
J
a = a eJ = aJ eJ . (1.2)
J=1
Note that we can change the (dummy) summation index without affecting the result
3
X 3
X
J J
a eJ = a eJ = aK eK = aK eK .
J=1 K=1
The summation is ambiguous if an index appears more than twice and such terms are not allowed.
For clarity later, an upper case index is used for objects in a Cartesian (or, in fact, any orthonormal)
coordinate system and, in general, we will insist that summation can only occur over a raised index
and a lowered index for reasons that will hopefully become clear shortly.
It is important to recognise that the components of a vector aI do not actually make sense unless
we know the base vectors as well. In isolation the components give you distances but not direction,
which is only half the story.
because the Cartesian base vectors are fixed. Here the notation xK (ξ i ) means that the Cartesian
coordinates can be written as functions of the general coordinates, e.g. in plane polars x(r, θ) =
r cos θ, y(r, θ) = r sin θ, see Example 1.1. Note that equation (1.4) is the first time in which we use
upper and lower case indices to distinguish between the Cartesian and general coordinate systems.
A tangent vector in the ξ 1 direction, t1 , is the difference between two position vectors associated
with a small (infinitesimal) change in the ξ 1 coordinate
where dξ 1 represents the small change in the ξ 1 coordinate direction, see Figure 1.1.
t1 (ξ i ) ξ 2 constant
r(ξ i + dξ 1 )
i
r(ξ )
Figure 1.1: Sketch illustrating the tangent vector t1 (ξ) corresponding to a small change dξ 1 in the
coordinate ξ 1 . The tangent lies along a line of constant ξ 2 in two dimensions, or a plane of constant
ξ 2 and ξ 3 in three dimensions.
Assuming that r is differentiable and Taylor expanding the first term in (1.5) demonstrates that
∂r 1
t1 = r(ξ i ) + dξ − r(ξ i ) + O((dξ i )2 ),
∂ξ 1
which yields
∂r 1
t1 = dξ ,
∂ξ 1
if we neglect the (small) quadratic and higher-order terms. Note that exactly the same argument
can be applied to increments in the ξ 2 and ξ 3 directions and because dξ i are scalar lengths, it follows
that
∂r
gi = i
∂ξ
is also a tangent vector in the ξ i direction, as claimed. Hence, using equation (1.4), we can compute
tangent vectors in the general coordinate directions via
∂r ∂xK
gi = i = eK . (1.6)
∂ξ ∂ξ i
We can interpret equation (1.6) as defining a local linear transformation between the Cartesian
base vectors and our new tangent vectors g i . The transformation is linear because g i is a linear
combination of the vectors eK , which should not be a surprise because we explicitly neglected the
quadratic and higher terms in the Taylor expansion. The transformation is local because, in general,
the coefficients will change with position. The coefficients of the transformation can be written as
entries in a matrix, M, in which case equation (1.6) becomes
M
z
}| {
∂x1 ∂x2 ∂x3
g1 = ∂ξ 1 ∂ξ 1 ∂ξ 1 e1 (1.7)
g2 ∂x1 ∂x2 ∂x3 e .
∂ξ 2 ∂ξ 2 ∂ξ 2 2
g3 ∂x1 ∂x2 ∂x3 e3
∂ξ 3 ∂ξ 3 ∂ξ 3
Provided that the transformation is non-singular (the determinant of the matrix M is non-zero)
the tangent vectors will also be a basis of the space and they are called covariant base vectors
because the transformation preserves the tangency of the vectors to the coordinates. In general,
the covariant base vectors are neither orthogonal nor of unit length. It is also important to note
that the covariant base vectors will usually be functions of position.
Example 1.1. Finding the covariant base vectors for plane polar coordinates
A plane polar coordinate system is defined by the two coordinates ξ 1 = r, ξ 2 = θ such that
x = x1 = r cos θ and y = x2 = r sin θ.
Find the covariant base vectors.
Solution 1.1. The position vector is given by
r = x1 e1 + x2 e2 = r cos θe1 + r sin θe2 = ξ 1 cos ξ 2e1 + ξ 1 sin ξ 2 e2 ,
and using the definition (1.6) gives
∂r ∂r
g1 = = cos ξ 2 e1 + sin ξ 2 e2 , and g 2 = = −ξ 1 sin ξ 2 e1 + ξ 1 cos ξ 2 e2 .
∂ξ 1 ∂ξ 2
Note that g 1 is a unit vector,
q
√
|g 1 | = g 1 ·g 1 = cos2 ξ 2 + sin2 ξ 2 = 1,
but g 2 is not, |g 2 | = ξ 1 . The vectors are orthogonal g 1 ·g 2 = 0 and can be are related to the standard
orthonormal polar base vectors via g 1 = er and g 2 = reθ .
where the object δji is known as the Kronecker delta. In orthonormal coordinate systems the two
sets of base vectors coincide; for example, in our global Cartesian coordinates eI ≡ eI .
We can decompose g i into its components in the Cartesian basis g i = gK i K
e , where we have
used the raised index on the base vectors for consistency with our summation convention. Note
i
that gK is thus defined to be the K-th Cartesian component of the i-th contravariant base vector.
From the definition (1.10) and (1.6)
i K
∂xL i ∂x
L
K i ∂x
L
K i ∂x
L
gK e · eL = g K e ·eL = g K δ = g = δji . (1.11)
∂ξ j ∂ξ j ∂ξ j L L
∂ξ j
i K
Note that we have used the “index-switching” property of the Kronecker delta to write gK δL = gLi ,
which can be verified by writing out all terms explicitly.
∂ξ j
Multiplying both side of equation (1.11) by ∂x K yields
∂xL ∂ξ j i ∂ξ
j
∂ξ i
gLi = δj = ;
∂ξ j ∂xK ∂xK ∂xK
and from the chain rule
∂xL ∂ξ j ∂xL L
j K
= K = δK ,
∂ξ ∂x ∂x
because the Cartesian coordinates are independent. Hence,
∂ξ i
gLi δK
L i
= gK = ,
∂xK
and so the new set of base vectors are
∂ξ i K
i
g = Ke . (1.12)
∂x
The equation (1.12) defines a local linear transformation between the Cartesian base vectors and
the vectors g i . In a matrix representation, equation (1.12) is
M−T
z }| {
1
∂ξ 1 ∂ξ 1 ∂ξ 1
g = ∂x1 ∂x2 ∂x3
e1 (1.13)
g2 ∂ξ 2 ∂ξ 2 ∂ξ 2 e2 ,
∂x1 ∂x2 ∂x3
g3 ∂ξ 3 ∂ξ 3 ∂ξ 3 e3
∂x1 ∂x2 ∂x3
and we see that the new transformation is the inverse transpose5 of the linear transformation that
defines the covariant base vectors (1.6). For this reason, the vectors g i are called contravariant base
vectors.
Example 1.2. Finding the contravariant base vectors for plane polar coordinates
For the plane polar coordinate system defined in Example 1.1, find the contravariant base vectors.
Solution 1.2. The contravariant base vectors are defined by equation (1.12) and in order to use that
equation directly, we must express our polar coordinates as functions of the Cartesian coordinates
1
√ x2 2
r=ξ = x1 x1 + x2 x2 , and tan θ = tan ξ = 1 ,
x
and then we can compute
∂ξ 1 ∂ξ 1 ∂ξ 2 sin ξ 2 ∂ξ 2 cos ξ 2
= cos ξ 2 , = sin ξ 2 , = − and =
∂x1 ∂x2 ∂x1 ξ1 ∂x1 ξ1
Thus, using the transformation (1.13), we have
∂ξ 1 1 ∂ξ 1 2
g1 = e + 2 e = cos ξ 2 e1 + sin ξ 2 e2 = g 1 ,
∂x1 ∂x
where we have used the fact that eI = eI , and also
∂ξ 2 1 ∂ξ 2 2 sin ξ 2 cos ξ 2 1
g2 = 1
e + 2
e = − 1
e1 + 1
e2 = 1 2 g 2 .
∂x ∂x ξ ξ (ξ )
Similarly components of the vector a in the contravariant basis are given by taking the dot product
with the appropriate covariant base vectors
a = ak g k , where ai = a·g i = ak g k ·g i = ak δik = ai . (1.15)
5
That the inverse matrix is given by
∂ξ 1 ∂ξ 2 ∂ξ 3
∂x11 ∂x21 ∂x31
M−1 = ∂ξ
∂x12
∂ξ
∂x22
∂ξ
∂x32
,
∂ξ ∂ξ ∂ξ
∂x3 ∂x3 ∂x3
can be confirmed by checking that MM−1 = M−1 M = I, the identity matrix. Alternatively, the relationship follows
directly from equation (1.11) written in matrix form.
In fact, we can obtain the components of a general vector in either the covariant or contravariant
basis directly from the Cartesian coordinates. If a = aK eK = aK eK , then the components in the
covariant basis associated with the curvilinear coordinates ξ i are
∂ξ i J K ∂ξ
i
K ∂ξ
i
ai = a·g i = aK eK ·g i = aK e ·eK = a δK
J
= a ,
∂xJ ∂xJ ∂xK
a contravariant transform. Similarly, the components of the vector in the contravariant basis may
be obtained by covariant transform from the Cartesian components and so
∂xK
ai = aK (1.16a)
∂ξ i
and
∂ξ i K
ai = a . (1.16b)
∂xK
a = ai g i = ai g i . (1.17)
We now consider a change in coordinates from ξ i to another general coordinate system χi . It will
be of vital importance later on to know which index corresponds to which coordinate system so
we have chosen to add an overbar to the index to distinguish components associated with the two
coordinate systems, ξ i and χi . The covariant base vectors associated with χi are then
∂r ∂xJ ∂ξ k ∂xJ ∂ξ k
gi ≡ = eJ = eJ = gk ; (1.18)
∂χi ∂χi ∂χi ∂ξ k ∂χi
and the transformation between g i and g k is of the same (covariant) type as that between g i and eK
in equation (1.6). The transformation is covariant because the “new” coordinate is the independent
6
The logic for the choice of index location is the position of the generalised coordinate in the partial derivative
defining the transformation:
∂xK ∂ξ i
gi = eK (lowered index), gi = eK (raised index).
∂ξ i ∂xK
variable in the partial derivative (it appears in the denominator). In our new basis, the vector
a = ai g i and because a must remain invariant
a = ai g i = ai g i .
∂χi k
ai = a . (1.20)
∂ξ k
This transformation is contravariant because the “new” coordinate is the dependent variable in the
partial derivative (it appears in the numerator).
A similar approach can be used to show that the components in the contravariant basis must
transform covariantly in order to ensure that the vector remains invariant. Thus, the use of our
summation convention ensures that the summed quantities remain invariant under coordinate trans-
formations, which will be essential when deriving coordinate-independent physical laws.
Interpretation
The fact that base vectors and vector components must transform differently for the vector to
remain invariant is actually quite obvious. Consider a one-dimensional Euclidean space in which
a = a1 g1 . If the base vector is rescaled7 by a factor λ so that g1 = λg1 then to compensate the
component must be rescaled by the factor 1/λ: a1 = λ1 a1 . Note that for a 1 × 1 transformation
matrix with entry λ, the inverse transpose is 1/λ.
∂xK i ∂ξ i
gi = eK = g = eK ,
∂ξ i ∂xK
and so
∂xK ∂ξ i
= . (1.21)
∂ξ i ∂xK
Hence, the covariant and contravariant transformations are identical in orthonormal coordinate
systems, which means that there is no need to distinguish between raised and lowered indices. This
simplification is adopted in many textbooks and the convention is to use only lowered indices. When
working with orthonormal coordinates we will also adopt this convention for simplicity, but we must
always make sure that we know when the coordinate system is orthonormal. It is for this reason
that we have adopted the convention that upper case indices are used for orthonormal coordinates.
7
In one dimension all we can do is rescale the length, although the scaling can vary with position.
If the coordinate system is not known to be orthonormal, we will use lower case indices and must
distinguish between the covariant and contravariant transformations.
Condition (1.21) implies that
MMT = I ⇒ MT M = I,
where I is the identity matrix. In other words the components of the transformation form an
orthogonal matrix. It follows that (all) orthonormal coordinates can only be generated by an
orthogonal transformation from the reference Cartesians. This should not be a big surprise: any
other transform will change the angles between the base vectors or their relative lengths which
destroys orthonormality. The argument is entirely reversible: if either the covariant or contravariant
transform is orthogonal then the two transforms are identical and the new coordinate system is
orthonormal.
An aside
Further intuition for the reason why the covariant and contravariant transformations are identical
when the coordinate transform is orthogonal can be obtained as follows. Imagine that we have a
general linear transformation represented as a matrix M the acts on vectors such that components
in the fixed Cartesian coordinate system p = pK eK transform as follows
p̃K = MJK pJ .
Note that the index K does not have an overbar because p̃K is a component in the fixed Cartesian
coordinate system, eK . The transformation can, of course, also be applied to the base vectors of
the fixed Cartesian coordinate system eI ,
where [ ]K indicates the K-th component of the base vector. Now, [eI ]J = δIJ and it follows that
[ẽI ]K = MIK ,
which allows us to define the operation of the matrix components on the base vectors directly
because
ẽI = [ẽI ]K eK = MIK eK . (1.23a)
Thus the operation of the transformation on the components is the transpose of its operation on
the base vectors8 . We could write the new base vectors as eI˜ = MI˜K eK to be consistent with our
previous notation, but this will probably lead to more confusion in the current exposition.
Now consider a vector a that must remain invariant under our transformation. Let the vector
ã′ be the vector with the same numerical values of its components as a but with transformed base
vectors, i.e. ã′ = aK ẽK . Thus, the vector ã′ will be a transformed version of a. In order to ensure
that the vector remains unchanged under transformation we must apply the appropriate inverse
8
This statement also applies to general bases.
transformation to ã′ relative to the new base vectors, ẽI . In other words, the transformation of the
coordinates must be
ãK = [M −1 ]K ′J
J ã = [M
−1 K J
]J a , (1.23b)
where we have used the fact that ã′J = aJ by definition. Using the two transformation equations
(1.23a,b) we see that
ãK ẽK = [M −1 ]K J L
J a MK eL = [M
−1 K
]J MKL aJ eL = δJL aJ eL = aJ eJ ,
as required.
Thus, we have the two results: (i) a general property of linear transformations is that the matrix
representation of the transformation of vector components is the transpose of the matrix represen-
tation of the transformation of base vectors; (ii) in order to remain invariant the transform of the
components of the vector must actually undergo the inverse of the coordinate transformation. Thus,
the transformations of the base vectors and the coordinates coincide when the inverse transform is
equal to its transpose, i.e. when the transform is orthogonal.
If that all seems a bit abstract, then hopefully the following specific example will help make the
ideas a little more concrete.
Solution 1.3. The original and rotated bases are shown in Figure 1.2(a) from which we determine
that the new base vectors are given by
e2 e2
e2 e2
p′
e1 p e1
θ θ
θ
(a) e1 (b) e1
Figure 1.2: (a) The base vectors eI are the Cartesian base vectors eI rotated through an angle θ
about the origin. (b) If the coordinates of the position vector p are unchanged it is also rotated by
θ to p′ .
Consider a position vector p = pI eI in the original basis. If we leave the coordinates unchanged
then the new vector p′ = pI eI is the original vector rotated by θ, see Figure 1.2(b).
We must therefore rotate the position vector p′ through an angle −θ relative to the fixed basis
eI , but this is actually equivalent to a positive rotation of the base vectors. Hence the transforms
for the components of vector and the base vectors are the same.
1.2 Tensors
Tensors are geometric objects that have magnitude and zero, one or many associated directions,
but are linear in character. A more mathematically precise definition is to say that a tensor is
multilinear map or alternatively an element of a tensor product of vector spaces, which is somewhat
tautological and really not helpful at this point. The order (or degree or rank) of a tensor is
the number of associated directions and so a scalar is a tensor of order zero and a vector is a
tensor of order one. Many quantities in continuum mechanics such as strain, stress, diffusivity and
conductivity are naturally expressed as tensors of order two. We have already seen an example
of a tensor in our discussion of vectors: linear transformations from one set of vectors to another,
e.g. the transformation from Cartesian to covariant base vectors, are second-order tensors. If the
vectors represent physical objects, then they must not depend on the coordinate representation
chosen. Hence, the linear transformation must also be independent of coordinates because the
same vectors must always transform in the same way. We can write our linear transformation in a
coordinate-independent manner as
a = M(b), (1.24)
and the transformation M is a tensor of order two. In order to describe M precisely we must
pick a specific coordinate system for each vector in equation (1.24). In the global Cartesian basis,
equation (1.24) becomes
aI eI = M(bJ eJ ) = bJ M(eJ ), (1.25)
because it is a linear transformation. We now take the dot product with eK to obtain
where the dot product is written on the left to indicate that we are taking the dot product after
the linear transformation has operated on the base vector eJ . Hence, we can write the operation of
the transformation on the components in the form
aI = MIJ bJ , (1.26)
where MIJ = eI ·M(eJ ). Equation (1.26) can be written in a matrix form to aid calculation
a1 M11 M12 M13 b1
a2 = M21 M22 M23 b2 .
a3 M31 M32 M33 b3
The quantity MIJ represents the component of the transformed vectors in the I-th Cartesian di-
rection if the original vector is of unit length in the J-th direction. Hence, the quantity MIJ is
meaningless without knowing the coordinate system associated with both I and J.
In fact, there is no need to choose the same coordinate system for I and J. If we write the
vector a in the covariant basis, equation (1.25) becomes
ai g i = bJ M(eJ ).
Taking the dot product with the appropriate contravariant base vector gives
∂ξ k
ak = bJ g k ·M(eJ ) = bJ eK ·M(eJ ),
∂xK
which means that
∂ξ k ∂ξ k
ak = MJk bJ = MKJ bJ ⇒ M k
J = MKJ .
∂xK ∂xK
In other words the components of each (column) vector corresponding to a fixed second index in
a coordinate representation of M must obey a contravariant transformation if the associated basis
undergoes a covariant transform, i.e. the behaviour is exactly the same as for the components of a
vector.
If we now also represent the vector b in the covariant basis, equation (1.25) becomes
ai g i = bj M(gj ).
Taking the dot product with the appropriate contravariant base vector gives
k
J k J
k j k j ∂ξ ∂x j ∂ξ ∂x
a = b g ·M(gj ) = b e K ·M eJ = b eK ·M(eJ ),
∂xK ∂ξ j ∂xK ∂ξ j
k ∂ξ k ∂xJ j ∂ξ k ∂xJ
a = Mjk bj = K MKJ j b ⇒ Mjk = K MKJ j ,
∂x ∂ξ ∂x ∂ξ
and the components of each (row) vector associated with a fixed first index in a coordinate repre-
sentation of M undergo a covariant transformation when the associated basis undergoes a covariant
transform, i.e. the “opposite behaviour” to the components of a vector. The difference in behaviour
between the two indices of the components of the linear transformation arises because one index
corresponds to the basis of the “input” vector, whereas the other corresponds to the basis of the
“output” vector. There is a sum over the second (input) index and the components of the vector b
and in order for this sum to remain invariant the transform associated with the second index must
be the opposite to the components of the vector b, in other words the same as the transformation
of the base vectors of that vector.
The obvious relationships between components can easily be deduced when we represent our
vectors in the contravariant basis,
Many books term M ij a contravariant second-order tensor; Mik a covariant second-order tensor
and Mij a mixed second-order tensor, but they are simply representations of the same coordinate-
independent object in different bases. Another more modern notation is to say that M ij is a type
(2, 0) tensor, MiJ is type (0, 2) and Mji is a type (1, 1) tensor, which allows the distinction between
mixed tensors or orders greater than two.
1.2.1 Invariance of second-order tensors
Let us now consider a general change of coordinates from ξ i to χi . Given that
ai = Mji bj , (1.28a)
we wish to find an expression for Mji such that9
ai = Mji bj . (1.28b)
Using the transformation rules for the components of vectors (1.19) it follows that (1.28a)
becomes
∂ξ i n i ∂ξ
j
a = M j bn .
∂χn ∂χn
We now multiply both sides by ∂χm /∂ξ i to obtain
∂χm ∂ξ i n m n m ∂χm i ∂ξ j n
a = δn a = a = M b .
∂ξ i ∂χn ∂ξ i j ∂χn
Comparing this expression to equation (1.28b) it follows that
∂χi n ∂ξ m
Mji = M ,
∂ξ n m ∂χj
and thus we see that covariant components must transform covariantly and contravariant compo-
nents must transform contravariantly in order for the invariance properties to hold. Similarly, it
can be shown that
∂χi ∂χj ∂ξ n ∂ξ m
M i j = n m M nm , and Mi j = Mnm . (1.29)
∂ξ ∂ξ ∂χi ∂χj
An alternative definition of tensors is to require that they are sets of index quantities (multi-
dimensional arrays) that obey these transformation laws under a change of coordinates.
The transformations can be expressed in matrix form, but we must distinguish between the
covariant and contravariant cases. We shall write M♭ to indicate a matrix where all components
transform covariantly and M♯ for the contravariant case. We define the transformation matrix F to
have the components
∂χ1 ∂χ1 ∂χ1
∂ξ 1 ∂ξ 2 ∂ξ 3
∂χi
∂χ2 ∂χ2 ∂χ2
F= ∂ξ 1 ∂ξ 2 ∂ξ 3
, or Fji = ;
∂ξ j
∂χ3 ∂χ3 ∂χ3
∂ξ 1 ∂ξ 2 ∂ξ 3
and then from the chain rule and independence of coordinates
1 1 1
∂ξ ∂ξ ∂ξ
∂χ1 ∂χ2 ∂χ3
∂ξ 2 ∂ξ 2 ∂ξ 2 ∂ξ i
F−1 = ∂χ1 ∂χ2 ∂χ3
, or [F −1 ]ij = .
∂χj
∂ξ 3 ∂ξ 3 ∂ξ 3
∂χ1 ∂χ2 ∂χ3
♯ ♭
If M and M represent the matrices of transformed components then the transformation laws (1.29)
become
♯ ♭
M = FM♯ FT , and M = F−T M♭ F−1 . (1.30)
9
This is a place where the use of overbars makes the notation look cluttered, but clarifies precisely which coordinate
system is associated with each index. This notation also allows the representation of components in the two different
coordinate systems, so-called two-point tensors, e. g. Mji , which will be useful.
1.2.2 Cartesian tensors
If we restrict attention to orthonormal coordinate systems, then the transformation between co-
ordinate systems must be orthogonal10 and we do not need to distinguish between covariant and
contravariant behaviour. Consider the transformation from our Cartesian basis eI to another or-
thonormal basis eI . The transformation rules for components of a tensor of order two become
∂xN ∂xM
MI J = MN M .
∂xI ∂xJ
The transformation between components of two vectors in the different bases are given by
∂xI ∂xK
aI = K aK = aK ,
∂x ∂xI
which can be written the form
∂xI ∂xK
aI = QIK aK , where QIK = = ,
∂xK ∂xI
and the components QIK form an orthogonal matrix. Hence the transformation property of a
(Cartesian) tensor of order two can be written as
MI J = QIN MN M QJM
or in matrix form
M = QMQT . (1.31)
In many textbooks, equation (1.31) is defined to be the transformation rule satisfied by a (Cartesian)
tensor of order two.
a · b = (ai g i )·(bj g j ),
10
Although the required orthogonal transformation may vary with position, as is the case in plane polar coordinates.
and so
a · b = ai bj g i ·g j = ai bj δji = ai bi .
An alternative decomposition demonstrates that
a · b = ai bi ,
and we note that the scalar product is invariant under coordinate transformation, as expected. In
orthonormal coordinate systems, there is no distinction between co and contravariant bases and so
a · b = aK bK .
In order to represent the vector product with index notation it is convenient to define a quantity
known as the alternating, or Levi-Civita, symbol eIJK . In orthonormal coordinate systems the
components of eIJK are defined by
0 when any two indices are equal;
IJK
e = eIJK = +1 when I, J, K is an even permutation of 1, 2, 3; (1.32)
−1 when I, J, K is an odd permutation of 1,2,3.
e.g.
e112 = e122 = 0, e123 = e312 = e231 = 1, e213 = e132 = e321 = −1.
Strictly speaking eIJK thus defined is not a tensor because if the handedness of the coordinate system
changes then the sign of the entries in eIJK should change in order for it to respect the appropriate
invariance properties; such objects are sometimes called pseudo-tensors. We could ensure that eIJK
is a tensor by restricting our definition to right-handed (or left-handed) orthonormal systems, which
will be the approach taken in later chapters.
The vector product of two vectors a and b in orthonormal coordinates is
which can be confirmed by writing out all the components. In addition, the relationship between
the Cartesian base vectors eI can be expressed as a vector product using the alternating tensor
eI × eJ = eIJK eK . (1.34)
Let us now consider the case of general coordinates: the cross product between covariant base
vectors is given by
g i × g j = eijK eK .
If we now transform the third index covariantly we must transform the base vector contravariantly
so that
∂ξ k
g i × g j = ǫijk K eK = ǫijk g k ; (1.35)
∂x
∂xI ∂xJ ∂xK
where ǫijk ≡ e .
∂ξ i ∂ξ j ∂ξ k IJK
A similar argument shows that
g i × g j = ǫijk g k ,
∂ξ ∂ξ ∂ξi j k
where ǫijk =≡ ∂x I ∂xJ ∂xK e
IJK
.
If we decompose the vectors a and b into the contravariant basis we have
Thus, if we decompose the vector product into the covariant basis we have the following expression
for the components
[a × b]k = ǫijk ai bj , or [a × b]i = ǫijk aj bk .
The components Mij correspond to the representation of tensor with respect to a basis, but which
basis? We shall define the basis to be that formed from the tensor product of pairs of base vectors:
g i ⊗ g j , where the symbol ⊗ is used to denote the tensor product. Hence, we can represent a tensor
in the different forms
a = aI eI = ai g i = ai g i .
and because Mij are just coefficients it follows that g i ⊗ g j are themselves tensors of second order11 .
Decomposing a and b into the contravariant and covariant bases respectively gives
Surface elements
Any surface ξ 1 is constant is spanned by the covariant base vectors g 2 and g 3 and so an infinitesimal
area within that surface is given by
and I J K
∂x ∂x ∂x p
|g 2 × g 3 | = |ǫ231 g | = |ǫ231 ||g | = 2 3
1 1
eIJK
g 1 ·g 1 .
(1.45)
∂ξ ∂ξ ∂ξ 1
The term between the modulus signs is the determinant12 of the matrix with components ∂xI /∂ξ j ,
so from the definition of the determinant of the metric tensor (1.44),
p
dS(1) = gg 11 dξ 2 dξ 3 .
Now
√
|g1 ·ǫ231 g 1 | = |ǫ231 | = g,
using the same argument as in equation (1.45). Thus, the volume element is given by
√
dV = gdξ 1 dξ 2 dξ 3 , (1.46)
which is, of course, equivalent to the standard expression for change of coordinates in multiple
integrals
∂(x1 , x2 , x3 ) 1 2 3
dV = dξ dξ dξ .
∂(ξ 1 , ξ 2, ξ 3 )
ai = a·gi = aj g j ·g i = aj g ji .
Thus the contravariant metric tensor can be used to raise indices and similarly the covariant metric
tensor can be used to lower indices
ai = aj gji .
∂a ∂ξ j ∂a
= ,
∂χi ∂χi ∂ξ j
so it transforms covariantly. If we decompose the vector into the covariant basis we have that
a,j = ai g i ,j = ai,j g i + ai g i,j , (1.47)
v=u ⇒ vI eI = uI eI ⇒ vI = uI ,
where formally the last equation is obtained via dot product with a suitable base vector. In Carte-
sian components we can simply take derivatives of the component equations with respect to the
coordinates because the base vectors do not depend on the coordinates
In other words, the components of the differentiated vector equation are simply the derivatives of
the components of the original equation.
In a general coordinate system, the base vectors are not independent of the coordinates and so
the second term in equation (1.47) cannot be neglected. The vector equation
a=b ⇒ ai gi = bi g i ⇒ ai = bi ,
where the last equation is now obtained by taking the dot product with the contravariant base
vectors. However, in general coordinates,
Although the final equation is (obviously) true from direct differentiation of each component the
statement is not coordinate-independent because it does not obey the correct transformation prop-
erties for a mixed second-order tensor.
In fact, the derivatives of the base vectors are given by
∂2r
g i,j = = r ,ij = r ,ji ,
∂ξ j ∂ξ i
assuming symmetry of partial differentiation. If we decompose the position vector into the fixed
Cartesian coordinates, we have
∂ 2 xK
g i,j = j i eK ,
∂ξ ∂ξ
which can be written in the contravariant basis as
g i,j = Γijk g k ,
where
∂2r ∂ 2 xK ∂xK
Γijk = g i,j ·g k = ·g = . (1.48)
∂ξ j ∂ξ i k ∂ξ i ∂ξ j ∂ξ k
The symbol Γijk is called the Christoffel symbol of the first kind. Thus, we can write
a,j = ai,j g i + ai Γijk g k = ai,j g i + ai Γijk g kl g l = al,j + Γlij ai g l ,
where
Γlij = g klΓijk = g klg k ·g i,j = g l ·g i,j , (1.49)
and Γlij is called the Christoffel symbol of the second kind.
Finally, we obtain
a,j = ak |j g k , where ak |j = ak,j + Γkij ai ,
and the expression ak |j is the covariant derivative of the component ak . In many books the covariant
derivative is represented by a subscript semicolon ak;j and in some it is denoted by a comma, just
to confuse things. The fact that a,j is a covariant vector and that g k is a covariant vector means
that the covariant derivative ak |j is a (mixed) second-order tensor13 . Thus when differentiating
equations expressed in components in a general coordinate system we should write
ai = bi ⇒ ai |j = bi |j .
If we had decomposed the vector a into the contravariant basis then
a,j = ai,j g i + ai g i,j .
and from equation (1.12),
∂ξ k K ∂xI ∂ξ k ∂xI K
gk = e ⇒ gk = I K
e = δK e = eI . (1.50)
∂xK ∂ξ k K
∂x ∂ξ k
We differentiate equation (1.50) with respect to ξ j , remembering that eI is a constant base vector,
to obtain
∂xI ∂ 2 xI
g k,j k + g k j k = 0,
∂ξ ∂ξ ∂ξ
∂xI ∂ξ i k ∂ξ
i
∂ 2 xI
⇒ g k,j = −g ⇒ g i,j = −g k,j ·g i g k = −Γikj g k .
∂ξ k ∂xI ∂xI ∂ξ j ∂ξ k
Thus the derivative of the vector a when decomposed into the contravariant basis is
a,j = ai,j g i − Γijk ai g k = ak,j − Γijk ai g k ;
and finally, we obtain
a,j = ai |j g i , where ai |j = ai,j − Γkji ak ,
which gives the covariant derivative of the covariant components of a vector.
In Cartesian coordinate systems, the base vectors are independent of the coordinates, so ΓIJK ≡
0 for all I, J, K and the covariant derivative reduces to the partial derivative:
aI |K = aI |K = aI,K .
Therefore, in most cases, when generalising equations derived in Cartesian coordinates to other
coordinate systems we simply need to replace the partial derivatives with covariant derivatives.
The partial derivative of a scalar already exhibits covariant transformation properties because
∂φ ∂ξ j ∂φ
= ;
∂χi ∂χi ∂ξ j
and the partial derivative of a scalar coincides with the covariant derivative
φ,i = φ|i .
The covariant derivative of higher-order tensors can be constructed by considering the covariant
derivative of invariants and insisting that the covariant derivative obeys the product rule.
13
Thus, the covariant derivative exhibits tensor transformation properties, which the partial derivative does not
because !
∂ai ∂ξ k ∂ ∂χi l ∂ξ k ∂χi ∂al ∂ξ k ∂ 2 χi l
= a = + a,
∂χj ∂χj ∂ξ k ∂ξ l ∂χj ∂ξ l ∂ξ k ∂χj ∂ξ k ∂ξ l
and the presence of the second term violates the tensor transformation rule.
1.5.2 Vector differential operators
Gradient
The gradient of a scalar field f (x) is denoted by ∇f , or gradf , and is defined to be a vector field
such that
f (x + dx) = f (x) + ∇f ·dx + o(|dx|); (1.51)
i.e. the gradient is the vector that expresses the change in the scalar field with position.
Letting dx = th, dividing by t and taking the limit as t → 0 gives the alternative definition
f (x + th) − f (x) d
∇f ·h = lim = f (x + th)|t=0 . (1.52)
t→0 t dt
Here, the derivative is the directional derivative in the direction h.
If we decompose the vectors into components in the global Cartesian basis equation (1.52)
becomes
∂f d ∂f
[∇f ]K hK = (xK + thK )|t=0 = hK ,
∂xK dt ∂xK
and so because h is arbitrary, we can define
∂f
∇f = [∇f ]K eK = eK , (1.53)
∂xK
which should be a familiar expression for the gradient. Relative to the general coordinate system
ξ i equation (1.53) becomes
∂f ∂ξ i ∂f
∇f = i eK = i g i = f,i g i , (1.54)
∂ξ ∂xK ∂ξ
so we can write the vector differential operator ∇ = g i ∂ξ∂ i . Note that because the derivative
transforms covariantly and the base vector transform contravariantly the gradient is invariant under
coordinate transform.
sin ξ 2 cos ξ 2
g 1 = cos ξ 2 e1 + sin ξ 2 e2 , g2 = − e1 + e2 ,
ξ1 ξ1
so
2
∂f
2 sin ξ 2 cos ξ 2 ∂f
∇f = cos ξ e1 + sin ξ e2 1
+ − 1 e1 + 1
e2 ,
∂ξ ξ ξ ∂ξ 2
which can be written in the (hopefully) familiar form
∂f 1 ∂f
∇f = er + eθ .
∂r r ∂θ
Gradient of a Vector Field
The gradient of a vector field F (x) is a second-order tensor ∇F (x) also often written as ∇ ⊗ F
that arises when the vector differential operator is applied to a vector
∇ ⊗ F = g i ⊗ F ,i = F k |i g i ⊗ g k = Fk |i g i ⊗ g k .
Note that we have used the covariant derivative because we are taking the derivative of a vector
decomposed into the covariant or contravariant basis.
Divergence
The divergence of a vector field is the trace or contraction of the gradient of the vector field. It is
formed by taking the dot product of the vector differential operator ∇ with the vector field:
∂
divF = ∇·F = g i ·F ,i = g i · i
F j
g j = g i ·F j |i g j = δji F j |i = F i |i ,
∂ξ
so that
divF = F i |i = F,ii + Γiji F j = F,ii + Γiij F j . (1.55)
Curl
The curl of a vector field is the cross product of our vector differential operator ∇ with the vector
field
∂F ∂ (Fj g j )
curlF = ∇ × F = g i × i = g i × = g i × Fj |i g j = ǫijk Fj |i g k .
∂ξ ∂ξ i
Laplacian
The Laplacian of a scalar field f is defined by
∇2 f = ∆f = ∇·∇f.
because the partial and covariant derivatives are the same for scalar quantities. Now g ij |j is a tensor
(covariant first order), which means that we know how it transforms under change in coordinates,
but in Cartesian coordinates g ij |j becomes δ,J
IJ
= 0, so g ij |j = 0 in all coordinate systems14 . Thus,
2 ij 1 ∂ √ ∂f
ij
∇ f = (g f,i )|j = √ gg .
g ∂ξ j ∂ξ i
The last equality is not obvious and is proved in the next section, equation (1.58)
14
This is a special case of Ricci’s Lemma.
1.6 Divergence theorem
The divergence theorem is a generalisation of the fundamental theorem of calculus into higher
dimensions. Consider the volume given by a ≤ ξ 1 ≤ b, c ≤ ξ 2 ≤ d and e ≤ ξ 3 ≤ f . If we want to
calculate the outward flux of a vector field F (x) from this volume then we must calculate
ZZ
F · n dS,
where n is the outer unit normal to the surface and dS is an infinitesimal element of the surface
area.
From section 1.4.1, we know that on the faces ξ 1 = a, b,
p
dS = gg 11 dξ 2 dξ 3 ,
1 1
and a normal
p p to the surface is given by the contravariant base vector g , with length |g | =
1 1
g 1 ·g 1 = g 11 . On the face ξ = a the vector g is directed into the volume (it is oriented along
the line of increasing coordinate) and on ξ 1 = b, g 1 is directed out of the volume, so the net outward
flux through these faces is
Z dZ f p p Z dZ f p p
2 3 1 2 3
11 11
F (a, ξ , ξ )·(−g / g ) gg dξ dξ + F (b, ξ 2 , ξ 3)·(+g 1 / g 11 ) gg 11dξ 2dξ 3
c e c e
Z d Z f h p p i
= F (b, ξ , ξ ) g(b, ξ , ξ ) − F (a, ξ , ξ ) g(a, ξ , ξ ) dξ 2 dξ 3
1 2 3 2 3 1 2 3 2 3
c e
From the fundamental theorem of calculus, the integral becomes
Z dZ f Z b
√
(F 1 g),1 dξ 1 dξ 2 dξ 3 ,
c e a
and using similar arguments, the total outward flux from the volume is given by
ZZ Z bZ dZ f
√
F · n dS = (F i g),i dξ 1dξ 2dξ 3.
a c e
√
From section 1.4.1 we also know that an infinitesimal volume element is given by dV = g dξ 1 dξ 2 dξ 3 ,
so ZZ ZZZ ZZZ
1 i√ √ 1 2 3 1 √
F · n dS = √ (F g),i g dξ dξ dξ = √ (F i g),i dV.
g g
The volume integrand is
1 √ 1 √
√ (F i g),i = F,ii + √ F i ( g),i ,
g g
but
1 √ 1 ∂g 1 ∂g ∂gjk
√ ( g),i = i
= ,
g 2g ∂ξ 2g ∂gjk ∂ξ i
by the chain rule. From equations (1.42) and (1.44)
D kl ∂g l ∂g
gjk g kl = gjk = δjl ⇒ gjk D kl = g δjl ⇒ δj = = D kl = gg kl = gg lk , (1.56)
g ∂gjk ∂glk
which means that
1 √ 1 ∂gjk 1 1 1 j
√ ( g),i = gg jk i = g jk g j,i ·g k + g j ·g k,i = g j,i ·g j + g k,i ·g k = Γji + Γkki = Γjji ,
g 2g ∂ξ 2 2 2
(1.57)
from the definition of the Christoffel symbol (1.49). Thus the volume integrand becomes
1 √ j
√ F i g ,i = F,ii + F i Γji = F i |i , (1.58)
g
using the definition of the divergence (1.55). In fact, one should argue that the definition of the
divergence is chosen so that it is the quantity that appears within the volume integral in the
divergence theorem. The form of the theorem that we shall use most often is
ZZ ZZ ZZZ ZZZ
i i i 1 √
F ni dS = Fi n dS = F |i dV = √ (F i g),i dV. (1.59)
g
You might argue that the proof above is only valid for simple parallelepipeds in our general coordi-
nate system, but we can extend the result to more general volumes by breaking the volume up into
regions, each of which can be described by a parallelepiped in a general coordinate systems.
where the vector area is dS = n dS, and n is a unit vector normal to the surface. Again we can
make the argument that the definition of the curl is chosen so that this relationship is satisfied.
You may be interested to know that an entire theory (differential forms) can be constructed in
which the Stokes theorem is a fundamental result
Z Z
ω= dω,
∂C C
where ω is an n-form and dω is its differential. The theory encompasses all known integral theorems
because the appropriate n-dimensional differential forms coincide with the divergence and curl and
the theory provides a rational extension to higher dimensions.