The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
2006 SocietyforIndustrial
andAppliedMathematics
Vol.48,No. 3, pp.569-581
Abstract. Google's success derives in large part from its PageRank algorithm, which ranks the im
portance of web pages according to an eigenvector of a weighted link matrix. Analysis
of the PageRank formula provides a wonderful applied topic for a linear algebra course.
Instructors may assign this article as a project to more advanced students or spend one
or two lectures presenting the material with assigned homework from the exercises. This
material also complements the discussion of Markov chains in matrix algebra. Maple and
Mathematica files supporting this material can be found at www.rose-hulman.edu/-bryan.
DOI. 10.1137/050623280
1. Introduction. When Google went online in the late 1990s, one thing that set
it apart from other search engines was that its search result listings always seemed to
deliver the "good stuff" up front.With other search engines you often had to wade
through screen after screen of links to irrelevant web pages that just happened to
match the search text. Part of the magic behind Google is its PageRank algorithm,
which quantitatively rates the importance of each page on the web, allowing Google
to rank the pages and thereby present to the user the more important (and typically
most relevantand helpful)pages first.
Understanding how to calculate PageRank is essential for anyone designing a web
page that they want people to access frequently, since getting listed first in a Google
search leads tomany people looking at your page. Indeed, due to Google's prominence
as a search engine, its ranking system has had a deep influence on the development and
structure of the Internet and on what kinds of information and services get accessed
most frequently. Our goal in this paper is to explain one of the core ideas behind how
Google calculates web page rankings. This turns out to be a delightful application of
standard linear algebra.
Search engines such as Google have to do three basic things:
1. Crawl the web and locate all web pages with public access.
2. Index the data from step 1, so that they can be searched efficiently for relevant
key words or phrases.
*
Received by the editors January 25, 2005; accepted for publication (in revised form) January 5,
2006;published electronically August 1, 2006. $25,000,000,000 was the approximate market value of
Google when the company went public in 2004.
http://www.siam.org/journals/sirev/48-3/62328.html
"("Department ofMathematics, Rose-Hulman Institute of Technology, Terre Haute, IN 47803 (kurt.
bryan@rose-hulman.edu).
*Mathematics and Computer Science Amherst MA 01002
Department, College, Amherst, (tleise?
amherst.edu).
569
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
570 KURT BRYANAND TANYALEISE
1 ~3
Fig. I An example of a web with only four pages. An arrow from page A to page B indicates a link
from page A to page B.
3. Rate the importance of each page in the database, so that when a user does
a search and the subset of pages in the database with the desired information
has been found, the more important pages can be presented first.
This paper will focus on step 3. In an interconnected web of pages, how can one
meaningfully define and quantify the "importance" of any given page?
The rated importance of web pages is not the only factor in how links are pre
sented, but it is a significant one. There are successful ranking algorithms other than
PageRank. The interested reader will find a wealth of information about ranking
algorithms and search engines, and we list just a few references for getting started
(see the extensive bibliography in [9], for example, for a more complete list). For a
brief overview of how Google handles the entire process, see [6], and for an in-depth
treatment of PageRank, see [3] and a companion article [9]. Another article with
good concrete examples is [5]. For more background on PageRank and explanations
of essential principles of web design tomaximize a website's PageRank, go to the web
sites [4, 11, 14]. To find out more about search engine principles in general and other
ranking algorithms, see [2] and [8]. Finally, for an account of some newer approaches
to searching the web, see [12] and [13].
2.1. The Basic Idea. Inwhat follows we will use the phrase "importance score" or
just "score" for any quantitative rating of a web page's importance. The importance
score for any web page will always be a nonnegative real number. A core idea in
assigning a score to any given web page is that the page's score is derived from the
links made to that page from other web pages. The links to a given page are called
the backlinks for that page. The web thus becomes a democracy where pages vote for
the importance of other pages by linking to them.
Suppose the web of interest contains n pages, each page indexed by an integer k,
1 < k < n. A typical example is illustrated in Figure 1, in which an arrow from page
A to page B indicates a link from page A to page B. Such a web is an example of a
directed graph.1 We'll use Xk to denote the importance score of page k in the web. Xk
is nonnegative and xj > Xk indicates that page j ismore important than page k (so
xj = 0 indicates that page j has the least possible importance score).
1A graph consists of a set of vertices (in this context, the web pages) and a set of edges. Each
edge joins a pair of vertices. The graph is undirected if the edges have no direction. The graph is
directed if each edge (in the web context, the links) has a direction, that is, a starting and an ending
vertex.
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
EIGENVECTOR
THE $25,000,000,000 571
Xk = E Z'
n
(1ELk
where nj is the number of outgoing links from page j (which must be positive since if
j E Lk, then page j links to at least page k). We will assume that a link from a page
to itselfwill not be counted. In this "democracy of the web," you don't get to vote
foryourself!
Let's apply this approach to the four-page web of Figure 1. For page 1, we have
Xl = X3/1 + X4/2, since pages 3 and 4 are backlinks for page 1 and page 3 contains
only one link, while page 4 contains two links (splitting its vote in half). Similarly,
X2 = X1/3, X3 = xi/3 X2/2 + X4/2, and X4 x1/3 + x2/2. These linear equations
can be written Ax = x, where x [x1 X2 X3 X4]T and
oO I 1 -
100 0
(2) A= 3[ 2
L 3 2 J
This transforms the web ranking problem into the "standard" problem of finding an
eigenvector for a square matrix! (Recall that the eigenvalues A and eigenvectors x of
a matrix A satisfy the equation Ax = Ax, x 74 0, by definition.) We thus seek an
eigenvectorx with eigenvalue 1 forthematrix A. We will refertoA as the "link
matrix" forthegivenweb.
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
572 KURT BRYANAND TANYALEISE
It turns out that the link matrix A in (2) does indeed have eigenvectors with
eigenvalue 1, namely, all multiples of the vector [12 4 9 6]T (recall that any nonzero
multiple of an eigenvector is again an eigenvector). Let's agree to scale these "impor
tance score eigenvectors" so that the components sum to 1. In this case we obtain
1 ; 0.129 X3 9 = 6
X =32
31 0.387, X2 =4? 31 29 3 ,= 0.290, andX4
0.29, x 0.194. Note that
this ranking differs from that generated by simply counting backlinks. It might seem
surprising that page 3, linked to by all other pages, is not the most important. To
understand this, note that page 3 links only to page 1 and so casts its entire vote for
page 1. This, with the vote of page 2, results in page 1 getting the highest importance
score.
More generally, the matrix A for any web must have 1 as an eigenvalue if the
web has no dangling nodes (pages with no outgoing links). To see this,
in question
firstnote that for a general web of n pages, formula (1) gives rise to a matrix A with
= 1/nj if page j links to page i, and Aij = 0 otherwise. The jth column of A
Aid
then contains nj nonzero entries, each equal to 1/nj, and the column thus sums to 1.
This motivates the following definition, used in the study ofMarkov chains.
DEFINITION 1. A square matrix is called a column-stochastic matrix if all of its
entries are nonnegative and the entries in each column sum to 1.
The matrix A for a web with no dangling nodes is column-stochastic. We now
prove the following.
PROPOSITION 1. Every column-stochastic matrix has 1 as an eigenvalue.
Proof. Let A be an n x n column-stochastic matrix and let e denote an n
dimensional column vector with all entries equal to 1. Recall that A and its transpose
AT have the same eigenvalues. Since A is column-stochastic, it is easy to see that
ATe = e, so that 1 is an eigenvalue forAT and hence forA. I
In what follows, we use Vi (A) to denote the eigenspace for the eigenvalue 1 of a
matrixA.
column-stochastic
2.2. Shortcomings. Several difficulties arise with using formula (1) to rank web
sites. In this section we discuss two issues: webs with nonunique rankings and webs
with danglingnodes.
2.2.1. Nonunique Rankings. For our rankings, it is desirable that the dimension
of VI (A) equal 1, so that there is a unique eigenvector x with Ei xi = 1 that we can
use for importance scores. This is true in the web of Figure 1 and more generally is
always true for the special case of a strongly connected web (that is, you can get from
any page to any other page in a finite number of steps); see Exercise 10 below.
Unfortunately, it is not always true that the link matrix A will yield a unique
ranking for all webs. Consider the web in Figure 2, forwhich the linkmatrix is
0 1 0 0 0
I 0 0 0 0
2
2
L0 0 0 0 0 J
We find here that V1(A) is two-dimensional; one possible pair of basis vectors is x =
[1/2, 1/2, 0, 0,o]T and y = [0,0, 1/2,1/2, O]T. But note that any linear combination of
In 3
these two vectors yields another vector inV1(A), e.g., ,-[3/8, 3/8, 1/8, 1/8, n]T.
we shoulduse forthe rankings!
It is not clearwhich, ifany,of theseeigenvectors
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
THE $25,000,000,000
EIGENVECTOR 573
1 3
2 4
Fig. 2 A web of five pages, consisting of two disconnected "subwebs" Wi (pages 1 and 2) and W2
(pages 3, 4, 5).
It is no coincidence that for the web of Figure 2 we find that dim(Vi (A)) > 1. It is
a consequence of the fact that ifa web W, considered as an undirected graph (ignoring
which direction each arrow points), consists of r disconnected subwebs WI,. .Wr,
,
then dim(Vi (A)) > r, and hence there is no unique importance score vector x E V1 (A)
with Ei xi = 1. This makes intuitive sense: if a web W consists of r disconnected
subwebs WI, . .Wr,, then one would expect difficulty in finding a common reference
frame for comparing the scores of pages in one subweb with those in another subweb.
Indeed, it is not hard to see why a web W consisting of r disconnected subwebs
forces dim(V1 (A)) > r. Suppose a web W has n pages and r component subwebs
W.,. . .,Wr. Let ni denote the number of pages inWi. Index the pages inW1 with
indices 1 through nl, the pages inW2 with indices n1 + 1 through n1 + n2, the pages
inW3 with n1 + n2 + 1 through ni + n2 + n3, etc. In general, let Ni = _ nj for
i> 1,with No = 0, so Wi contains pages Ni-, + 1 through Ni. For example, in the
web of Figure 2, we can take N1 = 2 and N2 = 5, so Wi contains pages 1 and 2 and
W2 contains pages 3,4, and 5. The web in Figure 2 is a particular example of the
general case, inwhich the matrix A assumes a block diagonal structure
[A1 0 ... 0
0 A2 0 0
o :
o o 0 Alr
where Ai denotes the link matrix forWi. In fact, Wi can be considered as a web
in its own right. Each ni x ni matrix Ai is column-stochastic and hence possesses
some eigenvector v2 E Rni with eigenvector 1. For each i between 1 and r construct
a vector wi E in which has 0 components for all elements corresponding to blocks
other than block i. For example,
w4 I w2= [0J
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
574 KURT BRYANAND TANYALEISE
0
Awi = A vi wi.
0
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
THE $25,000,000,000
EIGENVECTOR 575
3.1. A Modification to the Link Matrix A. For an n-page web with no dangling
nodes, we can generate unambiguous importance scores as follows, including the case
of a web with multiple subwebs.
Let S denote an n x n matrix with all entries 1/n. The matrix S is column
stochastic, and it is easy to check that VI(S) is one-dimensional. We will replace the
matrix A with the matrix
where s is a column vector with all entries 1/n. Note that Sx = s if>i xi = 1.
We will provebelow thatV (M) is alwaysone-dimensional,
but firstlet's lookat
a couple of examples.
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
576 KURTBRYANAND TANYALEISE
Example 1. For the web of four pages in Figure 1 with matrix A given by (2),
the new formula gives (with m - 0.15)
n n
Sum both sides of inequality (6) from i = 1 to i = n, and swap the i and j summations.
Then use the fact thatM is column-stochastic (i Mij = 1 for all j) to find
n n n n n( \ n
a contradiction. Hence x cannot contain both positive and negative elements. If xi > 0
forall i (and not all xi are 0), thenxi > 0 followsimmediatelyfromx, = ,n=1M
andMi > 0. Similarlyxi,< 0 forall i impliesthateach xi < 0. 5
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
EIGENVECTOR
THE $25,000,000,000 577
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
578 KURT BRYANAND TANYALEISE
EXERCISE 11. Consider again the web in Figure 1, with the addition of a page 5
that links to page 3, where page 3 also links to page 5. Calculate the new ranking by
findingtheeigenvectorofM (correspondingto A = 1) thathas positivecomponents
summing to 1. Use m = 0.15.
EXERCISE 12. Add a sixth page that links to every page of theweb in the previous
exercise, but to which no other page links. Rank the pages using A, then ussingM
with m = 0.15, and compare the results.
EXERCISE 13. Constructa web consistingof twoormore subwebsand determine
therankinggiven byformula(3).
As of January 2005, the web contains at least eight billion pages how does one
computean eigenvectorforan eightbillionby eightbillionmatrix? One reasonable
approach is an iterative procedure called the power method (along with modifications),
which we will now examine for the special case at hand. It isworth noting that there
ismuch additional analysis one can do, and there are many improved methods for the
computation of PageRank. The reference [7]provides a typical example and additional
references.
4. Computing the Importance Score Eigenvector. The roughidea behind the
power method2 for computing an eigenvector of a matrix M is this: One starts with
a "typical" vector x0 then generates the sequence Xk = Mxk-l (so Xk = Mkxo) and
lets k approaches infinity. The vector xk is, to good approximation, an eigenvector
for the dominant (largest-magnitude) eigenvalue ofM. However, depending on the
magnitude of this eigenvalue, the vector xk may also grow without bound or decay
to the zero vector. One thus typically rescales at each iteration, say, by computing
can be vector norm.
xk = MXk-1,
where any
However, to guarantee that the power method converges at a reasonable rate, one
typically requires that the dominant eigenvalue A =1 forM be simple. That is, the
characteristic polynomial forM should be of the form p(A) =(A - 1)q(A) for some
polynomial q(A) of degree n - 1, where (A - 1) does not divide q(A). Alternatively,
M should possess no "generalized eigenvectors" for the eigenvalue A = 1 other than
the eigenvectors in V1(M) (see [1] formore on generalized eigenvectors).
Actually, the following proposition provides what we need to prove that the power
method converges in this case, but with no explicit reference to the algebraic nature
of the dominant A =1 eigenvalue. It can also be used to show that this eigenvalue is
simple.
DEFINITION v =
4. The 1-norm of a vector v is lvlll j Ivil.
PROPOSITION 4. Let M be a positive column-stochastic n x n matrix and letV
denote the subspace of RWnconsisting of vectors v such that ,j v. = 0. Then Mv EV
for any v c V, and
< cllvlli
11MvII1
for any v e V, where c = max,<j<n 11-2 minl<i<n Mii I< 1.
Proof. To see that Mv E V is straightforward: Let w = Mv, so that wi =
n1 Mijvj and
n n n n n \ n
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
THE $25,000,000,000
EIGENVECTOR 579
n n n
where ei = sgn(wi). Note that the ei are not all of one sign, since Ei wi = 0 (unless
w _ 0, inwhich case the bound clearly holds). Reverse the double sum to obtain
n n n
(7) Iw1il= Evj E( eiMij) = ajvj
j=l i=l j=l_
where aj = E>=1 eiMij. Since the ei are of mixed sign and >j Mij = 1 with 0 <
Mij < 1, it is easy to see that
Let c = max1 <j<n 11- 2min,<i<nMi3j. Observe that c < 1 and 1aj1 < c for all j.
From (7) we have
n n n n
=
Iwli =5 ajvj Eaj vj < E = cfVII,
j=1 j=1 j=1 j=1
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
580 KURT BRYANAND TANYALEISE
k
____ I[Mkxo-q|ll IMkxo-qlll
f~~~~JMklx0-q~J1
0 0.62
1 0.255 0.411
5 0.133 0.85
10 0.0591 0.85
50 8.87 x 10-5 0.85
It is clear that the bound IlMkxo - qlll < ckIIxo - qlll is rather pessimistic (note that
0.85 is the value 1 - m, and 0.85 turns out to be the second-largest eigenvalue for
M). One can show that in general the power method will converge asymptotically
according to MxkII - q IIl A2
I Ix-q
I IIl, where A2 is the second-largest eigenvalue
ofM. Moreover, forM of the formM = (1 - m)A + mS with A column-stochastic
and all Sij = 1/n, it can be shown that IA21< 1 - m (see, e.g., [1, Theorem 5.10]).
As a result, the power method will converge much more rapidly than indicated by
ck IIXo-qI1-. Nonetheless, the value of c in Proposition 4 provides a very simple bound
on the convergence of the power method here. It is easy to see that since all entries
ofM are at least m/n, we will always have c < 1 - 2m/n in Proposition 4.
As a practical matter, note that the n x n positive matrix M has no nonzero ele
ments, so the multiplication Mv for v E lRnwill typically take Q(nr2) multiplications
and additions, a formidable computation ifn = 8,000,000,000. But (4) shows that ifx
ispositivewith Ixli Mx
1, thenthemultiplication isequivalentto (1- m)Ax+ms.
This efficient computation, since A can be expected to contain mostly
is a farmore
zeros (most web pages link to only a few other pages). We've now proved our main
theorem.
THEOREM 5. The matrix M defined by (3) for a web with no dangling nodes will
always be a positive column-stochastic matrix and so have a unique q with positive
components such thatMq q and Ei qi = 1. The vector q may be computed as
the limit of iterations xk (1 - m)Axk-1 + ins, where xo is any initial vector with
positivecomponentsand lIxoll 1.
The eigenvector x defined by (4) also has a probabilistic interpretation. Consider
a web-surfer on a web of n pages with no dangling nodes. The surfer begins at
some web page (it doesn't matter where) and randomly moves fromweb page to web
page according to the following procedure: If the surfer is currently at a page with
r outgoing links, he either randomly chooses any one of these links with uniform
probability 1rm or he jumps to any randomly selected page on the web, each with
probability n (note that r 1 -m+ nm= 1, so this accounts for everything he can do).
The surfer repeats this page-hopping procedure ad infinitum. The component xj of
the normalized vector x in (4) is the fraction of time that the surfer spends, in the
long run, on page j of the web. More important pages tend to be linked to by many
other pages and so the surfer hits those most often.
EXERCISE 14. For the web inExercise 11, compute the values of flMkxo-q|ll and
IkMx1oqlllI for k = 1,5, 10,50 using an initial guess xo not too close to the actual
eigenvector q (so that you can watch the convergence). Determine c = max,<?<? 1 -
2min1i<<n MijI and the absolute value of the second-largest eigenvalue ofM.
EXERCISE 15. To see why the second-largest eigenvalue plays a role in bounding
x
lMk-1?Xqqlll nconsider an n n positive column-stochastic matrix M that is diagonal
izable. Let xo be any vector with nonnegative components that sum to 1. Since M
I}
is diagonalizable, we can create a basis of eigenvectors {q,vi 4... . vf"r where q is
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions
EIGENVECTOR
THE $25,000,000,000 581
?2 2
A=[O O
Show thatM = (1 - m)A + mS (all Sij = 1/3) is not diagonalizable for 0 < m < 1.
EXERCISE 17. How shouldthevalue ofm be chosen?How does thischoiceaffect
therankingsand thecomputationtime?
REFERENCES
This content downloaded by the authorized user from 192.168.52.61 on Wed, 28 Nov 2012 15:52:29 PM
All use subject to JSTOR Terms and Conditions