Comparative Study of Page Rank and Weighted Page Rank Algorithm
Comparative Study of Page Rank and Weighted Page Rank Algorithm
Comparative Study of Page Rank and Weighted Page Rank Algorithm
www.ijircce.com
2929
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
The query engine is responsible for receiving and filling search requests from user. When a user fires a query, query engine
receives it and after matching the query keywords with the index, returns the URLs of the pages to the user.
In general Query Engine may return several hundreds or thousands of URL that match the keywords for a given query.
But often users look at top ten results that can be seen without scrolling. Users seldom look at results coming after first
search result page, which means that results which are not among top ten are nearly invisible for general user. Therefore to
provide better search result, page ranking mechanisms are used by most search engines for putting the important pages on
top leaving the less important pages in the bottom of result list. Some of the common page ranking algorithms are
PageRank Algorithm [2, 3], Weighted PageRank Algorithm [4] and Hyperlinked Induced Topic search Algorithm [5].
The aim of the paper is to analyse the two popular web page ranking algorithms- Weighted PageRank algorithm and
PageRank algorithm and to provide a comparative study of both and to highlight their relative strengths and limitations.
II. PAGE RANKING ALGORITHMS
To present the documents in an ordered manner, Page Ranking methods are applied, which can arrange the documents in
order of their relevance, importance and content score. Search engines use two different kinds of ranking factors: querydependent factors and query Independent Factors .Query-dependent are all ranking factors that are specific to a given query,
while query-independent factors are attached to the documents, regardless of a given query. Query-dependent factors used
by search engines are measures such as word documents frequency, the position of the query terms within the document or
the inverted document frequency, which are all measures that are used in traditional Information Retrieval. Some of the
query independent factors are Link popularity, Click popularity and uptodateness etc. Ranking algorithms based on link
popularity, falls under Link based ranking algorithm category.
III. LINK BASED PAGE RANKING ALGORITHMS
Link based Page ranking algorithms are based on link structure of the web document. They view the web as a directed
graph where the web pages form the nodes and the hyperlinks between the web pages form the directed edges between
these nodes [6]. Two important Link Based Page algorithms are given below :
Copyright to IJIRCCE
www.ijircce.com
2930
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
PageRank[2]
Weighted PageRank [5].
A. PageRank
Brin and Page [2] developed a ranking algorithm at Stanford University named PageRank after Larry Page. Page Rank
algorithm uses link structure to determine the importance of web page. This algorithm is based on random surfer model.
The random surfer model assumes that a user randomly keeps on clicking the links on a page and if she/he get bored of a
page then switches to another page randomly. Thus, a user under this model shows no bias towards any page or link.
PageRank(PR) is the probability of a page being visited by such user under this model.
Page Rank algorithm assumes that if a page has a link to another page then it votes for that page. Therefore, each inlink
to a page raises its importance. PageRank is a recursive algorithm in which the PageRank of a page depends upon the
PageRank of the pages linking to it. Thus, not only the number of inlinks of a page influences its ranking but also the page
ranks of the pages linking to it. A page confers importance to the pages it references to by evenly distributing its PageRank
value among all its outlinks.
The PageRank of page P is given as, follows:
Where
N0...Nn are the pages that point to page P.
O(Ni) is defined as the number of links going out of page P.
The parameter d is a Damping factor which can be set between 0 and 1.
Damping factor, d is the probability of users following the direct links and 1- d denotes the rank distribution from non
directly linked web pages. It is usually set to 0.85. So it is easy to infer that every page distributes 85% of its original
PageRank evenly among all pages to which it points. As is evident from the above equations, even if a page doesnt have
any inlinks it still has a minimum PR value of 1-d.
Following is a simplified example of the PR algorithm. Consider web graph shown in fig3.
S
Fig 3
Page, P being referenced by pages Q and R .Q, R has 2,3 outlinks. Then pageRank value of the page P is given as:
PR(P)=1-d + d(PR(Q)/2 + PR(R)/3)
Iterative Method of Page Rank
In iterative calculation, each page is assigned a starting page rank value of 1. These rank values are iteratively
Copyright to IJIRCCE
www.ijircce.com
2931
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
substituted in page rank equations to find the final values. In general, much iteration could be followed to normalize the
page ranks.
The PageRank algorithm can be iteratively applied as:
1) Initially let Page rank of all web pages is one
2) Calculate page ranks of all pages by using above formula.
3) Repeat step 2 until values of two consecutive iterations match.
1) Advantages:
Less Time consuming:- As pageRank is a query independent algorithm i.e. it precomputes the rank score
so it takes very less time .
Feasibility:-This algorithm is more feasible as it computes rank score at indexing time not at query time.
Importance: - It returns important pages as Rank is calculated on the basis of the popularity of a page.
Less susceptibility to localized links: - For calculating rank value of a page, it consider the entire web
graph, rather than a small subset, it is less susceptible to localized link spam.
2) Disadvantages:
The main disadvantage is that it favours older pages, because a new page, even a very good one, will not
have many links unless it is part of an existing web site.
Relevancy of the resultant pages to the user query is very less as it does not consider the content of web
page.
Dangling link: This occurs when a page contains a link such that the hypertext points to a page with no
outgoing links. Such a link is known as Dangling Link.
Rank Sinks: The Rank sinks problem occurs when in a network pages get in infinite link cycles.
Dead Ends: Dead Ends are simply pages with no outgoing links.
Spider Traps: Another problem in PageRank is Spider Traps. A group of pages is a spider trap if there are
no links from within the group to outside the group.
Circular References: If you have circle references in your website, then it will reduce your front pages
PageRank.
A. Weighted PageRank
Wenpu Xing and Ali Ghorbani [5] projected a Weighted PageRank (WPR) algorithm which is a modification to the
PageRank algorithm. This algorithm assigns a larger rank values to the more important pages rather than dividing the rank
value of a page evenly among its outgoing linked pages. Each outgoing link gets a value proportional to its importance. The
importance is assigned in terms of weight values to the incoming and outgoing links and are denoted as Win(u,v) and
Wout(u,v) respectively.
Win(u,v)is the weight of link (u, v) calculated based on the number of incoming links of page v and the number of incoming
links of all orientations pages of page u.
Where Iv and Ip are the number of incoming links of page v and page p respectively. R (u) denotes the allusion page list of
page u.
Copyright to IJIRCCE
www.ijircce.com
2932
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
Wout(u,v) is the weight of link (u,v) calculated based on the number of outgoing links of page v and the number of outgoing
links of all orientations pages of page u.
O
Fig 4.
1) Advantages:
Quality: Quality of the pages returned by this algorithm is high as compared to pageRank algorithm.
Efficiency: It is more efficient than pageRank because rank value of a page is divided among its outlink
pages according to importance of that page.
2) Disadvantages:
Less Relevant: As this algorithm considers only link structure not the content of the page, it returns less
relevant pages to the user query.
IV. EXPERIMENTS
The WPR and pageRank algorithms were implemented and their results are being compared. Fig 5 illustrates different
components involved in the implementation and evaluation of these algorithms. The simulation studies carried out in this
work consist of following activities:
1. Finding a web site: Finding a web site with rich hyperlinks is necessary because the standard PageRank and the
WPR algorithms rely on the web structure. Insert these pages into database.
2. Extracting URL : This module will extract links within the given page i.e outlinks of a web page. Then these
outlink details are also added in the database.
Copyright to IJIRCCE
www.ijircce.com
2933
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
Fig 5. Architectural components of the system used for implementation and evaluation
3.
4.
5.
Finding Inlinks: From these extracted outlinks, inlinks can be found and stored in database.
Applying algorithms: The Standard PageRank and the WPR algorithms are applied to the pages stored in the
database.
Evaluating the results: The algorithms are evaluated by comparing their results.
V. EVALUATION
A website that contains many web pages is considered. These web pages can be represented as a web graph as shown
below in fig 6.
In this graph, nodes(A,B,,J) are representing web pages pageA , pageB,., pageJ and links between the web pages are
represented by edges . Rank Score of web pages according to pageRank algorithm, rank_value and rank score of web
pages by weighted pageRank algorithm , wrank_value are shown below in the fig 7.
Copyright to IJIRCCE
www.ijircce.com
2934
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
Fig 7. rank and wrank values of web pages at damping factor (d) = 0.85
As both of these algorithms work iteratively. The rank_value and wrank_value of web pages in various iterations is shown
in below table I.
TABLE I
RANK AND WRANK VALUES OF PAGES IN VARIOUS ITERATIONS
1
pageA
pageB
pageC
PageD
PageE
pageF
pageG
pageH
pageI
pageJ
Rank
0.269
0.286
0.286
0.2605
0.1755
0.2435
0.303
0.1925
0.201
0.2605
Wrank
0.22069
0.24676
0.24211
0.17355
0.15102
0.17678
0.31745
0.15561
0.15684
0.23968
Rank
0.4475
0.49
0.4475
0.3625
0.2095
0.405
0.507
0.2605
0.269
0.3795
Iterations
2
Wrank
0.291
0.333
0.3312
0.1877
0.1522
0.2159
0.4443
0.1597
0.1660
0.2905
3
Rank
0.609
0.634
0.575
0.4305
0.2435
0.5325
0.66
0.303
0.345
0.49
Wrank
0.32765
0.36665
0.36518
0.19
0.15297
0.23737
0.47302
0.16053
0.1714
0.2958
Copyright to IJIRCCE
www.ijircce.com
2935
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
TABLE II
RANK AND WRANK VALUES OF PAGES AT VARIOUS DAMPING FACTORS
0.15
pageA
pageB
pageC
paged
Page
pageF
pageG
pageH
pageI
pageJ
Rank
0.871
0.874
0.874
0.8695
0.8545
0.8665
0.877
0.8575
0.859
0.8695
Wrank
0.8624
0.8670
0.8664
0.8541
0.8501
0.8547
0.8795
0.8509
0.8512
0.8658
Rank
0.57
0.58
0.58
0.565
0.515
0.555
0.59
0.525
0.53
0.565
0.85
Rank
0.269
0.286
0.286
0.2605
0.1755
0.2435
0.303
0.1925
0.201
0.2605
Wrank
0.22069
0.24676
0.24211
0.17355
0.15102
0.17678
0.31745
0.15561
0.15684
0.23968
Criteria
Basic criteria
Technique used
Description
Complexity
Relevancy
Quality of result
Query
Dependency
Advantage
Disadvantage
pageRank
Weighted pageRank
<O(log(N))
Less as ranking is based on calculating weight at indexing
time.
Higher than PR
Query independent
Copyright to IJIRCCE
www.ijircce.com
2936
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
N. Duhan, A. K. Sharma and Bhatia K. K., Page Ranking Algorithms: A Survey, Proceedings of the IEEE International Conference on Advance
Computing, 2009, 978-1-4244-1888-6.
2.
S. Brin, and Page L., The Anatomy of a Large Scale Hypertextual Web Search Engine, Computer Network and ISDN Systems, Vol. 30, Issue 1-7,
pp. 107-117, 1998.
3.
Larry Page, and Sergey Brin, Rajeev Motwani, Terry Winograd, The PageRank Citation Ranking: Bring Order to the , Technical report in Stanford
U, 1998.
4.
Etizioni O.,The World Wide Web: Quagmire or Gold Mine, Communications of the ACM, 39(11), pp. 65-68(1996).
5.
Wenpu Xing and Ghorbani Ali, Weighted PageRank Algorithm, Proceedings of the Second Annual Conference on Communication Networks and
Services Research (CNSR 04), IEEE, 2004.
6.
G.Kumar; N. Duhan; A.K. Sharma, Page Ranking Based on Number of Visits of Links of Web Page , International Conference on Computer &
Communication Technology (ICCCT), 2011.
7.
R. Kosala.; H.Blockeel ,Web Mining Research: A survey, In ACM SIGKDD Explorations, 2(1), PP. 115.
8.
Mercy Paul Selvan ,A .Chandra Sekar and A.Priya Dharshin , Survey on Web Page Ranking Algorithms ,International [Journal of Computer
Applications (0975 8887) Volume 41 No.19, March 2012.
9.
J. Kleinberg, Authoritative Sources in a Hyper-Linked Environment , Journal of the ACM 46(5), pp. 604-632,1999.
Copyright to IJIRCCE
www.ijircce.com
2937