Survey Paper On Link Mining
Survey Paper On Link Mining
Survey Paper On Link Mining
Abstract: Link mining refers to data mining techniques that explicitly consider these links when building predictive or
descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection,
collective classification, link prediction and sub graph discovery. While network analysis has been studied in depth in
particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a cross-
fertilization of ideas among these different communities. This is an ex-citing, rapidly expanding area. Links among the
objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture
with traditional statistical models.
Keywords: object ranking, group detection, link prediction, graph discovery
compatible teams from a pool of personnel, and have generic The line numbers at the end of each step correspond to
preference data from everyone, then build up a graph, where the line numbers of that step in Algorithm 3.
each node represents a person and each edge represents a 1. Accept raw data representation of a collaboration or co-
common preference between two persons. After that authorship network, in the form of edge list and a year
manually assign different group labels to a select set of attribute for each edge at the least.
individuals and then assign groups to everyone else based on 2. Spilt this data into training and test sets. For maximum
the number of edges share with people who have already accuracy, the prediction process should depend only on
have been labeled. A few iteration of this process should attributes intrinsic to the network. Hence the newer vertices
result in an amicable classification of team members. Such in the test graph that are not in the training graph are pruned.
classification efforts that create groups of nodes are Algorithm: Graph Data Processing
sometimes referred to as Group Detection tasks. 1. Input: D- Duration of test data
2.2 Link Based Object Ranking (LOR) IG- Input graph
Link Based Object Ranking ranks objects in a graph Output: GT training – The training graph
based on several factors affecting their importance in the GT1test – The test graph
graph structure, whereas LOC assigns labels specifically G`Ttest- The pruned test graph.
belonging to a closed set of finite values to an object [6]. /*Let yearstart denote begin year of data
The purpose of LOR is not to assign distinctive labels to the /* Let yearend denote end of data
nodes usually, all nodes in such networks are understood to /* Let pruned denote vertices to be pruned from the test
be of the same type the goal is to associative a relative data
quantitative assessment with each node. /* Let V(G) denote vertices of graph G
LOR can sometimes be more fine-grained version of 2. Extract the yearstart and yearend from the year
LOC. If desire to mark each node with the precise number attribute of the edges.
representing its degree of connectivity, then it can be one 3. GT1test = IG[ yearend –D+1: yearend]
form of ranking nodes. Ranking nodes are usually much 4. GTtraining = IG- GT1test
more complex than that, and take into account a large part of 5. pruned = V(GT1test) – V(GTtraining)
the graph when coming up with a figure for each node. 6. G`Ttest =V(GTtest) – prund
One of the most well- known ranking tasks is ranking 7. return GTtraining, GT1test, G`Ttest
web pages according to their relevance to a search query.
Research and practical use have shown that the relevance of (b) Computing Most Portable Links:
a search result not only depends upon the content of the After having processed the graph data, the steps
document but also on how it is linked to other similar involved in computing probable links are quite
documents. There are algorithms that try to identify research straightforward.
papers that have the most comprehensive knowledge about a 1. Compute the score of all possible edges using the chosen
given topic by analyzing how many other relevant papers proximity measure.
have cited them. Some social network games include a 2. Select the proximity values above the threshold and return
notion of popularity that is defined by how well connected the edges associated with these values as a graph.
each person is with others and what this person’s respective Algorithm: Compute Most Portable Links
popularity figure is. 1. Input: G2 – Input Graph
2.3 Link Prediction T1 – Threshold for prediction
Being able to see the future is usually a nice capability, M1- Proximity measure to be used in link
although it is quite hard. Predicting how things may turn out, prediction
within some proven bounds of approximation, is not bad Output: G1priticited – A graph containing predicted
either. Prediction has always been a basic for development scores.
of a many artificial intelligence techniques. /* Let predicted denote a matrix of proximity values
Not that while LOC and LOR are analysis of links to for each pair of vertices
talk the nodes in a network, Link prediction actually deals /* Let Output denotes a matrix of Boolean values
with links themselves. /* compute the proximity values by applying the
2.3.1 Link Prediction Algorithm measure on G2
(a) Graph Data Processing: 2. Predicted:= M1(G2)