Survey Paper On Link Mining

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

ISSN 2394-3777 (Print)

ISSN 2394-3785 (Online)


Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 5, Issue 03, March 2018

Survey Paper on Link Mining


C.Uma1 S.Krithika2 N.Rajasekaran3
1
Assistant Professor, Department of CT & IT, umachinnsamy@gmail.com
2
Assistant Professor, Department of Computer Science (P.G.), krithitup86@gmail.com
3
Assistant Professor, Department of Computer Applications, rajasekarandpm@gmail.com
Kongu Arts and Science College (Autonomous), Erode-638 107.Tamil Nadu.

Abstract: Link mining refers to data mining techniques that explicitly consider these links when building predictive or
descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection,
collective classification, link prediction and sub graph discovery. While network analysis has been studied in depth in
particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a cross-
fertilization of ideas among these different communities. This is an ex-citing, rapidly expanding area. Links among the
objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture
with traditional statistical models.
Keywords: object ranking, group detection, link prediction, graph discovery

I. INTRODUCTION Link Mining


Link mining is a newly emerging research area that is
at the intersection of the work in link analysis [1; 2], Link Based Link based
hypertext and web mining [3], relational learning and Object Object Ranking
inductive logic programming [4], and graph mining [5]. We Classification
use the term link mining to put a special emphasis on the
links moving them up to rest-class citizens in the data Iterative Page Rank, HITS
analysis endeavor. In recent years, there have been several Classification and SimRank
workshop series devoted to topics related to link mining. and Relaxation
Link mining encompasses a range of tasks including Labeling
descriptive and predictive modeling. Both classification and
clustering in linked relational domains require new data [Fig 1.1 ] Link based mining activities
mining algorithms. But with the introduction of links, new
tasks also come to light. Examples include predicting the 2.1 Link-Based Object Classification (LOC)
numbers of links, predicting the type of link between two Link based Object Classification is a technique used to
objects, inferring the existence of a link, inferring the assign class labels to nodes according to their link
identity of an object, finding co-references, and discovering characteristics. One simplified example is to classify nodes
subgraph patterns. as strongly connected and weakly connected depending
II. LINK MINING TASKS solely on their degree.
The domain of link analysis encompasses several A slightly more complex process would be find the
distinct tasks. These are essentially determined by the average distance of each node to all other nodes, and classify
different possible outcome of analyzing link data. Link them according to that quality. The distance of one node to
analysis tasks can usually be grouped into a small set of another is number of edges that needed to be traverses along
overall categories. the shortest path between them. Assuming that all nodes are
connected to each other, this average distance would be
indicator of how central a node is within a network. Thus,
nodes can be classified as belonging to the core of a network
or not, based on a suitable threshold.
LOC can also incorporate information about a node’s
properties for classification. For instance, if task is to create

All Rights Reserved © 2018 IJARTET 6


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 5, Issue 03, March 2018

compatible teams from a pool of personnel, and have generic The line numbers at the end of each step correspond to
preference data from everyone, then build up a graph, where the line numbers of that step in Algorithm 3.
each node represents a person and each edge represents a 1. Accept raw data representation of a collaboration or co-
common preference between two persons. After that authorship network, in the form of edge list and a year
manually assign different group labels to a select set of attribute for each edge at the least.
individuals and then assign groups to everyone else based on 2. Spilt this data into training and test sets. For maximum
the number of edges share with people who have already accuracy, the prediction process should depend only on
have been labeled. A few iteration of this process should attributes intrinsic to the network. Hence the newer vertices
result in an amicable classification of team members. Such in the test graph that are not in the training graph are pruned.
classification efforts that create groups of nodes are Algorithm: Graph Data Processing
sometimes referred to as Group Detection tasks. 1. Input: D- Duration of test data
2.2 Link Based Object Ranking (LOR) IG- Input graph
Link Based Object Ranking ranks objects in a graph Output: GT training – The training graph
based on several factors affecting their importance in the GT1test – The test graph
graph structure, whereas LOC assigns labels specifically G`Ttest- The pruned test graph.
belonging to a closed set of finite values to an object [6]. /*Let yearstart denote begin year of data
The purpose of LOR is not to assign distinctive labels to the /* Let yearend denote end of data
nodes usually, all nodes in such networks are understood to /* Let pruned denote vertices to be pruned from the test
be of the same type the goal is to associative a relative data
quantitative assessment with each node. /* Let V(G) denote vertices of graph G
LOR can sometimes be more fine-grained version of 2. Extract the yearstart and yearend from the year
LOC. If desire to mark each node with the precise number attribute of the edges.
representing its degree of connectivity, then it can be one 3. GT1test = IG[ yearend –D+1: yearend]
form of ranking nodes. Ranking nodes are usually much 4. GTtraining = IG- GT1test
more complex than that, and take into account a large part of 5. pruned = V(GT1test) – V(GTtraining)
the graph when coming up with a figure for each node. 6. G`Ttest =V(GTtest) – prund
One of the most well- known ranking tasks is ranking 7. return GTtraining, GT1test, G`Ttest
web pages according to their relevance to a search query.
Research and practical use have shown that the relevance of (b) Computing Most Portable Links:
a search result not only depends upon the content of the After having processed the graph data, the steps
document but also on how it is linked to other similar involved in computing probable links are quite
documents. There are algorithms that try to identify research straightforward.
papers that have the most comprehensive knowledge about a 1. Compute the score of all possible edges using the chosen
given topic by analyzing how many other relevant papers proximity measure.
have cited them. Some social network games include a 2. Select the proximity values above the threshold and return
notion of popularity that is defined by how well connected the edges associated with these values as a graph.
each person is with others and what this person’s respective Algorithm: Compute Most Portable Links
popularity figure is. 1. Input: G2 – Input Graph
2.3 Link Prediction T1 – Threshold for prediction
Being able to see the future is usually a nice capability, M1- Proximity measure to be used in link
although it is quite hard. Predicting how things may turn out, prediction
within some proven bounds of approximation, is not bad Output: G1priticited – A graph containing predicted
either. Prediction has always been a basic for development scores.
of a many artificial intelligence techniques. /* Let predicted denote a matrix of proximity values
Not that while LOC and LOR are analysis of links to for each pair of vertices
talk the nodes in a network, Link prediction actually deals /* Let Output denotes a matrix of Boolean values
with links themselves. /* compute the proximity values by applying the
2.3.1 Link Prediction Algorithm measure on G2
(a) Graph Data Processing: 2. Predicted:= M1(G2)

All Rights Reserved © 2018 IJARTET 7


ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 5, Issue 03, March 2018

3. Output: = (Predicted >= T1) REFERENCES


4. Generate graph Gpredicted from adjacency matrix
represented by output. [1]. D. Jensen and H. Goldberg. AAAI Fall Symposium on AI and Link
Analysis. AAAI Press, 1998.
5. Return G1predicted
2.4 Graph Classification [2]. R. Feldman. Link analysis: Current state of the art, 2002.
Unlike link-based object classification, which attempts [3]. S. Chakrabarti. Mining the Web. Morgan Kaufman, 2002.
to label nodes in a graph, graph classification is a supervised
[4]. S. Dzeroski and N. Lavrac, editors. Relational Data Mining. Kluwer,
learning problem in which the goal is to categorize an entire Berlin, 2001
graph as a positive or negative instance of a concept. This is
one of the earliest tasks addressed within the context of [5]. D. J. Cook and L. B. Holder. Graph-based data mining. IEEE
Intelligent Systems, 15(2):32{41, 2000
applying machine learning and data mining techniques to
graph data. Graph classification does not typically require [6]. L.Getoor. Link Mining: A new data mining challenge. ACM SIGKDD
Explorations Newsletter, 5:84-89, 2003.
collective inference, as is needed for classifying objects and
edges, because the graphs are generally assumed to be [7]. R. D. King, S. H. Muggleton, A. Srinivasan, and M. J. E. Sternberg.
independently generated. Structure-activity relationships derived by machine learning: The use
of atoms and their bond connectivities to predict mutagenicity by
Three main approaches to graph classification have inductive logic programming. National Academy of Sciences,
been explored. These are based on feature mining on graphs, 93(1):438{442, January 1996.
inductive logic programming (ILP), and defining graph
kernels. Feature mining on graphs is usually performed by Ms. C.Uma, received M.Sc degree from
finding all frequent or informative substructures in the graph Anna University, Chennai and M.Phil
instances. These substructures are used for transforming the degree from Bharathiar University,
graph data into data represented as a single table, and then Coimbatore, TN, India. At currently
traditional classifiers are used for classifying the instances. working as an Assistant Professor in
As an example of an ILP approach, King et al. [7] rst map Department of Computer Technology and
the graph data describing mutagenesis into a relational Information Technology, Kongu Arts and Science College
representation. Their logical representation uses relations (Autonomous), Erode. I have 9 years of teaching and 7 years
such as vertex (graphId, VertexId, VertexLabel, of research experience. I had published more number of
VertexAttributes) and edge (graphId, vertexId1, vertexId2, papers in various international and national journals. My
BondLabel), and then uses an ILP system to and a research interest in the area of data mining.
hypothesis in this space. Finding all frequent substructures is Ms. S. Krithika, received M.C.A and
usually computationally prohibitive. An alternative approach M.Phil degree from Bharathiar University,
makes use of kernel methods. Coimbatore, TN, India. At currently
III. CONCLUSION working as an Assistant Professor in
More and more domains of interest today are best Department of Computer Science (PG),
described as a linked collection or network of interrelated Kongu Arts and Science College
heterogeneous objects. Data mining algorithms have (Autonomous), Erode. I have 9 years of
typically addressed the discovery of patterns in collections teaching and 7 years of research experience. I had published
of IID in-stances. Link mining is an emerging area within papers in national and international conferences. My area of
data mining that is focused on finding patterns in data by interest is Data Mining.
exploiting and explicitly modeling the links among the data Mr. N.Rajasekaran, received M.C.A
instances. We have surveyed several of the better studied and M.Phil degree from Bharathiar
link mining tasks: link-based object ranking, link-based University, Coimbatore, TN, India. At
object classification, group detection, entity resolution, link currently working as an Assistant
prediction, subgraph discovery, graph classification, and Professor in Department of Computer
generative models for graphs. Applications, Kongu Arts and Science
College (Autonomous), Erode. I have 8
years of teaching and 7 years of research experience. I had
published papers in international jourals. My area of interest
is software testing.

All Rights Reserved © 2018 IJARTET 8

You might also like