0% found this document useful (0 votes)
3 views69 pages

Unit-3

The document discusses various algorithms and techniques for Association Rule Mining, including Apriori, Eclat, and FP-growth, which are used to discover relationships between variables in large datasets. It also covers Sequential Pattern Mining, Bayesian Classifiers, Neural Networks, and Unsupervised Learning methods like k-means and hierarchical clustering, highlighting their advantages and limitations. Additionally, it introduces semi-supervised learning approaches that utilize both labeled and unlabeled data to improve classification performance.

Uploaded by

shobaraniaids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views69 pages

Unit-3

The document discusses various algorithms and techniques for Association Rule Mining, including Apriori, Eclat, and FP-growth, which are used to discover relationships between variables in large datasets. It also covers Sequential Pattern Mining, Bayesian Classifiers, Neural Networks, and Unsupervised Learning methods like k-means and hierarchical clustering, highlighting their advantages and limitations. Additionally, it introduces semi-supervised learning approaches that utilize both labeled and unlabeled data to improve classification performance.

Uploaded by

shobaraniaids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Unit-3

• Association Rule Mining- Algorithms and Techniques

•Definition:
•Association Rule Mining aims to discover interesting relations
between variables in large datasets.
•Example:
•"People who buy bread also tend to buy butter."
•Key Terms:
•Item: Individual product or service.
•Itemset: Collection of one or more items.
Basic Algorithms for Association
Rule Mining
• Apriori
• It applies a two-stage approach to discover frequent itemsets and
confident association rules.
• Frequent Itemset Discovery. To find all frequent itemsets, Apriori
introduces a candidate generation and test strategy. The basic idea is
that it first generates the candidate k-itemsets (i.e., k is 1 at the
beginning and is incrementd by 1 in the next cycle), then these
candidates will be evaluated whether frequent or not.
Association Rule Mining. Given all frequent itemsets, finding all frequent and
confident association rules are straightforward. The approach is very similar to the
frequent itemset mining algorithm. Because the cost on finding frequent itemsets is
high and accounts for most of the whole performance on discovering associate rules,
almost all research so far has been focused on the frequent itemset generation stage.

Eclat
There are many algorithms had been proposed based on Apriroi idea, in which Eclat
[270, 268] is distinct that it is the first one which proposed to generate all frequent
itemsets in a depth-first manner, while employs the vertical database layout and uses
the intersection based approach to compute the support of an itemset.
Eclat - Frequent Itemset Mining
Input: A transaction database D, a user specified threshold min sup, a set of
atoms of
FP-growth -The algorithm applies the divide and conquer approach

•FP-Tree: A compact representation of transaction databases.


•Purpose: To reduce the number of database scans in frequent pattern mining.
•Structure: A tree where nodes represent items and edges represent itemset
relationships.
FP-Tree Construction - Step-by-Step
1.Step 1: Create the root node labeled “null.”
2.Step 2: Scan the database to find frequent items.
3.Step 3: Order items by descending support (frequency).
4.Step 4: Reorder items in each transaction.
5.Step 5: Insert reordered items into the tree, updating counters for shared
nodes.
6.Step 6: Build the header table for tracking item occurrences and their
supports.
Sequential Pattern Mining
Sequential pattern mining identifies patterns of events that occur in
sequence over time.
It extends association rule mining by discovering inter-event patterns
(sequences) instead of just intra-event patterns (itemsets).
This approach is useful for various fields, such as retail (e.g., customer
purchase behavior) and biology (e.g., DNA or protein sequence analysis).
Sequential mining has applications in tasks like correlations, causality, and
periodic patterns.
Key methodologies include general sequential pattern mining, constraint-
based, incremental, and approximate mining.
The most foundational techniques rely on the Apriori heuristic and
pattern-growth strategies, which support more advanced algorithms.
Existing Sequential Pattern Mining Algorithms

Sequential pattern mining algorithms can be grouped into two categories:

Apriori-like Algorithms:

Apriori-All, GSP, SPADE, SPAM

Pattern Growth Algorithms:

PrefixSpan
AprioriALL
Sequential pattern mining was first introduced by Agrawal in [11] where three Apriori based
algorithms were proposed. Given the transaction database with three attributes customer-id,
transaction-time and purchased-items, the mining process were decomposed into five
phases:
Sort Phase: The original transaction database is sorted based on the customer and transaction
time.
L-itemsets Phase: The sorted database is first scanned to obtain those frequent (or large) 1-
itemsets based on the user specified support threshold. Suppose the minimal support is 70%. In
this case the minimal support count is 2, and the result of large 1-itemsets is listed.
Transformation Phase: All the large itemsets are mapped into a series of integers and the
original database is converted by replacing the itemsets. For example, with the help of the
mapping table.
Sequence Phase: Mine the transformed database and find all frequent sequential patterns.
Maximal Phase: Prune those patterns which are contained in other sequential patterns. In
other words, only maximum sequential patterns are remained.
Since most of the phases are straightforward, researches focused on the sequence phase.
AprioriAll [11] was first proposed based on the Apriori algorithm in association rule mining
[9]. There are two steps to mine sequential patterns, i.e., candidate generation and test
GSP

Srikant and Agrawal generalized the definition of sequential pattern mining


problem by incorporating some new properties, i.e., time constraints,
transaction relaxation, and taxonomy.
For the time constraints, the maximum gap and the minimal gap are defined
to specified the gap between any two adjacent transactions in the sequence.
When testing a candidate, if any gap of the candidate falls out of the range
between the maximum gap and the minimal gap, then the candidate is not a
pattern.

Furthermore, the authors relaxed the definition of transaction by using a


sliding window, that when the time range between two items is smaller than
the sliding window, these two items are considered to be in the same
transaction. The taxonomy is used to generate multiple level sequential
patterns.
GSP to efficiently find the generalized sequential patterns. Similar to the AprioriAll algorithm, there are
two steps in GSP, i.e., candidate generation and test.

•Candidate k-sequences are generated from frequent (k-1)-sequences.


•Subsequence Definition: Contiguous subsequence c of sequence s if:
• Derived by dropping an item from the start or end.
• Derived by dropping an item from an element with at least 2 items.
• c is contiguous in another contiguous subsequence.
Candidate Generation Steps
1.Join Phase:
1. Generate candidate k-sequences by joining two (k-1)-sequences with the same contiguous
subsequence.
2. Items can be added as part of an element or as a separate element.
3. Example: Join sequences d(bc)a and d(bc)d to form:
1. d(bc)(ad), d(bc)ad, d(bc)da.
2.Prune Phase:
1. Remove candidates with contiguous subsequences having support counts below minimum support.
2. Utilize hash-tree structure to reduce candidate numbers.
Support Counting Challenge
•Issue with GSP: Counting support of candidate
sequences is complex due to generalization rules.
•Comparison with AprioriAll: GSP requires an
efficient counting strategy.
Support Counting Strategy
1.Forward Phase:
1. Search for successive elements in a
sequence d.
2. Continue until the gap exceeds maximum
allowed.
3. If an element is missing, s is not in d.
2.Backward Phase:
1. Attempt to pull back previous elements if
the gap exceeds.
2. Continue until gap conditions are satisfied
or reach the first element.
3. Switch back to forward phase.
Incorporating Taxonomies
•Extend sequences with corresponding taxonomies.
•Original sequences replaced by extended versions.
•Challenge: Increased number of rules leads to redundancy.
•Solution:
• Precompute ancestors for each item.
• Remove candidates that contain both an item and its ancestors.

•The GSP algorithm effectively generates candidate sequences through


structured phases.
•Challenges in support counting are addressed with a forward-backward
strategy.
•Taxonomies enhance sequences but require careful rule management.
Bayesian Classifiers
The key idea of Bayesian learning is that the learning process applies Bayes rule to update (or
train) the prior distribution of the parameters in the model and computes the posterior
distribution for prediction. The Bayes rule can be represented as:

where Pr(s) is the priori probability of the sample s, Pr(D) is the priori probability of the
training data D, Pr(s|D) is the probability of s given D, and Pr(D|s) is the probability of D
given s.
The sample generation process can be modeled as: (1) each class c has an associated prior
probability Pr(c), with ∑c Pr(c) = 1. The sample s first randomly chooses a class label with
the corresponding probability; and (2) based on the class-conditional distribution Pr(s|c)
and the chosen class label c, we can obtain the overall probability to generate the sample,
i.e., Pr(c)Pr(s|c). The posterior probability of s generated from class c is thus can be
deduced by using Bayes’s rule
where γ crosses all the classes. The probability Pr(s|c) can be determined by
two kinds of parameters: (1) the priori domain knowledge unrelated to the
training data; and (2) the parameters which belong to the training samples.
For simplicity, the overall parameters are denoted as Θ and thus, Equation
3.3 can be extended as

The advantages of Bayesian classifiers are: (1) it can outperform other classifiers when the
size of training data is small; and (2) it is a model based on probabilistic theory, which is
robust to noise in real data. The limitation of Bayesian classifiers is that they are typically
limited to learning classes that occupy contiguous regions of the instance space.
Naive Bayes Learners
Naive Bayesian classifier is one basic type of the Bayesian learners with an assumption that
all the attributes are conditionally independent given the value of the class C. By
independence we mean probabilistic independence, that is, A is independent of B given C
whenever Pr(A|B,C) = Pr(A|C) for all possible values of A, B and C, whenever Pr(C)>0. The
pseudo code of Naive Bayesian classifier is shown in Algorithm
Neural Networks Classifier

Neural network (NN) classifier, or artificial neural network (ANN) classifier, is one of the classic
methods used for supervising learning . A Neural network can be commonly represented as a
format of graph, which consists of densely interconnected elements. The elements are called
neurons

The input is a number of values X1,X2,...,Xn at the left side and


the output is Y at the right side, both of which are in the
continuous space (i.e., commonly is between 0 and 1). The
neuron in the middle of the network first counts the weighted
sum of the inputs, then adjusts by subtracting some threshold θ,
finally transfers the result to a non-linear function f (e.g.,
sigmoid) to compute and output. In summary, the process can
be modeled as follows.
The distribution of the sigmoid function is shown in Figure

In a multi-layer perceptron topology of NN classifier,


neurons are arranged in distinct layers as illustrated in
Figure. Output of each layer is connected to input of
nodes in the next layer, i.e., inputs of the first layer
(input layer) are the inputs to the network, while the
outputs of the last layer form the output of the network.
Training Neural Network

When training a neural network, the inputs are [3]: (a) a series of examples of the items
to be classified; and (2) their proper classifications or targets. Such input can be viewed
as a vector: < X1,X2,...,Xn,θ,t >, where t is the target or true classification. The neural
network uses these to modify its weights, and it aims to match its classifications with the
targets in the training set. Weight modification follows a learning rule. The pseudo code
of training neural network is shown in Algorithm
The advantage of Neural nets is that it performs very well on difficult, non-linear
domains, where it becomes more and more difficult to use Decision tree, or Rule
induction systems, which cut the space of examples parallel to attribute axes. One
of disadvantages in using Neural nets for data mining is a slow learning process,
compared top for example Decision trees. This difference can very easily be several
orders of magnitude.

Another disadvantage is that neural networks do not give explicit knowledge


representation in the form of rules, or some other easily interpretable form. The
model is implicit, hidden in the network structure and optimized weights, between
the nodes.

In addition to the above commonly used supervised learning methods, there are
many others developed, such as Discriminative Classification, Maximum Entropy
Learners, SVM, and so forth. Refer [33] for a more comprehensive survey
Unsupervised Learning

In this section, we will introduce major techniques of unsupervised learning (or clustering).
Among a large amount of approaches that have been proposed, there are three representative
unsupervised learning strategies, i.e., k-means, hierarchical clustering and density based
clustering.

The k-Means Algorithm


k-means clustering is a top-down algorithm that classifies the objects into k number of groups
with regard to attributes or features, where k is a positive integer number and specified apriori
by users. The grouping is done by minimizing the sum of squares of distances between object
and the corresponding cluster centroid.
The key idea of k-means is simple and is as follows: In the beginning the number of clusters k is
determined. Then the algorithm assumes the centroids or centers of these k clusters. These
centroids can be randomly selected or designed deliberately.
There may be existing one special
case that the number of objects is less
than the number of clusters. If so,
each object is treated as the centroid
of a cluster and allocated a cluster
number.

If the number of data is greater than


the number of clusters, the algorithm
computes the distance (i.e., Euclidean
distance) between each object and all
centroids to get the minimum
distance
Hierarchical Clustering

Hierarchical clustering constructs a hierarchy of clusters that can be illustrated in a


tree structure which is also known as a dendrogram. Each node of the
dendrogram, including the root, represents a cluster and the parent-child
relationship among them enables us to explore different levels of clustering
granularity.

There are mainly two types of algorithms for hierarchical clustering: one is in an
agglomerative bottom-up manner that the algorithm starts with all the objects
and successively combines them into clusters; the other is in a divisive top-down
manner that the algorithm starts with one whole cluster which includes all objects
and recursively splits the clusters into smaller ones. The second type can be seen
as a reverse procedure of the first one and hence, we mainly introduce the former
in this section
Hierarchical Agglomerative Clustering

Hierarchical Agglomerative Clustering is typically visualized as a dendrogram as shown in


Figure.

which is an example on hierarchical clustering 10


objects. Each horizontal line 54 3 Algorithms and
Techniques in the dendrogram indicates a merging of
two sub-clusters. The value of the horizontal line (i.e.,
on y-coordinate) is the similarity between the two sub-
clusters merged. The bottom layer of the dendrogram
indicates that each object is viewed as a singleton
cluster. By moving up from the bottom layer to the top
one, the dendrogram allows us to reconstruct the
intermediate merges that resulted in the resultant
hierarchy of clusters.
The advantages of the hierarchical
clustering are [29]: (1) enable flexibility
with regard to the level of granularity;
(2) ease of dealing with any form of
similarity metric; and (3) applicability
to any attribute type. The
disadvantages of the hierarchical
clustering are summarized as: (1)
vagueness at judging when to
terminate; and (2) the fact that most
hierarchical algorithms do not revisit
intermediate clusters with the purpose
of their improvement.
Density based Clustering
K-means and hierarchical clustering algorithms generally perform well on grouping large data,
yet they are awkward on discovering clusters of arbitrary shapes and moreover, their
performance is sensitive to the outliers in the data. To address these issues, density based
clustering strategies have been proposed, in which DBSCAN is a representative one.

The DBSCAN algorithm was first introduced by Ester et al. . The authors grouped the objects by
recognizing the density of them. Given a pre-defined density threshold E ps, those areas with a
higher density compared with E ps are considered as qualified clusters and the others are
treated as outliers or noise. With the help of the inherent characteristic of cluster density,
DBSCAN can discover any kind of arbitrary shape of clusters.
Semi-supervised Learning
semi-supervised learning (or semi-supervised classification), which aims to address the
problem by using large amount of unlabeled data, together with the labeled data, to build
better classifiers. There are many approaches proposed for semi-supervised classification, in
which the representatives are self-training, co-training, generative models, graph-based
methods. We will introduce them in the next few sections.

Self-Training

The basic idea of self-training is that: a classifier c is first trained based on the labeled data L
(which has small size). Then we use the classifier c to classify the unlabeled data U. Those
confident (judged by a threshold) unlabeled data U with the corresponding label class, are
extracted from U and inserted into L. The two dataset, labeled and unlabeled, are updated and
the procedure is repeated. The pseudo code of self-training is shown in Algorithm
The advantages of the self-training approach are:
(1) the method is intuitive and simple can be
wrapped to other more complex classifiers; and (2) it
can be applied to many real applications, i.e., word
sense disambiguation.

The disadvantage of the selftraining is that the early


error could be accumulated and enlarged in the later
procedures.
Co-Training
Blum et al first proposed the co-training algorithm based on the idea of self-
training by employing two classifiers. In this the features of data can be partitioned
into two independent subsets (w.r.t class) and each subset can be used to train a
good classifier. The process of the algorithm thus can be fulfilled as a dual training
of the two classifiers to each other.

The advantages of the co-training approach are: (1) the method is simple and
based on self-training, and it can be wrapped to other more complex classifiers;
and (2) it is not so sensitive to the early error compared with self-training.

The disadvantage of the co-training is that the assumption seems to be too


strong that the features of a dataset can not always be partitioned independently
into two subsets
The pseudo code of self-training is shown in Algorithm is
Generative Models
The key idea of generative model based on semi-supervised learning approach is
that, we construct an appropriate model which can best “match” the data and as a
result, the classification can be easily done. The essential part for constructing the
model is to find the proper values of the model parameters.

There is an assumption that both the labeled and the unlabeled datasets have the
same kind of model with similar parameters. One common strategy used is to
employ EM-like (i.e., expectation maximization ) algorithm to estimate the most
like values of the parameters.

Generally, the model to be chosen is application oriented and can be different


according to different users, e.g., mixture of Gaussian, naive Bayes, and so forth.
Algorithm shows an example pseudo code of combining naive Bayes model and EM
procedure .
The advantages of the generative model based approach are:

(1) the method is clear and embedded in a probabilistic framework; and

(2) is very effective if the model is correct to match the data.

The main drawbacks of the co-training are:

(3) the appropriate model is difficult to choose and a wrong model may
weaken the effectiveness of the approach;

(4) EM is prone to local maxima. If global maximum is very different from


local maximum, then the effectiveness of the method is weakened.
Graph based Methods

Definition: Semi-supervised learning utilizes both labeled and unlabeled data,


with graphs used to represent similarities between data points.
Nodes: Represent labeled and unlabeled examples.
Edges: Indicate the similarity between nodes (data points).
Blum et al.:
• Introduced first graph-based semi-supervised learning approach.
• Key Concept: Minimum cut strategy to separate labeled data.
• Extension: Majority voting applied to minimum cut for label assignment.
Zhu et al. (2003):Generalized assumption: prediction functions on unlabeled
data can be continuous.
Label Propagation: Developed using the harmonic property of the lowest
energy function in graphs with Gaussian random fields.
Constructing Effective Graphs
•Challenges:
• Building graphs that best represent data is critical.
•Methods:
• Expert Knowledge: Used for graph construction in video surveillance.
• Neighbor Graphs: Based on multiple minimum spanning trees with perturbation.
• Local Fit (LLE-like Operations): Weights of locally linear embeddings are used as graph weights.
Method 1: Expert Knowledge
•Key Idea: Domain knowledge helps construct graphs, especially in specific applications like video surveillance.
•Example: Without expert knowledge, graph construction may fail.

Method 2: Neighbor Graphs


•Graph Construction:
• Combining minimum spanning trees through edge removal.
• Challenge: Choosing the correct bandwidth for the Gaussian function.
Method 3: Local Fit
•LLE-like operations: Local fit strategy uses non-negative weights for graphs.
•Preprocessing Step: Hein and Maier emphasized the removal of noisy data before graph construction for better
accuracy.
Markov Models

Markov model is one of the well-known approaches because of its broad applications, such
as speech recognition, financial forecasting, gene prediction, cryptanalysis, natural language
processing, data compression, and so forth.

The common point of these applications is that their goal is to predict or recover a data
sequence that is not immediately observable. For a concrete example, i.e., stock price
prediction, people always want to “guess” the trend of some stock, i.e., whether its price will
go up in the next day.

In fact, Markov model may be the simplest and most effective mathematical model for
analyzing such kind of time-series processes, while the class of Markov model makes it rich
enough to tackle all of these applications.
Regular Markov Models

Predicting stock price trends based on historical data.


•Naive Approach: Predict based on frequency of instances (up or down), ignoring time
correlation.
•Limitation: This method does not consider relationships between consecutive events.

Markov Model Overview

•Key Concept: The future is dependent only on the most recent observation.
•Example: If today's price goes up, it’s more likely tomorrow’s price will also go up.
•Markov Assumption: Future prediction depends only on the previous event.

First-Order Markov Chain

•Definition: The nth event depends only on the (n−1)th event.


•Mathematical Representation: p(xn∣x1,...,xn−1)=p(xn∣xn−1)p(x_n|x_1, ..., x_{n-1}) = p(x_n|
x_{n-1})p(xn​∣x1​,...,xn−1​)=p(xn​∣xn−1​)
Higher-Order Markov Chains
•Second-Order Markov Chain: The nth event depends on the two most recent
events.

•Markov Chain: The nth event depends on the last M events.


• Trade-off: More neighboring events contribute to better accuracy, but
computational cost increases
.
Trade-offs of Higher-Order Markov Chains
•Increased Parameters: For discrete events with S states:
• Number of parameters grows as SM−1(S−1)S^{M-1}(S-1)SM−1(S−1).
• As M increases, computation becomes exponentially expensive.
Markov Chains with Continuous Variables
•For continuous event spaces:
• Linear-Gaussian conditional distributions are often used.
• Neural Networks: Can be applied for parametric models in continuous
spaces.
Conclusion and Next Steps
•Summary:
• Markov models provide a way to predict future events based on recent
observations.
• Higher-order models offer flexibility but at a higher computational cost.
•Next Section: Introduction to Hidden Markov Models (HMMs), which account for
latent variables in discrete states.
Hidden Markov Models
A hidden Markov model (HMM) is a statistical model in which the system being modeled is
assumed to be a Markov process with unobserved state. In a regular Markov model , the
state can be visible to the observer directly and thus, the only parameter in the model is the
state transition probability. However, in a hidden Markov model, the state can not be
directly visible (i.e., hidden), but the event (or observation, output) which is dependent on
the state, is visible. In other words, the state behaves like a latent variable, where the latent
variable is discrete. The hidden Markov model can be illustrated in a graph, as shown in
Figure
The intuitive idea of HMM is that, because every state has a probability
distribution over the possible output, the output sequence generated by an
HMM provides some related information about the state sequence. Hidden
Markov models are especially known for their application in temporal pattern
recognition such as speech, handwriting, natural language processing , musical
score following, partial discharges and bioinformatics.
Suppose that the hidden state behaves like a discrete multinomial variable yn
and it controls how to generate the corresponding event xn. The probability
distribution of yn is dependent on the previous latent variable yn−1 and thus,
we have a conditional distribution p(yn|yn−1). We assume that the latent
variables having S states so that the conditional distribution corresponds to a set
of numbers which named transition probabilities. All of these numbers together
are denoted by T and can be represented as Tjk ≡ p(ynk = 1|yn−1, j = 1), where
we have 0 ≤ Tjk ≤ 1 with ∑k Tjk = 1. Therefore, the matrix T has S(S−1)
Introduction to Hidden Markov Models
•Definition: HMM is a statistical model where the system is a Markov process with unobserved
(hidden) states.
•Key Difference from Regular Markov Models: In HMM, the states are hidden, but the observations
dependent on the states are visible.
How HMM Works
•Latent Variables: The state is modeled as a discrete, hidden variable (denoted as yn​).
•Observable Events: The output or observation xn_ is visible and depends on the hidden state yn​.
•Graphical Representation: States influence observations, and current states depend on previous
ones.
Applications of HMM
•Temporal Pattern Recognition:
• Speech Recognition
• Handwriting Recognition
• Natural Language Processing
• Bioinformatics
•Other Examples:
• Musical score following
• Partial discharges
Transition Probabilities in HMM

Emission Probabilities in HMM

Homogeneous HMM
Variants of HMM

•Left-to-Right HMM: Often used for modeling time-sequential data, e.g.,


speech.
•Generalized HMM: Extended to handle more complex relationships between
states.
Pair HMM: Often used in bioinformatics.

Parameter Estimation in HMM


•Expectation Maximization (EM) Algorithm:
• Widely used to estimate HMM parameters.
•Forward-Backward Algorithm:
• An efficient way to compute the likelihood of the observed data.
•Sum-Product Approach: Alternative method to improve computational
efficiency.
K-Nearest-Neighboring

K-Nearest-Neighbor (kNN) approach is the most often used recommendation scoring algorithm in
many recommender systems, which is to compare the current user activity with the historic
records of other users for finding the top k users who share the most similar behaviors to the
current one.

In conventional recommender systems, finding k nearest neighbors is usually accomplished by


measuring the similarity in rating of items or visiting on web pages between current user and
others.

The found neighboring users are then used to produce a prediction of items that are potentially
rated or visited but not done yet by the current active user via collaborative filtering approaches.

Therefore, the core component of the kNN algorithm is the similarity function that is used to
measure the similarity or correlation between users in terms of attribute vectors, in which each
user activity is characterized as a sequence of attributes associated with corresponding weights.
Content-based Recommendation
In a content-based recommendation, a user is associated with the attributes of the items that
rated, and a user profile is learned from the attributes of the items to model the interest of the
user.
The recommendation score is computed by measuring the similarity of the attributes the user
rated with those of not being rated, to determine which attributes might be potentially rated by
the same user. As a result of attribute similarity comparison, this method is actually a
conventional information processing approach in the case of recommendation.

The learned user profile reflects the long-time preference of a user within a period, and could be
updated as more different rated attributes representing users interest are observed. Content-
based recommendation is helpful for predicting individuals preference since it is on a basis of
referring to the individuals historic rating data rather than taking others preference into
consideration.
Collaborative Filtering Recommendation
Collaborative filtering recommendation is probably the most commonly and
widely used technique that has been well developed for recommender systems.

As the name indicated, collaborative recommender systems work in a


collaborative referring way that is to aggregate ratings or preference on items,
discover user profiles/patterns via learning from users historic rating records, and
generate new recommendation on a basis of inter-pattern comparison.

A typical user profile in recommender system is expressed as a vector of the


ratings on different items. The rating values could be either binary (like/dislike) or
analogous-valued indicating the degree of preference, which is dependent on the
application scenarios. In the context of collaborative filtering recommendation,
there are two major kinds of collaborative filtering algorithms mentioned in
literature, namely memory-based and model-based collaborative filtering
algorithm
Memory-based collaborative recommendation
Memory-based algorithms use the total ratings of users in the training database while computing
recommendation. These systems can also be classified into two sub-categories: user-based and
item-based algorithms. These systems can also be classified into two sub-categories: user-based
and item-based algorithms.
For example, user-based kNN algorithm, which is based on calculating the similarity between two
users, is to discover a set of users who have similar taste to the target user, i.e. neighboring users,
using kNN algorithm. After k nearest neighboring users are found, this system uses a
collaborative filtering algorithm to produce a prediction of top-N recommendations that the user
may be interested in later. Given a target user u, the prediction on item i is then calculated by:

Here Rj,i denotes the rating on item i voted by user j, and only k most similar users (i.e. k nearest
neighbors of user i) are considered in making recommendation
In contrast to user-based kNN, item-based kNN algorithm is a different
collaborative filtering algorithm, which is based on computing the similarity
between two columns, i.e. two items. In item-based kNN systems, mutual item-
item similarity relation table is constructed first on a basis of comparing the item
vectors, in which each item is modeled as a set of ratings by all users.

To produce the prediction on an item i for user u, it computes the ratio of the sum
of the ratings given by the user on the items that are similar to i with respect to
the sum of involved item similarities as follows:

Here Ru, j denotes the prediction of rating given by user u on item j, and only the k
most similar items (k nearest neighbors of item i) are used to generate the
prediction.
Model-based Recommendation
A model-based collaborative filtering algorithm is to derive a model from the historic rating
data, and in turn, uses it for making recommendation. To derive the hidden model, a variety of
statistical machine learning algorithms can be employed on the training database, such as
Bayesian networks, neural networks, clustering and latent semantic analysis and so on.

For example, in a model-based recommender system, named Profile Aggregations based on


Clustering Transaction (PACT), clustering algorithm was employed to generate aggregations of
user sessions, which are viewed as profiles via grouping users with similar access taste into
various clusters.

The centroids of the user session clusters can be considered as access patterns/models learned
from web usage data and used to make recommendation via referring to the web objects
visited by other users who share the most similar access task to the current target user.
Although there exist of different recommendation algorithms in recommender systems,
however, it is easily found that these algorithms are both executing in a collaborative manner,
and the recommendation score is dependent on the significant weight.
Social Network Analysis
A social network is a social structure consisting of individuals (or organizations) called nodes and
ties that connecting nodes by different types of interdependency, such as friendship, affiliation
relationship, communication mode or relationship of religions, interest or prestige.

Social network analysis aims to reveal or visualize the relationships resided in the network via
network theory. Basically the graph theory is a common principle that used to conduct the
analysis.

Since the social network compounds the complicated interaction behaviors between various
individuals from the different aspects, the analysis is not a easy job with only the traditional
means, and the resulting graph-based structure is often very complex.

Social network analysis is studied in social science areas such as psychology, sociology, behavior
science and organization science discipline in several decades ago.
Nowadays the emergence of web-based communities and hosted services such as social
networking sites, Wikis and folksonomies, brings in tremendous freedom of Web autonomy and
facilitate collaboration and knowledge sharing between users.
Along with the interaction between users and computers, social media are rapidly becoming an
important part of our digital experience, ranging from digital textual information to diverse
multimedia forms.

These aspects and characteristics constitute of the core of second generation of Web. The close
involvement of computing intelligence science has injected the social network analysis research a
lot of new methodologies and technologies, resulting in the social network analysis became a
very active and hot topic in the web research.

Detecting Community Structure in Networks


One of the main tasks in social network analysis is to capture the community structure of
networks. Community structure has been recognized as an important statistical feature of
networked systems over the past decade. Generally, a community within a network is a subgraph
whose nodes are densely connected within itself but less connected with the rest of the
network. From this perspective, therefore, , finding such group of nodes in a network could be
treated as a clustering problem, i.e. satisfying the criterion that the nodes within each cluster
have dense connections each other but less linkings with others in other clusters. As discussed in
section
Basically there are two categories of clustering algorithms, namely agglomerative clustering (or
hierarchical clustering), which starts from the individual isolated nodes and aggregate them
which are close or similar enough to each other, producing aggregated groups, and divisive
clustering, which assigns each node into separate clusters continuously by following some
optimization rules, such as k-means or k-nodes.

Modularity Function Q for Community Structure Detection

Newman and Girvan has developed a new approach of community structure detection. They
introduced a procedure of two stages. First it runs the iterative removal of edges from the
network to split it into communities, the removed edges are identified by using any one of a
number of possible “betweenness” measures. Second, these measures are, iteratively updated
after each removal.
The partition operation ceases until the optimization condition is reached. In order to choose an
appropriate community number to ensure the strength of community structure, a modularity
function Q for monitoring community structure is proposed to provide a effective means
quantitatively measuring the ‘goodness’ of the clusters discovered.
The modularity of a particular division of a network is computed based on the differences
between the actual number of edges within a community in the division and the expected
number of such edges if they are partitioned randomly. Hence, finding the underlying
community structure hidden in a network is equivalent to a process of determining the
optimal modularity value over the all possible divisions of the network.
Updated Modularity Function using Fuzzy C-means Clustering
Zhang et al. proposed an updated version of modularity function Q. The main idea in the
work is to replace the “hard division” with the “soft division” by using fuzzy c-means
clustering. Given k communities existing in the network, we define a corresponding n × k
soft division matrix Uk = [u1,···,uk] with 0 ≤ uic ≤ 1 for each c = 1,···,k and ∑k c=1 uic = 1 for
each i = 1,···,n.

With the introduction of fuzzy assignment, we can further define the membership of each
community as Vc = {i|uic > λ,i ∈V }, where λ is a threshold which is used to determine a
soft division to a final clustering The new modularity function Q is reformulated as
With the updated modularity function Q, Zhang et al. proposed a new algorithm for
detecting the community structure by combining the spectral and fuzzy c-mean
clustering. The detailed algorithm description is stated as follows.
• Spectral mapping: (i) Compute the diagonal matrix D = (dii), where dii = ∑k aik (ii) Form
the eigenvector matrix Ek = [e1, e2,···,ek] by computing the top K eigenvectors of the
generalized eigensystem Ax = tDx.

• Fuzzy c-means: for each value of k, 2 ≤ k ≤ K: (i) Form the matrix Ek = [e2, e3,···,ek] from
the matrix Ek. (ii) Normalize the rows of Ek to unit length using Euclidean distance norm.
(iii) Cluster the row vectors of Ek using fuzzy c-means or any other fuzzy clustering
method to obtain a soft assignment matrix Uk.

• Maximizing the modular function: Pick the k and the corresponding fuzzy partition Uk
that maximizes Q(Uk).
The main purpose of the updated modularity function is to detect the overlapping of
the community structure in a network. In Fig.3.20, an example is presented to illustrate
the detection of community structure overlapping along with calculated modularity
function Q values by using k-means and fuzzy spectral clustering.
The Evolution of Social Networks
Another important aspect of social networks is the dynamic and evolutionary changes of the
network structure over time. Since the social network is an explicit demonstration of
relationships or interactions of individuals within a network environment, the networked
communities may evolve over time, due to changes to individual roles, social status even the
personal preferences in the network. Hence the analysis of these communities (memberships,
structure and temporal dynamics) is emerging as an interesting and important research issue
within the social network analysis research areas.

There are a variety of approaches and algorithms proposed to address the dynamic evolution of
social networks. These algorithms are usually categorized into two main families. The first one is
using an additional temporal dimension along with the traditional networked graphs.

In these approaches, the ordinary two-way graph model is expanded into a three-way tensor
model by adding a temporal dimension. The networks at different timestamps are modeled as an
individual networked graph, which is expressed as a matrix. As such a set of networked graphs at
a sequence of time stamps consists of a three-way data cube, in which each slice of the data
cube is corresponding to a networked graph at a specific time stamp.
For a given tensor expression modeling the sequence of network structures, tensX, the
process is to find a best rank k approximation tensor X , which satisfies the criterion of
minimization of X−X F, where X ∈ RN1×···×Nm and X ∈ RR1×···×Rm . The best
approximation tensor, X, is decomposed into the product of a core tensor Y of two
projection matrices of row and colume avriables U1 and U2, as shown in Figure
The second category of temporal analysis of dynamic community structures is the
simultaneous discovery of community structure and dynamic evolution. Apart from the classic
community detection and dynamic evolution analysis algorithms, such kind of approaches
take both two requests together into account when conducting social network analysis.

The hypnosis in these approaches is that the evolution of community structures in a network
from one stage to its consecutive stage will not change dramatically, instead, follows the
routine that the incurring cost will be minimized. This assumption is quite reasonable in real
world that the evolution of real observations is gradually and properly in a longitudinal
manner.

In their approach, they first introduced a cost function to measure the quality of community
structure at a certain time t and assumed the forming of stable community structure is really
dependent on the minimization of the cost function. In particular, the cost function consists of
two aspects - a snapshot cost and a temporal cost:
The snapshot cost is then determined by the distribution difference between the real
similarity matrix of nodes within the network and the calculated similarity matrix of the
formed community structures.

That is, the snapshot cost is the KL-divergence between the above mentioned similarity
matrix. On the other hand, the temporal cost indicates the distribution difference between
the similarity matrices at two consecutive time stamps using the same formula.

Combining these two types of cost gives the whole cost measure, which is used to guide the
community structure detection.

Hence the analysis of community structures and their evolutions is converted to an


optimization problem of finding appropriate communities that minimizes the whole cost.

To solve the optimization problem, an iterative algorithm is devised to alternatively update


the requested community structures until an optimization solution is reached.

You might also like