ML Unit 6

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 83

Course Name: 6IT4-02:Machine Learning

Unit-VI: Recommended system

 Collaborative filtering
 Content-based filtering
 Artificial neural network
 Perceptron
 Multilayer network
 Backpropagation
 Introduction to Deep learning
Recommender System
These days whether you look at a video on YouTube, a
movie on Netflix or a product on Amazon, you're going to
get recommendations for more things to view, like or buy.
You can thank the advent of machine learning algorithms 
and recommender systems for this development. 
Recommender System
 Recommender systems are one of the most successful
and widespread application of machine learning
technologies in business.
 Recommender systems are an important class of machine
learning algorithms that offer "relevant" suggestions to
users. Categorized as either collaborative filtering or a
content-based system.
Recommender System
 A recommender system is a subclass of information
filtering that seeks to predict the "rating" or "preference"
a user will give an item, such as a product, movie, song,
etc.
 Recommender systems provide personalized information
by learning the user’s interests through traces of
interaction with that user. 
 Much like machine learning algorithms, a recommender
system makes a prediction based on a user's past
behaviors. Specifically, it’s designed to predict user
preference for a set of items based on experience.
Recommender System
Mathematically, a recommendation task is set to be:
Set of users (U)

Set of items (I) that are to be recommended to U

Learn a function based on the user's past interaction data

that predicts the likeliness of item I to U


Examples
Some key examples of recommender systems
at work include:
Product recommendations on Amazon and

other shopping sites


Movie and TV show recommendations on

Netflix
Article recommendations on news sites
Recommendation Engine
 Till recently, people generally tended to buy products recommended to them by
their friends or the people they trust. This used to be the primary method of
purchase when there was any doubt about the product.
 But with the advent of the digital age, that circle has expanded to include online
sites that utilize some sort of recommendation engine.
 A recommendation engine filters the data using different algorithms
and recommends the most relevant items to users. It first captures the
past behavior of a customer and based on that, recommends products
which the users might be likely to buy.
Recommendation Engine
Q. If a completely new user visits an e-commerce site, that site will not
have any past history of that user. So how does the site go about
recommending products to the user in such a scenario?

Solution:

One possible solution could be to recommend the best selling products, i.e.
the products which are high in demand.

Another possible solution could be to recommend the products which


would bring the maximum profit to the business.
Recommendation Engine
If few items can be recommended to a customer based on their needs and
interests, it will create a positive impact on the user experience and lead to
frequent visits.

Hence, businesses nowadays are building smart and intelligent


recommendation engines by studying the past behavior of their users.
Types
There are basically three important types of
recommendation engines:
Collaborative filtering

Content-Based Filtering

Hybrid Recommendation Systems


Types
Types
Content Based Filtering
 Content-based recommendation systems take into
account the data provided by the user both directly and
indirectly. For example, age can be used to determine
classes of products or items reviewed and bought by the
user.
 This type of recommendation system relies on
characteristics of the object.
Content Based Filtering
 New content can be quickly recommended to the user.

E.g. if the user has a history of watching all action movies, a newly
released action movie is recommended by this system.
However, this system does not take into account behavior/data about
other users in the system hence, if a particular action movie fetches
very low rating / negative recommendations by other users, It will still
be recommended to the user.
Content Based Filtering
Techniques used in content-based filtering are:
 TF-IDF ( Term Frequency — Inverse Document Frequency)
 Cosine Similarity
Content Based Filtering

TF-IDF
This technique is used in information retrieval and text mining.
TF-IDF, as the name suggests has two terms.
TF calculates normalized frequency at which a given term appears in
the document.
IDF calculates importance of a term in general. Eg. terms
‘recommendation’, ‘system’, ‘movie’ conveys more information about the
document than terms ‘the’, ‘and’, ‘are’ etc.
Content Based Filtering

TF-IDF

TF: It measures the frequency of a term in the document.


Since the size of a document may vary, It will be futile to
use simple count. Hence this count is normalized.
TF(w) = (Number of times term w appears in a document) /
(Total number of terms in the document).
Content Based Filtering
TF-IDF

IDF: It measures the overall importance of a given term. Since


commonly used terms like ‘is’, ‘the’, ‘are’ doesn’t usually provide
information about the document, IDF for these terms is low.

IDF is calculated as :
IDF(t) = log_e(Total number of documents / Number of
documents with term t in it).
Content Based Filtering
TF-IDF

In content-based filtering technique, TF-IDF can be useful in


determining products which are similar to a given product.
Since this technique is keyword based, it is more useful in
the area with high textual data. Eg. book recommendation.
Content Based Filtering
Cosine-Similarity

As the name promises, this method calculates cosine value


in vector calculation. This method provides the estimation of
similarity between two objects as a measure of the angle
between two vectors.
Cosine similarity can be calculated by obtaining a dot
product between two vectors.
Content Based Filtering
This algorithm recommends products which are similar to
the ones that a user has liked in the past.
Content Based Filtering
Cosine-Similarity
For example : Netflix save all the information related to each user in a
vector form. This vector contains the past behavior of the user, i.e. the
movies liked/disliked by the user and the ratings given by them. This
vector is known as the profile vector.

All the information related to movies is stored in another vector called


the item vector. Item vector contains the details of each movie, like
category, cast, director, etc.
Content Based Filtering
Cosine-Similarity
The content-based filtering algorithm finds the cosine of the

angle between the profile vector and item vector,


i.e. cosine similarity. Suppose A is the profile vector and B
is the item vector, then the similarity between them can be
calculated as:

Content Based Filtering
Cosine-Similarity
Based on the cosine value, which ranges between -1 to 1,
the movies are arranged in descending order and one of the
two below approaches is used for recommendations:
Top-n approach: where the top n movies are

recommended (Here n can be decided by the business)


Rating scale approach: Where a threshold is set and all

the movies above that threshold are recommended


Content Based Filtering
Other methods that can be used to calculate the similarity
are:
Euclidean Distance: Similar items will lie in close

proximity to each other if plotted in n-dimensional space. So,


we can calculate the distance between items and based on
that distance, recommend items to the user. The formula for
the euclidean distance is given by:
 Pearson’s Correlation: It tells us how
much two items are correlated. Higher the
correlation, more will be the similarity.
Pearson’s correlation can be calculated using
the following formula:
Content Based Filtering
A major drawback of this algorithm is that it is limited to
recommending items that are of the same type. It will never
recommend products which the user has not bought or liked
in the past.
So if a user has watched or liked only action movies in the
past, the system will recommend only action movies. It’s a
very narrow way of building an engine.
Collaborative Filtering
The collaborative filtering algorithm uses “User Behavior” for
recommending items. This is one of the most commonly
used algorithms in the industry as it is not dependent on any
additional information.  

Let us understand this with an example.


If person A likes 3 movies, say M1, M2 and M3, and person
B likes M2, M3 and M4, then they have almost similar
interests. We can say with some certainty that A should like
The M4 and B should like M1
Collaborative Filtering
User-User collaborative filtering
 This algorithm first finds the similarity score between users.
 Based on this similarity score, it then picks out the most
similar users and recommends products which these similar
users have liked or bought previously.
Collaborative Filtering
 In terms of our movies example from earlier, this
algorithm finds the similarity between each user
based on the ratings they have previously given to
different movies.
 The prediction of an item for a user u is calculated by
computing the weighted sum of the user ratings
given by other users to an item i.
Collaborative Filtering
The prediction Pu,i is given by:
Collaborative Filtering
Here,
Pu,i is the prediction of an item

Rv,i is the rating given by a user v to a

movie i
Su,v is the similarity between users
Collaborative Filtering
Now, we have the ratings for users in profile vector and based on
that we have to predict the ratings for other users. Following
steps are followed to do so:

1.For predictions we need the similarity between the user u and


v. We can make use of Pearson correlation.

2.First we find the items rated by both the users and based on
the ratings, correlation between the users is calculated.
Collaborative Filtering

3.The predictions can be calculated using the similarity


values. This algorithm, first of all calculates the similarity
between each user and then based on each similarity
calculates the predictions. Users having higher
correlation will tend to be similar.
Collaborative Filtering
Based on these prediction values, recommendations are made. Let us
understand it with an example:

Consider the user-movie rating matrix:


User/Movie M1 M2 M3 M4 M5 Mean
Rating
A 4 1 - 4 - 3
B - 4 - 2 3 3
C - 1 - 4 4 3
Collaborative Filtering
let’s find the similarity between users (A, C) and (B, C) in
the above table. Common movies rated by A/[ and C are
movies M2 and M4 and by B and C are movies M2, M4
and M5.
Collaborative Filtering
 The correlation between user A and C is more than the
correlation between B and C. Hence users A and C
have more similarity and the movies liked by user A will
be recommended to user C and vice versa.
Collaborative Filtering
Drawback:
This algorithm is quite time consuming as it involves
calculating the similarity for each user and then
calculating prediction for each similarity score.
One way of handling this problem is to select only a few
users (neighbors)
Collaborative Filtering
There are various ways to select the neighbors:

1.Select a threshold similarity and choose all the users


above that value.

2.Randomly select the users.

3.Arrange the neighbors in descending order of their


similarity value and choose top-N users.

4.Use clustering for choosing neighbors.


Collaborative Filtering
This algorithm is useful

1.When the number of users is less.

2.Its not effective when there are a large number of users as it will
take a lot of time to compute the similarity between all user pairs.

This leads us to item-item collaborative filtering, which is effective


when the number of users is more than the items being
recommended.
Collaborative Filtering
item-Item collaborative filtering
 In this algorithm, we compute the similarity between
each pair of items.
This algorithm works similar to user-user collaborative

filtering with just a little change –


 instead of taking the weighted sum of ratings of “user-

neighbors”, we take the weighted sum of ratings of “item-


neighbors”. The prediction is given by:
Collaborative Filtering
item-Item collaborative filtering
Collaborative Filtering
item-Item collaborative filtering
Now, as we have the similarity between each movie and the ratings,
predictions are made and based on those predictions, similar movies
are recommended. Let us understand it with an example.

User/Movie M1 M2 M3 M4 M5

A 4 1 2 4 4
B 2 4 4 2 1
C - 1 - 3 4
Mean Rating 3 2 3 3 3
Collaborative Filtering
item-Item collaborative filtering

Here the mean item rating is the average of all the


ratings given to a particular item (compare it with the
table we saw in user-user filtering). Instead of finding the
user-user similarity as we saw earlier, we find the item-
item similarity.
Collaborative Filtering
item-Item collaborative filtering

To do this, first we need to find such users who have rated those
items and based on the ratings, similarity between the items is
calculated.
Let us find the similarity between movies (M1, M4) and (M1, M5).
Common users who have rated movies M1 and M4 are A and B
while the users who have rated movies M1 and M5 are also A and
B.
Collaborative Filtering
item-Item collaborative filtering
Collaborative Filtering
item-Item collaborative filtering

The similarity between movie M1 and M4 is more than the


similarity between movie M1 and M5.
So based on these similarity values, if any user searches for
movie M1, they will be recommended movie M4 and vice
versa.
Q. What will happen if a new user or a new item is added in
the dataset?
It is called a Cold Start. There can be two types of cold
start:
Collaborative Filtering
item-Item collaborative filtering

1.Visitor Cold Start :

means that a new user is introduced in the dataset. Since


there is no history of that user, the system does not know
the preferences of that user. It becomes harder to
recommend products to that user.
Collaborative Filtering
item-Item collaborative filtering

1.Visitor Cold Start :

So, how can we solve this problem?

One basic approach could be to apply a popularity based


strategy, i.e. recommend the most popular products.

Once we know the preferences of the user, recommending


products will be easier.
Collaborative Filtering
item-Item collaborative filtering

2. Product Cold Start

means that a new product is launched in the market or


added to the system.

User action is most important to determine the value of


any product. More the interaction a product receives, it will
be easier to recommend that product to the right user.
Artificial Neural Network
 Neural networks involve long training times and are therefore
more suitable for applications where this is feasible.
 They require a number of parameters that are typically best
determined empirically such as the network topology or
“structure.”
 Neural networks have been criticized for their poor
interpretability. For example, it is difficult for humans to
interpret the symbolic meaning behind the learned weights and
of “hidden units” in the network. These features initially made
neural networks less desirable for data mining.
Artificial Neural Network
 “What is backpropagation?” Backpropagation is a neural network
learning algorithm.
 The neural networks field was originally kindled by psychologists and
neurobiologists who sought to develop and test computational analogs
of neurons.
 Roughly speaking, a neural network is a set of connected input/output
units in which each connection has a weight associated with it. During
the learning phase, the network learns by adjusting the weights so as to
be able to predict the correct class label of the input tuples.
 Neural network learning is also referred to as connectionist learning
due to the connections between units.
Artificial Neural Network
 Advantages of neural networks,
include their high tolerance of noisy data as well as their ability to classify
patterns on which they have not been trained.
They can be used when you may have little knowledge of the relationships
between attributes and classes.
They are well suited for continuous-valued inputs and outputs, unlike most
decision tree algorithms. They have been successful on a wide array of
real-world data, including handwritten character recognition, pathology and
laboratory medicine, and training a computer to pronounce English text.
Neural network algorithms are inherently parallel; parallelization techniques
can be used to speed up the computation process. In addition, several
techniques have been recently developed for rule extraction from trained
neural networks. These factors contribute to the usefulness of neural
networks for classification and numeric prediction in data mining.
Artificial Neural Network
 There are many different kinds of neural networks and neural network
algorithms. The most popular neural network algorithm is
backpropagation, which gained repute in the 1980s.
Multilayer Feedforward Neural Network
 The backpropagation algorithm performs learning on a multilayer feed-
forward neural network. It iteratively learns a set of weights for
prediction of the class label of tuples. A multilayer feed-forward neural
network consists of an input layer, one or more hidden layers, and an
output layer.
Multilayer Feedforward Neural Network
Multilayer Feedforward Neural Network
 Each layer is made up of units. The inputs to the network correspond to
the attributes measured for each training tuple.
 The inputs are fed simultaneously into the units making up the input
layer.
 These inputs pass through the input layer and are then weighted and fed
simultaneously to a second layer of “neuronlike” units, known as a hidden
layer.
 The outputs of the hidden layer units can be input to another hidden
layer, and so on.
 The number of hidden layers is arbitrary, although in practice, usually
only one is used.
 The weighted outputs of the last hidden layer are input to units making
up the output layer, which emits the network’s prediction for given tuples
Multilayer Feedforward Neural Network
 The units in the input layer are called input units. The units in the
hidden layers and output layer are sometimes referred to as neurodes,
due to their symbolic biological basis, or as output units.
 The multilayer neural network shown in Figure 9.2 has two layers of
output units. Therefore, we say that it is a two-layer neural network.
(The input layer is not counted because it serves only to pass the input
values to the next layer.)
 Similarly, a network containing two hidden layers is called a three-layer
neural network, and so on. It is a feed-forward network since none of
the weights cycles back to an input unit or to a previous layer’s output
unit. It is fully connected in that each unit provides input to each unit
in the next forward layer.
Multilayer Feedforward Neural Network
 Each output unit takes, as input, a weighted sum of the outputs from
units in the previous layer .
 It applies a nonlinear (activation) function to the weighted input.
Multilayer feed-forward neural networks are able to model the class
prediction as a nonlinear combination of the inputs.
 From a statistical point of view, they perform nonlinear regression.
Multilayer feed-forward networks, given enough hidden units and
enough training samples, can closely approximate any function.
Backpropagation Algorithm
Backpropagation Algorithm
Backpropagation Algorithm
 its output, Oj , is equal to its input value, Ij . Next, the net input and
output of each unit in the hidden and output layers are computed.
 The net input to a unit in the hidden or output layers is computed as a
linear combination of its inputs.
 To compute the net input to the unit, each input connected to the unit
is multiplied by its corresponding weight, and this is summed. Given a
unit, j in a hidden or output layer, the net input, Ij , to unit j is
Backpropagation Algorithm

where wij is the weight of the connection from unit i in


the previous layer to unit j; Oi is the output of unit i
from the previous layer; and θj is the bias of the unit.
The bias acts as a threshold in that it serves to vary
the activity of the unit.
Backpropagation Algorithm

 Given the net input Ij to unit j, then Oj ,


the output of unit j, is computed a
Backpropagation Algorithm

This function is also referred to as a squashing function,


because it maps a large input domain onto the smaller
range of 0 to 1. The logistic function is nonlinear and
differentiable, allowing the backpropagation algorithm to
model classification problems that are linearly inseparable
Backpropagation Algorithm

We compute the output values, Oj , for each hidden


layer, up to and including the output layer, which gives
the network’s prediction. I
Backpropagation Algorithm

Backpropagate the error: The error is propagated


backward by updating the weights and biases to reflect
the error of the network’s prediction. For a unit j in the
output layer, the error Errj is computed by
where Oj is the actual output of unit j, and Tj is the
known target value of the given training tuple. Note that
Oj(1 − Oj) is the derivative of the logistic function. To
compute the error of a hidden layer unit j, the weighted
sum of the errors of the units connected to unit j in the
next layer are considered. The error of a hidden layer unit
j is
where wjk is the weight of the connection from unit j to a
unit k in the next higher layer, and Errk is the error of unit
k. The weights and biases are updated to reflect the
propagated errors. Weights are updated by the following
equations, where 1wij is the change in weight wij:
 Terminating condition: Training stops when All 1wij in the previous epoch are so
small as to be below some specified threshold, or The percentage of tuples
misclassified in the previous epoch is below some threshold, or A prespecified
number of epochs has expired. In practice, several hundreds of thousands of
epochs may be required before the weights will converge.
Backpropagation Algorithm
Backpropagation Algorithm
 “How efficient is backpropagation?” The computational efficiency depends on
the time spent training the network. Given |D| tuples and w weights, each
epoch requires O(|D| × w) time. However, in the worst-case scenario, the
number of epochs can be exponential in n, the number of inputs. In practice,
the time required for the networks to converge is highly variable. A number of
techniques exist that help speed up the training time. For example, a technique
known as simulated annealing can be used, which also ensures convergence to
a global optimum
Example
Example
Example
Example
Example
Example
“How can we classify an unknown tuple using a trained network?”
To classify an unknown tuple, X, the tuple is input to the trained network, and the

net input and output of each unit are computed. (There is no need for computation
and/or backpropagation of the error.)
If there is one output node per class, then the output node with the highest value

determines the predicted class label for X.


If there is only one output node, then output values greater than or equal to 0.5

may be considered as belonging to the positive class, while values less than 0.5 may
be considered negative.
Several variations and alternatives to the backpropagation algorithm have been

proposed for classification in neural networks. These may involve the dynamic
adjustment of the network topology and of the learning rate or other parameters, or
the use of different error functions.
What is Deep Learning (DL)

• A machine learning subfield of learning


representations of data. Exceptional effective at
learning patterns.
• Deep learning algorithms attempt to learn (multiple
levels of) representation by using a hierarchy of
multiple layers
• If you provide the system tons of information, it
begins to understand it and respond in useful ways.
What is Deep Learning (DL)
Why is DL useful?

o Manually designed features are often over-specified,


incomplete and take a long time to design and
validate
o Learned Features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?)
universal, learnable framework for representing world,
visual and linguistic information.
o Can learn both unsupervised and supervised
o Effective end-to-end joint system learning

You might also like