AI ML Nov 15

Artificial Intelligence & Machine Learning
Certificate Course
RRIT
Shridhar Venkat
About the Trainer
• Name: Shridhar Venkatanarasimhan
• Department: CSE RRIT 3rd Floor
• Phone Number: 9663352923
• Email: svnara12@yahoo.com
• If you meet me, I can help you install and run Python and AI/ML
Examples. I can give you Sample Programs.
Topics for Today
• Mutliple Regression
• Polynomial Regression
• Artificial NeuraL Networks
• Recommender Systems
• Sentiment Analysis
• Exam
Prediction Algorithms
• Simple Linear Regression
• Multiple (Linear) Regression
• Polynomial Regression
• RIDGE
• LASSO
Validating the Model
• Once a model is used, we have to check if it is
appropriate based on some measures including
measures specific to the model.
Simple Linear Regression
• Try to fit a straight line using independent variable x
and dependent variable y
•yi = β0 + β1xi + εi
Multiple (Linear) Regression
• Try to fit a hyperplane using n independent variables
x and dependent variable y
•yi = β0 + β1x1i + β2x2i + … + βnxni + εi

Precaution with Multiple Regression
• If X1 and X2 are two features used, X1 and X2 should
not be highly correlated.
• Similarly. For One Hot Encoding/Dummy Variables,
drop first should be used.
Polynomial Regression
• Try to fit a polynomial using independent variable x
and dependent variable y
•yi = β0 + β1xi + β2xi2 + …+ βnxin + εi

Deep Learning/ Neural Networks
• Artificial Neural Network (ANN)
• Convolutional Neural Network (CNN)
• Recurrent Neural Network (RNN)
Structure of a
Nerve Cell
(Neuron)
A Single Perceptron
x1
x2 y
xn
Single Perceptron Formula
• z = w0 + Σw x
i i
• y = f(z)
• Where f is an activation function, usually ReLU or sigmoid or

softmax or tanh.
Neural Network
• A neural network consists of many layers of perceptrons.
• The first layer takes the inputs.
• The last layer yields the outputs.
• In between, there are 0 or more hidden layers.
What is Training a Neural Network?
• Training a neural network means determining all the
weights wi in the network including the biases b = w0.
• The weights are such that for the training tuples, the best
fit is achieved.
Typical Cost Function to Minimize
• C = (1/(2N)) Σ (y-y’)2
• For full batch gradient descent sum is over all tuples.

• For stochastic gradient Descent each y is taken separately.
• For mini-batch gradient descent, the sum is over a given
batch size.
Neural Network Steps
• Initialize the weights to random values close to 0.
• Forward propagate and compute outputs.
• Compute the loss function C.
• Find gradient of C with respect to w values, this is done one layer at
a time, starting from the last layer towards the first. Hence, this is
backpropagation.
• Apply the learning rate and move towards the minimum.
• In gradient descent, the entire data set is seen before the weights
are updated.
• In stochastic gradient descent, weights are updated every sanple.
• In mini-batch gradient descent a batch size number of tuples is seen
before updating the weights.
SGD and other optimizers
• As far as keras is concerned, all variants of gradient descent are
called ‘sgd’.
• The batch size determines the exact algorithm.
• However, there are improved optimizers like Adam, which can
handle changing learning rate.
• Gradient descent is deterministic but might give a local minimum.
• Stocharstic gradient descent is not deterministic but tries to reach
the global minimum.
• Please check documentation for optimizers.
Loss Function C
• C has many choices, and need not be convex. Common
options are:
• Binary_crosssentropy
• Categorical_crossentropy
• Sparse_categorical_crossentropy.
• Please check documentation for where each can be used.

Choice of Learning Rate
• If learning rate is too small, it might take a long time to
converge.
• If it is too large, it might fluctuate and even jump across
local minima.
An Epoch
• An epoch is when weights are adjusted using all of the
tuples in the data set.
• For better results, use multiple epochs.
Why is Gradient Descent used?
• Brute force is not feasible as there will be too many points
and far too many dimensions (CURSE of dimensionality).
• So, we use either gradient descent or stochastic gradient
descent or mini batch gradient descent and move towards
the optimum for the cost function using a learning rate.
ANN Steps
• Forward Propagation
• Back Propagation
• Gradient Descent
• Stochastic Gradient Descent
• Mini Batch Gradient Descent
• Batch
• Epoch
Convolutional Neural Networks (CNN)
• Convolution or Filtering or Feature Extraction
• ReLU Layer
• Max Pooling
• Flattening
• Full Connection
Recurrent Neural Networks (RNN)
• RNNs have some “memory” in them.
• They have the ability to deal with historical sequences.
• They are capable of detecting for example, moving objects.
Natural Language Processing (NLP)
• Natural language processing (NLP) refers to the branch of computer
science—and more specifically, the branch of artificial intelligence
or AI—concerned with giving computers the ability to understand
text and spoken words in much the same way human beings can.
Recommender Systems
• A recommender system suggests items based on past history of
choices or behaviour of the particular user.
• For example, YouTube recommendation
• For example, Amazon Books recommendation
• Netflix movies suggestion
Sentiment Analysis
• Sentiment analysis is the process of analyzing digital text to
determine if the emotional tone of the message is positive,
negative, or neutral.
• Today, companies have large volumes of text data like emails,
customer support chat transcripts, social media comments, and
reviews.
Thank You!

AI ML Nov 15

Uploaded by

Copyright:

Available Formats

AI ML Nov 15

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI ML Nov 15

Uploaded by

Copyright:

Available Formats

Artificial Intelligence & Machine Learning

•yi = β0 + β1x1i + β2x2i + … + βnxni + εi

•yi = β0 + β1xi + β2xi2 + …+ βnxin + εi

• Where f is an activation function, usually ReLU or sigmoid or

• For full batch gradient descent sum is over all tuples.

• Please check documentation for where each can be used.

You might also like