0% found this document useful (0 votes)

9 views28 pages

Lecture14 - ML (FF, Autoenc, Dense Networks)

Uploaded by

1162407364

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views28 pages

Lecture14 - ML (FF, Autoenc, Dense Networks)

Uploaded by

1162407364

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Natural Language

Processing
Lecture 14:
Machine Learning: Feed-forward Neural Networks,
Autoencoders/embeddings, Dense networks

12 /7/2019

COMS W4705
Yassine Benajiba
Perceptron Expressiveness
• Simple perceptron learning algorithm, starts with an
arbitrary hyperplane and adjusts it using the training data.

• Step function is not differentiable, so no closed-form

solution.

• Perceptron produces a linear separator.

• Can only learn linearly separable patterns.

• Can represent boolean functions like and, or, not but not
the xor function.
The problem with xor
Multi-Layer Neural Networks

input layer hidden layer output layer

• Basic idea: represent any (non-linear) function as a composition of

soft-threshold functions. This is a form of non-linear regression.

• Lippmann 1987: Two hidden layers suffice to represent any arbitrary

region (provided enough neurons), even discontinuous functions!
Activation Functions
• One problem with perceptrons is that the threshold
function (step function) is undifferentiable.

• It is therefore unsuitable for gradient descent.

• One alternative is the sigmoid (logistic) function:

g(z) = 0 if z→-∞
g(z) = 1 if z→∞
Activation Functions
• Two other popular activation functions:
Output Representation
• Many NLP Problems are multi-class classification problems.

• Each output neuron represents one class. Predict the class

with the highest activation.

y0 0.9

y1 0.1

y2 0.7

y3 0.4
Softmax
• We often want the activation at the output layer to
represent probabilities.

• Normalize activation of each output unit by the sum of all

output activations (as in log-linear models).

z0 0.9

z1 0.1

z2 0.7 The network computes a probability

z3 0.4
Softmax
• We often want the activation at the output layer to
represent probabilities.

• Normalize activation of each output unit by the sum of all

output activations (as in log-linear models).

z0 0.35

z1 0.16

z2 0.28 The network computes a probability

z3 0.21
Learning in Multi-Layer
Neural Networks
• Network structure is fixed, but we want to train the weights. Assume
feed-forward neural networks: no connections that are loops.

• Backpropagation Algorithm:

• Given current weights, get network output and compute loss

function (assume multiple outputs / a vector of outputs).

• Can use gradient descent to update weights and minimize loss.

• Problem: We only know how to do this for the last layer!

• Idea: Propagate error backwards through the network.

Backpropagation
feed-forward computation of network outputs

x1 output vector
hw(x)
i hw(x)1 = a1
x2 k
input vector x
target vector y
hw(x)2 = a2
x3

Error function
x4 Etrain(w)

input layer hidden layer output layer

back propagation of error gradients

Negative Log-Likelihood
(also known as cross-entropy)

• Assume target output is a one-hot vector and c(y) is the

target class for target y.

• Compute the negative log-likehood for a single example

• Empirical error for the entire training data:

Stochastic Gradient Descent
(for a single unit)
• Goal: Learn parameters that minimize the empirical error.

Randomly initialize w
for a set number of iterations T:
shuffle training data
for j = 1...N:
for each wi (all weights in the network):

• is the learning rate.

• It often makes sense to compute the gradient over batches of examples,
instead of just one ("mini-batch").
Backpropgation
• Simplified multi-layer case (a single unit per layer):

x g g(x) f f(g(x)) Loss

w1 w2

• Stochastic Gradient Descent should perform the following

update:

• Problem: How do we compute the gradient for parameters w1

and w2?
Chain Rule of Calculus

• To compute gradients for hidden units, we need to apply the

chain rule of calculus:

The derivative of is
Backpropagation

x f f(x) g g(f(x)) Loss

w1 w2
Backpropagation
forward ... x f f(x) ... Loss
w

backward ... f ...

Assume we know

We want to compute to propagate it back.

and (for the weight update)

Backpropagation
forward ... x f f(x) ... Loss
w

backward ... f ...

to compute these
we have to know
the derivate of the
function f
Autoencoders
Embeddings
(Word level semantics)
Skip-Gram Model
• Input:
A single word in one-hot representation.

• Output: probability to see any single word as a context word.

0.02 a
0 d hidden
⋮
neurons 0.0 thought
0 Σ
0.04 cheese
eat 1 Σ
0 ⋮ 0.03 place
⋮ Σ
⋮

0 0.0 run
|V| neurons |V| neurons
softmax activation
• Softmax function normalizes the activation of the output neurons to sum up to 1.0.
Skip-Gram Model
• Compute error with respect to each context word.
wt-c place ...a place to eat delicious cheese .

⋮ (eat, place)
(eat, to)
wt-1 to (eat, delicious)
eat (eat,cheese)
wt+1 delicious
wt
⋮

wt+c cheese

• Combine errors for each word, then use combined error to update
weights using back-propagation.
Continuous Bag-of-Words
Model (CBOW)
wt-c

wt-1
wt

wt+1
SUM
⋮

wt+c

• Input: Context words. Averaged in the hidden layer.

• Output: Probability that each word is the target word.

Embeddings are Magic
(Mikolov 2016)

vector(‘king’) - vector(‘man’) + vector(‘woman’) ≈ vector(‘queen’)

Application: Word Pair
Relationships
Using Word Embeddings
• Word2Vec:

• https://code.google.com/archive/p/word2vec/

• GloVe: Global Vectors for Word Representation

• https://nlp.stanford.edu/projects/glove/

• Can either use pre-trained word embeddings or train them

on a large corpus.
Word embeddings
0.02 a
0 d hidden
⋮ neurons 0.0 thought
0 Σ
0.04 cheese
eat 1 Σ
0 ⋮ 0.03 place
⋮ Σ
⋮

0 0.0 run
|V| neurons |V| neurons
softmax activation
Word embeddings
Pros
- Groups semantically
similar words together
- A simple way to measure
similarity
- Great approach to better
deal with unseen words
in the training

Cons
- Doesn’t make a
difference between
function and content
words
- Only one representation How can we build a sentence
for polysemous words representation using word-level
- Non interpretable distributional representations?
semantic dimensions
Acknowledgments
• Some slides by Chris Kedzie

Computer Vision Masterclass
No ratings yet
Computer Vision Masterclass
154 pages
Machine Learning Approaches and Sentinel-2 Data in Crop Type Mapping
No ratings yet
Machine Learning Approaches and Sentinel-2 Data in Crop Type Mapping
21 pages
Kandpal MSC Thesis 14-16 4 0
No ratings yet
Kandpal MSC Thesis 14-16 4 0
102 pages
Lecture 14 - Neural Networks: Machine Learning March 18, 2010
No ratings yet
Lecture 14 - Neural Networks: Machine Learning March 18, 2010
50 pages
Decision Tree Slides
No ratings yet
Decision Tree Slides
94 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Introduction To Machine Learning Paradigm Updated
No ratings yet
Introduction To Machine Learning Paradigm Updated
14 pages
Predicting Life Insurance Risk Classes Using Machine Learning
No ratings yet
Predicting Life Insurance Risk Classes Using Machine Learning
68 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques
No ratings yet
Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques
7 pages
SLIQ
No ratings yet
SLIQ
15 pages
Ann4-3s.pdf 7oct PDF
No ratings yet
Ann4-3s.pdf 7oct PDF
21 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
Session XX - Neural Network
No ratings yet
Session XX - Neural Network
43 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
No ratings yet
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
11 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
Alternatives For Telco Data Network The Value of Spatial and Referral Networks For Churn Detection
No ratings yet
Alternatives For Telco Data Network The Value of Spatial and Referral Networks For Churn Detection
20 pages
Bim309 Ai Week13
No ratings yet
Bim309 Ai Week13
53 pages
Lecture+8
No ratings yet
Lecture+8
65 pages
4.2 Ann
No ratings yet
4.2 Ann
26 pages
Neural Network: Prof. Subodh Kumar Mohanty
No ratings yet
Neural Network: Prof. Subodh Kumar Mohanty
37 pages
AN2DL_02_2324_Perceptron_2_FeedForward
No ratings yet
AN2DL_02_2324_Perceptron_2_FeedForward
55 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Module I
No ratings yet
Module I
109 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
19 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
2023-Lecture11-NeuralNetworks
No ratings yet
2023-Lecture11-NeuralNetworks
48 pages
neural (2)
No ratings yet
neural (2)
32 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Ai Fundamentals Midterm Quizzes Source
No ratings yet
Ai Fundamentals Midterm Quizzes Source
26 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
24 pages
Unit IV Ensemble Unsupervised Learning
No ratings yet
Unit IV Ensemble Unsupervised Learning
5 pages
19_Learning
No ratings yet
19_Learning
31 pages
06-NeuralNetworks-2024
No ratings yet
06-NeuralNetworks-2024
82 pages
Analysis of Credit Card Fraud Detection Methods
No ratings yet
Analysis of Credit Card Fraud Detection Methods
3 pages
341-Forest Cover Type Prediction
No ratings yet
341-Forest Cover Type Prediction
5 pages
Active Online Learning For Social Media Analysis To Support Crisis Management
No ratings yet
Active Online Learning For Social Media Analysis To Support Crisis Management
8 pages
Machine Learning: Notes by Aniket Sahoo - Part II
No ratings yet
Machine Learning: Notes by Aniket Sahoo - Part II
140 pages
lect8_dnn (1)
No ratings yet
lect8_dnn (1)
33 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Ism Research Assessment 3
No ratings yet
Ism Research Assessment 3
27 pages
Main
No ratings yet
Main
25 pages
Predicting Mental Health Illness Using Machine Learning Algorithms
No ratings yet
Predicting Mental Health Illness Using Machine Learning Algorithms
8 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
lecture_221004_04
No ratings yet
lecture_221004_04
29 pages
5_From Linear Models to Multi-layer Perceptrons
No ratings yet
5_From Linear Models to Multi-layer Perceptrons
45 pages
LLM for Maths People
No ratings yet
LLM for Maths People
53 pages
Module 3.Docxaiml
No ratings yet
Module 3.Docxaiml
20 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Event Detection in Football: Improving The Reliability of Match Analysis
No ratings yet
Event Detection in Football: Improving The Reliability of Match Analysis
11 pages
10 Neural Network
No ratings yet
10 Neural Network
65 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Brain Computer Interfacewithg MOBIlab
No ratings yet
Brain Computer Interfacewithg MOBIlab
13 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Multiple-Layer Networks Backpropagation Algorithms
No ratings yet
Multiple-Layer Networks Backpropagation Algorithms
46 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Neural networks unit-3
No ratings yet
Neural networks unit-3
14 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
CSC354 ML CDF V3.1
No ratings yet
CSC354 ML CDF V3.1
2 pages
Studying Complex Adaptive Systems: John H. Holland
No ratings yet
Studying Complex Adaptive Systems: John H. Holland
8 pages
Telecom_Customer_Churn
No ratings yet
Telecom_Customer_Churn
5 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
unit 2 -ml
No ratings yet
unit 2 -ml
18 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
6 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
neural-networks-essay-feranmi-dere
No ratings yet
neural-networks-essay-feranmi-dere
7 pages
Machine Learning May 2024
No ratings yet
Machine Learning May 2024
8 pages
536C3E
No ratings yet
536C3E
2 pages
Remote Sensing and GIS-Unit-III
No ratings yet
Remote Sensing and GIS-Unit-III
71 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Business Statistics (B.com) P 1
No ratings yet
Business Statistics (B.com) P 1
99 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet