Autoencoder
Tuan Nguyen - AI4E
Outline
● Unsupervised Learning (Introduction)
● Autoencoder (AE)
● Autoencoder application
● Convolutional AE
● Denoising AE
Supervised vs Unsupervised
Supervised Learning
• Data: (X,Y)
• Goal: Learn a
Mapping Function
f where:
f(X) = Y
Supervised Learning
Label data?
01 02
What happens What happens
when our labels where we don’t
are noisy? have labels for
• Missing values. training at all?
• Labeled incorrectly.
Unsupervised Learning
Up until now we have encountered in this course
mostly Supervised Learning problems and
algorithms.
Let’s talk about Unsupervised Learning
Unsupervised Learning
Unsupervised Learning
Data: X no label
Goal: Learn the structure
of the data learn
correlations between
features
Unsupervised Learning
Examples: Clustering, Compression, Feature &
Representation learning, Dimensionality
reduction, Generative models ,etc.
PCA – Principal Component analysis
- Statistical approach for data
compression and
visualization
- Invented by Karl Pearson in
1901
- Weakness: linear
components only.
Autoencoder
- The autoencoder idea was a part of NN history
for decades (LeCun et al, 1987).
- Traditionally an autoencoder is used for
dimensionality reduction and feature learning.
- Recently, the connection between autoencoders
and latent space modeling has brought
autoencoders to the front of generative modeling.
Simple Idea
Training Autoencoder
Traditional Autoencoder
Traditional Autoencoder
▪ Unlike the PCA now we can use
activation functions to achieve
non-linearity.
▪ It has been shown that an AE
without activation functions
achieves the PCA capacity.
Auto-encoder
Usually <784
Compact
NN
code representation of
Encoder
the input object
28 X 28 = 784
NN Can reconstruct
code the original
Decoder
object
Deep Autoencoder
Symmetric is
Of course, the auto-encoder can be deep
not necessary.
As close as
Output Layer
possible
Input Layer
Layer
Layer
Layer
Layer
Layer
Layer
… …
z
Latent
Deep Autoencoder
Original Image
784
784
30
PCA
Deep Auto-encoder
1000
1000
500
500
784
784
30
0
5
2
0
5
2
784 784
1000
2
500
784 2
5
0
2
2
5
0
500
1000
784
Denoise
As close as possible
encode decode
Add
noise
Vincent, Pascal, et al. "Extracting and composing robust
features with denoising autoencoders." ICML, 2008.
Text Retrieval
Vector Space Model Bag-of-word
this 1
is 1
word string:
query “This is an apple” a 0
an 1
apple 1
document pen 0
…
Semantics are not
considered.
Text Retrieval
The documents talking about the
same thing will have close code.
2
125
250 query
500
2000
Bag-of-word
(document or query)
Similar image search
Retrieved using Euclidean distance in pixel intensity
space
Reference: Krizhevsky, Alex, and Geoffrey E. Hinton. "Using
very deep autoencoders for content-based image
retrieval." ESANN. 2011.
Similar image search
8192
4096
2048
1024
512
256
32x32
code
Auto-encoder for CNN
As close as
possible
Deconvolution
Convolutio
Unpooling
n
Deconvolution Pooling
Convolutio
Unpooling
code n
Deconvolution Pooling
Deconvolution
Convolution
Transposed convolution
Transposed convolution
Convolutional AE
Convolutional AE
Denoising AE
Intuition:
- We still aim to encode the input and to NOT mimic the identity function.
- We try to undo the effect of corruption process stochastically applied to the input.
A more robust model
Encoder Decoder
Noisy Latent space Denoised
Input representation Input
Process
Apply Noise
Q&A