ML Lec 19 Autoencoder
ML Lec 19 Autoencoder
• Autoencoder encodes the data, i.e., it codes the data in its own.
• This is an unsupervised learning, as we don’t need the class
label of the data during training.
• Whatever is fed to the input of the autoencoder, it outputs the
same thing.
• For this, we need two different functions: encoder and decoder.
• The encoder will encode the input data to a compressed domain
knowledge representation using one or many hidden layers,
where last hidden layer is called the bottleneck or latent layer.
• The decoder will decode the data from that compressed
representation available at the bottleneck layer to the original
input or closer to original input at the output layer. The decoder
also may contain many hidden layers.
• Encoder part is from input layer to bottleneck layer and decoder
part is from bottleneck layer to output layer.
Base Architecture of an Autoencoder
X x̂
Bottleneck layer
• But if the input is an image, say of size M×N, then we have MN number of
pixels, represented by a vector.
• Each pixel is represented by a node. So we need MN+1 nodes, as one node is
required for bias.
• But we don’t need the bias at output layer, so the number of nodes at output
layer is MN.
• The hidden layer is the compressed domain knowledge representation of the
input image, so the representation space contains vectors of dimension d <<
MN. But here also we need a bias node, so number of nodes in bottleneck is
d+1
Loss Function
• Whatever may be the type of autoencoder, we have
to encode the input data and again decode the
encoded data for faithful reconstruction of the input.
• At the encoder side, the bottleneck layer compress
the input data and at decoder side the compressed
data is reconstructed.
• So autoencoder should perform two tasks:
(i) Autoencoder should be sensitive to input for
accurate reconstruction.
(ii) It should not be sensitive enough to memorize or
overfit the training data.
Loss Function
• Applications:
i) Dimensionality Reduction: An undercomplete autoencoder can be used for
dimensionality reduction in large datasets, allowing for faster processing in
subsequent tasks.
ii) Anomaly Detection: If trained on normal data, the autoencoder will have
trouble reconstructing anomalous inputs, making it a good tool for detecting
outliers.
iii) Pretraining for Deep Networks: The learned representations in the bottleneck
can be used as feature representations for initializing deep neural networks
(transfer learning).
• Summary:
o An undercomplete autoencoder is a neural network model designed to learn
compact, efficient representations by restricting the size of the latent space.
o It achieves dimensionality reduction, feature extraction, and noise reduction by
forcing the network to encode only the most important aspects of the input data.
o This constraint makes it an effective tool for many machine learning tasks,
particularly when the goal is to capture the essential structure of the data.
Stacked Autoencoder
• Sparsity Constraint:
– The primary goal is to ensure that only a small
fraction of the neurons are active (non-zero) for
any given input. This is usually achieved by adding
a sparsity penalty to the loss function.
– Common methods include using L1 regularization
on the activations or incorporating a sparsity
constraint that compares the average activation of
the neurons to a predefined sparsity parameter.
Loss Function
• Benefits
(i) Feature Extraction: By promoting sparsity, these
autoencoders tend to learn more meaningful and
interpretable features, which can be useful for
downstream tasks.
(ii) Dimensionality Reduction: Sparse representations
can be more efficient in capturing the underlying
structure of the data, which is beneficial for
reducing dimensionality.
(iii) Robustness: Sparsity can enhance the model's
robustness to noise and irrelevant variations in the
data.
Denoising Autoencoder
• A denoising autoencoder (DAE) is a type of autoencoder
specifically designed to learn robust representations of
data by reconstructing clean input from noisy versions.
• This approach helps the model learn to filter out noise
and can improve its ability to generalize to unseen data.
• Architecture:
– Like a standard autoencoder, a denoising autoencoder
consists of an encoder and a decoder.
– The encoder compresses the noisy input data into a lower-
dimensional latent representation, and the decoder
reconstructs the original clean input from this representation.
Denoising Autoencoder
• Input Corruption: During training, the original
input data is intentionally corrupted. Common
methods of corruption include:
– Adding Gaussian Noise: Random noise is added to the
input features.
– Dropout: Randomly setting a fraction of input units to
zero.
– Salt-and-Pepper Noise: Randomly replacing some input
pixels with maximum and minimum values (for images).
• The model learns to reconstruct the original clean
input from this corrupted version.
Denoising Autoencoder : Loss Function
• The objective is to minimize the
reconstruction loss, typically using mean
squared error (MSE) or binary cross-entropy,
comparing the reconstructed output with the
original clean input.
Denoising Autoencoder : Loss Function
• Though the added noise during training acts as a form of
regularization to prevent overfitting, but some other
regularizers are also helpful.
• Common Regularization Techniques are:
• Dropout: Randomly dropping units during training can
help prevent co-adaptation of neurons, making the
model more robust.
• Weight Regularization: Adding L1 or L2 penalties to the
weights in the loss function can help keep the model
from becoming overly complex.
• Early Stopping: Monitor validation loss and stop training
when it starts to increase can prevent overfitting.
Denoising Autoencoder
• Benefits
(i) Robust Feature Learning: By learning to
reconstruct from noisy inputs, DAEs can capture
more robust features that generalize well to new,
unseen data.
(ii) Regularization: The added noise during training
acts as a form of regularization, helping to prevent
overfitting.
(iii) Improvements in Data Quality: Can be effectively
used for tasks such as denoising images, speech
signals, or any data that is prone to noise.
Applications of DAE