0% found this document useful (0 votes)
14 views

Chapter17 Autoencoders

representation learning and generative learnig

Uploaded by

Sivaiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Chapter17 Autoencoders

representation learning and generative learnig

Uploaded by

Sivaiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Chapter 17: Representation Learning

and Generative Learning Using


Autoencoders and GANs (Part 1)
Tsz-Chiu Au
chiu@unist.ac.kr

Ulsan National Institute of Science and Technology (UNIST)


South Korea
Introduction
• Autoencoders are artificial neural networks capable of learning dense
representations of the input data, called latent representations or
codings, without any supervision (i.e., the training set is unlabeled).
» These codings typically have a much lower dimensionality than the input data, making
autoencoders useful for dimensionality reduction, especially for visualization purposes.
» Autoencoders also act as feature detectors, and they can be used for unsupervised
pretraining of deep neural networks
» Some autoencoders are generative models: they are capable of randomly generating
new data that looks very similar to the training data.
§ However, the generated images are usually fuzzy and not entirely realistic.
• Generative adversarial networks (GANs) are now widely used for
» Face generation
» Super resolution (increasing the resolution of an image)
» Colorization
» Powerful image editing (e.g., replacing photo bombers with realistic background)
» Turning a simple sketch into a photorealistic image
» Predicting the next frames in a video
» Augmenting a dataset (to train other models)
» Generating other types of data (such as text, audio, and time series)
» Identifying the weaknesses in other models and strengthening them
Introduction (cont.)
• Autoencoders and GANs are both unsupervised, they both learn dense
representations, they can both be used as generative models, and they
have many similar applications.
• But autocoders and GANs work differently:
» Autoencoders simply learn to copy their inputs to their outputs.
§ Add constraints to forces an autoencoder learn efficient ways of representing the
data.
» GANs are composed of two neural networks: a generator that tries to generate data that
looks similar to the training data, and a discriminator that tries to tell real data from fake
data.
§ The generator and the discriminator compete against each other during training
• In this chapter, we will learn
» Autoencoders and how to use them for dimensionality reduction, feature extraction,
unsupervised pretraining, or as generative models.
» GANs and how to use them to generate fake images.
» Adversarial training
Efficient Data Representations
• An autoencoder looks at the inputs, converts them to an efficient latent
representation, and then spits out something that looks very close to the inputs.
• An autoencoder is always composed of two parts:
» An encoder (or recognition network) that converts the inputs to a latent representation
» A decoder (or generative network) that converts the internal representation to the outputs.
• The outputs are often called the reconstructions because the autoencoder tries
to reconstruct the inputs.
• The cost function contains a
reconstruction loss that
penalizes the model when
the reconstructions are
different from the inputs.
• When the internal
representation has a lower
dimensionality than the
input data, the autoencoder
is said to be undercomplete
» force to learn the most
important features in the input
data and drop the unimportant
ones).
Performing PCA with an Undercomplete Linear
Autoencoder
• If the autoencoder uses only linear activations (i.e., no activation function)
and the cost function is the mean squared error (MSE), then it ends up
performing Principal Component Analysis (PCA).

• You can think of autoencoders as a form of self-supervised learning


Stacked Autoencoders
• Stacked autoencoders (or deep autoencoders) are autoencoders with
multiple hidden layers.
• Adding more layers helps the autoencoder learn more complex codings.
» But do not to make the autoencoder too powerful (e.g., map an input to a single number)
§ Otherwise, it will not have learned any useful data representation.
Visualizing the Reconstructions
• Let’s plot a few images from the validation set, as well as their
reconstructions:

• The reconstructions are recognizable, but a bit too lossy.


» We may need to train the model for longer, or make the encoder and decoder deeper,
or make the codings larger.
• But if we make the network too powerful, it will manage to make perfect
reconstructions without having learned any useful patterns in the data.
Visualizing the Fashion MNIST Dataset
• For visualization, stacked autoencoders do not give great results compared
to other dimensionality reduction algorithms.
» But they can handle large datasets, with many instances and many features.
• Use an autoencoder to reduce the dimensionality down to a reasonable
level, then use another dimensionality reduction algorithm for visualization.
• For example, to visualize Fashion MNIST,
» Use the encoder from our stacked autoencoder to reduce the dimensionality down to 30.
» Then we use Scikit-Learn’s implementation of the t-SNE algorithm to reduce the
dimensionality down to 2 for visualization.
Unsupervised Pretraining Using
Stacked Autoencoders
• Having plenty of unlabeled data and little labeled data is common.
• If you have a large dataset but most of it is unlabeled,
» Train a stacked autoencoder using all the data.
» Reuse the lower layers to create a neural network for your actual task and train it using
the labeled data.
» You may want to freeze the pretrained layers (at least the lower ones).
Tying Weights
• If an autoencoder is neatly symmetrical, a common technique is to tie the
weights of the decoder layers to the weights of the encoder layers.
» This halves the number of weights in the model.
§ speeding up training and limiting the risk of overfitting.
• Specifically, if the autoencoder has a total of N layers (not counting the
input layer), and WL represents the connection weights of the Lth layer
» The decoder layer weights can be defined simply as: WN–L+1 = WL⊺ (with L = 1, 2, ..., N/2).
Training One Autoencoder at a Time
• Greedy layerwise training: train one shallow autoencoder at a time, then
stack all of them into a single stacked autoencoder.
• For several years, greedy layerwise training was the only efficient way to
train deep nets.
» See restricted Boltzmann machines.
» But no longer common.
Convolutional Autoencoders
• Convolutional neural networks are far better suited than dense networks
to work with images.
• Convolutional autoencoder: The encoder is a regular CNN composed of
convolutional layers and pooling layers.
» It typically reduces the spatial dimensionality of the inputs (i.e., height and width) while
increasing the depth (i.e., the number of feature maps).
§ The decoder does the reverse by transpose convolutional layers
Recurrent Autoencoders
• In a recurrent autoencoder, the encoder is typically a sequence-to-vector
RNN which compresses the input sequence down to a single vector.
» The decoder is a vector-to-sequence RNN that does the reverse.

• This recurrent autoencoder can process sequences of any length, with 28


dimensions per time step.
» Use a RepeatVector layer as the first layer of the decoder, to ensure that its input vector
gets fed to the decoder at each time step.
Overcomplete Autoencoder
• Up to now, in order to force the autoencoder to learn
interesting features, we have limited the size of the coding
layer, making it undercomplete.
» For data visualization and unsupervised pretraining.
• If we allow the coding layer to be just as large as the inputs, or
even larger, the result is an overcomplete autoencoder.
Denoising Autoencoders
• Stacked denoising autoencoders: add noise to its inputs, training it to
recover the original, noise-free inputs.
» The noise can be pure Gaussian noise added to the inputs, or it can be randomly
switched-off inputs, just like in dropout.
Sparse Autoencoders
• Another kind of constraint for feature extraction is sparsity
» By adding an appropriate term to the cost function, the autoencoder is pushed to
reduce the number of active neurons in the coding layer.
» E.g., use the sigmoid activation function in the coding layer and add some L1
regularization to the coding layer’s activations.
§ Using the L1 norm rather than the L2 norm will push the neural network to preserve the most important codings
while eliminating the ones that are not needed for the input image (rather than just reducing all codings).
Sparse Autoencoders (cont.)
• A better approach is to measure the actual sparsity of the coding layer at
each training iteration, and penalize the model when the measured
sparsity differs from a target sparsity.
» Compute the average activation of each neuron in the coding layer, over the whole
training batch.
» Once we have the mean activation per neuron, we want to penalize the neurons that
are too active, or not active enough, by adding a sparsity loss to the cost function.
» Add the Kullback–Leibler (KL) divergence to the cost function.
Sparse Autoencoders (cont.)
• Once we have computed the sparsity loss for each neuron in the coding
layer, we sum up these losses and add the result to the cost function.
» To control the relative importance of the sparsity loss and the reconstruction loss, we
can multiply the sparsity loss by a sparsity weight hyperparameter.

• E.g., after training this sparse autoencoder on Fashion MNIST, the


activations of the neurons in the coding layer are mostly close to 0, and all
neurons have a mean activation around 0.1.
Variational Autoencoders
• Variational autoencoders are one of the most popular types of autoencoders.
» They are probabilistic autoencoders, meaning that their outputs are partly determined by
chance, even after training.
» They are generative autoencoders, meaning that they can generate new instances that look
like they were sampled from the training set.
• Variational autoencoders perform variational Bayesian inference, which is an
efficient way to perform approximate Bayesian inference.
• Instead of directly producing a
coding for a given input, the encoder
produces a mean coding μ and a
standard deviation σ.
• The actual coding is then sampled
randomly from a Gaussian
distribution with mean μ and
standard deviation σ.
• The decoder decodes the sampled
coding.
Variational Autoencoders (cont.)
• A variational autoencoder tends to produce codings that look as though they
were sampled from a simple Gaussian distribution.
» During training, the cost function pushes the codings to gradually migrate within the coding
space (also called the latent space) to end up looking like a cloud of Gaussian points.
• The cost function is composed of two parts.
» The reconstruction loss
» The latent loss that pushes the autoencoder to have codings that look as though they were
sampled from a simple Gaussian distribution.
§ The KL divergence between the target distribution (i.e., the Gaussian distribution) and
the actual distribution of the codings.

§ A common tweak to the variational autoencoder’s architecture is to make the encoder


output γ = log(σ2) rather than σ.
› This approach is more numerically stable and speeds up training.
Variational Autoencoders (cont.)
• Build a variational autoencoder for Fashion MNIST in Keras:
Generating Fashion MNIST Images
• To use the variational autoencoder to generate images, sample random
codings from a Gaussian distribution and decode them.

• Fashion MNIST images generated by the variational autoencoder:

• Give it a bit more fine-tuning and training time, and those images should
look better.
Semantic Interpolation
• Variational autoencoders make it possible to perform semantic
interpolation
» Instead of interpolating two images at the pixel level (which would look as if the two
images were overlaid), we can interpolate at the codings level.
» We first run both images through the encoder, then we interpolate the two codings we
get, and finally we decode the interpolated codings to get the final image.

You might also like