E-Note_28189_Content_Document_20241127105359AM (1)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

MODULE 4

Syllabus: The purpose of GAN, An analogy from the real world, Building blocks of GAN.
Implementation of GAN, Applications of GAN, Challenges of GAN Models, Setting up failure
and bad initialization, Mode collapse, Problems with counting, Problems with perceptive

The purpose of GAN

Generative adversarial networks are machine learning systems that can learn to mimic a
given distribution of data. They were first proposed in a 2014 NeurIPS paper by deep
learning expert Ian Goodfellow and his colleagues.
GANs consist of two neural networks, one trained to generate data and the other trained to
distinguish fake data from real data (hence the “adversarial” nature of the model). Although
the idea of a structure to generate data isn’t new, when it comes to image and video
generation, GANs have provided impressive results such as:
 Style transfer using CycleGAN, which can perform a number of convincing style
transformations on images
 Generation of human faces with StyleGAN, as demonstrated on the website This
Person Does Not Exist
Structures that generate data, including GANs, are considered generative models in contrast
to the more widely studied discriminative models.
Some generative models are able to generate samples from model distribution. GANs are an
example of generative models. GAN focuses primarily on generating samples from
distribution. You might be wondering why generative models are worth studying, especially
generative models that are only capable of generating data rather than providing an
estimate of the density function. Some of the reasons to study generative models are as
follows: Sampling (or generation) is straightforward Training doesn't involve maximum
likelihood estimation Robust to overfitting since the generator never sees the training data
GANs are good at capturing the modes of distribution

An analogy from the real world

Let's consider the real-world relationship between a money counterfeiting criminal and the
police. Let's enumerate the objective of the criminal and the police in terms of money:
Figure1a: GAN real world analogy
To become a successful money counterfeiter, the criminal needs to fool the police so that
the police can't tell the difference between the counterfeit/fake money and real money As a
paragon of justice, the police want to detect fake money as effectively as possible This can
be modeled as a minimax game in game theory. This phenomenon is called adversarial
process. GAN, introduced by Ian Goodfellow in 2014 is a special case of an adversarial
process where two neural networks compete against each other. The first network
generates data and the second network tries to find the difference between the real data
and the fake data generated by the first network. The second network will output a scalar
[0, 1], which represents a probability of real data.
The building blocks of GAN

In GAN, the first network is called generator and is often represented as G(z) and the second
network is called discriminator and is often represented as D(x):
Here are the steps a GAN takes:

 The generator takes in random numbers and returns an image.


 This generated image is fed into the discriminator alongside a stream of images
taken from the actual, ground-truth dataset.
 The discriminator takes in both real and fake images and returns probabilities, a
number between 0 and 1, with 1 representing a prediction of authenticity and 0
representing fake.

So you have a double feedback loop:

 The discriminator is in a feedback loop with the ground truth of the images, which
we know.
 The generator is in a feedback loop with the discriminator.

At the equilibrium point, which is the optimal point in the minimax game, the first network
will model the real data and the second network will output a probability of 0.5 as the
output of the first network = real data:

Sometimes the two networks eventually reach equilibrium, but this is not always
guaranteed and the two networks can continue learning for a long time. An example of
learning with both generator and discriminator loss is shown in the following figure:
Figure 1c: Loss of two networks, generator and discriminator
Generator
The generator network takes as input random noise and tries to generate a sample of data.
In the preceding figure, we can see that generator G(z) takes an input z from probability
distribution p(z) and generates data that is then fed into a discriminator network D(x).
Discriminator
The discriminator network takes input either from the real data or from the generator's
generated data and tries to predict whether the input is real or generated. It takes an input
x from real data distribution P data (x) and then solves a binary classification problem giving
output in the scalar range 0 to 1.
GANs are gaining lot of popularity because of their ability to tackle the important challenge
of unsupervised learning, since the amount of available unlabeled data is much larger than
the amount of labeled data. Another reason for their popularity is that GANs are able to
generate the most realistic images among generative models. Although this is subjective, it
is an opinion shared by most practitioners.

Figure-1d: Vector arithmetic in GANs


Beside this, GAN is often very expressive: it can perform arithmetic operations in the latent
space, that is the space of the z vectors, and translate into corresponding operations in
feature space. As shown in Figure 1d, if you take the representation of a man with glasses in
latent space, subtract the neutral man vector and add back the neutral woman vector, you
end up with a picture of a woman with glasses in feature space.
Generative adversarial networks consist of an overall structure composed of two neural
networks, one called the generator and the other called the discriminator.
The role of the generator is to estimate the probability distribution of the real samples in
order to provide generated samples resembling real data. The discriminator, in turn, is
trained to estimate the probability that a given sample came from the real data rather than
being provided by the generator.
These structures are called generative adversarial networks because the generator and
discriminator are trained to compete with each other: the generator tries to get better at
fooling the discriminator, while the discriminator tries to get better at identifying generated
samples.
To understand how GAN training works, consider a toy example with a dataset composed of
two-dimensional samples (x₁, x₂), with x₁ in the interval from 0 to 2π and x₂ = sin(x₁), as
illustrated in the following figure:

As you can see, this dataset consists of points (x₁, x₂) located over a sine curve, having a very
particular distribution. The overall structure of a GAN to generate pairs (x̃₁, x̃₂) resembling
the samples of the dataset is shown in the following figure:

The generator G is fed with random data from a latent space, and its role is to generate data
resembling the real samples. In this example, you have a two-dimensional latent space, so
that the generator is fed with random (z₁, z₂) pairs and is required to transform them so that
they resemble the real samples.
The structure of the neural network G can be arbitrary, allowing you to use neural networks
as a multilayer perceptron (MLP), a convolutional neural network (CNN), or any other
structure as long as the dimensions of the input and output match the dimensions of the
latent space and the real data.
The discriminator D is fed with either real samples from the training dataset or generated
samples provided by G. Its role is to estimate the probability that the input belongs to the
real dataset. The training is performed so that D outputs 1 when it’s fed a real sample and 0
when it’s fed a generated sample.
As with G, you can choose an arbitrary neural network structure for D as long as it respects
the necessary input and output dimensions. In this example, the input is two-dimensional.
For a binary discriminator, the output may be a scalar ranging from 0 to 1.
The GAN training process consists of a two-player minimax game in which D is adapted to
minimize the discrimination error between real and generated samples, and G is adapted to
maximize the probability of D making a mistake.
Although the dataset containing the real data isn’t labeled, the training processes
for D and G are performed in a supervised way. At each step in the training, D and G have
their parameters updated. In fact, in the original GAN proposal, the parameters of D are
updated k times, while the parameters of G are updated only once for each training step.
However, to make the training simpler, you can consider k equal to 1.
To train D, at each iteration you label some real samples taken from the training data as 1
and some generated samples provided by G as 0. This way, you can use a conventional
supervised training framework to update the parameters of D in order to minimize a loss
function, as shown in the following scheme:

For each batch of training data containing labeled real and generated samples, you update
the parameters of D to minimize a loss function. After the parameters of D are updated, you
train G to produce better generated samples. The output of G is connected to D, whose
parameters are kept frozen, as depicted here:
You can imagine the system composed of G and D as a single classification system that
receives random samples as input and outputs the classification, which in this case can be
interpreted as a probability.
When G does a good enough job to fool D, the output probability should be close to 1. You
could also use a conventional supervised training framework here: the dataset to train the
classification system composed of G and D would be provided by random input samples,
and the label associated with each input sample would be 1.
During training, as the parameters of D and G are updated, it’s expected that the generated
samples given by G will more closely resemble the real data, and D will have more trouble
distinguishing between real and generated data.

Implementation of GAN

As per the definition of GAN, we basically require two networks, be it a sophisticated


network
such as ConvNet or a simple two-layer neural network. Let's use a simple two-layer neural
network with the MNIST dataset using TensorFlow for implementation purposes. MNIST is a
dataset of handwritten digits where each image is gray scale of dimension 28x28 pixel:

# Random noise setting for Generator


Z = tf.placeholder(tf.float32, shape=[None, 100], name='Z')
#Generator parameter settings
G_W1 = tf.Variable(xavier_init([100, 128]), name='G_W1')
G_b1 = tf.Variable(tf.zeros(shape=[128]), name='G_b1')
G_W2 = tf.Variable(xavier_init([128, 784]), name='G_W2')
G_b2 = tf.Variable(tf.zeros(shape=[784]), name='G_b2')
theta_G = [G_W1, G_W2, G_b1, G_b2]
# Generator Network
def generator(z):
G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
G_prob = tf.nn.sigmoid(G_log_prob)
return G_prob

The generator(z) takes as input a 100-dimensional vector from a random distribution (in this
case we are using uniform distribution) and returns a 786-dimensional vector, which is a
MNIST
image (28x28). The z here is the prior for the G(z). In this way, it learns a mapping between
the prior space to p data (real data distribution):

#Input Image MNIST setting for Discriminator [28x28=784]


X = tf.placeholder(tf.float32, shape=[None, 784], name='X')
#Discriminator parameter settings
D_W1 = tf.Variable(xavier_init([784, 128]), name='D_W1')
D_b1 = tf.Variable(tf.zeros(shape=[128]), name='D_b1')
D_W2 = tf.Variable(xavier_init([128, 1]), name='D_W2')
D_b2 = tf.Variable(tf.zeros(shape=[1]), name='D_b2')
theta_D = [D_W1, D_W2, D_b1, D_b2]
# Discriminator Network
def discriminator(x):
D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
D_logit = tf.matmul(D_h1, D_W2) + D_b2
D_prob = tf.nn.sigmoid(D_logit)
return D_prob, D_logit

Whereas the discriminator(x) takes MNIST image(s) as input and returns a scalar that
represents a probability of real image. Now, let's discuss an algorithm for training GAN.
G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)
# Loss functions according the GAN original paper
D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))
The TensorFlow optimizer can only do minimization, so in order to maximize the loss
function,
we are using a negative sign for the loss as seen previously. Also, as per the paper's pseudo
algorithm, it's better to maximize tf.reduce_mean(tf.log(D_fake)) instead of minimizing
tf.reduce_mean(1 - tf.log(D_fake). Then we train the networks one by one with those
preceding loss functions:
# Only update D(X)'s parameters, so var_list = theta_D
D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
# Only update G(X)'s parameters, so var_list = theta_G
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)
def sample_Z(m, n):
'''Uniform prior for G(Z)'''
return np.random.uniform(-1., 1., size=[m, n])
for it in range(1000000):
X_mb, _ = mnist.train.next_batch(mb_size)
_, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z:
sample_Z(mb_size, Z_dim)})
_, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z:
sample_Z(mb_size, Z_dim)})

After that we start with random noise and as the training continues, G(Z) starts moving
towards p data. This is proved by the more similar samples generated by G(Z) compared to
original
MNIST images.

Applications of GAN

GAN is generating lots of excitement in a wide variety of fields. Some of the exciting
applications of GAN in recent years are listed as follows:
 Translating one image to another (such as horse to zebra) with CycleGAN and
performing
 image editing through Conditional GAN.
 Automatic synthesis of realistic images from a textual sentence using StackGAN. And
transferring style from one domain to another domain using Discovery GAN
(DiscoGAN).
 Enhancing image quality and generating high resolution images with pre-trained
models using SRGAN.
 Generating realistic a image from attributes: Let's say a burglar comes to your
apartment but you don't have a picture of him/her. Now the system at the police
station could generate a realistic image of the thief based on the description
provided by you and search a database.
 Predicting the next frame in a video or dynamic video generation

Challenges of GAN models


Training a GAN is basically about two networks, generator G(z) and discriminator D(z), trying
to race against each other and trying to reach an optimum, more specifically a nash
equilibrium. The definition of nash equilibrium as per Wikipedia (in economics and game
theory) is a stable state of a system involving the interaction of different participants, in
which
no participant can gain by a unilateral change of strategy if the strategies of the others
remain
unchanged.
 Setting up failure and bad initialization
If you think about it, this is exactly what GAN is trying to do; the generator and discriminator
reach a state where they cannot improve any further given the other is kept unchanged.
Now
the setup of gradient descent is to take a step in a direction that reduces the loss measure
defined on the problem—but we are by no means enforcing the networks to reach Nash
equilibrium in GAN, which have non-convex objective with continuous high dimensional
parameters. The networks try to take successive steps to minimize a non-convex objective
and end up in an oscillating process rather than decreasing the underlying true objective.
In most cases, when your discriminator attains a loss very close to zero, then right away you
can figure out something is wrong with your model. But the biggest difficulty is figuring out
what is wrong.
Another practical thing done during the training of GAN is to purposefully make one of the
networks stall or learn slower, so that the other network can catch up. And in most
scenarios, it's the generator that lags behind so we usually let the discriminator wait. This
might be fine to some extent, but remember that for the generator to get better, it requires
a good discriminator and vice versa. Ideally the system would want both the networks to
learn at a rate where both get better over time. The ideal minimum loss for the
discriminator is close to 0.5— this is where the generated images are indistinguishable from
the real images from the perspective of the discriminator.
 Mode collapse
One of the main failure modes with training a generative adversarial network is called mode
collapse or sometimes the helvetica scenario. The basic idea is that the generator can
accidentally start to produce several copies of exactly the same image, so the reason is
related to the game theory setup. We can think of the way that we train generative
adversarial networks as first maximizing with respect to the discriminator and then
minimizing with respect to the generator. If we fully maximize with respect to the
discriminator before we start to minimize with respect to the generator, everything works
out just fine. But if we go the other way around and we minimize with respect to the
generator and then maximize with respect to the discriminator, everything will actually
break and the reason is that if we hold the discriminator constant, it will describe a single
region in space as being the point that is most likely to be real rather than fake and then the
generator will choose to map all noise input values to that same most likely to be real point.
 Problems with counting
GANs can sometimes be far-sighted and fail to differentiate the number of particular objects
that should occur at a location. As we can see, it gives more numbers of eyes in the head
than originally present:

 Problems with perspective


GANs sometimes are not capable of differentiating between front and back view and hence
fail
to adapt well with 3D objects while generating 2D representations from it, as follows:
GAN

What are Generative Adversarial Networks?

Generative Adversarial Networks (GANs) were introduced in 2014 by Ian J. Goodfellow and
co-authors. GANs perform unsupervised learning tasks in machine learning. It consists of 2
models that automatically discover and learn the patterns in input data.

The two models are known as Generator and Discriminator.

They compete with each other to scrutinize, capture, and replicate the variations within a
dataset. GANs can be used to generate new examples that plausibly could have been drawn
from the original dataset.

Shown below is an example of a GAN. There is a database that has real 100 rupee notes. The
generator neural network generates fake 100 rupee notes. The discriminator network will
help identify the real and fake notes.

What is a Generator?

A Generator in GANs is a neural network that creates fake data to be trained on the
discriminator. It learns to generate plausible data. The generated examples/instances
become negative training examples for the discriminator. It takes a fixed-length random
vector carrying noise as input and generates a sample.
The main aim of the Generator is to make the discriminator classify its output as real. The
part of the GAN that trains the Generator includes:

 noisy input vector


 generator network, which transforms the random input into a data instance
 discriminator network, which classifies the generated data
 generator loss, which penalizes the Generator for failing to dolt the discriminator
The backpropagation method is used to adjust each weight in the right direction by
calculating the weight's impact on the output. It is also used to obtain gradients and these
gradients can help change the generator weights.

Let’s see the next topic in this article on what GANs are, i.e., a Discriminator.

What is a Discriminator?

The Discriminator is a neural network that identifies real data from the fake data created by
the Generator. The discriminator's training data comes from different two sources:

 The real data instances, such as real pictures of birds, humans, currency notes, etc., are
used by the Discriminator as positive samples during training.
 The fake data instances created by the Generator are used as negative examples during
the training process.
While training the discriminator, it connects to two loss functions. During discriminator
training, the discriminator ignores the generator loss and just uses the discriminator loss.

In the process of training the discriminator, the discriminator classifies both real data and
fake data from the generator. The discriminator loss penalizes the discriminator for
misclassifying a real data instance as fake or a fake data instance as real.

The discriminator updates its weights through backpropagation from the discriminator loss
through the discriminator network.

Now, let’s learn how GANs work in this article on ‘What are GANs’.

How Do GANs Work?

GANs consists of two neural networks. There is a Generator G(x) and a Discriminator D(x).
Both of them play an adversarial game. The generator's aim is to fool the discriminator by
producing data that are similar to those in the training set. The discriminator will try not to
be fooled by identifying fake data from real data. Both of them work simultaneously to learn
and train complex data like audio, video, or image files.
The Generator network takes a sample and generates a fake sample of data. The Generator
is trained to increase the Discriminator network's probability of making mistakes.

Below is an example of a GAN trying to identify if the 100 rupee notes are real or fake. So,
first, a noise vector or the input vector is fed to the Generator network. The generator
creates fake 100 rupee notes. The real images of 100 rupee notes stored in a database are
passed to the discriminator along with the fake notes. The Discriminator then identifies the
notes as classifying them as real or fake.

We train the model, calculate the loss function at the end of the discriminator network, and
backpropagate the loss into both discriminator and generator models.

Mathematical Equation

The mathematical equation for training a GAN can be represented as:

Here,

G = Generator

D = Discriminator

Pdata(x) = distribution of real data

p(z) = distribution of generator

x = sample from Pdata(x)

z = sample from P(z)


D(x) = Discriminator network
G(z) = Generator network
Steps for Training GAN

 Define the problem


 Choose the architecture of GAN
 Train discriminator on real data
 Generate fake inputs for the generator
 Train discriminator on fake data
 Train generator with the output of the discriminator
Let us now look at the different types of GANs.

Vanilla GANs: Vanilla GANs have a min-max optimization formulation where the
Discriminator is a binary classifier and uses sigmoid cross-entropy loss during optimization.
The Generator and the Discriminator in Vanilla GANs are multi-layer perceptrons. The
algorithm tries to optimize the mathematical equation using stochastic gradient descent.

Deep Convolutional GANs (DCGANs): DCGANs support convolution neural networks instead
of vanilla neural networks at both Discriminator and Generator. They are more stable and
generate better quality images. The Generator is a set of convolution layers with fractional-
strided convolutions or transpose convolutions, so it up-samples the input image at every
convolutional layer. The discriminator is a set of convolution layers with strided
convolutions, so it down-samples the input image at every convolution layer.

Conditional GANs: Vanilla GANs can be extended into Conditional models by using extra-
label information to generate better results. In CGAN, an additional parameter ‘y’ is added
to the Generator for generating the corresponding data. Labels are fed as input to the
Discriminator to help distinguish the real data from the fake generated data.

Super Resolution GANs: SRGANs use deep neural networks along with an adversarial
network to produce higher resolution images. SRGANs generate a photorealistic high-
resolution image when given a low-resolution image.

How GANs Learn

Generative methods are a very powerful tool that can be used to solve a number of
problems. Their goal is to generate new data samples that are likely to belong to the training
dataset. Generative methods can do this in two ways, by learning an approximate
distribution of the data space then sampling from it, or by learning to generate samples that
are likely to belong to this data space (avoiding the step of approximating the data
distribution).
Above you can see a diagram of the architecture of GANs. GANs consist of two networks
(generator and discriminator) that are essentially competing against each other; the two
networks have adversarial goals.

The generator attempts to maximize the probability of fooling the discriminator into thinking
its generated images are real. The discriminator’s goal is to correctly classify the real data as
real, and the generated data as fake. These objectives are expressed in the loss functions of
the networks, which will be optimized during training.

In GANs, the generator's loss function is minimized, and the discriminator's loss function is
maximized. The generator attempts to maximize the number of samples that the
discriminator’s false positives, and the discriminator attempts to maximize its classification
accuracy.

Figure 1: GAN training algorithm pseudo-code


In the pseudo-code above, for each epoch, for every batch, the gradients of the discriminator
and generator are computed. The discriminator’s loss consists of the logarithm of the
number of correctly classified samples from the real dataset, and the number of correctly
classified samples from the fake dataset. We want to maximize this. The generator’s loss
functions consists of how many times the discriminator correctly classifies the fake images,
we want to minimize this.

The goals are naturally opposite, and therefore so will be the gradients used to train the
networks. This can become a problem, and I will discuss this later.

Once the training is complete, the generator is all we care about. The generator is capable of
taking in a random noise vector, then this one will output the image that is most likely to
belong to the training data space. Remember that even though this is effectively learning a
mapping between the random variable (z) and the image data space, there is no guarantee
that the mappings between the two spaces will be smooth. GANs do not learn the
distribution of the data, they learn how to generate samples similar to those belonging to the
training data.

Application of GANs

 With the help of DCGANs, you can train images of cartoon characters for generating faces
of anime characters as well as Pokemon characters.

 GANs can be trained on the images of humans to generate realistic faces. The faces that
you see below have been generated using GANs and do not exist in reality.
 GANs can build realistic images from textual descriptions of objects like birds, humans,
and other animals. We input a sentence and generate multiple images fitting the
description.
Below is an example of a text to image translation using GANs for a bird with a black head,
yellow body, and a short break.

The Drawbacks of GANs

A major disadvantage of GANs is that as mentioned earlier, both the discriminator and
generator have opposite objectives and therefore opposite sign gradients. It can be shown
that when optimizing a GAN, a minimum will not be achieved. Instead, the optimization
algorithm will end up in a saddle point.
Another common problem with GANs is that when training these models, it is easy for the
discriminator to overpower the generator. The discriminator simply gets too good too quickly
and the generator is unable to learn how to generate images that fool the discriminator.
Intuitively this makes sense, a classification task will always be easier than the generator’s
task of learning how to generate new samples.

Deep Convolutional Generative Adversarial Network (DCGAN)


GANs are used for teaching a deep learning model to generate new data from that same
distribution of training data. Invented by Ian Goodfellow in 2014 in the paper Generative
Adversarial Nets. They are made up of two different models, a generator and
a discriminator. The generator produces synthetic or fake images which look like training
images. The discriminator looks at an image and the output and checks if the image is real
or fake. While training, the generator generates better fake images and fools the
discriminator to believe that the generated image is a real image and the discriminator tries
to become better at detection and classifying whether the image is real or fake.
Need for DCGANs:
DCGANs are introduced to reduce the problem of mode collapse. Mode collapse occurs
when the generator got biased towards a few outputs and can’t able to produce outputs of
every variation from the dataset. For example- take the case of mnist digits dataset (digits
from 0 to 9) , we want the generator should generate all type of digits but sometimes our
generator got biased towards two to three digits and produce them only. Because of that
the discriminator also got optimized towards that particular digits only, and this state is
known as mode collapse. But this problem can be overcome by using DCGANs.

DCGAN
DCGAN uses convolutional and convolutional-transpose layers in the generator and
discriminator, respectively. It was proposed by Radford et. al. in the paper Unsupervised
Representation Learning With Deep Convolutional Generative Adversarial Networks. Here
the discriminator consists of strided convolution layers, batch normalization layers, and
LeakyRelu as activation function. It takes a 3x64x64 input image. The generator consists of
convolutional-transpose layers, batch normalization layers, and ReLU activations. The
output will be a 3x64x64 RGB image.
Architecture

The generator of the DCGAN architecture takes 100 uniform generated values using normal
distribution as an input. First, it changes the dimension to 4x4x1024 and performed a
fractionally stridden convolution 4 times with a stride of 1/2 (this means every time when
applied, it doubles the image dimension while reducing the number of output channels).
The generated output has dimensions of (64, 64, 3). There are some architectural changes
proposed in the generator such as the removal of all fully connected layers, and the use of
Batch Normalization which helps in stabilizing training. In this paper, the authors use ReLU
activation function in all layers of the generator, except for the output layers. We will be
implementing generator with similar guidelines but not completely the same architecture.
The role of the discriminator here is to determine that the image comes from either a real
dataset or a generator. The discriminator can be simply designed similar to a convolution
neural network that performs an image classification task. However, the authors of this
paper suggested some changes in the discriminator architecture. Instead of fully connected
layers, they used only strided-convolutions with LeakyReLU as an activation function, the
input of the generator is a single image from the dataset or generated image and the
output is a score that determines whether the image is real or generated.

Implementation
In this section we will be discussing the implementation of DCGAN in Keras, since our
dataset in the Fashion MNIST dataset, this dataset contains images of size (28, 28) of 1
color channel instead of (64, 64) of 3 color channels. So, we need to make some changes in
the architecture, we will be discussing these changes as we go along.
In the first step, we need to import the necessary classes such as TensorFlow, Keras,
matplotlib, etc. We will be using TensorFlow version 2. This version
of TensorFlow provides inbuilt support for the Keras library as its default High-level API.

# code % matplotlib inline


import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from IPython import display

# Check tensorflow version


print('Tensorflow version:', tf.__version__)

Now we load the fashion-MNIST dataset, the good thing is that the dataset can be
imported from tf.keras.datasets API. So, we don’t need to load datasets manually by
copying files. This dateset contains 60k training images and 10k test images for each
dimension (28, 28, 1). Since the value of each pixel is in the range (0, 255), we divide these
values by 255 to normalize it.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()


x_train = x_train.astype(np.float32) / 255.0
x_test = x_test.astype(np.float32) / 255.0
x_train.shape, x_test.shape

((60000, 28, 28), (10000, 28, 28))

Now in the next step, we will be visualizing some of the images from the Fashion-MNIST
dateset, we use matplotlib library for that.

# We plot first 25 images of training dataset


plt.figure(figsize =(10, 10))
for i in range(25):
plt.subplot(5, 5, i + 1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i], cmap = plt.cm.binary)
plt.show()
Now, we define training parameters such as batch size and divide the dataset into batches,
and fill those batches by randomly sampling the training data.

# code
batch_size = 32
# This dataset fills a buffer with buffer_size elements,
# then randomly samples elements from this buffer,
# replacing the selected elements with new elements.
def create_batch(x_train):
dataset = tf.data.Dataset.from_tensor_slices(x_train).shuffle(1000)
# Combines consecutive elements of this dataset into batches.

dataset = dataset.batch(batch_size, drop_remainder = True).prefetch(1)


# Creates a Dataset that prefetches elements from this dataset
return dataset

Now, we define the generator architecture, this generator architecture takes a vector of
size 100 and first reshape that into (7, 7, 128) vector and then, it applies transpose
convolution on that reshaped image in combination with batch normalization. The output
of this generator is a trained image of dimension (28, 28, 1).
#code
num_features = 100

generator = keras.models.Sequential([
keras.layers.Dense(7 * 7 * 128, input_shape =[num_features]),
keras.layers.Reshape([7, 7, 128]),
keras.layers.BatchNormalization(),
keras.layers.Conv2DTranspose(
64, (5, 5), (2, 2), padding ="same", activation ="selu"),
keras.layers.BatchNormalization(),
keras.layers.Conv2DTranspose(
1, (5, 5), (2, 2), padding ="same", activation ="tanh"),
])
generator.summary()

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 6272) 633472
_________________________________________________________________
reshape (Reshape) (None, 7, 7, 128) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 7, 7, 128) 512
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 14, 14, 64) 204864
_________________________________________________________________
batch_normalization_1 (Batch (None, 14, 14, 64) 256
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 28, 28, 1) 1601
=================================================================
Total params: 840, 705
Trainable params: 840, 321
Non-trainable params: 384
_________________________________________________________________
Now, we define discriminator architecture, the discriminator takes an image of size 28*28
with 1 color channel and outputs a scalar value representing an image from either dataset
or generated image.

 python3
discriminator = keras.models.Sequential([
keras.layers.Conv2D(64, (5, 5), (2, 2), padding
="same", input_shape =[28, 28, 1]),
keras.layers.LeakyReLU(0.2),
keras.layers.Dropout(0.3),
keras.layers.Conv2D(128, (5, 5), (2, 2), padding
="same"),
keras.layers.LeakyReLU(0.2),
keras.layers.Dropout(0.3),
keras.layers.Flatten(),
keras.layers.Dense(1, activation ='sigmoid')
])
discriminator.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 14, 14, 64) 1664
_________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 14, 14, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 7, 7, 128) 204928
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 7, 7, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 7, 7, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 6272) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 6273
=================================================================
Total params: 212, 865
Trainable params: 212, 865
Non-trainable params: 0
_________________________________________________________________
Now we need to compile our DCGAN model (combination of generator and discriminator),
we will first compile the discriminator and set its training to False, because we first want to
train the generator.

 python3
# compile discriminator using binary cross entropy loss and adam optimizer
discriminator.compile(loss ="binary_crossentropy", optimizer ="adam")
# make discriminator no-trainable as of now
discriminator.trainable = False
# Combine both generator and discriminator
gan = keras.models.Sequential([generator, discriminator])
# compile generator using binary cross entropy loss and adam optimizer

gan.compile(loss ="binary_crossentropy", optimizer ="adam")


Now, we define the training procedure for this GAN model, we will be using tqdm package
which we have imported earlier., this package helps in visualizing training.
 python3
seed = tf.random.normal(shape =[batch_size, 100])

def train_dcgan(gan, dataset, batch_size, num_features, epochs = 5):


generator, discriminator = gan.layers
for epoch in tqdm(range(epochs)):
print()
print("Epoch {}/{}".format(epoch + 1, epochs))

for X_batch in dataset:


# create a random noise of sizebatch_size * 100
# to passit into the generator
noise = tf.random.normal(shape =[batch_size, num_features])
generated_images = generator(noise)
# take batch of generated image and real image and
# use them to train the discriminator
X_fake_and_real = tf.concat([generated_images, X_batch], axis = 0)
y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)
discriminator.trainable = True
discriminator.train_on_batch(X_fake_and_real, y1)

# Here we will be training our GAN model, in this step


# we pass noise that uses generatortogenerate the image
# and pass it with labels as [1] So, it can fool the discriminator
noise = tf.random.normal(shape =[batch_size, num_features])
y2 = tf.constant([[1.]] * batch_size)
discriminator.trainable = False
gan.train_on_batch(noise, y2)

# generate images for the GIF as we go


generate_and_save_images(generator, epoch + 1, seed)

generate_and_save_images(generator, epochs, seed)


Now we define a function that generates and save images from generator (during training).
We will use these generated images to plot the GIF later.
 python3
# code
def generate_and_save_images(model, epoch, test_input):
predictions = model(test_input, training = False)

fig = plt.figure(figsize =(10, 10))

for i in range(25):
plt.subplot(5, 5, i + 1)
plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap ='binary')
plt.axis('off')

plt.savefig('image_epoch_{:04d}.png'.format(epoch))
Now, we need to train the model but before that, we also need to create batches of
training data and add a dimension that represents number of color maps.
 python3
# reshape to add a color map
x_train_dcgan = x_train.reshape(-1, 28, 28, 1) * 2. - 1.
# create batches
dataset = create_batch(x_train_dcgan)
# callthe training function with 10 epochs and record time %% time
train_dcgan(gan, dataset, batch_size, num_features, epochs = 10)
Now we will define a function that takes the saved images and convert them into GIF. We
use this function from here
 python3
import imageio
import glob

anim_file = 'dcgan_results.gif'

with imageio.get_writer(anim_file, mode ='I') as writer:


filenames = glob.glob('image*.png')
filenames = sorted(filenames)
last = -1
for i, filename in enumerate(filenames):
frame = 2*(i)
if round(frame) > round(last):
last = frame
else:
continue
image = imageio.imread(filename)
writer.append_data(image)
image = imageio.imread(filename)
writer.append_data(image)
display.Image(filename = anim_file)

ADDITIONAL
As an
example, let’s say X is a set of images of horse and Y is a set of images of zebra.

The goal is to learn a mapping function G: X-> Y such that images generated by G(X) are indistinguishable from the
image of Y. This objective is achieved using an Adversarial loss. This formulation not only learns G, but it also learns
an inverse mapping function F: Y->X and use cycle-consistency loss to enforce F(G(X)) = X and vice versa.
As an example, let’s say X is a set of images of horse and Y is a set of images of zebra.

The goal is to learn a mapping function G: X-> Y such that images generated by G(X) are indistinguishable from the
image of Y. This objective is achieved using an Adversarial loss. This formulation not only learns G, but it also learns
an inverse mapping function F: Y->X and use cycle-consistency loss to enforce F(G(X)) = X and vice versa.
As an example, let’s say X is a set of images of horse and Y is a set of images of zebra.

The goal is to learn a mapping function G: X-> Y such that images generated by G(X) are indistinguishable from the
image of Y. This objective is achieved using an Adversarial loss. This formulation not only learns G, but it also learns
an inverse mapping function F: Y->X and use cycle-consistency loss to enforce F(G(X)) = X and vice versa.
Let X be a set of images of horse and Y be a set of images of zebra.
The goal is to learn a mapping function G: X-> Y such that images generated by G(X) are indistinguishable
from the image of Y. This objective is achieved using an Adversarial loss. This formulation not only learns G,
but it also learns an inverse mapping function F: Y->X and use cycle-consistency loss to enforce F(G(X)) = X
and vice versa.
While training, 2 kinds of training observations are given as input.
 One set of observations have paired images {Xi, Yi} for i where each Xi has it’s Yi counterpart.
 The other set of observations has a set of images from X and another set of images from Y without any match
between Xi and Yi.

Fig5: The training procedure for CycleGAN.


As I have mentioned earlier there are 2 kinds of functions being learned, one of them is G which transforms X
to Y and the other one is F which transforms Y to X and it comprises two individual GAN models. So, you will
find 2 Discriminator function Dx, Dy.
As part of Adversarial formulation, there is one Discriminator Dx that classifies whether the transformed Y is
indistinguishable from Y. Similarly, there is one more Discriminator Dy that classifies whether is
indistinguishable from X.
Along with Adversarial Loss, CycleGAN uses cycle-consistency loss to enable training without paired images
and this additional loss help the model to minimize reconstruction loss F(G(x)) ≈ X and G(F(Y)) ≈ Y
So, All-in-all CycleGAN formulation comprises of 3 individual loss as follows:

and as part of optimization, the following loss function is optimized.


CycleGAN

# Generator G translates X -> Y


# Generator F translates Y -> X.
fake_y = generator_g(real_x, training=True)
cycled_x = generator_f(fake_y, training=True)

fake_x = generator_f(real_y, training=True)


cycled_y = generator_g(fake_x, training=True)

# same_x and same_y are used for identity loss.


same_x = generator_f(real_x, training=True)
same_y = generator_g(real_y, training=True)

disc_real_x = discriminator_x(real_x, training=True)


disc_real_y = discriminator_y(real_y, training=True)

disc_fake_x = discriminator_x(fake_x, training=True)


disc_fake_y = discriminator_y(fake_y, training=True)

# calculate the loss


gen_g_loss = generator_loss(disc_fake_y)
gen_f_loss = generator_loss(disc_fake_x)

total_cycle_loss = calc_cycle_loss(real_x, cycled_x) + \


calc_cycle_loss(real_y,cycled_y)

# Total generator loss = adversarial loss + cycle loss


total_gen_g_loss = gen_g_loss + total_cycle_loss + identity_loss(real_y, same_y)
total_gen_f_loss = gen_f_loss + total_cycle_loss + identity_loss(real_x, same_x)

disc_x_loss = discriminator_loss(disc_real_x, disc_fake_x)


disc_y_loss = discriminator_loss(disc_real_y, disc_fake_y)

Challenges with GAN

You might also like