6 - High Resolution Diffusive Model
6 - High Resolution Diffusive Model
Overview
Until recently, image synthesis tasks were performed using deep generative models, such as GANs,
VAEs, and auto-regressive models. However, when these models were trained, they showed prob-
lems in synthesizing high quality samples on difficult and high resolution datasets. For example,
GANs have unstable training, while auto-regressive models generally suffer from slow synthesis
speed. In this regard, diffusion methods have recently played an important role for high-resolution
image and sound generation, since compared to the other generative methods (GANs, VAEs) they
have stable training making them very promising. Diffusion models work by corrupting the training
data, progressively adding Gaussian noise, slowly erasing details in the data until it becomes pure
noise and then training a neural network to reverse this corruption process. Running this reverse
corruption process synthesizes the data from the pure noise by gradually removing it until a clean
sample is produced. A comparison of different high-resolution image generation models based on
diffusion models is shown in figure 1.
Diffusion Models
Artificial Intelligence (AI) is gaining more and more ground in the world of imaging, from creating
similar photographs, generating deepfakes, coloring and scaling to higher resolution. Lately, Google
has employed its AI to convert pixelated photographs to high-resolution images, i.e., this machine
learning model is able to take a photo without resolution to scale it with the goal of getting as
much detail as possible. There are several methods to get a photo to scale thanks to Artificial
Intelligence. For example, the mechanisms employed by Google are called SR3 and CDM. diffusion
models. Below you can find a list of different diffusion models that generate high resolution images.
SR3 Super-Resolution Imaging via Repeated Refinement or also known as SR3 is a method that
takes a low resolution image as input and builds a high quality photograph out of a lot of noise.
B–1
The machine employs a process that constantly adds this defect until only this drawback is visible,
thus reversing the process.
CDM The CDM (cascade of multiple diffusion models) is a class conditional diffusion tool trained
on ImageNet data to generate high-resolution natural images. This mechanism starts with a stan-
dard model at the lowest quality and is followed by a sequence of models at high resolution where
details can be added to improve it. The result of this tool is to improve the photographs through a
direct application, and also serves to improve the resolution of those images taken with the mobile
camera.
Big-GAN You can find more information about this model in the document reference in [1].
B–2
Adaptive feature modification layers:
Paper: https://arxiv.org/pdf/1904.08118.pdf.
Code: https://github.com/hejingwenhejingwen/AdaFM
Tasks:
• Study diffusion methods (not include SR3) for generating high-resolution images.
• Implement another alternative method also from the suggested ones (GANs, VAE, Big-GAN)
and evaluate the results for different images of the ImageNet dataset considering the metrics
indicated in the papers.
Supervisors
The project supervisors are:
References
B–3