Project Paper

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Image Colorization using Deep Learning

Aniket More, Siddhi Shewale, Pal Jain, Prathamesh Kharade


{ animore14, siddhi.shewale12345, paljain02, prathameshkharade2003 }@gmail.com

MIT ADT University

Abstract: In this study, we set out to address the challenge of infusing color into
black-and-white photographs. Previous methods required a lot of human effort
and often resulted in dull outcomes. We developed an automatic system trained to
predict and apply colors to photos, making them look vibrant and lifelike. Our
system relies on deep learning, and we trained it using a large dataset of color
images. We also discovered that colorization can be a valuable tool for helping
computers better understand images, hinting at its broader applications beyond
just adding color to photos.

1 Introduction: Grayscale photographs pose an interesting challenge: restoring their colors


when much of the color information is missing. However, a closer look reveals that for many regions in
these images, semantic cues are available.While these cues don't hold universally, our goal in this paper
isn't to precisely recover the true colors but to create plausible colorizations that could convincingly fool
a human observer. To achieve this, we aim to model the statistical relationships between the semantics
and textures in grayscale images and their color counterparts, producing visually compelling results.

The cool thing about predicting colors is that we don't need expensive training data. We can use any
color photo by using the brightness information as input and the color details as the guide. Some past
efforts used big datasets to train computer models to predict colors, but their results looked a bit washed
out. The reason is that they used methods that prefer playing it safe and sticking with common colors.
But we did something different.
We used a method that's tailor-made for the colorization task. You see, color predictions can have lots
of possibilities. For instance, an apple can be red, green, or yellow, but it's rarely blue or orange. To
handle these many possibilities, we make our model predict a range of colors for each pixel. We also
put more focus on the rare colors during training. This way, our model can use the full spectrum of
colors it learned from the large dataset. Finally, we get the final color by finding the average of all the
possibilities. The result? Colorizations that look more vibrant and real compared to previous methods.

Our approach leverages the remarkable power of deep learning, employing a large dataset of color
images to train the model. We embarked on this journey with a clear objective: to create colorizations
that can convincingly fool human observers, not necessarily by perfectly reproducing ground truth
colors but by producing plausible and lifelike results. To evaluate the effectiveness of our model, we
designed a unique "colorization Turing test" to gauge the perceptual realism of the generated images,
challenging human participants to differentiate between real and synthesized colors.
Furthermore, we demonstrate that our colorizations are not only visually appealing but also practically
useful for downstream tasks such as object classification. Our CNN model, operating as a self-
supervised representation learner, learns features by using the raw image data as its source of
supervision, introducing a novel approach that offers promising results. By exploring colorization as a
self-supervised learning technique, our project reveals the potential for the model to generalize
effectively and achieve state-of-the-art performance on various metrics.
In the following sections, we delve into the architecture and training of our CNN model, the innovative
evaluation method, and the implications of colorization as a form of self-supervised representation
learning.

2 Prior Work on Colorization:


Preprocessing and Data Preparation for Image Colorization

Image colorization is a complex task that requires careful preprocessing and data preparation to ensure
the effectiveness of the subsequent colorization process. In this section, we detail the steps taken before
applying colorization models in our research, following a research paper-style format.

1. Data Collection and Selection:


- The foundation of our colorization project lies in the data collection process. We gathered a diverse
dataset of grayscale images for experimentation.
- Image selection involved considerations for image quality, content diversity, and relevance to the
project's objectives. High-quality images with clear object boundaries and rich details were prioritized.

2. Data Augmentation:
- To enhance the model's ability to generalize, we applied data augmentation techniques. This
involved generating variations of the grayscale images, such as rotations, flips, and minor adjustments
to brightness and contrast.
- Data augmentation helped expand the dataset and exposed the model to a wider range of image
characteristics.

3. Image Preprocessing:
- Grayscale images, typically represented as single-channel images, were converted to the LAB color
space, where the L channel represents the brightness information, and the AB channels contain the color
information.
- During this preprocessing step, we also normalized the image values to a consistent scale, often
ranging from 0 to 1.

4. Data Splitting:
- We divided the dataset into training, validation, and test sets. The training set was used for model
training, the validation set to fine-tune hyperparameters and monitor training progress, and the test set to
evaluate the model's performance.

5. Model-Specific Preparations:
- Different colorization models may have specific input requirements. We ensured that the data format
matched the input expectations of our chosen Convolutional Neural Network (CNN) model.

6. Noise Reduction:
- Grayscale images can contain noise, which may negatively impact the quality of colorization. We
applied noise reduction techniques, such as Gaussian blurring, to clean the input images.

7. Ground Truth Generation:


- For evaluation purposes, we generated ground truth color images corresponding to the grayscale
images in the dataset. This step was essential for quantitatively assessing the colorization model's
performance.

In this section, we have provided an overview of the critical steps taken before applying our colorization
models. These steps lay the groundwork for effective colorization and help ensure the reliability and
quality of the results achieved in the later stages of the research.

3 Approach:
Our image coloring method is based on the use of convolutional neural network (CNN) architecture,
which is the key to our coloring process. Below we provide an overview of the approach, illustrating the
main points in an approximately two-page document.

1. Convolutional Neural Network (CNN) Architecture:


- The core of our approach is the implementation of a powerful CNN architecture. This model forms the
basis for converting grayscale input images into distributions containing color values. CNNs and their
layers and configurations play an important role in capturing complex patterns, features, and
dependencies in image data, making it possible to perform color rendering.

- Our choice of CNN architecture is not arbitrary, but is profound in its ability to extract hierarchical
representation from grayscale inputs, making it suitable for the study of image colorization. This
openness ensures the clarity and accessibility of the model to a wide range of researchers.

1.1 Loss Function:


● Formula: MSE (Mean Squared Error)
● How it's applied: In the objective function during training.
● Mathematical Explanation: Loss = Σ( (I_colorized - I_ground_truth)^2 ), where I_colorized
and I_ground_truth are pixel values of colorized and ground truth images.

1.2 Convolution Operation:


● Formula: (I * K)(x, y) = Σ(Σ(I(x+i, y+j) * K(i, j))) for i, j in the kernel size.
● How it's applied: Convolutional layers in CNNs.
● Mathematical Explanation: I is the input image, K is the kernel, and (x, y) is the position in the
output feature map.

1.3 Probability Distribution for Color Predictions:


○ Formula: P(color) ~ N(μ, σ^2)
○ How it's applied: To represent the distribution of pixel color.
○ Mathematical Explanation: Use Gaussian (Normal) distribution to model the likelihood of
colors. μ represents the mean color, and σ^2 is the variance.

1.4 Structural Similarity Index (SSI):


● Formula: SSI(I1, I2) = (2 * μI1 * μI2 + C1) * (2 * σI1I2 + C2) / (μI1^2 + μI2^2 + C1) *
(σI1^2 + σI2^2 + C2)
● How it's applied: As an evaluation metric for colorized images.
● Mathematical Explanation: SSI measures the structural similarity between two images I1 and
I2, where μ represents the mean, σ is the standard deviation, and C1 and C2 are constants to avoid
instability.

1.5 Peak Signal-to-Noise Ratio (PSNR):

○ Formula: PSNR(I1, I2) = 10 * log10(MAX^2 / MSE)


○ How it's applied: As an evaluation metric for image quality.
○ Mathematical Explanation: PSNR quantifies the relationship between the highest achievable
power of an image and the influence of disruptive noise, with MAX representing the maximum
attainable pixel value.

1.6 Multimodal Predictions:

○ Formula: P(color|pixel) is a distribution over pixel color.


○ How it's applied: In the color prediction step to handle the multimodal nature of colorization.
○ Mathematical Explanation: Instead of a single color prediction, a distribution of pixel color is
determined.

1.7 Cross-Channel Encoder:

○ Formula: Explain how the encoder operates mathematically.


○ How it's applied: In the self-supervised feature learning process.
○ Mathematical Explanation: Elaborate on the process by which the cross-channel encoder
acquires representations through the prediction of color channels from grayscale input images.

2. Using template text:


- We took advantage of the power of the "colorization_deploy_v2.prototxt" template in our coloring
work. These guidelines play a crucial role within our system, establishing the network's design and
configuration.
- We created a solid foundation for the CNN to work using the "colorization_deploy_v2.prototxt"
template. The model simplifies the shading process by providing predefined models and settings that
drive network prediction.
- The combination of these models makes it easier to work with color, allowing our CNN to use its
design to make color predictions. This model is useful for flawlessly converting grayscale images into
beautiful and realistic colors.

In this section, we will examine the image colorization method in depth, highlight the main role of the
CNN architecture, and use the "colorization_deploy_v2.prototxt" template to achieve our research
goals. Together, these elements increase the efficiency and reliability of our paint systems.

3. Mathematical Formulation and Solution for Image Colorization

3.1 Problem Statement

In the context of image colorization, we encounter a challenging problem—assigning colors to


grayscale images. This problem can be aptly framed as a mathematical optimization task, where our
primary objective is to identify the optimal color assignment for each pixel, thereby effecting the
transformation of grayscale imagery into vivid, colored representations.

3.2 The Objective Function

Central to this endeavor is the objective function, a mathematical measure that quantifies the quality of
colorization for each pixel. To be precise, the color error for pixel i, i is defined as the squared
difference between the assigned color and the true color for that pixel:

Objective Function=ColorError(i)=(AssignedColor(i)−TrueColor(i))
Herein:

● AssignedColor(i)
● AssignedColor(i) signifies the color assigned to pixel i
● i.TrueColor(i)
● TrueColor(i) characterizes the genuine color of the pixel as observed in the original image.
TRAINED IMAGES:

3.3 The Solution: An Optimization Paradigm

The solution to the colorization problem is predicated on an optimization approach, outlined as follows:

Initialization: Commencing the process, we initialize the color assignments for each pixel. Every pixel
is endowed with an initial color to kick-start the procedure.

Gradient Computation: We embark on the computation of gradients for the objective function
concerning the assigned colors for each pixel. These gradients, akin to guides, direct the adjustments
made to the pixel colors.

INPUT/OUTPUT IMAGES:
Color Updates: Progressing forward, the colors assigned to each pixel are updated, moving contrary to
the direction indicated by the computed gradients. This iterative mechanism serves to minimize the
objective function, consequently optimizing the pixel color assignments.

Convergence: The iterative process—comprising gradient computation and color updates—is reiterated
until the assigned colors converge to values that faithfully replicate the true colors, mirroring the hues
present in the original image.

3.4 A Demonstrative Instance

In order to elucidate this mathematical problem and its corresponding solution, we consider a simplified
illustrative scenario. An initial grayscale image, representing a single pixel, is handed to us. Our
objective pertains to the colorization of this singular-pixel image. At the outset, we assign the pixel a
color, opting for blue. Yet, the genuine color of the pixel, as observed in the original image, is red.
Employing the principles of gradient descent, we proceed to iteratively fine-tune the assigned color,
aiming to mitigate the color error and eventually achieving a precise colorization.

This illustrative example serves as a foundational concept underpinning the broader domain of image
colorization, with optimization as its core component.

In practice, the process of colorization extends to encompass all pixels within the image. The objective
function matures into a more multifaceted measure of colorization quality, capturing the intricacies of
real-world images.

4 Evaluating Colorized Images:


Evaluating colorized images constitutes a vital step in our research, allowing us to measure the quality
and realism of our colorization process. In this section, we describe the techniques and criteria we use to
assess the effectiveness of our approach in producing captivating and lifelike colorized photos.

Assessing Human Perception:

At the heart of our evaluation is the "colorization Turing test." This innovative method focuses on
directly evaluating how real and convincing our colorized images appear to human observers. We
present both actual and colorized images to human participants and ask them to identify which ones are
computer-generated. This approach prioritizes the perceptual realism of our colorizations, aligning with
our goal of creating images that can genuinely deceive human viewers.
- The results of this test provide invaluable insights. They showcase our algorithm's ability to produce
colorizations that are nearly indistinguishable from real photos.

Usefulness for Practical Applications:


Beyond assessing human perception, we investigate the practical utility of our colorized images.
Specifically, we evaluate their effectiveness in object classification using a commonly used VGG
network. This evaluation demonstrates that our colorizations are not only visually appealing but also
valuable for real-world tasks like object recognition.

Self-Supervised Feature Learning:


Our research extends into the domain of self-supervised feature learning. In this approach, we use raw
data as a source of guidance. Our model operates as a "cross-channel encoder," taking advantage of the
many possible colors for each object. We explore how well our model generalizes to various tasks and
compare it to previous and concurrent self-supervision methods.
- Impressively, our method achieves top-tier performance on multiple metrics, highlighting its
potential as a valuable tool for self-supervised representation learning.

5 Conclusion
In our research, we tackled the challenge of turning black-and-white photos into colorful ones. We used
advanced technology like Convolutional Neural Networks (CNNs) to make this happen. Our study
involved preparing data, training models, and creating new ways to predict colors. The heart of our
work, the CNN architecture, helped us find patterns in grayscale images and played a key role in
colorizing them. Data preparation was important for creating accurate color reference images. We also
introduced a unique test, the 'colorization Turing test,' to check if our colorized images looked real. We
found that our colorized images could be useful for tasks like object recognition, and our approach had
broader applications in computer vision and machine learning. Looking ahead, we see opportunities to
improve CNN designs, expand training data, and explore more advanced methods. Our project,
grounded in math and innovative techniques, points to a future where turning black-and-white pictures
into colorful ones is easier and more convincing.

Results

You might also like