Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

"Deep drive" sounds like an exciting journey into the world of deep learning!

Let's take a dive


into the key aspects of deep learning:

1. Definition: Deep learning is a subset of machine learning that involves neural networks with
multiple layers (deep neural networks). These networks, inspired by the structure and function of
the human brain, are capable of learning hierarchical representations of data.

2. Neural Networks: At the core of deep learning are artificial neural networks. These networks
consist of interconnected nodes (neurons) organized into layers. Input layers receive data, hidden
layers process it, and output layers produce the final results.

3. Deep Neural Networks: "Deep" refers to the depth of these networks, meaning they have
multiple hidden layers. The depth allows them to learn intricate features and patterns from
complex data, making them powerful for tasks like image recognition, natural language
processing, and more.

4. Training Process: Deep learning models learn through a training process where they adjust
their internal parameters (weights and biases) based on a labeled dataset. This process involves
forward and backward passes, with optimization algorithms like gradient descent fine-tuning the
model.

What is Deep Learning?

Deep learning works with artificial neural networks, which are designed to imitate how humans
think and learn.

Deep Learning is a part of machine learning that deals with algorithms inspired by the structure
and function of the human brain. It uses artificial neural networks to build intelligent models and
solve complex problems.

What is a Neural Network?

A neural network is a combination of advanced systems and hardware designed to operate and
function like a human brain. It consists of different layers like an input layer, hidden layer, and
output layer. It can perform tasks like a translation of texts, identification of faces, speech
recognition, controlling robots, and a lot more.

Each layer consists of nodes. The connections between the nodes depict the flow of information
from one layer to the next. The neurons are connected with the help of weights. It then feeds the
inputs to a neuron. After this, it processes the data and gives an output.

The following is an example of a basic neural network.

1
A neural network has three main layers.

● Input Layer: This layer is responsible for accepting the inputs.


● Hidden Layer: This layer processes the input data to find out hidden information and
performs feature extraction.
● Output Layer: This layer gives the desired output.
Basic Components of a Neural Network:

1. Neurons (Nodes):

- The basic units of a neural network are neurons. Each neuron receives one or more inputs,
performs a computation, and produces an output.

2. Layers:

- Neurons are organized into layers. The input layer receives the initial data, the output layer
produces the final result, and there can be one or more hidden layers in between.

3. Weights and Biases:

- Each connection between neurons has an associated weight, representing the strength of that
connection. Additionally, each neuron has a bias, allowing it to influence the output.

Feedforward Process:

The process of information flow through a neural network is called feedforward. Here's how it
works:

1. Input Layer:

- The input layer receives the initial data or features. Each neuron in this layer represents a
feature of the input.

2. Weighted Sum:

- Each connection between neurons is associated with a weight. The input values are multiplied
by their respective weights, and these weighted values are summed up for each neuron.

3. Activation Function:

2
- The sum is then passed through an activation function, which introduces non-linearity to the
model. This helps the neural network learn complex patterns

4. Hidden Layers:

- The process is repeated for each hidden layer. The output from one layer serves as the input
to the next, creating hierarchical representations of the data.

5. Output Layer:

- The final layer produces the network's output. The activation function used in the output layer
depends on the nature of the task (e.g., softmax for classification, linear for regression).

Training Process (Backpropagation):

1. Loss Function:

- A loss function measures the difference between the predicted output and the actual target.
The goal during training is to minimize this loss.

2. Backpropagation:

- Backpropagation is a process used to update the weights and biases of the network to
minimize the loss. It involves calculating the gradient of the loss with respect to the weights and
biases and adjusting them accordingly using optimization algorithms like gradient descent.

3. Epochs:

- The training process is repeated over multiple iterations called epochs. During each epoch,
the entire dataset is passed through the network, and weights are updated to improve
performance.

Model Evaluation:

After training, the neural network can be evaluated on new, unseen data to assess its
generalization ability. Common metrics for evaluation include accuracy, precision, recall, and
mean squared error, depending on the task.

Applications:

Neural networks are applied to a wide range of tasks, including image recognition, natural
language processing, speech recognition, autonomous vehicles, and many others. The
architecture and complexity of a neural network depend on the specific task it is designed to
solve.

Activation Function

The following operations are performed within each neuron,

● The product of each input value and the weight of the channel it has passed over is
found.
3
● It computes the sum of the weighted products. We call this the weighted sum.
● It adds a bias unique to the neuron to the weighted sum.
● We then subject the final sum to a particular function.

How Do Neural Networks work?

● A network comprises layers of neurons. It associates each neuron with a random


number called the bias.
● Neurons present in each layer transmit information to neurons of the next layer over
channels.
● These channels are associated with values called weights.
● The weights, along with the biases, determine the information that is passed over from
neuron to neuron.
● Neurons from each layer transmit information to neurons of the next layer.
● The output layer gives a predicted output.
Let’s go ahead and build a neural network to predict bike prices based on a few of its
features.

● The input features such as cc, mileage, and abs are fed to the input layer.
● The hidden layers help in improving output accuracy.
● Each of the connections has a weight assigned to it. The neuron takes a subset of the
inputs and processes it.
----> x1*w1 + x2*w2 + b1

----> Φ(x1* w1 + x2*w2 + b1), where Φ is an activation function.

● The information reaching the neuron’s in the hidden layer is subjected to the respective
activation function.
● It sends the processed information to the output layer over the weighted channels.
● It compares the predicted output to the original output value.
● A cost function determines the error in prediction and reports it back to the neural
network. We call this backpropagation.
● The weights are adjusted to minimize the error.
● We now train the network using the new weights.
● Once again, it determines the cost, and it continues backpropagation until the cost
cannot be reduced any further.
● We consider our neural network trained when the value for the cost function is
minimum.

4
What Are Optimizers in Deep Learning?

Optimizers are algorithms that adjust the model’s parameters during training to minimize a loss
function. They enable neural networks to learn from data by iteratively updating weights and
biases. Common optimizers include Stochastic Gradient Descent (SGD), Adam, and RMSprop.
Each optimizer has specific update rules, learning rates, and momentum to find optimal model
parameters for improved performance.

An optimizer is a function or an algorithm that adjusts the attributes of the neural network, such
as weights and learning rates. Thus, it helps in reducing the overall loss and improving accuracy.
The problem of choosing the right weights for the model is a daunting task, as a deep learning
model generally consists of millions of parameters. You can use different optimizers in the
machine learning model to change your weights and learning rate. However, choosing the best
optimizer depends upon the application. various deep-learning optimizers, such as Gradient
Descent, Stochastic Gradient Descent, Stochastic Gradient descent with momentum, Mini-Batch
Gradient Descent, Adagrad, RMSProp, AdaDelta, and Adam. We will study optimizers in deep
later in unit-5

Spatial Transformer Network


Spatial Transformer Network is a module that enhances the spatial invariance of neural
networks by allowing them to learn and apply spatial transformations to input data. This
capability contributes to improved performance in tasks where variations in object position,
orientation, and scale.

Spatial Transformers can be inserted before/after conventional layers. Can also have multiple
Spatial Transformers at the same level.

5
1. Spatial Transformers allow dynamic, conditional cropping and warping of images/feature
maps
2. Can be constrained and used as very fast attention mechanism.
3. Spatial Transformer Networks localise and rectify objects automatically.

End-to-end optimized image compression


End-to-end optimized image compression in deep learning involves leveraging neural
networks to learn an optimal representation of images for efficient compression and
reconstruction. This approach is in contrast to traditional image compression methods, which
often rely on handcrafted algorithms. Here's an overview of the key components and
considerations for end-to-end optimized image compression using deep learning:

Generative Adversarial Networks (GANs):

- GANs perform unsupervised learning tasks in machine learning. It consists of 2 models that
automatically discover and learn the patterns in input data. The two models are known as
Generator and Discriminator.

GANs consist of a generator and a discriminator, and they can be used for image compression.
The generator generates compressed images, and the discriminator evaluates the realism of the
reconstructed images. This adversarial training can lead to improved compression quality.

We can improve GAN by turning our attention in balancing the loss between the generator and
the discriminator.

Shown below is an example of a GAN. There is a database that has real 100 rupee notes. The
generator neural network generates fake 100 rupee notes. The discriminator network will help
identify the real and fake notes.

6
What is a Generator?
- The generator is a neural network tasked with generating new data samples that resemble the
training data. In the context of image generation, for example, the generator takes random noise
as input and produces synthetic images as output. The goal of the generator is to create realistic
samples that are indistinguishable from real data.

- The generator is trained to transform random noise into samples that are convincing enough
to deceive the discriminator. It learns to capture the underlying patterns and structures present in
the training data.

A Generator in GANs is a neural network that creates fake data to be trained on the
discriminator.

The main aim of the Generator is to make the discriminator classify its output as real. The part of
the GAN that trains the Generator includes:

• noisy input vector


• generator network, which transforms the random input into a data instance
• discriminator network, which classifies the generated data
• generator loss, which penalizes the Generator for failing to dolt the discriminator
The backpropagation method is used to adjust each weight in the right direction by calculating
the weight's impact on the output. It is also used to obtain gradients and these gradients can help
change the generator weights.

7
The Discriminator is a neural network that identifies real data from the fake data created by the
Generator. The discriminator's training data comes from different two sources:

The real data instances, such as real pictures of birds, humans, currency notes, etc., are used by
the Discriminator as positive samples during training.

The fake data instances created by the Generator are used as negative examples during the
training process.

While training the discriminator, it connects to two loss functions. During discriminator training,
the discriminator ignores the generator loss and just uses the discriminator loss.

In the process of training the discriminator, the discriminator classifies both real data and fake
data from the generator. The discriminator loss penalizes the discriminator for misclassifying a
real data instance as fake or a fake data instance as real.

The discriminator updates its weights through backpropagation from the discriminator loss
through the discriminator network.

- The discriminator is another neural network that acts as a binary classifier. It is trained to
distinguish between real data samples from the training set and fake samples generated by the
generator. Essentially, it tries to determine whether an input data sample is genuine or synthetic.
8
- The discriminator is simultaneously trained to correctly classify real samples as real and
generated samples as fake. Its objective is to become a better detector over time, making it
challenging for the generator to produce samples that can fool the discriminator.

How Do GANs Work?

GANs consists of two neural networks. There is a Generator G(x) and a Discriminator D(x).
Both of them play an adversarial game. The generator's aim is to fool the discriminator by
producing data that are similar to those in the training set. The discriminator will try not to be
fooled by identifying fake data from real data. Both of them work simultaneously to learn and
train complex data like audio, video, or image files.

The Generator network takes a sample and generates a fake sample of data. The Generator is
trained to increase the Discriminator network's probability of making mistakes.

Above is an example of a GAN trying to identify if the 100 rupee notes are real or fake. So, first,
a noise vector or the input vector is fed to the Generator network. The generator creates fake 100
rupee notes. The real images of 100 rupee notes stored in a database are passed to the
discriminator along with the fake notes. The Discriminator then identifies the notes as classifying
them as real or fake.

We train the model, calculate the loss function at the end of the discriminator network, and
backpropagate the loss into both discriminator and generator models.

9
Steps for Training GAN (techniques to training GAN)
1. Define the problem
2. Choose the architecture of GAN
3. Train discriminator on real data

4. Generate fake inputs for the generator


5. Train discriminator on fake data
6. Train generator with the output of the discriminator

Different types of Neural Networks

1. Classical Neural Networks (also known as Fully Connected Neural Networks or Multi-Layer
Perceptrons) and Convolutional Neural Networks (CNNs) are both types of artificial neural
networks, but they differ in their architectures and purposes.

Architecture:

Classical Neural Networks (Fully Connected Networks): In classical neural networks, each
neuron in one layer is connected to every neuron in the next layer. These networks are
characterized by fully connected layers, and each connection has a weight associated with it.

Parameter Sharing:- Classical Neural Networks: Parameters (weights and biases) are unique
for each connection in a fully connected layer. This results in a large number of parameters,
making these networks prone to overfitting, especially when dealing with high-dimensional data
like images.

Applications:- Typically used for general-purpose machine learning tasks, such as tabular data,
speech recognition, and simple image classification.

10
Convolutional Neural Network (CNN)
Convolutional Neural Networks (CNNs): CNNs, on the other hand, have a more specialized
architecture. They consist of convolutional layers, pooling layers, and fully connected layers.
Convolutional layers use filters to convolve over the input data, capturing local patterns.

Parameter sharing : CNNs use parameter sharing through the convolutional operation. Instead
of having unique parameters for each connection, a set of shared filters is used across the entire
input. This reduces the number of parameters and helps the network generalize better to spatial
patterns.

Applications: Specialized for tasks involving grid-like data, such as image and video analysis,
object detection, and image classification. CNNs have been particularly successful in computer
vision tasks.

Recurrent Neural Network (RNN)

Recurrent Neural Network also known as (RNN) that works better than a simple neural
network when data is sequential like Time-Series data and text data. These parameters are
updated using Backpropagation. RNN works on sequential data here we use an updated
backpropagation which is known as Backpropagation through time.

Recurrent Neural Network (RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step. In traditional neural networks, all the inputs and
outputs are independent of each other. Still, in cases when it is required to predict the next word
of a sentence, the previous words are required and hence there is a need to remember the
previous words.

11
RNN came into existence, which solved this issue with the help of a Hidden Layer. The main
and most important feature of RNN is its Hidden state, which remembers some information
about a sequence. The state is also referred to as Memory State since it remembers the previous
input to the network.

It uses the same parameters for each input as it performs the same task on all the inputs or hidden
layers to produce the output. This reduces the complexity of parameters, unlike other neural
networks.
Advantages
1. An RNN remembers each and every piece of information through time. It is useful in time
series prediction only because of the feature to remember previous inputs as well. This is
called Long Short Term Memory.
2. Recurrent neural networks are even used with convolutional layers to extend the effective
pixel neighborhood.
Disadvantages
1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an activation function.

Applications of Recurrent Neural Network


1. Language Modelling and Generating Text
2. Speech Recognition
3. Machine Translation
4. Image Recognition, Face detection
5. Time series Forecasting

1 CNN stands for Convolutional Neural Network. RNN stands for Recurrent Neural
Network.

2 CNN is considered to be more potent than RNN. RNN includes less feature
compatibility when compared to
CNN.

3 CNN is ideal for images and video processing. RNN is ideal for text and speech
Analysis.

4 It is suitable for spatial data like images. RNN is used for temporal data, also
called sequential data.

5 The network takes fixed-size inputs and generates fixed size outputs. RNN can handle arbitrary input/
output lengths.

6 CNN is a type of feed-forward artificial neural network with RNN, unlike feed-forward neural
variations of multilayer perceptron's designed to use minimal networks- can use their internal
amounts of preprocessing. memory to process arbitrary
sequences of inputs.

12
7 CNN's use of connectivity patterns between the neurons. CNN is Recurrent neural networks use time-
affected by the organization of the animal visual cortex, whose series information- what a user spoke
individual neurons are arranged in such a way that they can respond last would impact what he will speak
to overlapping regions in the visual field. next.

Learning Algorithm

The neural network learns by adjusting its weights and bias (threshold) iteratively to yield the
desired output. These are also called free parameters. For learning to take place, the Neural
Network is trained first. The training is performed using a defined set of rules, also known as the
learning algorithm.

supervised-vs-unsupervised-learning

What is supervised learning?

Supervised learning is a machine learning approach that’s defined by its use of labeled datasets.
These datasets are designed to train or “supervise” algorithms into classifying data or predicting
outcomes accurately. Using labeled inputs and outputs, the model can measure its accuracy and
learn over time.

Supervised learning can be separated into two types of problems when data mining:
classification and regression.

Classification problems use an algorithm to accurately assign test data into specific categories,
such as separating apples from oranges. Or, in the real world, supervised learning algorithms can
be used to classify spam in a separate folder from your inbox. Linear classifiers, support vector
machines, decision trees and random forest are all common types of classification algorithms.

Regression is another type of supervised learning method that uses an algorithm to understand
the relationship between dependent and independent variables. Regression models are helpful for
predicting numerical values based on different data points, such as sales revenue projections for
a given business. Some popular regression algorithms are linear regression, logistic regression
and polynomial regression.

13
What is unsupervised learning?

Unsupervised learning uses machine learning algorithms to analyze and cluster unlabeled data
sets. These algorithms discover hidden patterns in data without the need for human intervention
(hence, they are “unsupervised”).

Unsupervised learning models are used for three main tasks: clustering, association and
dimensionality reduction:

1. Clustering is a data mining technique for grouping unlabeled data based on their similarities
or differences. For example, K-means clustering algorithms assign similar data points into
groups, where the K value represents the size of the grouping and granularity. This technique is
helpful for market segmentation, image compression, etc.

2. Association is another type of unsupervised learning method that uses different rules to find
relationships between variables in a given dataset. These methods are frequently used for market
basket analysis and recommendation engines, along the lines of “Customers Who Bought This
Item Also Bought” recommendations.

3. Dimensionality reduction is a learning technique used when the number of features (or
dimensions) in a given dataset is too high. It reduces the number of data inputs to a manageable
size while also preserving the data integrity. Often, this technique is used in the preprocessing
data stage, such as when autoencoders remove noise from visual data to improve picture quality.

The main difference between supervised and unsupervised learning: Labeled data

The main distinction between the two approaches is the use of labeled datasets. To put it simply,
supervised learning uses labeled input and output data, while an unsupervised learning algorithm
does not.

In supervised learning, the algorithm “learns” from the training dataset by iteratively making
predictions on the data and adjusting for the correct answer. While supervised learning models
tend to be more accurate than unsupervised learning models, they require upfront human
intervention to label the data appropriately.

For example, a supervised learning model can predict how long your commute will be based on
the time of day, weather conditions and so on. But first, you’ll have to train it to know that rainy
weather extends the driving time.

Major differences between Supervised and Unsupervised Learning


Supervised Learning Unsupervised Learning

Supervised Learning can be used for 2 different types of problems Unsupervised Learning can be used for 2 different
i.e. regression and classification types of problems i.e. clustering and association.

14
Input Data is provided to the model along with the output in the Only input data is provided in Unsupervised
Supervised Learning. Learning.

Output is predicted by the Supervised Learning. Hidden patterns in the data can be found using the
unsupervised learning model.

Labeled data is used to train supervised learning algorithms. Unlabeled data is used to train unsupervised
learning algorithms.

Accurate results are produced using a supervised learning model. The accuracy of results produced are less in
unsupervised learning models.

Training the model to predict output when a new data is provided Finding useful insights, hidden patterns from the
is the objective of Supervised Learning. unknown dataset is the objective of the
unsupervised learning.

Supervised Learning includes various algorithms such as Bayesian Unsupervised Learning includes various
Logic, Decision Tree, Logistic Regression, Linear Regression, algorithms like KNN, Apriori Algorithm, and
Multi-class Classification, Support Vector Machine etc. Clustering.

To assess whether right output is being predicted, direct feedback No feedback will be taken by the unsupervised
is accepted by the Supervised Learning Model. learning model.

In Supervised Learning, for right prediction of output, the model Unsupervised Learning has more resemblance to
has to be trained for each data, hence Supervised Learning does Artificial Intelligence, as it keeps learning new
not have close resemblance to Artificial Intelligence. things with more experience.

Number of classes are known in Supervised Learning. Number of classes are not known in Unsupervised
Learning

In scenarios where one is aware of output and input data, In the scenarios where one is not aware of output
supervised learning can be used. data, but is only aware of the input data then
Unsupervised Learning could be used.

Computational Complexity is very complex in Supervised There is less computational complexity in


Learning compared to Unsupervised Learning Unsupervised Learning when compared to
Supervised Learning.

Supervised Learning will use off-line analysis Unsupervised Learning uses Real time analysis of
data.

Some of the applications of Supervised Learning are Spam Some of the applications of Unsupervised
detection, handwriting detection, pattern recognition, speech Learning are detecting fraudulent transactions,
recognition etc. data preprocessing etc.

15
Step-by-step procedure to choose correct machine learning algorithm

1. Understand Your Problem : Begin by gaining a deep understanding on the problem you are
trying to solve. What is your goal? What is the problem all about classification, regression ,
clustering, or something else? What kind of data you are working with?
2. Process the Data: Ensure that your data is in the right format for your chosen algorithm.
Process and prepare your data by cleaning, Clustering, Regression.
3. Exploration of Data: Conduct data analysis to gain insights into your data. Visualizations
and statistics helps you to understand the relationships within your data.
4. Metrics Evaluation: Decide on the metrics that will measure the success of model. You
must choose the metric that should align with your problem.
5. Simple models: One should begin with the simple easy-to-learn algorithms. For
classification, try regression, decision tree. Simple model provides a baseline for comparison.
6. Use Multiple Algorithms: Try to use multiple algorithms to check that one performs on
your dataset. That may include:
● Decision Trees
● Gradient Boosting (XGBoost, LightGBM)
● Random Forest
● k-Neasrest Neighbors (KNN)
● Naive Bayes
● Support Vector Machines (SVM)
● Neural Networks (Deep Learning)

7. Hyperparameter Tuning: Grid Search and Random Search can helps with adjusting
parameters choose algorithm that find best combination.
8. Cross- Validation: Use cross- validation to get assess the performance of your models. This
helps prevent overfiting .
9. Comparing Results: Evaluate the models’s performance by using the metrics evaluation.
Compare their performance and choose that best one that align with problem’s goal.
10. Consider Model Complexity: Balance complexity of model and their performance.
Compare their performance and choose that one best algorithm to generalize better.

Deep Learning Applications:-


1. Virtual Assistants:-

It understand natural language voice commands and complete tasks for the user. Amazon Alexa
and Google Assistant are typical examples of virtual assistants.

2. Chatbots

A chatbot is an AI application to chat online via text or text-to-speech. It is capable of


communicating and performing actions similar to a human. Chatbots are used a lot in customer
interaction, marketing on social network sites, and instant messaging the client. It delivers
automated responses to user inputs. It uses machine learning and deep learning algorithms to
generate different types of reactions.

3. Healthcare

16
Deep Learning has found its application in the Healthcare sector. Computer-aided disease
detection and computer-aided diagnosis have been possible using Deep Learning. It is widely
used for medical research, drug discovery, and diagnosis of life-threatening diseases such as
cancer and diabetic retinopathy through the process of medical imaging.

4. Entertainment

Companies such as Netflix, Amazon, YouTube, and Spotify give relevant movies, songs, and
video recommendations to enhance their customer experience. This is all thanks to Deep
Learning. Based on a person’s browsing history, interest, and behavior, online streaming
companies give suggestions to help them make product and service choices. Deep learning
techniques are also used to add sound to silent movies and generate subtitles automatically.
Next, we have News Aggregation as our next important deep learning application.

5. News Aggregation and Fake News Detection

Deep Learning allows you to customize news depending on the readers’ persona. You can
aggregate and filter out news information as per social, geographical, and economic parameters
and the individual preferences of a reader. Neural Networks help develop classifiers that can
detect fake and biased news and remove it from your feed. They also warn you of possible
privacy breaches.

6. Composing Music

A machine can learn the notes, structures, and patterns of music and start producing music
independently. Deep Learning-based generative models such as WaveNet can be used to develop
raw audio. Long Short Term Memory Network helps to generate music automatically. Music21
Python toolkit is used for computer-aided musicology. It allows us to train a system to develop
music by teaching music theory fundamentals, generating music samples, and studying music.
Next in the list of deep learning applications, we have Image Coloring.

7. Image Coloring

Image colorization is taking an input of a grayscale image and then producing an output of a
colorized image. ChromaGAN is an example of a picture colorization model. A generative
network is framed in an adversarial model that learns to colorize by incorporating a perceptual
and semantic understanding of both class distributions and color.

8. Automatic Colorization of Black and White Images


Image colorization is the problem of adding color to black and white photographs.
Traditionally this was done by hand with human effort because it is such a difficult task.
Deep learning can be used to use the objects and their context within the photograph to color
the image, much like a human operator might approach the problem.

9. Automatically Adding Sounds To Silent Movies


In this task the system must synthesize sounds to match a silent video.
The system is trained using 1000 examples of video with sound of a drum stick striking different
surfaces and creating different sounds. A deep learning model associates the video frames with
a database of pre-rerecorded sounds in order to select a sound to play that best matches what is
happening in the scene.
17
The system was then evaluated using a turing-test like setup where humans had to determine
which video had the real or the fake (synthesized) sounds.

10. Automatic Machine Translation


This is a task where given words, phrase or sentence in one language, automatically translate it
into another language.
Automatic machine translation has been around for a long time, but deep learning is achieving
top results in two specific areas:

Automatic Translation of Text.


Automatic Translation of Images.

11. Object Classification and Detection in Photographs


This task requires the classification of objects within a photograph as one of a set of previously
known objects.

12. Automatic Handwriting Generation


This is a task where given a corpus of handwriting examples, generate new handwriting for a
given word or phrase.

The handwriting is provided as a sequence of coordinates used by a pen when the handwriting
samples were created. From this corpus the relationship between the pen movement and the
letters is learned and new examples can be generated ad hoc.
What is fascinating is that different styles can be learned and then mimicked. I would love to
see this work combined with some forensic hand writing analysis expertise.

13. Automatic Text Generation


This is an interesting task, where a corpus of text is learned and from this model new text is
generated, word-by-word or character-by-character.
The model is capable of learning how to spell, punctuate, form sentences and even capture the
style of the text in the corpus.

14. Automatic Image Caption Generation


Automatic image captioning is the task where given an image the system must generate a
caption that describes the contents of the image.
Once you can detect objects in photographs and generate labels for those objects, you can see
that the next step is to turn those labels into a coherent sentence description.

15. Automatic Game Playing


This is a task where a model learns how to play a computer game based only on the pixels on
the screen.
This very difficult task is the domain of deep reinforcement models and is the breakthrough
that DeepMind (now part of google) is renown for achieving.

Self Organizing Map (or Kohonen Map or SOM)

It is a type of Artificial Neural Network which is also inspired by biological models of neural
systems from the 1970s. It follows an unsupervised learning approach and trained its network
through a competitive learning algorithm. SOM is used for clustering and mapping (or
dimensionality reduction) techniques to map multidimensional data onto lower-dimensional

18
which allows people to reduce complex problems for easy interpretation. SOM has two layers,
one is the Input layer and the other one is the Output layer.

SOMs were invented for achieving data visualization to understand the dimensions of data through
artificial and self-organizing neural networks. The attempts to achieve data visualization to solve
problems are mainly done by what humans cannot visualize. These data are generally high-
dimensional so there are lesser chances of human involvement and of course less error.

SOMs help in visualizing the data by initializing weights of different nodes and then choose
random vectors from the given training data. They examine each node to find the relative weights
so that dependencies can be understood. The winning node is decided and that is called Best
Matching Unit (BMU). Later, SOMs discover these winning nodes but the nodes reduce over
time from the sample vector. So, the closer the node to BMU more is the more chance to recognize
the weight and carry out further activities. There are also multiple iterations done to ensure that no
node closer to BMU is missed. One example of such is the RGB color combinations that we use
in our daily tasks. Consider the below image to understand how they function.

What Are Autoencoders?

Autoencoders are a type of neural network commonly used in end-to-end image compression.
They consist of an encoder and a decoder. The encoder compresses the input image into a lower-
19
dimensional representation (latent space), and the decoder reconstructs the image from this
representation.

Autoencoders are very useful in the field of unsupervised machine learning. You can use them to
compress the data and reduce its dimensionality.

If anyone needs the original data can reconstruct it from the compressed data using an
autoencoder.

Architecture of Autoencoder:-

An Autoencoder is a type of neural network that can learn to reconstruct images, text, and other
data from compressed versions of themselves.

An Autoencoder consists of three layers:

1. Encoder
2. Code
3. Decoder
The Encoder layer compresses the input image into a latent space representation. It encodes the
input image as a compressed representation in a reduced dimension.

The compressed image is a distorted version of the original image.

The Code layer represents the compressed input fed to the decoder layer.

The decoder layer decodes the encoded image back to the original dimension. The decoded
image is reconstructed from latent space representation, and it is reconstructed from the latent
space representation and is a lossy reconstruction of the original image.

20

You might also like