Autoencoder Report 1
Autoencoder Report 1
2023
Autoencoder
Mattu University,
Department of ECE
MULUKEN TESFAYE
Page |
Autoencoder 2023
MATTU UNIVERSITY
INDIVIDUAL ASSIGNMENT
COURSE TITLE: MACHINE LEARNING FOR COMMUNICATION SYSTEM
INDIVIDUAL ASSIGNMENT
NAME YEAR&PROGRAM
METTU, ETHIOPIA
Autoencoder 2023
Table of Contents
Abstract ......................................................................................................................................ii
Introduction ................................................................................................................................ 1
Basic Autoencoder System ........................................................................................................ 1
Components of Autoencoder ..................................................................................................... 3
Difference Between Encoding and Decoding ............................................................................ 5
Architecture of Autoencoder...................................................................................................... 7
Types of Autoencoders .............................................................................................................. 9
Application of Autoencoders ................................................................................................... 16
Implementation of Autoencoder .............................................................................................. 19
Explanation of Big Data Architecture...................................................................................... 27
References ................................................................................................................................ 30
Abstract
Autoencoder, one of the most promising and successful architectures in deep learning (DL),
has been widely used in wireless communications. However, fast-increasing size of neural
networks (NNs) in autoencoder leads to high storage requirement and heavy computational
overhead, which poses a challenge to practical deployment of autoencoders in real
communications systems. It is the case of artificial neural mesh used to discover effective
data coding in an unattended manner. The Autoencoder goal is used to learn presentation for
a group of data, especially for dimensionality step-down. Autoencoders have a unique feature
where their input is equal to its output by forming feedforwarding networks. Autoencoder
turns the input into compressed data to form a low dimensional code and then again retrace
the input to form the desired output. The compressed code of input is also called latent space
representation. In simple, the main aim is to reduce distortion in between circuits[1].
Keywords
Wireless communication
Deep learning (DL)
Neural networks (NNs)
Introduction
A traditional autoencoder is an unsupervised neural network that learns how to efficiently
compress data, which is also called encoding. The autoencoder also learns how to reconstruct
the data from the compressed representation such that the difference between the original
data and the reconstructed data is minimal.
Traditional wireless communication systems are designed to provide reliable data transfer
over a channel that impairs the transmitted signals. These systems have multiple components
such as channel coding, modulation, equalization, synchronization, etc. Each component is
optimized independently based on mathematical models that are simplified to arrive at closed
form expressions. On the contrary, an autoencoder jointly optimizes the transmitter and the
receiver as a whole. This joint optimization has the potential of providing a better
performance than the traditional systems [2],[3].
Traditional autoencoders are usually used to compress images, in other words remove
redundancies in an image and reduce its dimension. A wireless communication system on the
other hand uses channel coding and modulation techniques to add redundancy to the
information bits. With this added redundancy, the system can recover the information bits
that are impaired by the wireless channel. So, a wireless autoencoder actually adds
redundancy and tries to minimize the number of errors in the received information for a given
channel while learning to apply both channel coding and modulation in an unsupervised way.
where M=2k. Then message s is mapped to n real number to create x=f(s)∈ℝn. The last layer
of the encoder imposes constraints on x to further restrict the encoded symbols. The
following are possible such constraints and are implemented using the normalization layer:
Define the communication rate of this system as R=k/n [bits/channel use], where (n,k) means
that the system sends one of M=2k messages using n channel uses. The channel impairs
encoded (i.e. transmitted) symbols to generate y∈ℝn. The decoder (i.e. receiver) produces an
estimate, ˆs, of the transmitted message, s.
The input message is defined as a one-hot vector 1s∈ℝM, which is defined as a vector whose
elements are all zeros except the sth one. The channel is additive white Gaussian noise
(AWGN) that adds noise to achieve a given energy per data bit to noise power density
ratio, Eb/No.
The autoencoder maps k data bits into n channel uses, which results in an effective coding
rate of R=k/n data bits per channel use. Then, 2 channel uses are mapped into a symbol,
which results in 2 channel uses per symbol. Map the channel uses per channel symbol value
to the BitsPerSymbol parameter of the AWGN channel.
Components of Autoencoder
There are three main components in Autoencoder. They are Encoder, Decoder, and Code.
The encoder and decoder are completely connected to form a feed forwarding mesh—the
code act as a single layer that acts as per its own dimension. To develop an Autoencoder, you
have to set a hyperparameter; you have to set the number of nodes in the core layer. The
decoder’s output network is a mirror image of the input encoder in a more detailed manner.
The decoder produces the desired output only with the help of the code layer.
Ensure that the encoder and decoder have the same dimensional values. The important
parameter to set autoencoder is code size, number of layers, and number of nodes in each
layer.
Code size is defined by the total quantity of nodes present in the middle layer. To get
effective compression, the small size of a middle layer is advisable. The Number of layers in
the autoencoder can be deep or shallow as you wish. The Number of nodes in the
autoencoder should be the same in both encoder and decoder. The layer of decoder and
encoder must be symmetric.
In a stacked autoencoder, you have one invisible layer in both the encoder and decoder. It
consists of handwritten pictures with a size of 28*28. Now you can develop an autoencoder
with 128 nodes in the invisible layer with 32 as code size. To add many numbers of layers,
use this function
model.add(Dense(16,activation='relu'))
model.add(Dense(8, activation='relu'))
for conversion,
layer_1=Dense(16,activation='relu')(input)
Now the output of this layer is added as an input to the next layer. This is the callable layer in
this dense method. The decoder performs this function. It uses the sigmoid method to obtain
output between 0 to 1. Since the input lies between 0 to 1 range
Encoding means the creation of a messages (which you want to communicate with other person).
On the other hand decoding means listener or audience of encoded message. So decoding means
interpreting the meaning of the message. You will interpret and understand the message, what
just been said.
All communication begins with the sender. The first step the sender is faced with involves
the encoding process. In order to convey meaning, the sender must begin encoding, which means
translating information into a message in the form of symbols that represent ideas or concepts.
The decoding of a message is how an audience member is able to understand, and interpret the
message.
Architecture of Autoencoder
In this stacked architecture, the code layer has a small dimensional value than input
information, which is said to be under a complete autoencoder.
1. Denoising Autoencoders
You cannot copy the input signal to the output signal to get the perfect result in this method.
Because here, the input signal contains noise that needs to be subtracted before getting the
result that is the underlying needed data. This process is called denoising autoencoder. The
first row contains original images. To make them a noisy input signal, some noisy data is
added. Now you can design the autoencoder to get a noise-free output as follows
autoencoder.fit(x_train_noisy, x_train)
Convolution autoencoder is used to handle complex signals and also get a better result than
the normal process.
2. Sparse Autoencoders
Then another effective method is regularization. To apply this regularization, you need to
regularize sparsity constraints. To activate some parts of nodes in the layer, add some extra
terms to the loss function, which pushes the autoencoder to make each input as combined
smaller nodes, and it makes the encoder find some unique structures in the given data. It is
also applicable for many data because only a part of the nodes is activated.
In this model, only 0.01 is the final loss that too because of the regularization term.
In this sparse model, a bunch of code values is true to the expected result. But it has fairly
low variance values.
Regularized autoencoders have unique properties like robustness to missing inputs, sparse
representation, and nearest value to derivatives in presentations. To use effectively, keep
minimum code size and shallow encoder and decoder. They discover a high capacity of
inputs and do not need any extra regularizing term for encoding to be effective. They are
trained to give maximized effect rather than copy and paste.
3. Variational Autoencoder
It is used in complex cases, and it finds the chances of distribution designing the input data.
This variational autoencoder uses a sampling method to get its effective output. It follows the
same architecture as regularized autoencoders.
Hence autoencoders are used to learn real-world data and images involved in binary and
multiclass classifications. It is a simple process for dimensionality reduction. It is applied in a
Restricted Boltzmann machine and plays a vital role in it. It is also used in the biochemical
industry to discover the unrevealed part of learning and identify intelligent behavior patterns.
Every component in machine learning has a self-organized character; Autoencoder is one of
that which is successful learning in artificial intelligence..
When it comes to managing heavy data and doing complex operations on that massive data
there becomes a need to use big data tools and techniques. When we say using big data tools
and techniques we effectively mean that we are asking to make use of various software and
procedures which lie in the big data ecosystem and its sphere. There is no generic solution
that is provided for every use case and therefore it has to be crafted and made in an effective
way as per the business requirements of a particular company. Thus there becomes a need to
make use of different big data architecture as the combination of various technologies will
result in the resultant use case being achieved. By establishing a fixed architecture it can be
ensured that a viable solution will be provided for the asked use case.
Types of Autoencoders
There are, basically, 7 types of autoencoders:
Denoising autoencoder
Sparse Autoencoder
Deep Autoencoder
Contractive Autoencoder
Undercomplete Autoencoder
Convolutional Autoencoder
Variational Autoencoder
1) Denoising Autoencoder
Denoising autoencoders create a corrupted copy of the input by introducing some noise. This
helps to avoid the autoencoders to copy the input to the output without learning features
about the data. These autoencoders take a partially corrupted input while training to recover
the original undistorted input. The model learns a vector field for mapping the input data
towards a lower dimensional manifold which describes the natural data to cancel out the
added noise.
Advantages
It was introduced to achieve good representation. Such a representation is one that can
be obtained robustly from a corrupted input and that will be useful for recovering the
corresponding clean input.
Corruption of the input can be done randomly by making some of the input as zero.
Remaining nodes copy the input to the noised input.
Minimizes the loss function between the output node and the corrupted input.
Drawbacks
This model isn't able to develop a mapping which memorizes the training data
because our input and target output are no longer the same.
2) Sparse Autoencoder
Sparse autoencoders have hidden nodes greater than input nodes. They can still discover
important features from the data. A generic sparse autoencoder is visualized where the
obscurity of a node corresponds with the level of activation. Sparsity constraint is introduced
on the hidden layer. This is to prevent output layer copy input data. Sparsity may be obtained
by additional terms in the loss function during the training process, either by comparing the
probability distribution of the hidden unit activations with some low desired value,or by
manually zeroing all but the strongest hidden unit activations. Some of the most powerful AIs
in the 2010s involved sparse autoencoders stacked inside of deep neural networks.
Advantages
Sparse autoencoders have a sparsity penalty, a value close to zero but not exactly
zero. Sparsity penalty is applied on the hidden layer in addition to the reconstruction
error. This prevents overfitting.
They take the highest activation values in the hidden layer and zero out the rest of the
hidden nodes. This prevents autoencoders to use all of the hidden nodes at a time and
forcing only a reduced number of hidden nodes to be used.
Drawbacks
For it to be working, it's essential that the individual nodes of a trained model which
activate are data dependent, and that different inputs will result in activations of
different nodes through the network.
3) Deep Autoencoder
Deep Autoencoders consist of two identical deep belief networks, oOne network for encoding
and another for decoding. Typically deep autoencoders have 4 to 5 layers for encoding and
the next 4 to 5 layers for decoding. We use unsupervised layer by layer pre-training for this
model. The layers are Restricted Boltzmann Machines which are the building blocks of deep-
belief networks. Processing the benchmark dataset MNIST, a deep autoencoder would use
binary transformations after each RBM. Deep autoencoders are useful in topic modeling, or
statistically modeling abstract topics that are distributed across a collection of documents.
They are also capable of compressing images into 30 number vectors.
Advantages
Deep autoencoders can be used for other types of datasets with real-valued data, on
which you would use Gaussian rectified transformations for the RBMs instead.
Drawbacks
Chances of overfitting to occur since there's more parameters than input data.
Training the data maybe a nuance since at the stage of the decoder’s backpropagation,
the learning rate should be lowered or made slower depending on whether binary or
continuous data is being handled.
4) Contractive Autoencoder
Advantages
This model learns an encoding in which similar inputs have similar encodings. Hence,
we're forcing the model to learn how to contract a neighborhood of inputs into a
smaller neighborhood of outputs.
5) Undercomplete Autoencoder
The objective of undercomplete autoencoder is to capture the most important features present
in the data. Undercomplete autoencoders have a smaller dimension for hidden layer
compared to the input layer. This helps to obtain important features from the data. It
minimizes the loss function by penalizing the g(f(x)) for being different from the input x.
Advantages
Drawbacks
Using an overparameterized model due to lack of sufficient training data can create
overfitting.
6) Convolutional Autoencoder
Autoencoders in their traditional formulation does not take into account the fact that a signal
can be seen as a sum of other signals. Convolutional Autoencoders use the convolution
operator to exploit this observation. They learn to encode the input in a set of simple signals
and then try to reconstruct the input from them, modify the geometry or the reflectance of the
image. They are the state-of-art tools for unsupervised learning of convolutional filters. Once
these filters have been learned, they can be applied to any input in order to extract features.
These features, then, can be used to do any task that requires a compact representation of the
input, like classification.
Advantages
Due to their convolutional nature, they scale well to realistic-sized high dimensional
images.
Drawbacks
The reconstruction of the input image is often blurry and of lower quality due to
compression during which information is lost.
7) Variational Autoencoder
called the Stochastic Gradient Variational Bayes estimator. It assumes that the data is
generated by a directed graphical model and that the encoder is learning an approximation to
the posterior distribution where Ф and θ denote the parameters of the encoder (recognition
model) and decoder (generative model) respectively. The probability distribution of the latent
vector of a variational autoencoder typically matches that of the training data much closer
than a standard autoencoder.
Advantages
It gives significant control over how we want to model our latent distribution unlike
the other models.
After training you can just sample from the distribution followed by decoding and
generating new data.
Drawbacks
When training the model, there is a need to calculate the relationship of each
parameter in the network with respect to the final output loss using a technique known
as backpropagation. Hence, the sampling process requires some extra attention.
Application of Autoencoders
So far we have seen a variety of autoencoders and each of them is good at a specific task.
Let’s find out some of the tasks they can do
1. File Compression: Primary use of Autoencoders is that they can reduce the dimensionality
of input data which we in common refer to as file compression. Autoencoders works with all
kinds of data like Images, Videos, and Audio, this helps in sharing and viewing data faster
than we could do with its original file size.
2. Image De-noising: Autoencoders are also used as noise removal techniques (Image De-
noising), what makes it the best choice for De-noising is that it does not require any human
interaction, once trained on any kind of data it can reproduce that data with less noise than the
original image.
3. Image Transformation: Autoencoders are also used for image transformations, which is
typically classified under GAN(Generative Adversarial Networks) models. Using these we
can transform B/W images to colored one and vice versa, we can up-sample and down-
sample the input data, etc.
4. Data Compression
Although autoencoders are designed for data compression yet they are hardly used for this
purpose in practical situations. The reasons are:
Lossy compression: The output of the autoencoder is not exactly the same as the
input, it is a close but degraded representation. For lossless compression, they are not
the way to go.
Data-specific: Autoencoders are only able to meaningfully compress data similar to
what they have been trained on. Since they learn features specific for the given
training data, they are different from a standard data compression algorithm like jpeg
or gzip. Hence, we can’t expect an autoencoder trained on handwritten digits to
compress landscape photos.
Since we have more efficient and simple algorithms like jpeg, LZMA, LZSS(used in
WinRAR in tandem with Huffman coding), autoencoders are not generally used for
compression. Although autoencoders have seen their use for image denoising and
dimensionality reduction in recent years.
5. Machine translation
Autoencoders have been applied to machine translation, which is usually referred to as neural
machine translation (NMT). Unlike traditional autoencoders, the output does not match the
input - it is in another language. In NMT, texts are treated as sequences to be encoded into the
learning procedure, while on the decoder side sequences in the target language(s) are
generated. Language-specific autoencoders incorporate further linguistic features into the
learning procedure, such as Chinese decomposition features. Machine translation is rarely
still done with autoencoders, but rather transformer networks
6. Anomaly detection
Another application for autoencoders is anomaly detection. By learning to replicate the most
salient features in the training data under some of the constraints described previously, the
model is encouraged to learn to precisely reproduce the most frequently observed
characteristics. When facing anomalies, the model should worsen its reconstruction
performance. In most cases, only data with normal instances are used to train the
autoencoder; in others, the frequency of anomalies is small compared to the observation set
so that its contribution to the learned representation could be ignored. After training, the
autoencoder will accurately reconstruct "normal" data, while failing to do so with unfamiliar
anomalous data.[35] Reconstruction error (the error between the original data and its low
dimensional reconstruction) is used as an anomaly score to detect anomalies.[35]
Recent literature has however shown that certain autoencoding models can,
counterintuitively, be very good at reconstructing anomalous examples and consequently not
able to reliably perform anomaly detection.
7. Information retrieval
Information retrieval benefits particularly from dimensionality reduction in that search can
become more efficient in certain kinds of low dimensional spaces. Autoencoders were indeed
applied to semantic hashing, proposed by Salakhutdinov and Hinton in 2007. By training the
algorithm to produce a low-dimensional binary code, all database entries could be stored in
a hash table mapping binary code vectors to entries. This table would then support
information retrieval by returning all entries with the same binary code as the query, or
slightly less similar entries by flipping some bits from the query encoding.
Implementation of Autoencoder
In this section, we explore the concept of Image denoising which is one of the applications of
autoencoders. After getting images of handwritten digits from the MNIST dataset, we add
noise to the images and then try to reconstruct the original image out of the distorted image.
In this tutorial, we use convolutional autoencoders to reconstruct the image as they work
better with images. Also, we use Python programming language along with Keras and
TensorFlow to code this up.
import keras
### Downloading and Preprocessing of dataset and adding some noise to it.
import numpy as np
The next step is to add noise to our dataset. For this purpose, we use the NumPy library to
generate random numbers with a mean of 0.5 and a standard deviation of 0.5 in the shape of
our input data. Also to make sure the values of a pixel in between 0 and 1, we use the clip
function of NumPy to do so
Now let us visualize the distorted dataset and compare it with our original dataset. Here I
have displayed the five images before and after adding noise to them
plt.figure(figsize=(10, 10))
for i in range(5):
plt.subplot(1, 5, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 10))
for i in range(5):
plt.subplot(1, 5, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.tight_layout()
plt.show()
As we can see above, the images are partially distorted after adding noise to them and we can
hardly recognize the digits. Next, we define the structure of our autoencoder, fit the distorted
images, and pass the original images as labels.
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
x = UpSampling2D((2, 2))(x)
x = UpSampling2D((2, 2))(x)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
H = autoencoder.fit(
trainXNoisy, trainX,
validation_data=(testXNoisy, testX),
epochs=20,
batch_size=32)
N = np.arange(0, 20)
plt.style.use("ggplot")
plt.figure()
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.show()
Output: Here is a plot which shows loss at each epoch for both training and validation sets
pred = autoencoder.predict(testXNoisy)
plt.figure(figsize=(10,10))
for i in range(5):
plt.subplot(1, 5, i+1)
plt.show()
plt.figure(figsize=(10,10))
for i in range(5):
plt.subplot(1, 5, i+1)
plt.show()
plt.figure(figsize=(10,10))
for i in range(5):
plt.subplot(1, 5, i+1)
plt.show()
As we can see above, the model is able to successfully denoise the images and generate the
pictures that are pretty much identical to the original images.
This architecture is designed in such a way that it handles the ingestion process,
processing of data and analysis of the data is done which is way too large or complex
to handle the traditional database management systems.
Different organizations have different thresholds for their organizations, some have it
for a few hundred gigabytes while for others even some terabytes are not good enough
a threshold value.
Due to this event happening if you look at the commodity systems and the commodity
storage the values and the cost of storage have reduced significantly. There is a huge
variety of data that demands different ways to be catered.
Some of them are batch related data that comes at a particular time and therefore the
jobs are required to be scheduled in a similar fashion while some others belong to the
streaming class where a real-time streaming pipeline has to be built to cater to all the
requirements. All these challenges are solved by big data architecture.
Big Data systems involve more than one workload types and they are broadly classified as
follows:
1. Where the big data-based sources are at rest batch processing is involved.
2. Big data processing in motion for real-time processing.
3. Exploration of interactive big data tools and technologies.
4. Machine learning and predictive analysis.
1. Data Sources
The data sources involve all those golden sources from where the data extraction pipeline is
built and therefore this can be said to be the starting point of the big data pipeline.
Examples,
(i) Datastores of applications such as the ones like relational databases
(ii) The files which are produced by a number of applications and are majorly a part of static
file systems such as web-based server files generating logs.
2. Data Storage
This includes the data which is managed for the batch built operations and is stored in the file
stores which are distributed in nature and are also capable of holding large volumes of
different format backed big files. It is called the data lake. This generally forms the part
where our Hadoop storage such as HDFS, Microsoft Azure, AWS, GCP storages are
provided along with blob containers.
3. Batch Processing
All the data is segregated into different categories or chunks which makes use of long-
running jobs used to filter and aggregate and also prepare data o processed state for analysis.
These jobs usually make use of sources, process them and provide the output of the processed
files to the new files. The batch processing is done in various ways by making use of Hive
jobs or U-SQL based jobs or by making use of Sqoop or Pig along with the custom map
reducer jobs which are generally written in any one of the Java or Scala or any other language
such as Python.
This includes, in contrast with the batch processing, all those real-time streaming systems
which cater to the data being generated sequentially and in a fixed pattern. This is often a
simple data mart or store responsible for all the incoming messages which are dropped inside
the folder necessarily used for data processing. There are, however, majority of solutions that
require the need of a message-based ingestion store which acts as a message buffer and also
supports the scale based processing, provides a comparatively reliable delivery along with
other messaging queuing semantics. The options include those like Apache Kafka, Apache
Flume, Event hubs from Azure, etc.
5. Stream Processing
There is a slight difference between the real-time message ingestion and stream processing.
The former takes into consideration the ingested data which is collected at first and then is
used as a publish-subscribe kind of a tool. Stream processing, on the other hand, is used to
handle all that streaming data which is occurring in windows or streams and then writes the
data to the output sink. This includes Apache Spark, Apache Flink, Storm, etc.
6. Analytics-Based Datastore
This is the data store that is used for analytical purposes and therefore the already processed
data is then queried and analyzed by using analytics tools that can correspond to the BI
solutions. The data can also be presented with the help of a NoSQL data warehouse
technology like HBase or any interactive use of hive database which can provide the
metadata abstraction in the data store. Tools include Hive, Spark SQL, Hbase, etc.
The insights have to be generated on the processed data and that is effectively done by the
reporting and analysis tools which makes use of their embedded technology and solution to
generate useful graphs, analysis, and insights helpful to the businesses. Tools include Cognos,
Hyperion, etc.
8. Orchestration
Big data-based solutions consist of data related operations that are repetitive in nature and are
also encapsulated in the workflows which can transform the source data and also move data
across sources as well as sinks and load in stores and push into analytical units. Examples
include Sqoop, oozie, data factory, etc.
References
[1] Oshea TJ, Hoydis J (2017) An introduction to deep learning for the physical layer.
IEEE Trans Cognitive Commun Netw 3(4):563–575
[2] T. O’Shea and J. Hoydis, "An Introduction to Deep Learning for the Physical
Layer," in IEEE Transactions on Cognitive Communications and Networking, vol. 3,
no. 4, pp. 563-575, Dec. 2017, doi: 10.1109/TCCN.2017.2758370.
[3] S. Dörner, S. Cammerer, J. Hoydis and S. t. Brink, "Deep Learning Based
Communication Over the Air," in IEEE Journal of Selected Topics in Signal
Processing, vol. 12, no. 1, pp. 132-143, Feb. 2018, doi:
10.1109/JSTSP.2017.2784180.
[4] B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. B¨ulow, D. Lavery,P.
Bayvel, and L. Schmalen, “End-to-End Deep Learning of Optical Fiber
Communications,” J. Lightw. Technol., vol. 36, no. 20, pp. 4843– 4855, Oct. 2018.
[5] Z. Zhu, J. Zhang, R. Chen, and H. Yu, “Autoencoder-Based Transceiver Design
for OWC Systems in Log-Normal Fading Channel,” IEEE Photon. J., vol. 11, no. 5,
pp. 1–12, Oct. 2019.