0% found this document useful (0 votes)
35 views16 pages

Project Report

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 16

Handwritten Digit Recognition

Using CNN
ABSTRACT

In this document, we explore the process of training a neural network model to recognize
handwritten digits provided as input to the model. The chosen algorithm for this task is the
Convolutional Neural Network (CNN), a deep learning architecture that excels in learning from
image data. Operating within the domain of deep learning, CNNs are adept at identifying
patterns in images, facilitating object and category recognition. Comprising multiple layers,
CNNs analyze input data and generate output, proving highly efficient in the contemporary
landscape saturated with artificial intelligence. As an integral part of Artificial Neural Networks
(ANN), CNNs exhibit superior performance in computer vision tasks, with notable
achievements in diverse fields such as medical research and AI. The distinctive grid pattern of
images is processed by CNNs, featuring three essential layers: convolution, pooling, and fully
connected layers. The initial layers focus on feature extraction, while subsequent layers map
these extracted features to produce the final output. Notably, the convolution layer, employing
operations like convolution, plays a pivotal role in CNNs. By applying a small grid of
parameters known as a kernel to each position in the image's two-dimensional array of pixel
values, CNNs demonstrate exceptional efficiency in image processing. The sequential
execution of convolution and subsampling across layers ensures a streamlined flow of
information, with each layer's output serving as input for the subsequent layer. Ultimately, the
predicted value is derived from the output of the final layer. This hierarchical approach allows
progressively more complex features to be extracted as information flows through the layers,
contributing to the efficacy of CNNs in intricate tasks like image recognition and classification.
INTRODUCTION

Artificial intelligence (AI), in its essence, involves enabling a computer to perform tasks that
traditionally rely on human cognitive abilities. AI possesses the capacity to process extensive
datasets, surpassing human limitations, and leverages this data to identify patterns, make
informed decisions, and exercise judgment. Within the domain of AI, Machine Learning (ML)
emerges as a crucial subset. ML empowers computers to assimilate behaviors akin to humans
through two primary approaches: supervised learning and unsupervised learning.

Supervised learning entails providing the computer with a labeled dataset, comprising input
data and corresponding output. Through ML, the computer learns the underlying algorithm,
discerning the relationship between specific inputs and their corresponding outputs. This
method is particularly effective in scenarios where the desired outcome is known, facilitating
the machine in grasping intricate patterns. Conversely, unsupervised learning unfolds when
input data lacks pre-defined outputs. In this scenario, ML mechanisms are tasked with
autonomously analyzing datasets, discerning patterns, and categorizing information into
relevant groups. This method is instrumental when insights into data relationships are sought
without predefined objectives.

This nuanced exploration emphasizes the versatility of AI and ML, elucidating their pivotal
role in automating tasks and extracting meaningful insights from vast datasets. The dynamic
interplay between supervised and unsupervised learning showcases the adaptability of these
technologies in addressing diverse challenges across industries. As AI continues to evolve, the
symbiotic relationship with ML reinforces its transformative impact, positioning it as a
cornerstone in shaping the future of computing.

Machine Learning (ML), a pivotal field in computer science, empowers machines to learn
autonomously without explicit programming. Instead of relying on traditional programming,
ML harnesses algorithms and advanced techniques to facilitate learning from data. This process
involves machines performing tasks, gauging accuracy, and assessing whether their task
performance improves over time. ML leverages past experiences and examples to construct
models capable of predicting new values. Particularly beneficial for tackling large and intricate
problems, ML significantly reduces the time needed to unearth crucial insights from vast
datasets. Its ability to swiftly address complex challenges positions machines as rapid learners,
often outpacing human capabilities in certain domains. The escalating demand for ML attests
to its transformative potential, and its integration continues to advance, shaping the landscape
of computing and problem-solving.

In recent years, there has been an unprecedented surge in the volume of information and data,
necessitating the quest for effective tools that facilitate accurate decision-making. Machine
Learning has emerged as a cornerstone in addressing this challenge, representing a realm within
Artificial Neural Networks (ANN) where machines assimilate knowledge from experiences
and examples, akin to human learning. The construction of a model stems from the data
provided to the algorithm, enabling predictions of new values based on this acquired model.
Beyond its predictive prowess, Machine Learning serves as a catalyst for exploration,
unraveling novel and undiscovered insights. Its applications span diverse domains,
encompassing finance, health, media, travel, image processing, computer vision, automated
trading, aerospace, natural language processing, manufacturing, automotive, and beyond. This
report serves as a comprehensive exploration of the fundamentals of machine learning, delving
into the algorithms employed and their wide-ranging applications.

The focal algorithm discussed in this paper is Convolutional Neural Network (CNN), a robust
network architecture designed for deep learning. Operating on input images, CNN delves into
the intricate task of pattern recognition, deciphering objects and categories through a process
known as feature extraction. Subsequently, max pooling and convolution unfold, orchestrated
by multiple layers within the CNN structure. These layers meticulously analyze the input,
culminating in the production of meaningful output, as illustrated in Fig. [figure number]. The
pervasive influence of CNN, with its ability to uncover intricate patterns, exemplifies its
significance in the realm of deep learning and underscores its pivotal role in deciphering
complex visual data.

This algorithm stands out as one of the extensively employed methodologies in the realm of
deep learning, proving to be highly efficient in our contemporary AI-driven world. Notably,
Convolutional Neural Network (CNN) distinguishes itself by eliminating the necessity for
manually crafted feature extraction. Unlike traditional approaches, CNN architectures dispense
with the need for experts to painstakingly segment organs or tumors. This intrinsic capability
streamlines processes, making CNN a versatile and user-friendly tool in various applications
within the field of artificial intelligence.

CNN possesses the capacity to handle vast amounts of data owing to its intricate learnable
parameters, albeit at a higher cost. The architecture encompasses three distinct types of layers,
each carrying out specific operations:

a. Convolution Layer: This layer constructs a feature map by leveraging previously


utilized examples, predicting probabilities for each feature through a meticulous scan
of the image. It systematically flattens all pixels, scrutinizing them in a sequential
manner.

b. Pooling Layer: Responsible for reducing the information generated by the convolution
layer, this layer, also known as down sampling, iteratively scales down the output. Both
the convolution and pooling layers repeat multiple times for each extracted feature.

c. Fully Connected Layers: Comprising three layers—fully connected input, hidden, and
output layers—this segment plays a crucial role. The fully connected input layers flatten
image pixels for streamlined analysis. The hidden layers impart weights to generated
inputs, applying an activation function. The output layer concludes the process by
generating final probabilities, predicting outcomes, and determining the image's class
and category.

In a practical implementation:

1. The chosen approach involved employing the Convolutional Neural Network algorithm
to identify digits inscribed in a personalized style and font.

2. The utilized CNN architecture comprised an input layer, a hidden layer, and an output
layer.

3. Libraries such as Keras and Tensorflow were harnessed for dataset loading and model
training.

4. The activation function chosen for the hidden layer was ReLU, known for its simplicity,
high efficiency, and ease of comprehension, as it eliminates negative values, allowing
positive values to traverse through the network.

Fig.1 Convolution Neural Network


METHODOLOGY

1. Incorporating Libraries: Utilizing libraries proves to be an invaluable asset, greatly


enhancing the efficiency of a web developer's tasks. Essentially, a library comprises
prewritten code modules that can be seamlessly invoked while programming. It
encapsulates the collective efforts of developers who have already undertaken specific
coding tasks, providing a reservoir of functions and features that can be leveraged
without the need to start from scratch. This collaborative approach expedites the
development process and enriches the functionality of applications. The libraries used
in this particular code include:

a. Tensorflow - Tensorflow, an open-source framework, plays a pivotal role in


machine learning and various data computations. Recognizable by its TF
symbol, Tensorflow empowers developers to create learning algorithms tailored
to their datasets and models. It facilitates experimentation with diverse
algorithms and enables the creation of data flow structures, illustrating the
movement of data through a series of graphs or nodes.

b. OpenCV – OpenCV, short for Open-Source Computer Vision Library, stands as


a prominent open-source software library for computer vision and machine
learning. Specifically designed for Python, OpenCV, or CV2, addresses
computer vision challenges and accelerates the integration of machine
perception into a wide array of products.

c. NumPy – NumPy, an abbreviation for Numerical Python, serves as a robust


library for array manipulation. Its functionalities extend to the domains of linear
algebra, matrices, and Fourier transforms. Originating from the efforts of Travis
Oliphant in 2005, NumPy operates as an open-source project, freely available
for use.

2. Dataset Loading: To facilitate the training and testing phases, we opt for a dataset
readily available within the Tensorflow API. The chosen dataset is the Mnist dataset,
encompassing an extensive collection of approximately fifty thousand images for
training purposes and an additional ten thousand images for meticulous testing. Each
image within this dataset is meticulously formatted to adhere to a specific
dimensionality of 28X28, ensuring uniformity and coherence throughout the dataset.

3. Model Creation: Our endeavor involves the construction of a model through the
implementation of a sequential neural network designed to discern handwritten digits
effectively. This process entails the incorporation of three distinct dense layers,
specifically the input, hidden, and output layers. The configuration of these layers
involves the utilization of 128 parameters for both the input and hidden layers, with a
subsequent adjustment to 10 parameters for the final output layer. This strategic
allocation of parameters ensures the model's ability to capture intricate patterns within
the handwritten digit data for accurate recognition.

4. Model Training: The training phase involves the utilization of three dense layers,
incorporating the input, hidden, and output layers. Each of these layers is configured
with 128, 128, and 10 parameters, respectively. In the initial stage, the greyscale image
pixels undergo flattening within the input layer. Subsequently, activation functions are
applied to the weights within the hidden layer, contributing to the intricate learning
process. The conclusive output layer culminates in delivering predictions based on the
acquired knowledge and refined patterns, thereby enhancing the model's proficiency in
recognizing handwritten digits.

Fig.2 Methodology used in Implementation of Handwritten Recognition


5. Model Evaluation: Following the training phase, the model undergoes testing using
both the loss function and accuracy metric. The aim is to achieve a high accuracy value,
approaching 1, and a minimal loss value, nearing 0, to ensure optimal performance. In
this context, the "ADAM" optimizer, "cross-entropy" as the chosen loss function, and
"accuracy" as the designated metric contribute to the meticulous evaluation process.
The emphasis lies on refining the model's predictive capabilities to yield desired
outcomes with precision and efficiency.

6. Obtaining Results: Inputting a set of data into our model initiates the output generation.
Utilizing a while loop, we systematically analyze each input individually, obtaining
predictions for each corresponding image. The intricacies of this process are visually
represented in the accompanying flowchart, depicted in Fig. 2. This systematic
approach ensures a comprehensive examination of the input data, facilitating precise
predictions for each image within the given dataset.
TOOLS AND ALGORITHMS USED

1. The utilized neural network is a Sequential Model, a type of model in machine learning
specifically designed to analyze input sequences and generate corresponding output
sequences of data. Sequential data encompasses various forms such as audio clips, text
streams, time-series data, and video clips. Recurrent Neural Networks (RNNs) stand
out as a predominant algorithm employed in sequence models. The structure of a
Sequential Model entails a linear path connecting all the input, hidden, and output layers
without omitting any layers in between, illustrated in Fig. 3. This model configuration
proves optimal for a straightforward stack of layers, where each layer possesses
precisely one input tensor, one or more hidden tensors, and one output tensor.

Fig.3 Structure of a Sequential Network

2. The implemented activation functions encompass:

a. Rectified Linear Activation Function (ReLU): ReLU, short for Rectified Linear
Activation Function, is a piecewise linear function. It produces output directly
from the input when positive; otherwise, the output is zero, as depicted in Fig.
4. ReLU serves as the default activation function for various neural networks
due to its ease of training models, contributing to enhanced performance.

b. Softmax: Softmax operates as a function capable of transforming a vector of


numbers into a vector of probabilities, with values ranging between 0 and 1,
where 1 represents the most probable outcome. The probabilities assigned to
each value are proportional to their relative scale in the vector. This function is
particularly useful for tasks requiring outcome probability distribution.
Fig.4 ReLU Activation Function

3. The utilized optimizer in this code is ADAM, an acronym for Adaptive Moment
Estimation. ADAM is not just an algorithm but also a sophisticated technique employed
for obtaining gradient descent. This method showcases remarkable effectiveness,
especially when dealing with extensive problems encompassing numerous parameters
and substantial datasets. Not only does it demand less memory, but it also amalgamates
two pivotal algorithms, namely the "RMSP" (Root Mean Square Propagation)
algorithm and the "gradient descent with momentum" algorithm.

ADAM serves the purpose of enhancing the overall efficiency of the gradient descent
algorithm by introducing a mechanism that takes into account the weighted average of
the gradients. By doing so, it propels the algorithm to converge towards minima at an
accelerated pace. This strategic use of averages plays a crucial role in influencing the
trajectory of the algorithm, steering it towards faster convergence.

4. The chosen loss function for this application is CROSS-ENTROPY, a particularly


fitting choice when optimizing classified models. Cross entropy proves to be highly
effective in scenarios involving classification tasks within supervised learning. In
classification, models make predictions of class labels based on one or multiple input
variables, and the Cross-Entropy loss function is well-suited to address the intricacies
of this process.

5. The performance metric employed in this code is ACCURACY, a pivotal function that
assesses the efficacy of the model. While both metric and loss functions share
similarities, the key distinction lies in the fact that outputs derived from evaluating a
metric are not utilized during the model training process. Accuracy, in this context,
serves as a crucial measure to evaluate how effectively the model performs its intended
tasks.
IMPLEMENTATION

This paper presents the implementation of a Neural Network designed to recognize and
predict handwritten digits. The foundation of this implementation is built using Tensorflow
and Keras. The datasets, sourced from these open-source libraries, provide the necessary
inputs for the model. Through extensive analysis of thousands of images, the model acquires
a profound understanding of patterns, pixel arrangements within greyscale images, and
establishes intricate neural connections.

Keras, functioning as an Application Programming Interface (API), is specifically crafted for


machine learning and deep learning applications. As an open-source library, it incorporates a
wealth of pre-existing data and essentially operates as the interface for the Tensorflow library.

1. Keras adheres to optimal practices, effectively reducing cognitive load for users.

2. It offers straightforward and consistent APIs for seamless interaction.

3. Extensive documentation and comprehensive developer guides contribute to its user-


friendly nature.

4. Keras streamlines common use cases by minimizing the required user actions and
provides clear, actionable error messages.

TensorFlow, an open-source framework symbolized by TF, serves as a robust tool in the realm
of machine learning and diverse computations across various datasets. Developers leverage
TensorFlow to craft learning algorithms, experiment with different models, and construct data
flow structures, illustrating how data navigates through interconnected graphs and nodes. The
datasets employed in this program are sourced from TensorFlow, which boasts an extensive
library containing pre-trained models applicable in various research contexts.

The nomenclature "TensorFlow" is derived from its unique trait of accepting input as multi-
dimensional arrays, commonly referred to as tensors. Users can articulate a kind of operational
flowchart, delineating the desired operations on the input. This flow begins as the input enters
one end of the system, traverses through a sequence of operations, and emerges as output from
the other end, encapsulating the essence of TensorFlow.

In the training phase, we employ the ADAM optimizer, renowned for efficiently updating
neural network weights in real-time during training. The optimizer significantly reduces
training time while enhancing efficiency. Subsequently, we evaluate the model's performance
using the loss function and the metrics function, employing cross-entropy and accuracy
metrics, respectively. Once the model is trained, it is saved for future use, containing all the
necessary data.
In the final step, we provide inputs to our model by scanning hand-written digits or creating
digital representations. By uploading these images into our Python script, we utilize a while
loop to systematically analyze and predict the output for each image, facilitating a seamless
and effective workflow.
RESULTS AND DISCUSSION

This implementation allows us to discern handwritten digits, provided as input to the code, as
depicted in Fig. 5. The code systematically analyzes the input and predicts the probability
output. Notably, the code demonstrates the capability to handle data created in MS Paint, where
the saved file is seamlessly loaded into the program for execution. Employing a while loop,
each image undergoes thorough examination and prediction, accommodating scenarios where
multiple dataset files are present.

The program relies on the neural connections it has learned, coupled with an understanding of
grayscale pixels, to accurately predict the output of recognized data. This successful integration
enables us to effectively recognize and digitize our data. The utilization of key components
such as the Sequential Model, ReLU, SoftMax, ADAM optimizer, Cross-Entropy, and
Accuracy functions for image recognition underscores the versatility and efficacy of the
implemented solution.

Fig.5 Handwritten digits given as input Images


PERFORMANCE ANALYSIS

To assess the performance of our model, we leverage two crucial metrics: the loss function and
accuracy. The chosen loss function, Cross-Entropy, quantifies the extent of loss incurred during
the training of our model. Essentially, it provides insights into how well our model is adapting
and learning from the provided data. On the other hand, accuracy serves as a pivotal metric,
revealing the overall precision of our model post-training.

Ideally, in a well-trained model, accuracy should approach 1, indicating a near-perfect


alignment with the desired outcome, while the loss should approach 0, signifying minimal
discrepancies. In our case, the achieved loss after training the model hovers around 3%,
indicating a relatively low level of divergence. Concurrently, the attained accuracy stands at an
impressive 97%, affirming the model's proficiency in accurately predicting outcomes.

These metrics serve as benchmarks to gauge the model's performance. If the loss is
disproportionately high or accuracy significantly deviates from the ideal, it signals the need for
further optimization, potentially by adjusting parameters or increasing the number of training
epochs. This iterative refinement process ensures that the model progressively approaches a
state of optimal performance.
CONCLUSION

In this implementation, we meticulously devised a neural network utilizing TensorFlow and


Keras to proficiently recognize handwritten digits. Leveraging the MNIST dataset, our model
exhibits a remarkable ability to identify and interpret input handwritten digits. The
convolutional neural network (CNN) employed in this approach demonstrates commendable
accuracy, substantiated by the minimal loss percentage observed across multiple training
sessions.

While the method excels in capturing the nuances of handwritten digits, challenges arise,
particularly in handling image noise. Nevertheless, through rigorous training sessions, the
model strives to optimize its performance, aiming for the most accurate outcomes. To evaluate
the model's efficacy, we conducted rigorous testing post-training, spanning 50 epochs. The
results affirm the model's robustness and accuracy, providing a foundation for further
enhancements and facilitating the exploration of streamlined approaches for complex data
scenarios, such as converting handwritten paragraphs into text.

This research has deepened our comprehension of the intricate mechanisms involved in
identifying handwritten data. Recognizing the practical significance of handwritten data
recognition, especially in scenarios where users prefer the convenience of transcribing paper-
based content rather than typing on a keyboard, underscores the relevance of our work. Looking
ahead, we recommend implementing this solution on edge computing platforms, such as the
Raspberry Pi 4 system, for real-world applications, promising a seamless integration into
practical usage scenarios.
REFERENCES

1. Wei Lu, Zhijian Li and Bingxue Shi, "Handwritten Digits Recognition with Neural Networks and
Fuzzy Logic", IEEE International Conference on Neural Networks, 1995.
2. D. Castro, B. L. D. Bezerra and M. Valena, "Boosting the deep multidimensional long-short-
term memory network for handwritten recognition systems", ICFHR, 2018.
3. Viswanatha, V., R. K. Chandana, and A. C. Ramachandra. "Real Time Object Detection System
with YOLO and CNN Models: A Review." (2022).
4. P. Voigtlaender, P. Doetsch and H. Ney, "Handwriting recognition with large multidimensional
long short-term memory recurrent neural networks", ICFHR, 2016.
5. Gideon Maillette de Buy Wenniger, Lambert Schomaker and Andy Way, "No Padding Please:
Efficient Neural Handwriting Recognition", ICDAR, 2019
6. Rohan Vaidya, Darshan Trivedi, Sagar Satra and Prof. Mrunalini Pimpale, "Handwritten
Character Recognition Using DeepLearning", ICICCT, 2018.
7. Viswanatha, V., R. K. Chandana, and A. C. Ramachandra. "IoT Based Smart Mirror Using
Raspberry Pi 4 and YOLO Algorithm: A Novel Framework for Interactive Display." Indian Journal
of Science and Technology 15.39 (2022): 2011-2020.
8. Herleen Kour and Naveen Kumar Gondhi, "Machine Learning approaches for Nastaliq style
Urdu handwritten recognition: A survey", ICACCS, 2020.
9. Shahbaz Hassan, Ayesha Irfan, Ali Mirza and Imran Siddiqi, "Cursive Handwritten Text
Recognition using Bi-Directional LSTMs: A case study on Urdu Handwriting", Deep-ML, 2019.
10. Viswanatha, V., Ashwini Kumari, and B. M. Sathisha. "Implementation of IoT in Agriculture: A
Scientific Approach for Smart Irrigation." 2022 IEEE 2nd Mysore Sub Section International
Conference (MysuruCon). IEEE, 2022.
11. Siham Tabik, Daniel Peralta, Andrs Herrera-Poyatos and Francisco Herrera, "A snapshot of
image Pre-Processing for convolutional neural networks: Case study of MNIST", International
Journal of Computational Intelligence Systems 10, vol. 555, no. 1, January 2017.
12. J. Wang and X. Hu, "Gated recurrent convolution neural network for ocr", NIPS, 2017.
13. Viswanatha, V., B. M. Sathisha, and Ashwini Kumari. "Custom Hardware and Software
Integration: Bluetooth Based Wireless Thermal Printer for Restaurant and Hospital
Management." 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon).
IEEE, 2022.
14. Viswanatha, V., Ashwini Kumari, and Pradeep Kumar. "Internet of things (IoT) based multilevel
drunken driving detection and prevention system using Raspberry Pi 3." International Journal
of Internet of Things and Web Services 6 (2021).

You might also like