0% found this document useful (0 votes)
22 views20 pages

Sem 5 Proj Report Group

Project of sem 5 computer Engineering

Uploaded by

sanjanabhosle27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views20 pages

Sem 5 Proj Report Group

Project of sem 5 computer Engineering

Uploaded by

sanjanabhosle27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

A

PROJECT REPPORT
ON

OBJECT DETECTION
Submitted in partial fulfilment of the requirements
of the degree of

Bachelor of Engineering
In
Computer Engineering
by
KRUTIKA BHIDE
(Roll No. 04 )
SANJANA BHOSLE
(Roll No. 09)
GAYATRI HADKAR
(Roll No. 26)

Supervisor(s):
Prof. Nilima patil

Department of Computer Engineering


K.C. College of Engineering and Management Studies And
Research, Thane (E)
University of Mumbai
2023-24
CERTIFICATE

This is to certify that the project entitled “Object detection” is a bonafide work of

“KRUTIKA BHIIDE” (Roll No.04 )& “SANJANA BHOSLE”


(Roll No.09) & “GAYATRI HADKAR”(Roll No. 26)
submitted to the University of Mumbai in partial fulfillment of the requirement for the

award of the degree of “Bachelor of Engineering” in “Computer Engineering”.

Name and sign Name and sign


Supervisor/Guide Co Supervisor/Guide

Dr.Vilas Nitnaware
Head of Department Principal
Project Report Approval for T.E.(Font Size-20)

This project report entitled OBJECT DETECTION by Krutika bhide ,


SANJANA BHOSLE, GAYATRI HADKAR is approved for the degree
of Bachelor of Engineering in Computer Engineering.

Examiners

1.---------------------------------------------

2.---------------------------------------------

Date:

Place:
DECLARATION
We declare that this written submission represents our ideas in our own
words and where others' ideas or words have been included, we have
adequately cited and referenced the original sources. We also declare that
we have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any
idea/data/fact/source in our submission. We understand that any
violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken
when needed.
_________________________

(Signature)

_______________________

(KRUTIKA BHIDE A-04)

(SANJANA BHOSLE A-09)

(GAYATRI HADKAR A-26)

Date:
DECLARATION
I declare that this written submission represents our ideas in our own
words and where others' ideas or words have been included, we have
adequately cited and referenced the original sources. We also declare that
we have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any
idea/data/fact/source in our submission. We understand that any
violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken
when needed.
_________________________

(Signature)

_______________________

(KRUTIKA BHIDE A-04)

(SANJANA BHOSLE A-09)

(GAYATRI HADKAR A-26)

Date:
ACKNOWLEDGEMENT
We would like to express special thanks of gratitude to our guide Mrs. NILIMA PATIL &
co-guide (if any) Mr. /Mrs. (Name of co-guide) as well as our Project Coordinator
(Name)gave us the golden opportunity to do this wonderful project on the topic of OBJECT
DETECTION, which also helped us in doing a lot of research and we came to know about
so many new things. We are very grateful to our Head of the Department MAHESH
MAURYA for extending her help directly and indirectly through various channels in our
project work. We would also like to thank Principal

Dr. Vilas Nitnaware for providing us the opportunity to implement our project. We are
really thankful to them. Finally we would also like to thank our parents and friends who
helped us a lot in finalizing this project within the limited time frame.

Thanking You.
TABLE OF CONTENT

Sr.No. Topic OBJECT DETECTION

Certificate ..............................................................................................i

Approval Sheet .....................................................................................ii

Declaration …….......................……………………………………….iii

Acknowledgement ……………………………………..…..................iv

List of Figures ...………………………………………………………vi

List of Table ...………………………………………………………...vii

Abstract ....…………………………………………………………….viii

1. Introduction …………………………………………………………………1
2. Literature Survey…………………………………………………………………..
3. Proposed Work
3.1 Requirement Analysis ……………………………………………….
3.1.1 Scope ……………………………………………….
3.1.2 Feasibility Study ………………………………………
3.1.3 Hardware & Software Requirement …………………………
3.2 Problem Statement ………………………………………………………..
3.3 Project Design……………………………………………………………...
3.4 Methodology………………………………………………………………
5. Conclusion and Future Scope…………………………………………………..
References ……………………………………………………………....
LIST OF FIGURES

Sr. No. Topic Page No.

1 Project Processing 6

2 Working of Object Detection 7

ABSTRACT
Introducing the "Personalized Object Detection Model for Visually Impaired Individuals" - a
specialized tool crafted exclusively for individuals with visual impairments. This innovative
model harnesses the power of advanced technologies to provide tailored assistance.
Imagine having a tool that not only understands the objects around you but also recognizes
the ones that matter most to you. It's like having a personal detective for objects! When you
use your webcam, this program acts like a set of super-fast eyes, quickly identifying objects
in view. It even knows that different objects may look different, adjusting images
accordingly. Plus, it can talk! Upon recognizing an object, it provides immediate feedback.
This tool is designed with your needs in mind, offering enhanced confidence and
independence for your daily activities. By combining OpenCV, NumPy, TensorFlow, Keras,
and pyttsx3 libraries, this program seamlessly processes video frames. OpenCV ensures
smooth webcam operation, while NumPy handles essential numerical operations.
TensorFlow and Keras utilize a personalized deep learning model to recognize objects
important to you. Additionally, pyttsx3 delivers customized audio feedback, providing
instant and tailored information about your surroundings. This groundbreaking solution
represents a significant stride in assistive technology, empowering visually impaired
individuals with a newfound level of autonomy. It focuses on detecting objects that matter
most to you, facilitating easier navigation and greater effectiveness in your environment.
1. Introduction
Living with visual impairment presents unique challenges, particularly when it comes to
perceiving and navigating the environment. Recognizing objects, a task often taken for
granted, can be a crucial aspect of daily independence. Object detection technology, rooted
in cutting-edge computer vision and machine learning, offers a transformative solution. It
enables real-time identification of objects through visual input, allowing individuals with
visual impairments to receive immediate feedback about their surroundings. This feedback,
often delivered through auditory or tactile means, bridges the gap between the visual world
and non-visual perception. This technology serves as a vital link, providing individuals with
visual impairments the information they need to navigate their surroundings with
confidence and accuracy.
Furthermore, object detection systems play a pivotal role in enhancing safety for visually
impaired individuals. By alerting them to the presence of obstacles or objects in their path,
these systems mitigate potential hazards and facilitate safer navigation. This heightened
level of awareness not only increases physical safety but also contributes to a greater sense
of overall well-being and independence.
Additionally, the accessibility provided by object detection technology extends beyond
physical safety. It opens up new avenues for educational and vocational opportunities. With
the ability to independently identify and interact with their environment, visually impaired
individuals can more actively participate in learning and work environments, leveling the
playing field and empowering them to pursue their goals with greater autonomy.
In summary, object detection technology is a powerful tool that transcends the boundaries
imposed by visual impairment. By providing real-time feedback and enhancing safety, it
revolutionizes the way visually impaired individuals interact with their surroundings. This
technology is not only about increasing accessibility; it's about fostering confidence,
independence, and a brighter outlook on life for those with visual impairments.

2. Literature survey
• TensorFlow is a popular open-source machine learning framework developed by
Google. It provides a comprehensive ecosystem of tools, libraries, and community
resources for building and deploying machine learning models.
• Keras, on the other hand, is a high-level neural networks API that runs on top of
TensorFlow (and other backends). It's known for its simplicity and ease of use,
allowing developers to quickly prototype and build neural network models.
• MobileNet, specifically MobileNetV2, is a lightweight deep learning model designed
for mobile and edge devices with resource constraints. It's known for its efficiency
and compact architecture while maintaining good performance in image
classification tasks.
3. Proposed Work
3.1 Requirement Analysis

3.1.1 Scope

The project aims to develop an object detection system using Keras tailored to assist visually
impaired individuals in navigating their surroundings independently. The scope
encompasses creating a real-time object recognition model capable of accurately identifying
diverse objects. Integrating Keras-based deep learning with auditory feedback mechanisms
is pivotal, translating visual information into accessible forms for the visually impaired. The
project also involves designing an intuitive user interface and optimizing the model for
efficiency, ensuring it can run on various devices. Continuous refinement, ethical testing,
and collaboration with the visually impaired community are integral to the project.
Documentation and dissemination of the findings will be crucial to encourage further
research and facilitate broader accessibility in assisting the visually impaired.

3.1.2 Feasibility Study

A feasibility study for an object detection project aimed at assisting visually impaired
individuals using Keras involves a comprehensive assessment of various key factors:

• Technical Feasibility: Determine if Keras, along with available hardware, is capable of


supporting real-time object detection and integrating feedback mechanisms for the
visually impaired
• Data Availability and Collection: Assess the availability and accessibility of a diverse
dataset suitable for training the object detection model, ensuring it covers a wide
range of objects relevant to the visually impaired.
• Model Training and Optimization: Evaluate the feasibility of training and optimizing a
model using Keras to achieve high accuracy in real-time object detection, considering
computational requirements.
• Integration of Feedback Mechanisms: Investigate the feasibility of integrating
auditory feedback systems to convert visual information into accessible formats for
the visually impaired.
• Testing and Validation: Plan for thorough testing involving visually impaired
individuals to validate usability, accuracy, and reliability of the system in real-world
scenarios.
• Resource Allocation and Timeline: Evaluate required resources, including human
resources, budget, and time to create a realistic project timeline considering the
project's complexity.
• Scalability and Future Development: Determine the potential for scalability and
future improvements, exploring possibilities for updates, adaptability to new
environments, or evolving user needs.
• Cost-Benefit Analysis: Conduct a comprehensive cost-benefit assessment to
understand financial implications and the potential societal impact for the visually
impaired community and stakeholders.

This feasibility study encompasses a thorough evaluation of technical, practical, ethical,


and financial aspects essential for determining the viability and potential success of the
object detection project tailored for visually impaired individuals using Keras.

3.1.3 Hardware & Software Requirements

Hardware Requirements:

1. Processor (CPU): A multi-core CPU (such as an Intel i7 or i9, or AMD Ryzen series) is
beneficial for faster training of complex models
2. Memory (RAM): (16 GB or More): Having at least 16GB of RAM is recommended,
especially for training larger models and handling sizable datasets.
3. Storage: Solid-State Drive(SSD): SSDs offer faster read/write speeds compared to
traditional hard disk drives (HDD), aiding in quicker data access and model training.

Software Requirements:

1. Python: A fundamental programming language for machine learning and computer


vision tasks.
2. Anaconda Installation Python 3.11: Anaconda typically includes popular Python
versions and provides a comprehensive platform for managing Python environments
and packages.
3. Git Installation Python 3.8.3 Install: GitPython is a python library used to interact
with git repositories, high-level like git-porcelain, or low-level like git-plumbing.
4. Deep Learning Frameworks: Choose a deep learning framework to implement and
train object detection models. Popular choices include:
• TensorFlow: TensorFlow provides various pre-built models and tools for
object detection, making it a preferred framework for such tasks.
• Keras: Keras is a high-level, deep learning API developed by Google for
implementing neural networks. It is written in Python and is used to make
the implementation of neural networks easy. It also supports multiple
backend neural network computation
5. Object Detection Frameworks/Libraries: Consider specific libraries or frameworks
designed explicitly for object detection:
OpenCV (Open Source Computer Vision Libray): Often used for image processing,
video analysis, and contains pre-trained models for object detection.
6. Other Supporting Libraries: NumPy for data manipulation, analysis, and visualization.
7. Installed pyttsx3 using pip: pyttsx3 is a text-to-speech conversion library in Python.
Unlike alternative libraries, it works offline and is compatible with both Python 2 and
3. It is a very easy to use tool which converts the entered text into speech. The
pyttsx3 module supports two voices first is female and the second is male which is
provided by “sapi5” for windows.

3.2 Problem Statement

Vision is critical in our daily lives. However, according to the World Health Organization
(2019), there were more than 2.2 billion people worldwide suffering from visual
impairment. These people are unable to view the surrounding objects unlike a person with
normal vision. They face challenges in detecting obstacles when navigating. Although there
are several options available for visually impaired people to help them when navigating,
such as white canes and advanced technologies, they still encounter a few problems when
accessing or using the tools. The challenge lay in creating an object detection system that
could accurately identify and locate multiple objects within complex scenes, handle various
object sizes and orientations, and operate in real-time. The project sought to address these
challenges using cutting-edge machine learning models and innovative implementation
techniques. Navigating the environment poses a considerable challenge for visually
impaired individuals due to their inability to visually recognize and interpret objects in their
surroundings. To address this, the project endeavors to develop an object detection system
using Keras, aiming to provide real-time assistance to the visually impaired. The absence of
immediate object recognition significantly hinders their mobility and independence,
emphasizing the need for an innovative solution. Leveraging deep learning methodologies
through Keras, this project seeks to create a system that can accurately identify a wide array
of objects in real time, subsequently conveying this information to the user through
auditory feedback. This system's primary goal is to bridge the gap between the visual world
and those with visual impairments, empowering them with crucial information about their
environment and enhancing their ability to navigate safely and independently.
3.3 Project Processing:

The program uses a camera for real time object detection for personalized objects of
visually impaired people. The program establishes the necessary tools for processing visual
information, numerical operations, and auditory feedback. The program captures video
frames, processes them through OpenCV, applies numerical operations using NumPy,
utilizes a pre-trained deep learning model for object detection through TensorFlow and
Keras, and finally, provides both visual and auditory feedback about the detected objects
using pyttsx3. This collaborative effort creates a system that continuously processes
webcam frames, detects objects, and provides feedback to the user.

Capturing and resizing the images:

OpenCV is a comprehensive computer vision library that provides tools for a wide range of
tasks related to understanding and processing images and videos. In this context, it's used to
handle the webcam feed. It initializes the camera, captures individual frames, and allows for
resizing them. Resizing is important because the deep learning model expects images of
specific dimensions for processing.

Processing the captured images:

NumPy is a fundamental library for numerical operations in Python. It provides support for
working with arrays and matrices, making it indispensable for handling image data. Within
this code, NumPy is crucial for converting images to numerical arrays. This conversion allows
for efficient numerical operations, such as normalization, which is essential for preparing
the images for processing by the deep learning model. Additionally, NumPy helps create an
empty array with the right shape to serve as input for the model. This is where the
processed image data will be stored before prediction.
Predicting the image class:

TensorFlow is a powerful numerical computation library, especially popular for deep


learning applications. Keras is a high-level API that runs on top of TensorFlow, making it
easier to build, train, and deploy neural network models. In this code, TensorFlow and Keras
are used to handle the pre-trained deep learning model. This model is capable of processing
images and making predictions about the objects present.

Audio Output:

Pyttsx3 is used to provide auditory feedback about the detected objects. Once an object is
identified, the program uses pyttsx3 to convert the text label (e.g., "Airpods") into spoken
words. This auditory feedback voices out the visual information displayed on the screen.

Working of Object Detection

Image Classification using CNN

A convolutional neural network (CNN) is a type of Artificial Neural Network(ANN) used in


image recognition and processing which is specially designed for processing data(pixels).
When you show it an image, it looks at it in small pieces, like how we focus on one part of a
picture at a time. Then, it learns what patterns to look for in these pieces, like edges, colors,
or shapes. Next, the CNN puts these patterns together, like building blocks, to figure out
what's in the whole image. The CNN gets better at this over time by practicing with lots of
images and learning which patterns are important for different things. It's like how we learn
to recognize objects better with practice! In the end, the CNN tells us what it thinks is in the
picture.
A CNN operates by mimicking the visual processing that occurs in the human brain. When
presented with an image, a CNN breaks it down into smaller, overlapping regions known as
"convolutional windows" or "kernels". These windows allow the network to focus on
specific localized features within the image, such as edges, corners, and textures.
Each convolutional window moves across the entire image, systematically scanning and
extracting patterns. Through a process called convolution, the network calculates a
weighted sum of pixel values within the window. These weights, or filters, are learned
through training and are designed to detect specific features. For instance, one filter might
be attuned to identifying diagonal edges, while another may recognize textures like fur or
scales.
As the CNN progresses through its layers, it organizes these extracted features into
increasingly complex hierarchies. Early layers identify basic features, while deeper layers
combine them to recognize more intricate structures, such as shapes or object parts. This
hierarchical feature extraction allows the CNN to discern complex patterns within the
image.
Subsequently, the network undergoes a pooling operation, which reduces the spatial
dimensions of the extracted features. Pooling helps to retain the most salient information
while minimizing computational complexity. This process is akin to zooming out on an image
to capture its essential characteristics.
The learned features are then flattened and fed into a traditional artificial neural network
for further processing. This fully connected network interprets the hierarchical features and
performs the final classification. Through a process called backpropagation, the network
adjusts its internal parameters (weights and biases) during training to improve accuracy.
In summary, a CNN employs a sophisticated series of mathematical operations to
systematically analyize and interpret the visual content of an image. By utilizing
convolutional layers, pooling, and fully connected layers, the network learns to recognize
complex patterns and make accurate classifications. This capability finds applications in a
wide range of tasks, from identifying objects in images to more advanced computer vision
applications.
3.4 Methodology

• Data Collection and Preparation: Gather a diverse dataset comprising various object
classes, ensuring it covers objects significant for the visually impaired. Preprocess
and annotate the data for model training.
• Model Selection and Training:
▪ Select a suitable pre-trained model architecture from Keras or design a custom
model architecture. Ensure it is optimized for real-time object detection and
capable of integration with auditory or haptic feedback systems.
▪ Augment the dataset to increase its diversity and size. Train the selected or
custom-built model using the prepared dataset. Fine-tune the model for accurate
object recognition
• Integration of Text-to-Speech (TTS) Module: Implement mechanisms to convert
visual information into auditory feedback. This involves integrating the model
outputs with text-to-speech notification systems for user accessibility.
• Real-Time Object Detection: Real-time object detection involves instantly
recognizing and identifying objects within a continuous stream of data, such as live
video. It utilizes algorithms and models to swiftly process frames, analyzing and
identifying various objects as they appear, with minimal delay, allowing immediate
responses or notifications. This rapid identification is crucial in providing timely
information, particularly in dynamic or time-sensitive environments, making it highly
relevant for applications aimed at aiding visually impaired individuals.
• Audio Feedback Generation: Audio feedback generation is the process of creating
sound-based information or notifications in response to certain events or actions. It
involves generating auditory cues or alerts to convey information, provide guidance,
or signal changes in a system. For visually impaired individuals, audio feedback can
be used to describe surroundings, convey important notifications, or assist in
navigation by converting visual data into spoken information. This audio feedback
aims to enhance accessibility and understanding of the environment for those who
rely on auditory cues due to visual impairment.
4. Results:
4. Conclusion and Future Scope:
Conclusion:

Developing an object detection system for visually impaired individuals is not just a
technological endeavor; it is a crucial step toward enhancing accessibility and
independence. By employing cutting-edge technology, this project aims to provide a
sophisticated tool that empowers those with visual impairments to navigate their
surroundings more confidently and safely.
The integration of computer vision and machine learning algorithms enables the system to
identify and interpret objects in real-time, translating visual information into auditory
feedback. This groundbreaking technology has the potential to revolutionize the daily lives
of visually impaired individuals, granting them greater autonomy and freedom in their
interactions with the environment.
Through this project, the overarching goal is to bridge the gap between the sighted and the
visually impaired, fostering inclusivity and equality in accessing information and the
surrounding world. The ultimate vision is not only to provide object recognition but also to
cultivate a sense of security, enabling individuals with visual impairments to engage more
fully in various activities and situations, ultimately enhancing their quality of life.
Continued advancements and refinements in this technology hold promise for a future
where the visually impaired can navigate the world with increased ease, confidence, and
independence. While the current system marks a significant milestone, ongoing
improvements and broader accessibility will be key in ensuring its widespread adoption and
usefulness to the visually impaired community.

Future Scope:

The future scope for AI and neural networks in object detection is vast. As these
technologies continue to evolve, their integration will become even more integral to a wide
range of applications, leading to increased efficiency, precision, and automation across
industries. Our work in these areas aims to contribute to this exciting and extensive future.

1. Integration with Wearable Devices:


Merging object detection with wearable devices like smart glasses or watches
enables instant real-time object identification. It provides hands-free assistance for
tasks, aids visually impaired individuals, and enhances security. This integration
requires a balance of tech innovation, user-centered design, and ethical
considerations for efficient and private object recognition.
2. Object Identification Customization:
Custom object identification allows users to tailor the system to recognize specific
objects or patterns of interest, offering a more personalized experience. This
customization enhances precision in various applications, empowering users with
tailored solutions for specific industries or tasks.
3. Integration voice Assistants: Combining object detection with voice assistants
enables users to interact verbally with the system, acquiring real-time information
about recognized objects through voice commands. This integration streamlines
interaction with the environment, offering a more intuitive way for users to engage
with their surroundings.
4. Exapnsion of Recognized Object Classes: Expanding recognized object classes
broadens the range of identifiable objects in real-world environments, allowing the
system to detect and categorize a wider array of items. This enhancement fosters
more comprehensive recognition, making applications more inclusive and precise in
real-world scenarios.

5. References:
1. IEEE Paper:
https://ieeexplore.ieee.org/document/9342049
https://ieeexplore.ieee.org/document/8987942
2. https://www.geeksforgeeks.org/region-proposal-object-detection-with-opencv-
keras-and-tensorflow/
3. https://keras.io/guides/keras_cv/object_detection_keras_cv/
4. https://www.tensorflow.org/hub/tutorials/object_detection

You might also like