Sem 5 Proj Report Group
Sem 5 Proj Report Group
PROJECT REPPORT
ON
OBJECT DETECTION
Submitted in partial fulfilment of the requirements
of the degree of
Bachelor of Engineering
In
Computer Engineering
by
KRUTIKA BHIDE
(Roll No. 04 )
SANJANA BHOSLE
(Roll No. 09)
GAYATRI HADKAR
(Roll No. 26)
Supervisor(s):
Prof. Nilima patil
This is to certify that the project entitled “Object detection” is a bonafide work of
Dr.Vilas Nitnaware
Head of Department Principal
Project Report Approval for T.E.(Font Size-20)
Examiners
1.---------------------------------------------
2.---------------------------------------------
Date:
Place:
DECLARATION
We declare that this written submission represents our ideas in our own
words and where others' ideas or words have been included, we have
adequately cited and referenced the original sources. We also declare that
we have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any
idea/data/fact/source in our submission. We understand that any
violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken
when needed.
_________________________
(Signature)
_______________________
Date:
DECLARATION
I declare that this written submission represents our ideas in our own
words and where others' ideas or words have been included, we have
adequately cited and referenced the original sources. We also declare that
we have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any
idea/data/fact/source in our submission. We understand that any
violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken
when needed.
_________________________
(Signature)
_______________________
Date:
ACKNOWLEDGEMENT
We would like to express special thanks of gratitude to our guide Mrs. NILIMA PATIL &
co-guide (if any) Mr. /Mrs. (Name of co-guide) as well as our Project Coordinator
(Name)gave us the golden opportunity to do this wonderful project on the topic of OBJECT
DETECTION, which also helped us in doing a lot of research and we came to know about
so many new things. We are very grateful to our Head of the Department MAHESH
MAURYA for extending her help directly and indirectly through various channels in our
project work. We would also like to thank Principal
Dr. Vilas Nitnaware for providing us the opportunity to implement our project. We are
really thankful to them. Finally we would also like to thank our parents and friends who
helped us a lot in finalizing this project within the limited time frame.
Thanking You.
TABLE OF CONTENT
Certificate ..............................................................................................i
Declaration …….......................……………………………………….iii
Acknowledgement ……………………………………..…..................iv
Abstract ....…………………………………………………………….viii
1. Introduction …………………………………………………………………1
2. Literature Survey…………………………………………………………………..
3. Proposed Work
3.1 Requirement Analysis ……………………………………………….
3.1.1 Scope ……………………………………………….
3.1.2 Feasibility Study ………………………………………
3.1.3 Hardware & Software Requirement …………………………
3.2 Problem Statement ………………………………………………………..
3.3 Project Design……………………………………………………………...
3.4 Methodology………………………………………………………………
5. Conclusion and Future Scope…………………………………………………..
References ……………………………………………………………....
LIST OF FIGURES
1 Project Processing 6
ABSTRACT
Introducing the "Personalized Object Detection Model for Visually Impaired Individuals" - a
specialized tool crafted exclusively for individuals with visual impairments. This innovative
model harnesses the power of advanced technologies to provide tailored assistance.
Imagine having a tool that not only understands the objects around you but also recognizes
the ones that matter most to you. It's like having a personal detective for objects! When you
use your webcam, this program acts like a set of super-fast eyes, quickly identifying objects
in view. It even knows that different objects may look different, adjusting images
accordingly. Plus, it can talk! Upon recognizing an object, it provides immediate feedback.
This tool is designed with your needs in mind, offering enhanced confidence and
independence for your daily activities. By combining OpenCV, NumPy, TensorFlow, Keras,
and pyttsx3 libraries, this program seamlessly processes video frames. OpenCV ensures
smooth webcam operation, while NumPy handles essential numerical operations.
TensorFlow and Keras utilize a personalized deep learning model to recognize objects
important to you. Additionally, pyttsx3 delivers customized audio feedback, providing
instant and tailored information about your surroundings. This groundbreaking solution
represents a significant stride in assistive technology, empowering visually impaired
individuals with a newfound level of autonomy. It focuses on detecting objects that matter
most to you, facilitating easier navigation and greater effectiveness in your environment.
1. Introduction
Living with visual impairment presents unique challenges, particularly when it comes to
perceiving and navigating the environment. Recognizing objects, a task often taken for
granted, can be a crucial aspect of daily independence. Object detection technology, rooted
in cutting-edge computer vision and machine learning, offers a transformative solution. It
enables real-time identification of objects through visual input, allowing individuals with
visual impairments to receive immediate feedback about their surroundings. This feedback,
often delivered through auditory or tactile means, bridges the gap between the visual world
and non-visual perception. This technology serves as a vital link, providing individuals with
visual impairments the information they need to navigate their surroundings with
confidence and accuracy.
Furthermore, object detection systems play a pivotal role in enhancing safety for visually
impaired individuals. By alerting them to the presence of obstacles or objects in their path,
these systems mitigate potential hazards and facilitate safer navigation. This heightened
level of awareness not only increases physical safety but also contributes to a greater sense
of overall well-being and independence.
Additionally, the accessibility provided by object detection technology extends beyond
physical safety. It opens up new avenues for educational and vocational opportunities. With
the ability to independently identify and interact with their environment, visually impaired
individuals can more actively participate in learning and work environments, leveling the
playing field and empowering them to pursue their goals with greater autonomy.
In summary, object detection technology is a powerful tool that transcends the boundaries
imposed by visual impairment. By providing real-time feedback and enhancing safety, it
revolutionizes the way visually impaired individuals interact with their surroundings. This
technology is not only about increasing accessibility; it's about fostering confidence,
independence, and a brighter outlook on life for those with visual impairments.
2. Literature survey
• TensorFlow is a popular open-source machine learning framework developed by
Google. It provides a comprehensive ecosystem of tools, libraries, and community
resources for building and deploying machine learning models.
• Keras, on the other hand, is a high-level neural networks API that runs on top of
TensorFlow (and other backends). It's known for its simplicity and ease of use,
allowing developers to quickly prototype and build neural network models.
• MobileNet, specifically MobileNetV2, is a lightweight deep learning model designed
for mobile and edge devices with resource constraints. It's known for its efficiency
and compact architecture while maintaining good performance in image
classification tasks.
3. Proposed Work
3.1 Requirement Analysis
3.1.1 Scope
The project aims to develop an object detection system using Keras tailored to assist visually
impaired individuals in navigating their surroundings independently. The scope
encompasses creating a real-time object recognition model capable of accurately identifying
diverse objects. Integrating Keras-based deep learning with auditory feedback mechanisms
is pivotal, translating visual information into accessible forms for the visually impaired. The
project also involves designing an intuitive user interface and optimizing the model for
efficiency, ensuring it can run on various devices. Continuous refinement, ethical testing,
and collaboration with the visually impaired community are integral to the project.
Documentation and dissemination of the findings will be crucial to encourage further
research and facilitate broader accessibility in assisting the visually impaired.
A feasibility study for an object detection project aimed at assisting visually impaired
individuals using Keras involves a comprehensive assessment of various key factors:
Hardware Requirements:
1. Processor (CPU): A multi-core CPU (such as an Intel i7 or i9, or AMD Ryzen series) is
beneficial for faster training of complex models
2. Memory (RAM): (16 GB or More): Having at least 16GB of RAM is recommended,
especially for training larger models and handling sizable datasets.
3. Storage: Solid-State Drive(SSD): SSDs offer faster read/write speeds compared to
traditional hard disk drives (HDD), aiding in quicker data access and model training.
Software Requirements:
Vision is critical in our daily lives. However, according to the World Health Organization
(2019), there were more than 2.2 billion people worldwide suffering from visual
impairment. These people are unable to view the surrounding objects unlike a person with
normal vision. They face challenges in detecting obstacles when navigating. Although there
are several options available for visually impaired people to help them when navigating,
such as white canes and advanced technologies, they still encounter a few problems when
accessing or using the tools. The challenge lay in creating an object detection system that
could accurately identify and locate multiple objects within complex scenes, handle various
object sizes and orientations, and operate in real-time. The project sought to address these
challenges using cutting-edge machine learning models and innovative implementation
techniques. Navigating the environment poses a considerable challenge for visually
impaired individuals due to their inability to visually recognize and interpret objects in their
surroundings. To address this, the project endeavors to develop an object detection system
using Keras, aiming to provide real-time assistance to the visually impaired. The absence of
immediate object recognition significantly hinders their mobility and independence,
emphasizing the need for an innovative solution. Leveraging deep learning methodologies
through Keras, this project seeks to create a system that can accurately identify a wide array
of objects in real time, subsequently conveying this information to the user through
auditory feedback. This system's primary goal is to bridge the gap between the visual world
and those with visual impairments, empowering them with crucial information about their
environment and enhancing their ability to navigate safely and independently.
3.3 Project Processing:
The program uses a camera for real time object detection for personalized objects of
visually impaired people. The program establishes the necessary tools for processing visual
information, numerical operations, and auditory feedback. The program captures video
frames, processes them through OpenCV, applies numerical operations using NumPy,
utilizes a pre-trained deep learning model for object detection through TensorFlow and
Keras, and finally, provides both visual and auditory feedback about the detected objects
using pyttsx3. This collaborative effort creates a system that continuously processes
webcam frames, detects objects, and provides feedback to the user.
OpenCV is a comprehensive computer vision library that provides tools for a wide range of
tasks related to understanding and processing images and videos. In this context, it's used to
handle the webcam feed. It initializes the camera, captures individual frames, and allows for
resizing them. Resizing is important because the deep learning model expects images of
specific dimensions for processing.
NumPy is a fundamental library for numerical operations in Python. It provides support for
working with arrays and matrices, making it indispensable for handling image data. Within
this code, NumPy is crucial for converting images to numerical arrays. This conversion allows
for efficient numerical operations, such as normalization, which is essential for preparing
the images for processing by the deep learning model. Additionally, NumPy helps create an
empty array with the right shape to serve as input for the model. This is where the
processed image data will be stored before prediction.
Predicting the image class:
Audio Output:
Pyttsx3 is used to provide auditory feedback about the detected objects. Once an object is
identified, the program uses pyttsx3 to convert the text label (e.g., "Airpods") into spoken
words. This auditory feedback voices out the visual information displayed on the screen.
• Data Collection and Preparation: Gather a diverse dataset comprising various object
classes, ensuring it covers objects significant for the visually impaired. Preprocess
and annotate the data for model training.
• Model Selection and Training:
▪ Select a suitable pre-trained model architecture from Keras or design a custom
model architecture. Ensure it is optimized for real-time object detection and
capable of integration with auditory or haptic feedback systems.
▪ Augment the dataset to increase its diversity and size. Train the selected or
custom-built model using the prepared dataset. Fine-tune the model for accurate
object recognition
• Integration of Text-to-Speech (TTS) Module: Implement mechanisms to convert
visual information into auditory feedback. This involves integrating the model
outputs with text-to-speech notification systems for user accessibility.
• Real-Time Object Detection: Real-time object detection involves instantly
recognizing and identifying objects within a continuous stream of data, such as live
video. It utilizes algorithms and models to swiftly process frames, analyzing and
identifying various objects as they appear, with minimal delay, allowing immediate
responses or notifications. This rapid identification is crucial in providing timely
information, particularly in dynamic or time-sensitive environments, making it highly
relevant for applications aimed at aiding visually impaired individuals.
• Audio Feedback Generation: Audio feedback generation is the process of creating
sound-based information or notifications in response to certain events or actions. It
involves generating auditory cues or alerts to convey information, provide guidance,
or signal changes in a system. For visually impaired individuals, audio feedback can
be used to describe surroundings, convey important notifications, or assist in
navigation by converting visual data into spoken information. This audio feedback
aims to enhance accessibility and understanding of the environment for those who
rely on auditory cues due to visual impairment.
4. Results:
4. Conclusion and Future Scope:
Conclusion:
Developing an object detection system for visually impaired individuals is not just a
technological endeavor; it is a crucial step toward enhancing accessibility and
independence. By employing cutting-edge technology, this project aims to provide a
sophisticated tool that empowers those with visual impairments to navigate their
surroundings more confidently and safely.
The integration of computer vision and machine learning algorithms enables the system to
identify and interpret objects in real-time, translating visual information into auditory
feedback. This groundbreaking technology has the potential to revolutionize the daily lives
of visually impaired individuals, granting them greater autonomy and freedom in their
interactions with the environment.
Through this project, the overarching goal is to bridge the gap between the sighted and the
visually impaired, fostering inclusivity and equality in accessing information and the
surrounding world. The ultimate vision is not only to provide object recognition but also to
cultivate a sense of security, enabling individuals with visual impairments to engage more
fully in various activities and situations, ultimately enhancing their quality of life.
Continued advancements and refinements in this technology hold promise for a future
where the visually impaired can navigate the world with increased ease, confidence, and
independence. While the current system marks a significant milestone, ongoing
improvements and broader accessibility will be key in ensuring its widespread adoption and
usefulness to the visually impaired community.
Future Scope:
The future scope for AI and neural networks in object detection is vast. As these
technologies continue to evolve, their integration will become even more integral to a wide
range of applications, leading to increased efficiency, precision, and automation across
industries. Our work in these areas aims to contribute to this exciting and extensive future.
5. References:
1. IEEE Paper:
https://ieeexplore.ieee.org/document/9342049
https://ieeexplore.ieee.org/document/8987942
2. https://www.geeksforgeeks.org/region-proposal-object-detection-with-opencv-
keras-and-tensorflow/
3. https://keras.io/guides/keras_cv/object_detection_keras_cv/
4. https://www.tensorflow.org/hub/tutorials/object_detection