0% found this document useful (0 votes)
20 views9 pages

Smart AI Cane Project

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 9

Smart AI Cane

Abstract
As per the WHO report, about 30 million people are permanently blind and 285 billion
people are suffering from vision impairment. Often, they rely on the stick to ease their
day-to-day travel. This particular project helps them to ease their travel, by creating a
more interactive and advanced version of that stick.
Our Smart AI Cane guides them throughout their way to the destination by generating
haptic feedback and audio output. It detects the proximity of the obstacle. When the
person is close to the obstacle, it commands an alert message with haptic feedback. In
case we have an object lying at a distance, it detects, what type of object is in front and
information regarding that object is provided to the person in the form of audio. Apart
from that, the stick is capable of extracting the text from a captured image with the help
of a 5MP camera and provide an audio output through a 3.5mm audio jack
incorporated with the stick.
The motive behind designing this stick is to provide visual aid to a blind person by
yielding audio responses and giving them a more vivid idea of their surroundings.

I. Introduction
We have developed a “Smart Cane” for visually impaired people, using an advanced AIML
(Artificial Intelligence & Machine Learning) approach where we have used our handy, SBC
(Single Board Computer) Raspberry Pi 4, incorporated with a 5MP Camera, which enables our
stick to detect, read or alert our user, according to the surrounding circumstances.

1
More details about the Models we developed and used in this AI Smart Cane:

 OCR (Optical Character Reader):


Here we used OpenCV (Open Computer Vision) as our Image Processing Library,
whereas, pytesseract is used to fetch the text from the given image, which is a library
used to interact with the Tesseract application developed by Google in 2006. This
application contains LSTM (Long Short-Term Memory) RNN (Recurrent Neural
Network) based Neural Network Model which helps to fetch the text present in the
image. Pyttsx3 used for vocalizing the text. We have tried to develop a system which
would recognize the text and dictate it to the blind person.
For example, in case we have a shop or a hoarding close to the person such that the
name of the shop or the text on the hoarding lies in the range of the camera then our
stick would try to fetch the text and read it out to the user.
This takes us to the beginning of the working of the OCR where the input image is
sent for pre-processing.

2
The Image Pre-processing

Here we have used certain methods imported from OpenCV to pre-process our
image, so that the edges of the text become more visible for our Tesseract Engine.

I. cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
Parameters:
o img - It is the image which is to be eroded.
o cv2.COLOR_RGB2BGR - It is the color space conversion code.

OpenCV takes the image in BGR order (Blue, Green, and Red), so to make the
image according to the OpenCV’s input, we convert the image from RGB to BGR
format by using the above code. Other .cvtColor methods like BGR2GRAY,
RGB2BGR, etc. are used to change the color format of the image as well.

3
II. cv2.erode(img, kernel, iterations=1)

Parameters:
o img - It is the image which is to be eroded.
o kernel - A structuring element used for erosion.
o iterations - It is number of times erosion is applied.

cv2.erode() method is used to perform erosion on the image. By eroding the


image, we try to detect the edges in the image so that only those edges will be
detected which are distinctively visible. It erodes away the boundaries of the
foreground object and is used to reduce the features of an image.

The kernel here is created with the help of


kernel = np.ones((2, 1), np.uint8)

Here, np.ones() is numpy function used to create a matrix containing ‘1’ as all of
its elements with the provided dimensions of the matrix and the input data type in
the parameter.

Working of erosion:

o A kernel (a matrix whose size is taken to be odd (3,5,7)) is convolved with


the image.
o A pixel in the original image (either 1 or 0) will be considered 1 only if all
the pixels in that kernel are 1, otherwise, it is eroded (made to zero).
o Thus, all the pixels near the boundary will be discarded depending upon
the size of the kernel.
o So, the thickness or size of the foreground object decreases or simply the
white region decreases in the image.

III. cv2.dilate(img, kernel, iterations=1)

Parameters:
o img - It is the image which is to be eroded.
o kernel - A structuring element used for erosion.
o iterations - It is number of times erosion is applied.

It is the opposite of cv3.erode(). It increases the object area and is used to


accentuate the features.

The kernel can be made by using the same procedure that was used in erode. We
are going to use the same kernel size that we used in erode as the change in kernel
size can drastically affect the image features.

4
Working of dilation:

o A kernel (a matrix whose size is taken to be odd (3,5,7)) is convolved with


the image.
o A pixel in the original image is ‘1’ if at least one pixel in that kernel is ‘1’.
o It increases the white region in the image or the size of the foreground
object increases.

After this, the pre-processed image is sent to the Tesseract Engine, which detects the
text and give us the output, which is then, with the help of our Python Text to Speech
(pyttsx3) library is conveyed to the user.

 OBD (Object Detection):


Here only OpenCV is used to detect random object coming in the way of the blind
person, by detecting the item from the self-made labels. Our OBD detects the items
appearing in front of our user and would try to recognize and label them according to our
object_label.txt file. These labels are then conveyed by our Python Speech Generator
(pyttsx3) to the user.

# Explanation

5
 Obstacle Avoidance System (OBV):
If an object appears to be close to our user, it generates an emergency alert message
saying, “Something in front!! Please Wait !!”, until the object gets at a safe distance
from the user. It is used for detecting objects present in our user’s proximity. It
safeguards the user from hitting into an obstacle.

The basic working of Obstacle Avoidance System starts, after we determine whether
the detected object is covering large area in front of the camera or not. If the object is
very close to our user, then it is obvious, that it will cover large area and we can
trigger the emergency alert.

Object Detection and Obstacle Avoidance System


Flowchart

6
 GPS (Global Positioning System) Module:
We used GPS Module for Raspberry Pi; it sends the longitude and latitude of the
current position which is redirected to our site. We have used NEO 6M GPS Module +
ESP8266 on our Smart Cane, which sends the current position of the stick to our user’s
relatives, so that they can ensure our user’s safety.

The working of GPS Module starts by supplying power to our Node MCU ESP 8266 +
NEO 6M GPS Module via Raspberry Pi 4, This module is responsible for detecting the
current longitudinal and latitudinal position of the stick. After calibrating itself with the
satellite, it starts broadcasting the co-ordinates of the stick. These co-ordinates are then sent to
the ThingSpeak API Channel via ThingSpeak API. The co-ordinates are stored in the
channel, which is later fetched by the backend services. The co-ordinates are then sent to the
Google Maps API to display the current location of the stick on the Web Interface (Web
Application or Android Application). This application can be accessed by the end user to
trace the location of the stick.

Working of GPS Module (Flowchart)

7
Additional Features

8
For added security, we have also created an Emergency SOS Push Button System, which will send
the location information link along with a SOS message, it clicks 5 images of the scene from the
camera and records audio for about 15 seconds, which is sent to the emergency contacts, assigned on
the initial setup. Haptic Feedback is also installed, which gives our cane a more realistic touch.

Experimental Setup

 We have used Python for OCR, OBD and OBV AI models


 The Arduino IDE for flashing the code on our Node MCU ESP 8266 and Neo 6M GPS
using C/C++ programming language, ESP8266WiFi and TinyGPS++.h library.
 Raspberry Pi 4 is used as SBC (Single Board Computer) to run all the AI model, which is
connected with a Raspberry Pi Cam (5MP), taking the live image input and feeding it to our
model.

II. Related Work


As one can search on the web, the internet is floating with lots of obstacle avoidance cane using
Ultrasonic Sensors or Infrared Sensors. These are good at times, but lack on other features which
our Smart AI Cane can provide, like the OCR can read the text out aloud, OBD can tell what type of
object is appearing in front of our user, OBV shoots the alert message, instead of a normal buzzer. All
the interaction are made through both haptic feedback mechanism and audio output where the python
text to speech generator speaks out like a normal human being, which makes the cane for human-like.
The GPS Module helps to track the location of the user.

Furthermore, having a CPU inside, the cane can be updated over the Internet as this cane is a
wonderful combination of IoT and AIML. If there are some software upgrades, then they can be
easily made available to our users in this age of internet. Additional AI models can also be applied
later on, and services like Alexa Voice Assistant and Google Voice Assistant can make this cane more
feature full, for example, our user can listen to their favourite tunes on the go, or they can call their
loved ones, by just asking our assistant to initiate a call, can search about the facts, location,
temperature, weather, etc as well.

You might also like