Lee Rgbd11 Visually Impaired
Lee Rgbd11 Visually Impaired
Lee Rgbd11 Visually Impaired
Impaired
Young Hoon Lee
G rard Medioni
e
AbstractWe present a wearable RGB-D camera based navigation system for the visually impaired. The navigation system
is expected to enable the visually impaired to extend the range
of their activities compared to that provided by conventional
aid devices, such as white cane. Since this design is a successor
version of a previous stereo camera based system to overcome a
limitation of stereo vision based systems, algorithmic structure
of the system is maintained. In order to extract orientational
information of the blind users, we incorporate visual odometry
and feature based metric-topological (Simultaneous Localization
And Mapping) SLAM into our system. We build a vicinity map
based on dense 3D data obtained from RGB-D camera, and
perform path planning to provide the visually impaired with
3D traversability on the map. The 3D traversability analysis
helps subjects steer away from obstacles in the path. A vest-type
interface consisting of four microvibration motors delivers queues
for real-time navigation with obstacle avoidance. Our system
operates at 12 15Hz, and helps the visually impaired improve
the mobility performance in a cluttered environment. The results
show that navigation performance in indoor environments with
the proposed lightweight and affordable RGB-D camera improves
compared to a stereo-vision based system.
I. I NTRODUCTION
Visual impairment hinders a persons essential and routine
activities [3]. Furthermore, low vision or complete vision
loss severely lowers functional independence of the visually
impaired. It also highly reduces their ability and willingness
to travel independently.
About 109,000 people with vision loss in the U.S. used
long canes for navigation and obstacle avoidance purposes
[10]. However, the device has limited functionality in crowded
public areas. Most importantly, the relatively short range of the
device means there is still a high risk of collision, because the
visually impaired can avoid obstacles only when they make
contact with them.
To provide answers to this problem and improve the mobility of people with vision loss, recent work has proposed
utilizing various types of sensors to replace the white cane.
These include ultrasonic [1, 5] and laser [20].
Recently, a real-time wearable, stereo-vision based, navigation system for the visually impaired [12] was proposed.
The system is known to be the rst wearable system. It
consists of a head-mounted stereo camera and a vest-type
interface device with four tactile feedback effectors. The headmounted design enables blind users to stand and scan the
Fig. 1. A RGB-D camera on the top left and a tactile vest interface device
on the bottom left. RGB-D camera operates with USB interface and tactile
interface system communicates via Zigbee wireless network
scene to integrate wide-eld view information, whereas waistmounted or shoulder-mounted systems require body rotation.
Furthermore, a head mounted device matches the frame of
reference of the person, allowing relative position commands.
The system uses a stereo camera as a data acquisition device
and implements a realtime SLAM algorithm, an obstacle
avoidance algorithm, and a path planning algorithm. Based
on the algorithms mentioned above, an appropriate cue is
generated and delivered at every frame to the tactile sensor
array to alert the user to the presence of obstacles, and provide
a blind user with guidance along the generated safe path.
The authors carried out experiments to evaluate the mobility
performance of blindfolded and truly blind users. The experiments were designed to evaluate the effectiveness of tactile
cuing in navigation compared to widely used white canes. The
experiment results indicate that the tactile vest system is more
efcient at alerting blind users to the presence of obstacles
and helping blind subjects avoid collisions than the white cane.
The navigation performance was proved successful by showing
trajectories generated by the proposed navigation system are
the closest to the ideal trajectories from sighted subjects.
The main limitation of the system is due to inherent
shortcomings of stereo vision systems. For example, depth
maps extracted by stereo camera systems in a low textured
environment such as white walls are not accurate enough to
Visual Odometry
SLAM
Path Planning
Proximity
Alert Mode
FAST Corner
Detection
Interface
No
KLT Tracking
RANSAC
Motion Estimation
3D Depth
Information
Rao Blackwellized
Particle Filter
2
5
Update
Traversability Map
Safe Path
Generation
Safe Path
Available?
Zigbee
Communication
3
Yes
Cue
Generation
Fig. 2.
System Overview
Fig. 3. FAST corners extracted are shown as green dots. KLT tracking
estimates optical ow, and estimated camera motion is represented in red
bars. Four consecutive image frames as a subject made a left turn and moved
forward in a corridor are represented. Numbers in each gure represent the
order of the scene. As seen in the top left gure, red bars head left which
indicates that the subject is making a left turn. When a subject has nished
making a turn and starts moving forward, red bars point toward the front
direction as shown in the bottom right gure.
B. Metric-Topological SLAM
Consistent and accurate 3D location of the camera is important to perform robust path planning. SLAM algorithms help
to prevents accumulation of errors by continuously estimating
and correcting the camera location based on known landmarks
and to maintain accurate position of the camera. However, as
we expect to navigate a large area, the number of landmarks
to be observed and registered would be very large which
would slow down the SLAM process. Hence, we adopt the
metric-topological SLAM approach as described in [11]. The
approach has two different levels to describe an environment,
metric and topological. The metric (local) level estimates the
6 dimensional camera location and orientation information
st and sparse map mt given a feature observation z t such
as KLT/SIFT or DAISY and camera motion estimation ut
until frame t. This can be represented using a standard RaoBlackwellized FastSLAM framework [7, 8, 9] as follows.
p(st , mt |z t , ut ) p(st |z t , ut )
(2)
Way Point
Shortest Path
Current
position
Fig. 5. Free spaces(green areas), occupied spaces(red areas), current position
of a blind user(yellow dot), the way point (purple dot), generated path (white
line), and search region (inside blue triangle).
III. R ESULTS
We present the result of the navigation system using the
Primesensor in an indoor environment. Our navigation system
runs at 12 15 Hz on the following conguration.
CPU: Intel(R) Xeon Quad Core @ 3.72GHz
RAM: 3.00 GB
OS: Windows XP - 32 bit
Fig. 7. Disparity map experiments in low textured areas. Left: RGB camera
images. Center: Disparity maps generated by a stereo camera. Right: Disparity
maps generated by a RGB-D camera.
Fig. 6. Visual odometry and SLAM result. Left: RGB-D camera image
with feature points extracted. Center: Traversability map with path nding
algorithm. White bar show the safe path to the way point. Right: Calculated
cue based on traversability map (Go straight).
B. Traversability
The goal of the system is to guide blind users without
collision with obstacles on the way to a destination. To verify
the performance of the navigation system, experiments were
conducted for a simple navigation task. A subject starts from
one end of a corridor and aims to reach the other end of
the corridor. As shown in Fig. 8 and Fig. 9, the experiment
environment has solid walls and reective oors, which is
quite common in indoor environments. Both experiments with
RGB-D camera and Stereo Camera involved about 350 frames.
Fig. 8 suggests that the 3D depth map from the stereo camera
is inaccurate and results in inconsistency of the traversability
map in the experiment area. From the 2nd row of Fig. 8 and
on, errors in building the traversability map of an environment
accumulate. This spurious information of the map causes the
Dense information
Navigation fails!
Fig. 8. Four snapshots of traversability map of the system with stereo camera
AND
F UTURE
WORK
We have presented an integrated system using RGB-D camera to improve navigation performance of the stereo camera
based system proposed in [12] in low textured environments.
Navigation and
Mapping improved
Fig. 9.
camera
R EFERENCES
[1] J. Borenstein and I. Ulrich. The GuideCane - A computerized travel aid for the active guidance of blind pedestrians. In IEEE Int. Conf. on Robotics and Automation,
pages 12831288, 1997.
[2] M. Fischler and R. Bolles. Random Sample Consensus:
A paradigm for model tting with applications to image
analysis and automated cartography. Communications of
the ACM, 24(6):381395, 1981.
[3] R. G. Golledge, J. R. Marston, and C. M. Costanzo.
Attituds of visually imparied persons towards the use
of public transportation. Journal of Visual Impairment
Blindness, 91(5):446459, 1997.
[4] S. Koenig and M. Likhachev. Fast replanning for navigation in unknown terrain. IEEE Transactions on Robotics,
3(21):354363, 2005.
[5] B. Laurent and T. N. A. Christian. A sonar system
modeled after spatial hearing and echolocating bats for
blind mobility aid. Int. Journal of Physical Sciences, 2
(4):104111, 2007.
[6] B. Lucas and T. Kanade. An iterative image registration
technique with an application to stereo vision. In Int.
Joint Conf. on Articial Intelligence, pages 674679,
1981.
[7] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit.
FastSLAM: A factored solution to the simultaneous
localization and mapping problem. In Proc. of the AAAI
National Conf. on Articial Intelligence, pages 1151
1156, 2002.
[8] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit.
FastSLAM 2.0: An improved particle ltering algorithm
for simultaneous localization and mapping that probably
converges. In Int. Joint Conf. on Articial Intelligence,
pages 11511156, 2003.
[9] K. Murphy. Bayesian map learning in dynamic environments. In Neural Information Processing Systems, pages
10151021, 1999.
[10] JVIB news service. Demographics update: Use of white
long canes. Journal of Visual Impairment Blindness,
88(1):45, 1994.
[11] V. Pradeep, G. Medioni, and J. Weiland. Visual loop
closing using multi-resolution SIFT grids in metrictopological SLAM. In IEEE Conf. on Computer Vision
and Pattern Recognition, pages 14381445, 2009.
[12] V. Pradeep, G. Medioni, and J. Weiland. Robot vision for
the visually impaired. In Computer Vision Applications
for the Visually Impaired, pages 1522, 2010.
[13] D. Qiuand, S. May, and A. N chter. GPU-Accelerated
u
nearest neighbor search for 3d registration. In Proc. of
the 7th Int. Conf. on Computer Vision Systems: Computer
Vision Systems, pages 194203, 2009.
[14] E. Rosten and T. Drummond. Fusing points and lines
for high performance tracking. In IEEE Int. Conf. on
Computer Vision, pages 15081511, 2005.
[15] E. Rosten and T. Drummond. Machine learning for high-
[16]
[17]
[18]
[19]
[20]