0% found this document useful (0 votes)
219 views

Pothole Detection Using FPGA

This document is a project report submitted for a Master's degree that aims to develop a low-cost system using an FPGA to detect potholes on Indian roads in real-time using computer vision. Road accidents are a major problem in India and potholes are a significant contributing factor. The project will use image processing algorithms on an FPGA to identify potholes from camera images more efficiently than software-based approaches. The goal is to provide a solution for detecting and avoiding potholes to help reduce accidents.

Uploaded by

Abhisha Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
219 views

Pothole Detection Using FPGA

This document is a project report submitted for a Master's degree that aims to develop a low-cost system using an FPGA to detect potholes on Indian roads in real-time using computer vision. Road accidents are a major problem in India and potholes are a significant contributing factor. The project will use image processing algorithms on an FPGA to identify potholes from camera images more efficiently than software-based approaches. The goal is to provide a solution for detecting and avoiding potholes to help reduce accidents.

Uploaded by

Abhisha Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Low Cost – FPGA based system for pothole detection on

Indian Roads

M. Tech Project Report – I stage


Submitted in partial fulfillment of the requirements for the degree
of
Master of Technology

by
Shonil Vijay
Roll No: 05329001

under the guidance of


Dr. Kavi Arya

Kanwal Rekhi School of Information Technology


Indian Institute of Technology, Bombay
July 2006
Acknowledgments
I express my deep sense of gratitude to Dr. Kavi Arya, for his
invaluable help and guidance during the first stage of project. I am highly
indebted to them for constantly encouraging me by giving their critics on my
work. I am grateful to him for having given me the support and confidence.

Shonil Vijay
July 2006,
Indian Institute of Technology, Bombay

ii
Abstract
The objective of this project is to develop a low cost vision-based driver assistance
system over FPGA, to provide a solution of detection and avoidance of potholes in the
path of a vehicle on road. The project will use an FPGA model to deploy image
processing algorithms efficiently so that the output can be achieved in real time.

Pothole avoidance may be considered similar to other obstacle avoidance except that
the potholes are depressions rather than extrusions from a surface. A vision approach was
used since the potholes were different visually from the background surface.

The detection of defects in road surfaces is necessary for locating a pothole. The
system as a whole will be far more efficient than common methods such as manual
inspection and almost as accurate.

iii
Contents
1. Introduction…………………………………………………………. 1
1.1 Motivation……………………………………………………... 1
1.2 Problem Statement……………………………………………... 2

2. Design Choices………….…………………………………………... 3
2.1 PC Digital Signal Processing Programs…….…………………. 3
2.2 Application Specific Integrated Circuits………………………. 4
2.3 Dedicated Digital Signal Processors…………………………... 4
2.4 Field Programmable Gate Arrays……………………………... 5

3. Obstacle Detection Methods………………………………………... 5


3.1 Off-Road Obstacle Detection……………....…………………. 5
3.2 Optical Flow…………………………..………………………. 7
3.3 Stereo Vision…………………………………………………... 9
3.4 RADAR (Radio Detection and Ranging).…………………….. 10
3.4 LADAR (Laser Detection and Ranging)……………………… 10
3.6 Other Vision-Based Detection Modalities…………………….. 12

4. DSP (Digital Signal Processing) Algorithms.…………………….... 12


4.1 Introduction to Windowing Operators ...………………………. 13
4.2 Rank Order Filter…………...………………………………….. 13
4.3 Morphological Operators..……………………………………... 14
4.4 Convolution……………………………………………………. 16

5. Summary and Future Work………………...………………………... 17

6. References………………………………………………………….... 17

iv
1. Introduction
1.1 Motivation
Interest in Collision Avoidance Systems comes from the problems caused by traffic
congestion worldwide and a synergy of new information technologies for simulation,
real-time control and communications networks. Traffic congestion has been increasing
world-wide as a result of increased motorization, urbanization, population growth and
changes in population density. Congestion reduces utilization of the transportation
infrastructure and increases travel time, air pollution, fuel consumption and most
importantly traffic accidents.

Worldwide, around 1.2 million people died as a result of road traffic accidents in
2002. This represents an average of 3,242 persons dying each day around the world. In
addition to these deaths, around 50 million people globally are estimated to be injured or
disabled every year. Projections indicate that these figures will increase by about 65%
over the next 20 years. Road accidents are currently world's eleventh leading cause of
death, but by 2020, it will become third, behind deaths linked to heart disease and mental
illness. In United States alone, around 6.2 million traffic accidents occur due to
automobile crashes in 2004. These accidents accounted for 42,636 deaths and 2,788,000
nonfatal injuries. 31,693 occupants of passenger cars were killed in traffic crashes and an
additional 2,543,000 were injured, accounting for 85% of all fatalities and 95% of all
injuries. In 2000, the economic cost of motor vehicle crashes was $230.6 billion.

While different factors contribute to vehicle crashes, driver behavior is considered to


be the leading cause of around 95% percent of all accidents. More recent data has
identified inattention, including distraction, “looked but did not see” and falling asleep at
the wheel as the primary cause of accidents, accounting for at least 25% of the crashes.
The driver behavior can be improved by either alerting him about the probable collision
or controlling the vehicle itself. Car companies are already introducing incremental
improvements (although largely in luxury models) such as collision avoidance, GPS
navigation, roadway monitoring and wireless communications. In the works are Lane
Departure Warning Systems, and other neat smart capabilities. The most visionary
proposal is the “driverless vehicle” which, armed with GPS (global positioning system)
and anti-collision radar etc., is driven by an on-board computer linked to the traffic
control center and thus automatically uses the best route to the chosen destination.

At present, the commercially available traffic detecting equipments include loop


detectors, pressure sensors, infrared, radar, ladar or ultrasound based sensors and video
cameras. Although loop detectors are cheap to manufacture, their installation and repair
are very expensive because they involve digging and re-surfacing of the road, which is
labor intensive, time consuming and causes disruption to the traffic. The pressure based
traffic sensors have the same problem. Infrared, radar and ultrasound sensors, on the
other hand, are more expensive to make. The use of these active devices in urban areas
may have safety and other regulatory implications. In addition, their functionality can be
affected by bad weather. Video cameras have long been deployed at key traffic
bottlenecks. The effectiveness of this type of sensors can also be affected by bad weather.

-1-
Also the images have to be processed further to obtain useful data from them. There are a
number of research projects on-going aiming to extract objective traffic condition
indicators from the video images. But the huge volume of data involved in video images
dictates that such systems requires substantial computation power, and consequently, will
not be low cost devices for a considerable period of time. Also these laser beam or radar
based systems adds $1500 to $3000 to the costs of a car, and thus are being used in
luxury cars only. Finally GPS based systems have very high operational costs as they
need to obtain the precise positions of the vehicles in every few milliseconds.

1.2 Problem Statement


An examination of automobile accident statistics in India reveals that, despite the
declining number of fatalities, the number of accidents resulting in injury is steadily
increasing. The major cause of traffic accidents is “Improper Driving” due to driver’s
inattention and fatigue. Another major contribution to these mishaps on roads is the
increasing rate of defects on these roads. These defects are a result of the increasing
amount of traffic and the climatic conditions, both of which can not be changed.

Fig 1: Example defects. From left, cracking, popouts, wear and polishing and
potholes.
The different types of defects can be briefly described as:
• Cracking – areas where the road surface has been split apart, classified as
longitudinal transverse and crocodile or combination cracking.
• Pop-outs – areas where pressure from blow has forced the surface of the road to
be raised.
• Wear and Polishing – areas of smooth black tar occurring on asphalt surfaces.
• Potholes – areas where there is an absence of asphalt or a hole. Potholes usually
emerge from areas where there has been severe cracking.

Of the above different types of defects, the one of particular interest and the problem
at hand are the potholes. Early detection of cracks in road surfaces, allows maintenance to
be performed before cracks develop into more serious problems, such as potholes and
pop-outs.

This research discusses a solution for the automatic detection and avoidance of such
potholes in the path of an autonomous vehicle operating in an unstructured environment.
Pothole avoidance may be considered similar to other obstacle avoidance except that the
potholes are depressions rather than extrusions from a surface.

A vision approach is used since the potholes were significantly different visually from
the background surface. According to electromagnetic spectrum, nowadays vision based

-2-
researches can be categorized into three classes according to the range of operating
wavelength, namely, visible spectrum region (0.38~0.8µm), Near Infra-Red region (NIR)
(0.8~1.2µm), and Far Infra-Red region (FIR) (7~14µm).

However, human cannot perceive the wavelength of the above region expect the
visible spectrum, and they can see the NIR or FIR images through the sensors which are
capable of detecting energy radiated by a band of the electromagnetic spectrum. For cost
and performance considerations, the CCD camera which operates in the visible spectrum
band will be taken as our sensing device to develop this system.

Recently, Field Programmable Gate Array (FPGA) technology has become a viable
target for the implementation of algorithms suited to video image processing applications.
The unique architecture of the FPGA has allowed the technology to be used in many such
applications encompassing all aspects of video image processing.

As image sizes and bit depths grow larger, software has become less useful in the
video processing realm. Real-time systems such as those that are the target of this project
are required for the high speeds needed in processing video. DSP systems are being
employed to selectively reduce the amount of data to process, ensuring that only relevant
data is passed on to a human user. Eventually, it is expected that most video processing
can and will take place in DSP systems, with little human interaction. This is obviously
advantageous, since humans are perhaps not entirely accurate.

2. Design Choices
There are several different choices a designer has when implementing a DSP system
of any sort. Hardware, of course, offers much greater speed than a software
implementation, but one must consider the increase in development time inherent in
creating a hardware design. Most software designers are familiar with C, but in order to
develop a hardware system, one must either learn a hardware design language such as
VHDL or Verilog, or use a software-to-hardware conversion scheme, such as Streams-C,
which converts C code to VHDL, or MATCH, which converts MATLAB code to VHDL.
While the goals of such conversion schemes are admirable, they are currently in
development and surely not suited to high-speed applications such as video processing.
Ptolemy is a system that allows modeling, design, and simulation of embedded systems.
Ptolemy provides software synthesis from models. While this type of system may be a
dominant design platform in the future, it is still under much development, meaning that
it may not be a viable design choice for some time. A discussion on the various viable
options for DSP system design is found below.

2.1 PC Digital Signal Processing Programs


Signal processing programs used on a PC allow for rapid development of algorithms,
as well as equally rapid debug and test capabilities. It is common for many hardware
designers to use some sort of PC programming environment to implement a design to
verify functionality prior to a lengthy hardware design.

-3-
MATLAB is such an environment. Although it was created for manipulating
matrices in general, it is well suited to some image processing applications. MATLAB
treats an image as a matrix, allowing a designer to develop optimized matrix operations
implementing an algorithm. However, if the eventual goal is a hardware device, the
algorithms are instead often written to operate similarly to the proposed hardware system,
which results in an even slower algorithm.

However, even specialized image processing programs running on PCs cannot


adequately process large amounts of high-resolution streaming data, since PC processors
are made to be for general use. Further optimization must take place on a hardware
device.

2.2 Application Specific Integrated Circuits


Application Specific Integrated Circuits (ASICs) represent a technology in which
engineers create a fixed hardware design using a variety of tools. Once a design has been
programmed onto an ASIC, it cannot be changed. Since these chips represent true,
custom hardware, highly optimized, parallel algorithms are possible. However, except in
high-volume commercial applications, ASICs are often considered too costly for many
designs. In addition, if an error exists in the hardware design and is not discovered before
product shipment, it cannot be corrected without a very costly product recall.

2.3 Dedicated Digital Signal Processors


Digital Signal Processors (DSPs) such as those available from Texas Instruments are
a class of hardware devices that fall somewhere between an ASIC and a PC in terms of
performance and design complexity. They can be programmed with either assembly code
or the C programming language, which is one of the platform’s distinct advantages.
Hardware design knowledge is still required, but the learning curve is significantly lower
than some other design choices, since many engineers have knowledge of C prior to
exposure to DSP systems. However, algorithms designed for a DSP cannot be highly
parallel without using multiple DSPs. Algorithm performance is certainly higher than on
a PC, but in some cases, ASIC or FPGA systems are the only choice for a design. Still,
DSPs are a very common and efficient method of processing real-time data.

One area where DSPs are particularly useful is the design of floating point systems.
On ASICs and FPGAs, floating-point operations are rather difficult to implement. For the
scope of this project, this is not an issue because all images consist of only integer data.

Recent advances in DSP technology have resulted in very high-speed algorithm


implementations. While the advantages of ASICs and FPGAs are still applicable, this
new generation of DSPs has made some engineers reconsider FPGA development. Still,
as new DSPs arrive to the market, so do new FPGAs, and it is expected that the two
architectures will have similarly increasing performance for each new generation of
processors.

-4-
2.4 Field Programmable Gate Arrays
Field Programmable Gate Arrays (FPGAs) represent reconfigurable computing
technology, which is in some ways ideally suited for video processing. Reconfigurable
computers are processors which can be programmed with a design, and then
reprogrammed (or reconfigured) with virtually limitless designs as the designer’s needs
change. FPGAs generally consist of a system of logic blocks (usually look up tables and
flip-flops) and some amount of Random Access Memory (RAM), all wired together
using a vast array of interconnects. All of the logic in an FPGA can be rewired, or
reconfigured, with a different design as often as the designer likes. This type of
architecture allows a large variety of logic designs dependent on the processor’s
resources), which can be interchanged for a new design as soon as the device can be
reprogrammed.

Today, FPGAs can be developed to implement parallel design methodology, which


is not possible in dedicated DSP designs. ASIC design methods can be used for FPGA
design, allowing the designer to implement designs at gate level. However, usually
engineers use a hardware language such as VHDL or Verilog, which allows for a design
methodology similar to software design. This software view of hardware design allows
for a lower overall support cost and design abstraction.

The goal of this project is for real-time (30 frames per second) processing of
grayscale image data, a goal in which an FPGA system using parallel algorithms should
have little difficultly achieving.

3. Obstacle Detection Methods


There are a variety of competing methods that have been proposed for obstacle
detection and many papers written on such methods. There have been few papers,
however, which have offered more than a trivial comparison of the various methods,
especially in any analytical fashion. This chapter seeks to fill that gap by examining the
various obstacle detection methods for a highway environment from an analytic point of
view. By examining each of the basic methods, I hope to illuminate which methods ought
to work best for the specific problem of on-road obstacle detection in real time.

3.1 Off-Road Obstacle Detection


There has also been extensive work on the off-road, or cross-country, obstacle
detection problem. Two natural questions to ask are: Can we learn anything from this
body of work? Can we use the same methods? The answers are “yes” and “no.” We may
categorize the requirements for an obstacle detection system into two basic types: system
requirements and detect ability/sensor requirements. The system requirements try to
abstract from the specific sensor and deal with response and throughput of the system.
The detect ability requirements deal with the accuracy and resolution of the sensor and
are often at odds with the system requirements. A quick comparison of the off-road and
on-road problems will show that we cannot use the off-road detection methods without
serious modification and still hope to succeed. However, much of the basic system
analysis still holds.

-5-
Basic system analysis has been neglected from most obstacle detection papers, but it
is necessary to examine both the detection requirements and the system requirements as a
whole so that adequate trade-offs can be made to ensure good system performance. This
chapter will explore several basic methods used for on-road obstacle detection including
optical flow and stereo, but I will also briefly consider the possibility of other detection
modalities.

Through no lack of effort, there have been no satisfactory solutions so far to the
problem of small static obstacle detection at highway speeds. Although there have been
good results published for vehicle detection, there have been few results reported on the
detection of small static obstacles. Moreover, these few results have generally been
reported in vague terms which give the reader little ability to compare methods. Papers
on cross-country obstacle detection systems are typically no better at reporting results in
a standardized way, although this is probably more excusable since rough terrain is
difficult to describe in an analytical fashion and an obstacle may be less well-defined in
cross-country applications.

There has been some success in cross-country obstacle detection, so it is worth


briefly examining the problem. Typical speeds for cross-country applications are on the
order of 5 to 10 m.p.h. Typical highway speeds, however, are many times this. Since
stopping distance is proportional to the square of the speed, the stopping distance for on-
road applications is much larger, perhaps by 2 orders of magnitude. In addition, a cross-
country vehicle moving at relatively slow speeds can climb larger objects without
damage to the vehicle than the typical passenger car traveling on the highway. For the
highway scenario, we must be able to detect any objects larger than a few inches.
Coupling these facts together, we see that the resolution necessary for on-road obstacle
detection may be 2 to 3 orders of magnitude greater if we use standard techniques from
cross-country with similar field-of-view. Given higher speeds, we must also examine a
greater amount of terrain. Although the effect of latencies on look-ahead distance is
generally dwarfed by the stopping distance for highway applications, it is still important
to have small latencies in the processing system. High latency systems can cause
problems in vehicle control.

These difficulties might make the on-road problem seem almost intractable given
that the off-road obstacle detection problem is still far from being solved. Fortunately,
however, roads are reasonably planar. Given our previous analysis, it should be quite
clear that we need to use this fact in order to win in this scenario. Assuming planarity of
the road improves our signal-to-noise ratio greatly. This simplifies a number of things.
We don’t need to build maps, estimate heights of objects, and check for rollover
conditions, etc. We only need to find areas which violate the planarity constraint and
avoid them.

For many of the computer vision methods I describe, there are both dense and
discrete methods. Dense methods provide some estimate (range, optical flow, etc.) at
every image pixel. Discrete methods first use some feature detection method to pick out
candidate potential obstacles and then attempt to calculate range, optical flow, or some

-6-
other measure in order to determine whether a feature (or group of features) actually
belongs to an obstacle.

In the near future, it is likely that only discrete methods will be able to achieve the
low processing latencies necessary for highway obstacle detection without special
purpose hardware. For this reason, this chapter focuses on discrete methods, although I
do not ignore the dense methods since these will become feasible as computers become
faster.

I name the process of finding obstacles once given a set of image features or patches,
obstacle determination. I focus on the problem of obstacle determination in this chapter.
Because an image is a mapping of 3-D space to 2-D space, a single image can not
determine whether an image patch belongs to the ground plane or not (Fig 2). Additional
information, in the form of more images or certain assumptions or a priori knowledge
about potential obstacles, must be used to perform obstacle determination. If an image
feature or patch belongs to the ground plane, then it is not a threat. If it does not,
however, it should be considered an obstacle and avoided. I refer to the obstacle
determination problem throughout the chapter.

Fig 2: A single image cannot distinguish whether image patch C is caused by A (an obstacle), or B (a
ground plane feature, i.e. no obstacle). We call the problem of using additional information to disambiguate
the two obstacle determination.

3.2 Optical Flow


One method of performing obstacle determination is optical flow. Optical flow uses
the motion of either the objects or the camera to determine the structure of the
environment. Optical flow uses two or more images taken at different times to perform
obstacle detection. The basic concept of using optical flow for obstacle determination is
illustrated below (Fig 3). A point in the image, y0, can be mapped to either obstacle point
P or ground point Q (or any point in between) at time t0. At time t1, the camera takes
another image after the vehicle has moved forward by distance d. A matching technique
or flow calculation is employed to map y0 to a point in the new image. If this point is

-7-
close to yobs, then it is likely that it corresponds to an obstacle. If it is closer to ygnd,
however, then it most likely corresponds to a ground-plane point. By examining the flow
(in the y-direction in the illustrated case), i.e. the movement of y0 at t0 to a new point in
the second image, the algorithm can theoretically detect whether the point belongs to an
obstacle or to the ground plane.

Fig 3: A point y0 on the image plane at time t0 can be mapped to either point P or Q in the world (or any
point in-between). Another image taken at time t1 after the camera has moved a distance d can be used to
disambiguate between P and Q depending on whether it is mapped to yobs or ygnd.

There has been a number of optical flow or similar techniques proposed in the
obstacle detection literature. Although some of these techniques have had success in
detecting large static obstacles, optical flow methods are best at detecting moving objects
which have motion fields significantly different from the background. Another vehicle,
for example, can be detected and tracked using optical flow techniques, even if it is
moving with low relative velocity with respect to our own vehicle because it is still
moving with respect to the background. This may be sensed by comparing the direction
of the calculated flow vector with the direction of the model flow vector. I briefly
describe several methods for calculating optical flow, and then illustrate why optical flow
is too insensitive to be effectively used in the basic static obstacle determination problem
for small obstacles at large distances.

Edge-based methods have been used with success for detecting vehicles and tracking
them over time. Typically, these methods involve an edge extraction step followed by a
dynamic programming method to match edges between images. The algorithms tend to
run faster than analytical methods since they compute one flow vector only for every
edge rather than every pixel. Reliable edges can be matched fairly robustly in these
schemes because of the length and orientation constraints that can be used in matching.

-8-
One problem with these methods is that the flow component along the edge is generally
unreliable insofar as the edge-detectors themselves often find broken lines, etc. Broken
lines can cause mismatches that make the flow component normal to the edge wrong too,
but this happens less frequently.

Pixel-based discrete methods, on the other hand, only compute the optical flow at
points where the flow is fully constrained without resorting to smoothness constraints.
Typically, the algorithm will choose local intensity minima or maxima or use an interest
operator, e.g. Moravec interest operator, to select points in an image. Then the algorithm
matches points using correlation similar to that in stereo matching. Using the observation
that under a planar ground assumption, an obstacle only changes the magnitude and not
the direction of the flow vector, it is possible to constrain the search to a single dimension
along the direction of the model flow vector (with the drawback that it may not allow the
detection of moving objects). The flow vector for the pixel is then just the vector from the
original pixel position to the location of the maximum correlation in the new image.

3.3 Stereo Vision


Binocular stereo vision has been a popular approach for obstacle detection in both
cross-country and on-road scenarios. Stereo can be calculated in either a dense or discrete
(point or edge-based) manner, but its calculation is the same in all cases. Discrete stereo
vision is very similar to discrete optical flow except that a spatial baseline is used rather
than a temporal one. For point-based discrete stereo methods, an identical approach can
be used to choose the features for correspondence as in optical flow, i.e. an operator
which finds local minima, maxima, or corners. A correlation approach is then used to
perform correspondence between the features in the first image and features in the second
image (hopefully the same). In edge-based methods, the same edges that are matched in
optical flow calculations can be used for matching in stereo algorithms.

As in optical flow, the search space for a corresponding point in the second image
can be reduced to one dimension. In the case of stereo, the match must fall along the epi-
polar line. Given a single image, the same features are found and essentially the same
matching procedure is run to match features to the second image, although the search is
made along different directions. Stereo has the advantage, however, that if the cameras
are situated with the optical axes perpendicular to the line connecting the optical centers,
the search space is constrained to the same image row. This makes stereo simpler to
implement and faster than optical flow. The conceptual difference is that one set of
images is taken with a spatial base line, and one with a temporal baseline. I show that in
these highway scenarios, the spatial baseline provides better detect ability.

-9-
Fig 4: Stereo camera setup. The distance of the point from the cameras is proportional to the baseline
and inversely proportional to the disparity
3.4 RADAR (Radio Detection and Ranging)
Radar is an excellent means of detecting other vehicles because radar works at long
ranges and is relatively unaffected by rain or snow. The radar was capable of detecting
vehicles at distances of up to 200 meters with a range resolution of approximately 0.1
meters. The sensor had a 3o vertical field of view and a 12 o horizontal field of view.
Bearing to a target could be estimated via wavefront reconstruction, and when combined
with geometric information about the road, potential obstacles could be mapped to an
individual lane. Since radars provide direct range and may also provide a doppler velocity
measurement, they will most likely be a standard sensor for automated vehicles.

Unfortunately, current radars are not able to reliably detect small objects at ample
distances. Metal surfaces are good radar reflectors, and hence make vehicle detection
fairly easy. The ease with which an object may be detected at a given range is related to
its radar cross section. Vehicles have a much larger radar cross section (10 m2) than
people (0.2 to 2m2), and most road debris will have an even smaller radar cross section,
making them undetectable.

3.5 LADAR (Laser Detection and Ranging)


Although we are unaware of any systems that use laser reflectance for obstacle
detection, there have been many systems which have used laser range measurements for
obstacle detection. Range measurements can be either indirect or direct.

Laser striping is an indirect ranging method that has been used on a number of
robots (especially indoor robots). Laser striping uses a fixed CCD camera and a laser
stripe (visible to the CCD) that is swept across a scene. Triangulating between the known
direction of the laser and the direction provided by the laser stripe position on the CCD
provides a range estimate at each image row. Laser striping can provide 3-D geometry
fairly quickly since the laser only needs to be scanned in one direction, and computation
is simple. There are a couple problems with laser striping. First, the laser must be easily
detectable within the image (significantly stronger than ambient light). As with any
triangulation system, range accuracy improves as the distance between the laser and CCD

- 10 -
is increased. However, as this distance is increased, the problem of “shadowing” worsens.
Shadowing occurs when points visible to the laser may not be visible to the camera and
vice versa. For the simplest systems (a beam diverged by a cylindrical lens), laser striping
requires more laser power than direct methods and hence is most useful indoors or at
short distances.

Laser rangefinders avoid the shadowing problem by keeping the transmitted and
received beams approximately co-axial, and measuring range directly. A 3-D laser
scanner operates by sweeping a laser across the scene in two dimensions. At each pixel,
the instrument measures the time that it takes for a laser beam to leave the sensor, strike a
surface, and return. There are several methods for measuring the time. Many sensors also
provide an intensity measurement at each pixel by measuring the energy of the returned
laser signal. Thus, a full sweep in two-dimensions can provide both a depth map and an
intensity image.

Once the laser provides a depth map for the scene, the data is typically transformed
into an elevation map based on the known position of the sensor. Terrain-typing
algorithms may then be run on the resulting elevation map, and discrete obstacles may be
detected by looking for discontinuities in the elevation map. However, there are a number
of problems with this method. First, it is computationally expensive. Second, the discrete
grid size cannot represent vertical surfaces, and hence may miss some potentially
dangerous obstacles.

Most ranging software systems have been image-based, meaning that the system
waited for an entire laser image to be acquired and processed before obstacles were
reported to the path generation module. Hebert proposed a system that processes range
data pixel by pixel to reduce the latencies involved and improve system efficiency. A
pixel by pixel method also reduces the dependency on a particular range sensor, since
methods which use entire images are tuned to the specific field-of-view and geometry of
the sensor. Reported latencies were reduced to under 100 ms or under. With the addition
of a planning module, the system was demonstrated in multi-kilometer traverses through
unknown terrain.

Despite problems with previous laser range methods, laser range would likely
provide adequate obstacle detection capability for highway environments given adequate
laser power. The distance between an obstacle and the road surface behind it is the same
as in the stereo vision system. A laser sensor mounted at a height of 1.5 meters looking at
the road at 60 meters would see an obstacle of 20 cm at a range of 52 meters. Given
adequate power, this difference should be easily detectable by a laser rangefinder. Our
current laser scanner is not powerful enough to provide reliable range estimates from the
road at these distances. The real problem with laser rangefinders currently is that they are
too expensive as a solution for obstacle detection for consumer vehicles.

- 11 -
3.6 Other Vision-Based Detection Modalities
Although the most promising obstacle detection methods have been discussed, there
is the potential for new methods based on numerous visual cues. At least 12 depth cues
are available to humans, only two of which are binocular. And there are other cues
unrelated to depth that are used for obstacle detection (shape, color, luminance, etc.).
Unfortunately, many of the depth cues operate at distances shorter or longer than the
distances with which we are concerned (on the order of 50 meters). Most are also difficult
to encode in a computer program because they require a priori knowledge: typical size of
the obstacle, etc. It would be impractical to teach a system what every potential obstacle
might look like. Nevertheless, some of these other cues may be useful to future obstacle
detection systems. One depth cue that can easily be used on highways to estimate the
distance to a suspected obstacle, however, is linear perspective effects. Linear perspective
states that the distance between two points subtends a smaller angle at larger depths.
Since we know approximately the width of the road, we can use the linear perspective
effects to estimate the distance to a given image row. Relative height may be used as a
depth cue because given level ground, objects which are closer to the horizon are also
farther away. Once calibrated using linear perspective effects, the vertical location of an
object in the image could be used to estimate its distance.

4. DSP (Digital Signal Processing) Algorithms


Based on the computational and communication characteristics, computer vision
tasks can be divided into a three-level hierarchy, namely low-level, intermediate-level,
and high-level.
• Low-level vision tasks consist of pixel-based operations such as filtering and edge
detection. The tasks at this level are characterized by a large amount of data
(pixels), small neighborhood operators, and relatively simple operations (e.g.,
multiply and add).
• The pixel grouping operations, such as segmentation and region labeling, are
intermediate-level vision tasks. These tasks are again characterized by local data
access but more complex pixel operations.
• High-level vision tasks are more decision oriented, such as point matching, tree
matching, and graph matching. These tasks are characterized by non-local data
access and nondeterministic and complex algorithms. Communication
requirements of vision tasks tend to be unstructured in intermediate and high level
vision tasks.

Several examples of vision tasks belonging to this three-level hierarchy are shown in
Table 1.

This project will be focused on developing hardware implementations of image


processing algorithms for use in an FPGA -based video processing system. This section
discusses these algorithms.

- 12 -
Task level Computational Examples
characteristics
Low Small neighborhood Edge detection,
data access, simple filtering,
operations, large image morphology
amount of data
Intermediate Small neighborhood, Hough transform,
more complex connected
operations component
labeling, relaxation
High Nonlocal data access, Point matching, tree
nonpolynomial and matching, graph
complex operations matching
Table 1: Examples of the three-level vision task hierarchy

4.1 Introduction to Windowing Operators


In image processing, several algorithms belong to a category called windowing
operators. Windowing operators use a window, or neighborhood of pixels, to calculate
their output. For example, windowing operator may perform an operation like finding the
average of all pixels in the neighborhood of a pixel. The pixel around which the window
is found is called the origin. Fig. 5 shows a 3 by 3 pixel window and the corresponding
origin.

Fig 5: Pixel Window and Origin

The work for this project is based on the usage of image processing algorithms using
these pixel windows to calculate their output. Although a pixel window may be of any
size and shape, a square 3x3 size was chosen for this application because it is large
enough to work properly and small enough to implement efficiently on hardware.

4.2 Rank Order Filter


The rank order filter is a particularly common algorithm in image processing
systems. It is a nonlinear filter, so while it is easy to develop, it is difficult to understand
its properties. It offers several useful effects, such as smoothing and noise removal. The
median filter, which is a rank order filter, is especially useful in noise removal.

4.2.1 Algorithm
This filter works by analyzing a neighborhood of pixels around an origin pixel, for
every valid pixel in an image. Often, a 3x3 area, or window, of pixels is used to calculate
its output. For every pixel in an image, the window of neighboring pixels is found. Then
the pixel values are sorted in ascending, or rank, order. Next, the pixel in the output

- 13 -
image corresponding to the origin pixel in the input image is replaced with the value
specified by the filter order. Fig. 6 shows an example of this algorithm for a median filter
(order 5), a filter that is quite useful in noise filtering.

Fig. 6: Graphic Depiction of Rank Order Filter Operation

As is evident in the above figure, it is possible to use any order up to the number of
pixels in the window. Therefore a rank order filter using a 3x3 window has 9 possible
orders and a rank order filter using a 5x5 window has 25 possible orders. No matter what
the window size used in a particular rank order filter, using the middle value in the sorted
list will always result in a median filter. Similarly, using the maximum and minimum
values in the sorted list always results in the flat dilation and erosion of the image,
respectively. These two operations are considered part of the morphological operations,
and are discussed in the next sub-section.

4.3 Morphological Operators


The term morphological image processing refers to a class of algorithm that is
interested in the geometric structure of an image. Morphology can be used on binary and
grayscale images, and is useful in many areas of image processing, such as
skeletonization, edge detection, restoration, and texture analysis.

A morphological operator uses a structuring element to process an image. We


usually think of a structuring element as a window passing over an image, which is
similar to the pixel window used in the rank order filter. Similarly, the structuring
element can be of any size, but 3x3 and 5x5 sizes are common. When the structuring
element passes over an element in the image, either the structuring element fits or does
not fit. At the places where the structuring element fits, we achieve a resultant image that
represents the structure of the image. Fig. 7 demonstrates the concept of a structuring
element fitting and not fitting inside an image object.

- 14 -
Fig. 7: Concept of Structuring Element Fitting and Not Fitting
Algorithm

There are two fundamental operations in morphology: erosion and dilation. It is common
to think of erosion as shrinking (eroding) an object in an image. Dilation does the
opposite; it grows the image object. Both of these concepts depend on the structuring
element and how it fits within the object. For example, if a binary image is eroded, the
resultant image is one where there is a foreground pixel for every origin pixel where it’s
surrounding structuring element-sized fit within the object. The output of a dilation
operation is a foreground pixel for every point in the structuring element at a point where
the origin fits within an image object.

Grayscale morphology is more powerful and more difficult to understand. The


concepts are the same, but instead of the structuring element fitting inside a two -
dimensional object, it is thought to either fit or not fit within a three -dimensional object.
Grayscale morphology also allows the use of grayscale structuring elements. Binary
structuring elements are termed flat structuring elements in grayscale morphology. The
combination of grayscale images and grayscale structuring elements can be quite
powerful.

One of the strongest features of morphological image processing extends from the fact
that the basic operators, performed in different orders, can yield many different, useful
results. For example, if the output of an erosion operation is dilated, the resulting
operation is called an opening. The dual of opening, called closing, is a dilation followed
by erosion. These two secondary morphological operations can be useful in image
restoration, and their iterative use can yield further interesting results, such as
skeletonization and granulometries of an input image.

Grayscale erosion and dilation can be achieved by using a rank order filter as well.
Erosion corresponds to a rank order filter of minimum order, and dilation corresponds to
a rank order filter of maximum order. The reason for this is that the result of a minimum
order ran k order filter is the minimum value in the pixel neighborhood, which is exactly
what an erosion operation is doing. This also holds true for a maximum order rank order
filter and a dilation operation. However, the rank order filter only works as a
morphological operation with a flat structuring element. This is because the rank order
filter window works as a sort of structuring element consisting of all ones. Still, this is a

- 15 -
powerful feature, since grayscale morphology using flat structuring elements accounts for
the most common usage of morphology.

In a grayscale image, erosion tends to grow darker areas, and dilation tends to grow
lighter areas. Opening and closing each tend to emphasize certain features in the image,
while de -emphasizing others. Iteratively, the morphological operations can be used to
pick out specific features in an image, such as horizontal or vertical lines.

4.4 Convolution
Convolution is another commonly used algorithm in DSP systems. It is from a class
of algorithms called spatial filters. Spatial filters use a wide variety of masks, also known
as kernels, to calculate different results, depending on the function desired. For example,
certain masks yield smoothing, while others yield low pass filtering or edge detection.

4.4.1 Algorithm
The convolution algorithm can be calculated in the following manner. For each input
pixel window, the values in that window are multiplied by the convolution mask. Next,
those results are added together and divided by the number of pixels in the window. This
value is the output for the origin pixel of the output image for that position.

The input pixel window is always the same size as the convolution mask. The output
pixel is rounded to the nearest integer. As an example, Fig. 9 shows an input pixel
window, the convolution mask, and the resulting output. This convolution mask in this
example is often used as a noise –cleaning filter.

Fig. 9: Convolution Algorithm Example

Convolution Output
= (50*1 + 10*1 + 20*1 + 30*1 + 70*2 + 90*1 + 40*1 + 60*1 + 80*1)/9 = 57.7778
=> 58

The results for this algorithm carried over an entire input image will result in an
output image with reduced salt-and-pepper noise. An important aspect of the convolution
algorithm is that it supports a virtually infinite variety of masks, each with its own
feature. This flexibility allows for many powerful uses.

- 16 -
5. Summary and Future Work
Many of the key imaging functions break down into highly repetitive tasks that are
well suited to modern FPGAs, especially those with built-in hardware multipliers and on-
chip RAM. The remaining tasks require hardware more suited to control flows and
decision making such as DSPs.

To gain the full benefit of both approaches, systems are required that can effectively
combine both FPGAs and DSPs. With the addition of standard imaging functions written
in either VHDL or C all of the key building blocks are available for making an image
processing system. Using the DSP functions provided with the camera we can implement
the hardware for the project.
This research has been an excellent learning experience in various engineering
disciplines. The Digital Signal Processing algorithms are now to be implemented in
MATLAB so that the results can be compared as well as simulations can be done for
optimum performance.

6. References
[1] Hussain, Z.: “Digital Image Processing – Practical Applications of Parallel
Processing Techniques,” Ellis Horwood, West Sussex, UK, 1991.

[2] Pratt, W.: “Digital Image Processing,” Wiley, New York, NY, 1978.

[3] Ratha, N. K. & Jain, A. K.: “Computer Vision Algorithms on Reconfigurable Logic
Arrays,” IEEE, 1999

[4] Evans, T.: “Semi-Automated Detection of Defects in Road Surfaces,” Thesis,


Monash University, 2004

[5] Ethans, P. M.: “Real-Time Image Processing on a Custom Computing Platform,”


IEEE, 1995

[6] Kruger, W. & Enkelman, W.: “Real-Time Estimation and Tracking of Optical Flow
Vectors for Obstacle Detection,” Germany

[7] Nakai, H. & Takeda, N.: “A Practical Stereo Scheme for Obstacle Detection in
Automotive Use,” IEEE, 2004

[8] Wang, C. & Huang S.: “Driver Assistance System for Lane Detection and Vehicle
Recognition with Night Vision,” National Taiwan University, Taipei, Taiwan

[9] Hancock, J.: “Laser Intensity-Based Obstacle Detection and Tracking,” Thesis,
Carnegie Mellon University, Pittsburgh, Pennsylvania

- 17 -

You might also like