Eecs 2014 117
Eecs 2014 117
Eecs 2014 117
Computers
Sunil Shah
Abstract
In 2012 a federal mandate was imposed that required the FAA to integrate unmanned aerial
systems (UAS) into the national airspace (NAS) by 2015 for civilian and commercial use. A significant
driver for the increasing popularity of these systems is the rise in open hardware and open software
solutions which allow hobbyists to build small UAS at low cost and without specialist equipment.
This paper describes our work building, evaluating and improving performance of a vision-based
system running on an embedded computer onboard such a small UAS. This system utilises open
source software and open hardware to automatically land a multi-rotor UAS with high accuracy.
Using parallel computing techniques, our final implementation runs at the maximum possible rate of
30 frames per second. This demonstrates a valid approach for implementing other real-time vision
based systems onboard UAS using low power, small and economical embedded computers.
1 Introduction
To most, the rapid deployment of unmanned aerial systems (UAS) is inevitable. UAS are now vital to
military operations and, as the hobbyist multi-rotor movement takes o, the colloquialism drone has
become a frequent reference in popular culture. Hobbyists can buy vehicles o-the-shelf for less than a
thousand US dollars and fly them almost immediately out of the box.
In 2012, President Obama imposed a
federal mandate that the Federal Aviation Administration (FAA) must propose rule changes to integrate UAS
into the National Airspace (NAS) by
2015 [15]; currently all non-military
and non-government flight is prohibited. While the incumbent defense companies are prepared for this approachFigure 1: UAS companies incorporated by year (own data).
ing rule change, many nascent businesses and established technology com-
panies are looking to gain a foothold in this market. In particular, there has been a dramatic growth
in the number of startups looking to provide UAS products and services (as shown in figure 1), as well
as previously unprecedented acquisitions of UAS businesses by technology behemoths like Google and
Facebook. Amazon recently made use of UAS for marketing purposes and has devoted an entire team
to researching their potential integration into their business. This growing commercial interest in the
use of UAS extends from applications such as precision agriculture [7] [21] to package delivery. Industry
association AUVSI predicts that integration of UAS will cause $82 billion of economic activity [4].
The rise in hobbyist interest in UAS has been driven by several factors: 1) the appearance of aordable
prototyping tools (such as the Arduino platform and 3D printers); 2) aordable sensors (such as the
accelerometers used in smartphones); and 3) growth in popularity of open source software. The growth in
recent years of online communities centred around hardware and software which facilitate interaction with
the physical world - for example, the Arduino platform, have made it possible for others to build robotics
applications on top of relatively stable and well understood hardware. DIYDrones, the foundation for
our industrial sponsor 3DRobotics business, is one of these. As an online community formed around
the creation of an Arduino-based open source autopilot called ArduPilot, DIYDrones has become a
force majeure. For just a few hundred dollars, a motivated hobbyist is now able to buy the necessary
intelligence to fly a model aircraft autonomously.
Unfortunately, the simple and economical design of these boards causes lacklustre performance. While
the ArduPilot project has moved on from the Arduino platform, their next generation autopilot, the
PixHawk, is still insucient for complex processing tasks, running at a mere 168 MHz with just 256
kilobytes of memory [1]. While it is possible to write highly optimised code for this platform - it is timeconsuming for users looking to quickly prototype a sensing application. Additionally, the limited memory
available makes it infeasible to re-use open source libraries without significant manual manipulation
during installation.
This Masters project, therefore, extends the concept of a co-computer from the research of the
Cyber-Physically Cloud Computing lab in the Civil Systems department at UC Berkeley. Their research
typically involves integration of various subsystems to produce UAS suited for particular applications such
as the tracking of pedestrians or collaborative sensing. Their systems ooad heavy sensor data processing
to a secondary co-computer [12], usually a much faster computer with a traditional x86 instruction set
processor. This design allows the UAS to compute advanced computer vision and robotics algorithms
onboard in real-time.
This approach is commonplace amongst the design of computational systems for complex robots. Stanfords DARPA Grand Challenge winner, the autonomous car Stanley, used six rack-mounted computers
which individually handled processing of various subsystems that carried out localisation, sensor fusion,
planning and other tasks [20]. It is almost certain that FAA certification of UAS for commercial flight
will be predicated on having redundant subsystems that allow the UAS to maintain control if higher
level functions fail.
Because of their specialist use cases, the computers traditionally used are often expensive and typically
cost over a thousand dollars at current prices. Furthermore, using processors intended for desktop use
requires larger and hence heavier cooling equipment, meaning that their weight and size renders them
infeasible as payload on hobbyist sized multi-rotor UAS. Furthermore, their power draw is significant at
between 60 to 80 watts.
This paper therefore concentrates on the increasingly prevalent open source ARM-based embedded
computers. We focus on the popular BeagleBone Black, a single core ARM Cortex A8 board that costs
just $45, and the Hardkernel Odroid XU, an octo-core ARM Cortex A15 board that costs $169 but is
considerably more powerful. Each of these boards draws less than 20 watts of power and is easily carried
on small multi-rotor aircraft.
We build and implement a vision-based automated landing system based on the work of Sharp, Shakernia, and Sastry in 2001 [18]. We follow a similar approach to theirs but optimise for re-use of popular
open source projects such as OpenCV, a computer vision library and ROS, a middleware system to allow
a modular software architecture. We then focus on improving the performance of this system - by first
optimising the hotspots inherent in our algorithm, and then by utilising processor specific optimisations
and parallel computing techniques to maximise throughput.
Automated landing is an interesting problem with a very clear commercial application. If UAS are
to be used in urban environments for tasks such as package delivery, it is necessary for them to be able
to land accurately. The current state of the art relies upon localisation using GPS. Our testing using
the built-in return-to-launch mode yielded a mean accuracy of 195.33 cm, with a standard deviation
of 110.73 cm over 10 launches. While this may be sucient for landing in an uninhabited field, it is
certainly not sucient for landing in an inhabited area with spatial constraints.
The next section surveys the prior work in this area. Section 3 describes the design methodology and
the optimisations implemented. The results of our work and a discussion of these are outlined in section
4. Finally, section 5 covers concluding remarks.
2 Prior Work
This project draws upon prior work in two dierent disciplines. Firstly, we consider accurate vision based
landing of a UAS, for which there have been several published approaches.
For this project we work with vertical takeo and landing (VTOL) multi-rotor UAS - since these are
the most popular type of UAS used by hobbyist users and nascent commercial entities. Multi-rotors
have become popular in the recent past; traditional approaches to automated landing of VTOL aircraft
are modelled on the automated landing of helicopters.
The automated landing problem was investigated in the research literature of the early 2000s. Sharp,
Shakernia, and Sastry [18] designed an approach for automatically landing an unmanned helicopter. Their
landing target uses a simple monochromatic design made up of several squares. Onboard the helicopter,
they use a pan-tilt-zoom camera and two embedded computers with Intel CPUs. They discuss the details
of their approach to pose estimation, but omit the details of the helicopter controller. Using a real-time
OS and highly optimised custom code, they are able to get their vision system to operate at a rate of 30
Hz.
Saripalli, Montgomery, and Sukhatme [17] designed another approach for automatically landing an
unmanned helicopter. They use a monochromatic H-shaped landing target. Their onboard vision system
detects this landing target and outputs the helicopters relative position with respect to it. This is sent
wirelessly to a behavior-based controller running on a ground station, which then directs the helicopter
to land on top of the target. They are able to run their controller at 10 Hz this way. They are also using
a high-accuracy dierential GPS system, and it is not clear how much their dierential GPS and vision
systems contribute to a successful landing.
Garcia-Pardo, Sukhatme, and Montgomery [6] look at a more general problem, where there is no
pre-specified landing target, and their helicopter has to search autonomously for a suitable clear area on
which to land.
The second discipline we draw upon is that of high performance embedded computing on reduced
instruction set computers, such as those implementing the ARM architecture. Several eorts have been
made to explore the eect of parallelising certain robotics applications but these typically involve the
use of general purpose computing on the GPU. This doesnt translate well to embedded computers due
to the lack of vendor support for graphics chips that are provided. These chips often dont support
heterogenous parallel programming languages, such as OpenCL or NVidias CUDA.
However, there are several eorts looking at optimising performance for ARM-based processors [9].
This is driven by growing smartphone usage, nearly all of which use ARM processor designs. Qualcomm,
in particular, provides an ARM-optimised computer vision library for Android called FastCV. While
this is optimised for their own series of processors, it does have generic ARM optimisations that are
manufacturer agnostic. Eorts have been made to explore OpenCV optimisation for real-time computer
vision applications too [13].
Finally, it should be noted that this project isnt the first attempt to use an open source embedded
computer on a UAS. Many hobbyists experiment with these boards, as noted in the discussion threads
on DIYDrones.
A San Francisco based startup, Skycatch Inc., uses a BeagleBone Black to provide a custom runtime
environment. This allows their users to write applications on top of their custom designed UAS in
scripting language JavaScript. While the user friendliness of this approach is evident, it is also clear
that using a high level interpreted application results in a tremendous loss of performance which makes
it impossible to do all but the most basic of image processing in real-time. This implementation is also
closed source.
Other commercial entities, such as Cloud Cap Technologies, provide proprietary embedded computers
running highly customised computer vision software. However, these cost many thousands of dollars and
are dicult to hack, making them impractical for research and startup use.
Ultimately, this projects contribution is to demonstrate the tools and techniques that can be used
to implement highly performant vision algorithms onboard a UAS using low-cost open source hardware
and open source software.
Memory
Storage
Ports
Cost
BeagleBone Black
Hardkernel Odroid XU
512 MB
2GB onboard & MicroSD
USB 2.0 client
USB 2.0 host
Ethernet
HDMI
2x 46 pin headers
$45
had good driver support in Linux and provided enough power to peripheral devices (such as the camera).
Our architecture is shown in figure 2.
Due to hardware limitations, connecting a computer to the APM autopilot via USB turns o the
wireless telemetry that is natively provided. Therefore, it was necessary to set up and configure the
embedded computer as an ad-hoc wireless network to allow us to receive telemetry and debugging data
while testing.
Figure 2: Architecture of our automated landing system. We use inexpensive o-the-shelf hardware. The
laptop and remote control are for monitoring and emergency takeover by a human pilot. All
the computation is performed onboard the UAS.
3.1.4 Camera
Our testing rig and vehicle was equipped with a Logitech C920 USB web camera. This is a high end
consumer product that has higher than average optical acuity, good driver support on Linux and a global
shutter. (Cheaper web cameras use rolling shutters which allow increased low light sensitivity at the
cost of a slower shutter speed. This is acceptable when there is not significant motion in the frame but
unsuitable for this application.)
3.1.5 Integrating Co-computer
It was necessary to design a special rack to integrate the co-computer onto our UAS. Figure 3 shows our
design to allow for adequate ventilation and the requisite connections to power and the autopilot.
As a first step, we detect the corners of the landing platform in an image, shown
visually in figure 7:
Figure 5: Left: Design of our landing platform. Right: Output of the corner detector (24 points, in
order).
1. Median Blur, Canny Edge Detection Denoise the image using a 3x3 median filter, and pass
it through the Canny edge detector.
2. Find Contours Identify contours and establish a tree-structured hierarchy among them.
3. Approximate Polygons Discard contours which are not four-sided convex polygons and which
have an area less than an experimentally determined threshold value. We look for four-sided
polygons and not specifically for squares, since they will not appear as squares under perspective
projection.
4. Get Index Of Outer Square Using the contour hierarchy, determine a contour which contains 6
other contours. This contour represents the boundary of our landing platform. Store coordinates
of the corners of these 6 inner contours.
5. Label Polygons Label the largest of the 6 polygons as A and the farthest one from A as D.
Label polygons as B, C, E and F based on their orientation and distance relative to the vector
formed by joining centers of A and D.
6. Label Corners For each polygon, label corners in anti-clockwise order.
Pose Estimation
We define the origin of the world coordinate frame to be the center of the landing
platform, such that all points on the landing platform have a Z coordinate of zero. The corner detector
gives us image coordinates for the 24 corners. Thus, we have a set of 24 point correspondences between
world coordinates and image coordinates. Given this input, we want to compute the quadcopters pose,
i.e. the position and orientation of the camera in the world coordinate frame. To do this, we followed the
approach of Sharp et al. [18], whose details are omitted here for brevity. We use SVD to approximately
solve a linear system of 48 equations with 6 degrees of freedom.
10
Figure 7
Figure 6
!
ty
tz
"
matrix R. We compute the camera position in world coordinates as C = R t, and the yaw angle as
= arctan(R21 /R11 ). (The roll and pitch angles can be computed similarly, but we do not require them
in the control algorithm.)
The approach above assumes a calibrated pinhole camera. For the pose estimates to be meaningful,
our camera had to be calibrated first. We calibrated our camera using the camera calibration tool
provided in the OpenCV tutorials, plus some manual tuning. We used the resulting calibration matrix
to convert the raw pixel coordinates into coordinates for a calibrated pinhole camera model, which we
then fed into the equations above.
3.3.3 Real-time Control
In order to actually land a vehicle using these pose estimates, it was necessary to implement a high-level
controller which worked in conjunction with the autopilots own stabilisation modes.
Our controller takes the form of a state machine, illustrated in Figure 8. The UAV starts out in
the FLYING state. When landing is desired, it switches into the SEEK HOME state. This uses the
autopilots return-to-launch mode to bring the UAV close to the original takeo location, using GPS.
When the landing platform becomes visible, the UAV switches into the LAND HIGH state. Here we
use our vision-based pose estimates with a simple proportional controller to guide the UAV towards
the landing platform. (The error terms in our controller are given as x, y, z, and yaw deviations. The
controller descends at a fixed rate, using the z deviation only as an estimate of the altitude.) When the
UAV reaches a predefined altitude (where pose estimates are no longer possible, due to limited field of
11
view), our controller enters the LAND LOW state, and descends slowly by dead reckoning. When the
barometric pressure sensor indicates that the UAV has reached the ground, the controller switches into
the POWER OFF state.
Due to the interface provided by roscopter, our control
input consists of the raw pulse width modulation (PWM)
values that would typically be read from the human pilots
radio-control receiver. For instance, a value of 900 represents no throttle, whereas 1800 represents full throttle. By
overriding these values, we can simulate human control input
from our controller. This crude approach is obviously limited - a better approach would be to extend the MAVLink
protocol with a new message type to allow for error values to
be sent directly into the inner control loop of the autopilot.
3.3.4 Optimisation Techniques
Our initial implementation operated at less than 3 Hz (i.e. it
calculated pose estimates less than 3 times a second). This
is far too poor for real-time control. The control loop of our
autopilot operates at 10 Hz, taking sensor input from the
GPS sensor at a rate of 5 Hz. In order to precisely control
each stage of optimisation, we ran the pose estimation implementation through the same set of tests to
identify hotspots - areas which took the majority of processing time. Figure 9 shows hotspots inherent
in our overall process and figure 10 shows hotspots within the Detect Corners subroutine.
Removing Redundancy
and substeps that were unnecessary. Through benchmarking, we attempted to remove or reduce the
impact of calls which were taking a significant amount of processing time. At each stage, we ensured
that robustness to image quality was retained by testing against approximately 6,000 images captured
in the lab and in the air.
12
Figure 10
Figure 9
Compiler Optimisations
For example, they oer user specified flags that cause the generated executable file to be optimised for
a certain instruction set. When compiling libraries such as OpenCV for an embedded computer, it is
typical to cross-compile; compilation on embedded computers typically takes many times longer. Crosscompilation is when compilation happens on a more powerful compilation computer that has available
to it a compiler for the target architecture. In our case, compilation was on a quad-core x86 computer
for an ARM target architecture. At this stage, it is possible to pass certain parameters to the compiler
that permit it to use ARM NEON instructions in the generated binary code.
NEON is an advanced SIMD instruction set introduced in the ARMv7 architecture - the basis for
all modern ARM processors. Single instruction, multiple data (SIMD) instructions are data parallel
instructions that allow a single operation to be performed in parallel on two more operands. While the
compiler has to be conservative in how it utilises these in the generated binary code so that correctness
is guaranteed, these allow some performance gain.
Additionally, library providers may implement optional code paths that are enabled when explicitly
compiling for an ARM target architecture. For instance, the OpenCV maintainers have several functions
that have optional NEON instructions implemented. This also results in a performance boost.
Library Optimisations
format parsing and multi-threading. For core functions, such as parsing of image formats, a standard
library is included and used. For advanced functions, support is optional and so these are disabled by
default.
13
We experimented with enabling multithreading functionality by compiling OpenCV with Intels Thread
Building Blocks library. This is a library that provides a multi-threading abstraction and for which
support is available in select OpenCV functions [3].
Secondly, we re-compiled OpenCV replacing the default JPEG parsing library, libjpeg, with a performance optimised variant called libjpeg-turbo. This claims to be 2.1 to 5.3x faster than the standard
library [8] and has ARM optimised code paths. Using this, it is possible to capture images at 30 frames
per second on the BeagleBone Black [5].
Note also that we were unable to use the Intel Integrated Performance Primitives (IPP) library. IPP is
the primary method of compiling a high performance version of OpenCV. However, it accomplishes this
by utilising significant customisation for x86 processors that implement instruction sets only available
on desktop computers (e.g. Streaming SIMD Extensions (SSE), Advanced Vector Extensions (AVX)).
Disabling CPU Scaling Modern embedded computers typically use some sort of aggressive CPU scaling
in order to minimise power consumption through an ondemand governor [11]. This is beneficial for
consumer applications where battery life is a significant concern and commercial feature.
However, for a real-time application such as this, CPU scaling is undesirable since it introduces a
very slight latency as load increases and the CPU frequency is increased by the governor. This is more
desirable still in architectures such as the big.LITTLE architecture used on the Odroid XU which actually
switches automatically between a low-power optimised processor and a performance optimised processor
as load increases.
This can be mitigated by manually setting the governor to the performance setting. This eectively
disables frequency scaling and forces the board to run at maximum clock speed.
Use of SIMD Instructions
SIMD instruction set. Compilers such as gcc make instruction set specific SIMD instructions available to
the programmer through additional instructions called intrinsics. Intrinsics essentially wrap calls to the
underlying SIMD instruction. Using intrinsics, it is possible to exploit data parallelism in our own code,
essentially rewriting higher level, non-parallel, function calls with low level function calls that operate
on multiple data simultaneously. This approach is laborious and is therefore used sparingly. It can,
however, yield significant performance improvements [9].
Use of Multi-threading Multi-threading is a promising approach. Certain library functions may inherently exploit multi-threading and there is a slight benefit to a single-threaded process to having multiple
cores (the operating system will balance processes across cores). However, our single-threaded imple-
14
mentation still gains little performance from having multiple cores available - many of which were not
occupied with useful work. Our multi-threaded implementation separates out the Capture step from
the latter Detect Corners, Calibrate Image Points and Estimate Pose steps as shown in figure 11.
This is accomplished by creating a pool of worker threads (the exact quantity of threads is configurable
- generally 1 for every available core). Each time an image is captured, it is dispatched to the next
free thread in a round-robin fashion. Work is distributed evenly across threads and, since each thread
finishes computation of images in a similar time, pose estimates are published to the \simplePose topic
in order of image acquisition. This approach is essentially what is commonly known as pipelining, each
thread can be considered a pipeline, allowing concurrent processing of an image while the master thread
captures the next image. Figure 12 shows how frames are processed for a single threaded process above
that for a pipelined multi-threaded process.
We use the POSIX thread (pthread) libraries to assist with thread creation and management.
Figure 11
Figure 12
15
z mean
89.3 cm
121.1 cm
172.0 cm
229.0 cm
z std
0.05 cm
0.08 cm
0.18 cm
0.54 cm
x bound
0.19 cm
0.26 cm
0.37 cm
0.49 cm
x std
0.43 cm
1.16 cm
2.74 cm
6.51 cm
y bound
0.14 cm
0.19 cm
0.27 cm
0.36 cm
y std
0.39 cm
1.06 cm
2.17 cm
6.05 cm
yaw std
0.12
0.12
0.07
0.34
Table 2: Accuracy of our pose estimates. x bound and y bound are lower bounds for the error on x
and y, given by the limited image resolution.
16
The controller should switch to the LAND LOW state when below a height at which it was able
to calculate pose estimates. However, this was suboptimal: 1) ground eects made the motion of the
aircraft very unstable when hovering close to the ground and it would drift significantly as it lowered;
and 2) the drift in altitude estimates provided by the barometric pressure sensor made it dicult to tell
when to POWER OFF.
In order to mitigate these issues, we modified our original code and built a smaller landing pad. While
this requires the UAS to be closer (at a maximum height of 3 metres) to start receiving pose estimates, it
means the camera can see the entire pad until it is 50 centimeters from the ground. (This is considerably
better than the approximately 2 meter altitude where pose estimates would cut o for the larger pad.)
4.2.3 Control
While this project was not explicitly focussed on the control aspects, our control algorithm was necessary
to demonstrate that this approach is technically viable. Unfortunately, windy conditions when testing
combined with slow performance meant that the UAS was unable to maintain a steady position and
instead drifted signficantly as gusts of wind presented themselves.
Extending our proportional controller to a full PID controller would provide more aggressive control
and would perform better under these conditions. However, this approach would still rely on passing
raw PWM values to the autopilot via roscopter.
The most optimal approach would be to extend the MAVLink protocol such that only error values are
sent to the inner control loop of the autopilot. These error values would then be presented to the core
PID loop maintaining stability and as such, it would reduce the need for an extra control loop and hence
17
4.3 Performance
A considerable amount of eort went into optimising performance of this vision system when run on
an embedded ARM computer. The following section describes the methods used to benchmark various
implementations, what the maximum performance attainable is and what results we gained.
4.3.1 Benchmarking Methodology
Environment The large fixed size landing pad was used a fixed distance of approximately 1.5 meters
from the web camera. All tests were performed indoors in a room without windows and with a constant
lighting level produced by a fluorescent tube light. No other user space processes were running on each
computer aside from roscore and the pose estimator process itself.
Timing Data
Each implementation was benchmarked using calls to C++s gettimeofday function. For
each implementation we profiled the amount of time taken for various calls within our pose estimation
routine over 20 frames. For each test, we discarded the first 20 frame chunk since this is when the camera
automatically adjusts exposure and focus settings. This data was collected for 10 arbitrary 20 frame
chunks and averaged to provide overall figures.
4.3.2 Maximum Performance
The Logitech C920 web camera we are using captures frames at a 640 x 480 pixel resolution at a maximum
of 30 frames per second. It would be impossible to operate any quicker than the camera is capable of
delivering frames and therefore the upper bound for performance is 30 Hz.
Board
BeagleBone
Odroid
No computation
Basic decoding using OpenCV
29.57
18.69
29.65
24.60
18
Board
BeagleBone
Desktop
Odroid
2.94
30.04
8.80
3.01
29.97
8.93
Table 4: Naive implementation: average FPS for the BeagleBone, Odroid and a desktop computer.
The desktop computer has no issue running at the maximum 30 FPS but both the BeagleBone and
Odroid fail to give the necessary 10 Hz required for real-time control (as described in section 3.3.4).
There is a slight but obvious discrepancy between performance when the pad is in view and when the
pad is not which only appears to manifest itself on the two ARM boards (the BeagleBone and Odroid).
By profiling the steps of detect corners, the first four of which are shown for the Odroid in table 5, we
can begin to see why. Canny edge detection and finding contours takes very slightly longer on both
boards when there is no pad in view. Intuitively this is because when the pad is in view, it occludes a
significant part of the image. The pad is constructed of very simple quadrilaterals that are atypical of
the many small heterogeneous shapes that a normal frame is composed of. For the remaining tests the
figures represent performance when the pad is in view.
Pad
Median Blur
Canny
Find Contours
No
Yes
0.054
0.054
0.039
0.038
0.003
0.002
0.001
0.001
Table 5: Naive implementation: breakdown of first four steps of Detect Corners with and without a
frame in view for Odroid (similar results for BeagleBone).
Figure 14 show a breakdown of the naive implementation of the overall algorithm on all three of our
boards. It is clear that the majority of time is spent in the Capture and Detect Corners stages. Video
capture is handled by an OpenCV function so we are unable to easily profile that. However, figure 15
shows a breakdown of time spent in the Detect Corners subroutine. Again, here, it is clear that two calls
to Median Blur and Canny contribute the majority of processing time.
19
20
Standard
NEON
NEON+TBB
NEON+TBB+libjpeg-turbo
BeagleBone
Odroid
2.91
8.93
2.84
9.60
2.98
9.60
3.20
9.90
Table 6: Naive implementation with optimised libraries: average FPS for the BeagleBone and Odroid
with various library optimisations.
Figure 16 shows how the time taken for steps in the overall algorithm changes with each adaptation.
Compiling for the NEON architecture causes consistent gains across each procedure while using libjpegturbo specifically optimises the Capture step. This can be explained by the fact that the Capture is
frame data encoded in Motion-JPEG from the web camera into the Mat object OpenCV uses internally.
Figure 16: Naive implementation with optimised libraries: breakdown of overall algorithm for Odroid
(similar results for BeagleBone).
21
Optimised Libraries
BeagleBone
Odroid
3.20
9.90
5.08
21.58
Table 7: Single-thread optimised implementation: average FPS for the BeagleBone and Odroid
Figure 17: Single-thread optimised implementation: breakdown of Detect Corners for Odroid (similar
results for BeagleBone).
22
Figure 18: Multi-thread optimised implementation: Figure 19: Multi-thread optimised implementation:
average FPS for BeagleBone for dierent
average FPS for Odroid for dierent numnumbers of threads
bers of threads
Table 8 shows the 1 minute average system load (also known as the number of waiting processes)
reported by Linux as additional threads are added when running on the Odroid. This does not increase
significantly beyond 2 threads, validating the earlier point that 2 threads is sucient to carry out nearly
all of the computation.
23
Threads
System Load
1
2
4
8
0.59
1.10
1.17
1.20
Table 8: Multi-thread optimised implementation: 1 minute average system load for Odroid for dierent
numbers of threads
24
BeagleBone
Odroid
5.08
21.58
5.97
30.19
Table 9: Single-thread optimised implementation (2): average FPS for the BeagleBone and Odroid
Figure 20 shows that the majority of the speed increase came from the reduction in time taken by the
Canny Edge Detection step. There was also a corresponding time saving during the Find Contours step
- presumably because thresholding is actually a more selective preprocessing method than Canny edge
detection.
Figure 20: Single-thread optimised implementation 2: breakdown of Detect Corners for Odroid (similar
results for BeagleBone).
25
Figure 23: Naive to optimised implementation: breakdown of Detect Corners for BeagleBone and Odroid.
26
6 Acknowledgements
Without the generous support of 3DRobotics and advice from Brandon Basso, this project would not have
been possible. Additionally, several other graduate students collaborated on and contributed extensively
to this project: Constantin Berzan, Nahush Bhanage, Gita Dombrowski and Hoang Nguyen.
27
References
[1]
[2]
[3]
OpenCV Adventure. Parallelizing Loops with Intel Thread Building Blocks. [Online]. 2011. url:
http : / / experienceopencv . blogspot . com / 2011 / 07 / parallelizing - loops - with - intel thread.html.
[4]
AUVSI. The Economic Impact of Unmanned Aircraft Systems Integration in the United States.
[Online]. 2013. url: http://www.auvsi.org/econreport.
[5]
Michael Darling. How to Achieve 30 fps with BeagleBone Black, OpenCV, and Logitech C920
Webcam. [Online]. 2013. url: http://blog.lemoneerlabs.com/3rdParty/Darling_BBB_30fps_
DRAFT.html.
[6]
[7]
Stanley R Herwitz et al. Precision agriculture as a commercial application for solar-powered unmanned aerial vehicles. In: AIAA 1st Technical Conference and Workshop on Unmanned Aerospace
Vehicles. 2002.
[8]
libjpeg-turbo. Performance. [Online]. 2013. url: http : / / www . libjpeg - turbo . org / About /
Performance.
[9]
Gaurav Mitra et al. Use of SIMD vector operations to accelerate application code performance
on low-powered ARM and Intel platforms. In: Parallel and Distributed Processing Symposium
Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. IEEE. 2013, pp. 11071116.
[10] Stack Overflow. What is the best library for computer vision in C/C++? [Online]. 2009. url:
http://stackoverflow.com/questions/66722/what-is-the-best-library-for-computervision-in-c-c.
[11] Venkatesh Pallipadi and Alexey Starikovskiy. The ondemand governor. In: Proceedings of the
Linux Symposium. Vol. 2. sn. 2006, pp. 215230.
[12] E. Pereira, R. Sengupta, and K. Hedrick. The C3UV Testbed for Collaborative Control and
Information Acquisition Using UAVs. In: American Control Conference. AACC. 2013.
28
[13] Kari Pulli et al. Real-time computer vision with OpenCV. In: Communications of the ACM 55.6
(2012), pp. 6169.
[14] Morgan Quigley et al. ROS: an open-source Robot Operating System. In: ICRA workshop on
open source software. Vol. 3. 3.2. 2009.
[15] John L. Rep. Mica et al. FAA Modernization and Reform Act of 2012. In: (2012).
[16] Katie Roberts-Homan and Pawankumar Hegde. ARM cortex-a8 vs. intel atom: Architectural
and benchmark comparisons. In: Dallas: University of Texas at Dallas (2009).
[17] Srikanth Saripalli, James F Montgomery, and Gaurav S Sukhatme. Vision-based autonomous
landing of an unmanned aerial vehicle. In: IEEE International Conference on Robotics and Automation. Vol. 3. IEEE. 2002, pp. 27992804.
[18] Courtney S. Sharp, Omid Shakernia, and Shankar Sastry. A Vision System for Landing an Unmanned Aerial Vehicle. In: IEEE International Conference on Robotics and Automation. IEEE,
2001, pp. 17201727.
[19] Eric Stotzer et al. OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip. In:
OpenMP in the Era of Low Power Devices and Accelerators. Springer, 2013, pp. 114127.
[20] Sebastian Thrun et al. Stanley: The robot that won the DARPA Grand Challenge. In: Journal
of field Robotics 23.9 (2006), pp. 661692.
[21] Chunhua Zhang and John M Kovacs. The application of small unmanned aerial systems for
precision agriculture: a review. In: Precision agriculture 13.6 (2012), pp. 693712.
29