OpenCV With Python Blueprints - Sample Chapter
OpenCV With Python Blueprints - Sample Chapter
OpenCV With Python Blueprints - Sample Chapter
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
$ 39.99 US
25.99 UK
P U B L I S H I N G
Sa
m
ee
Michael Beyeler
This is his first technical book that, in contrast to his (or any) dissertation, might
actually be read.
Michael has professional programming experience in Python, C/C++, CUDA,
MATLAB, and Android. Born and raised in Switzerland, he received a BSc degree
in electrical engineering and information technology, as well as a MSc degree in
biomedical engineering from ETH Zurich. When he is not "nerding out" on robots, he
can be found on top of a snowy mountain, in front of a live band, or behind the piano.
Preface
OpenCV is a native, cross-platform C++ library for computer vision, machine learning,
and image processing. It is increasingly being adopted in Python for development.
OpenCV has C++/C, Python, and Java interfaces, with support for Windows, Linux,
Mac, iOS, and Android. Developers who use OpenCV build applications to process
visual data; this can include live streaming data such as photographs or videos from
a device such as a camera. However, as developers move beyond their first computer
vision applications, they might find it difficult to come up with solutions that are welloptimized, robust, and scalable for real-world scenarios.
This book demonstrates how to develop a series of intermediate to advanced projects
using OpenCV and Python, rather than teaching the core concepts of OpenCV in
theoretical lessons. The working projects developed in this book teach you how to
apply your theoretical knowledge to topics such as image manipulation, augmented
reality, object tracking, 3D scene reconstruction, statistical learning, and object
categorization.
By the end of this book, you will be an OpenCV expert, and your newly gained
experience will allow you to develop your own advanced computer vision
applications.
Preface
Chapter 3, Finding Objects via Feature Matching and Perspective Transforms, is where
you develop an app to detect an arbitrary object of interest in the video stream of
a webcam, even if the object is viewed from different angles or distances, or under
partial occlusion.
Chapter 4, 3D Scene Reconstruction Using Structure from Motion, shows you how to
reconstruct and visualize a scene in 3D by inferring its geometrical features from
camera motion.
Chapter 5, Tracking Visually Salient Objects, helps you develop an app to track multiple
visually salient objects in a video sequence (such as all the players on the field during
a soccer match) at once.
Chapter 6, Learning to Recognize Traffic Signs, shows you how to train a support
vector machine to recognize traffic signs from the German Traffic Sign Recognition
Benchmark (GTSRB) dataset.
Chapter 7, Learning to Recognize Emotions on Faces, is where you develop an app that
is able to both detect faces and recognize their emotional expressions in the video
stream of a webcam in real time.
Black-and-white pencil sketch: To create this effect, we will make use of two
image blending techniques, known as dodging and burning
OpenCV is such an advanced toolchain that often the question is not how to
implement something from scratch, but rather which pre-canned implementation
to choose for your needs. Generating complex effects is not hard if you have a lot of
computing resources to spare. The challenge usually lies in finding an approach that
not only gets the job done, but also gets it done in time.
Instead of teaching the basic concepts of image manipulation through theoretical
lessons, we will take a practical approach and develop a single end-to-end app that
integrates a number of image filtering techniques. We will apply our theoretical
knowledge to arrive at a solution that not only works but also speeds up seemingly
complex effects so that a laptop can produce them in real time.
[1]
The following screenshot shows the final outcome of the three effects running
on a laptop:
All of the code in this book is targeted for OpenCV 2.4.9 and has been
tested on Ubuntu 14.04. Throughout this book, we will make extensive
use of the NumPy package (http://www.numpy.org). In addition,
this chapter requires the UnivariateSpline module of the SciPy
package (http://www.scipy.org) as well as the wxPython 2.8
graphical user interface (http://www.wxpython.org/download.
php) for cross-platform GUI applications. We will try to avoid further
dependencies wherever possible.
[2]
Chapter 1
filters: A module comprising different classes for the three different image
effects. The modular approach will allow us to use the filters independently
of any graphical user interface (GUI).
color image.
color image.
webcam and display the camera feed, which we will make extensive use of
throughout the book.
can be built.
chapter1.main: The main function routine for starting the GUI application
[3]
Areas that were not supposed to undergo changes were protected with a mask.
Today, modern image editing programs, such as Photoshop and Gimp, offer ways to
mimic these effects in digital images. For example, masks are still used to mimic the
effect of changing exposure time of an image, wherein areas of a mask with relatively
intense values will expose the image more, thus lightening the image. OpenCV does
not offer a native function to implement these techniques, but with a little insight
and a few tricks, we will arrive at our own efficient implementation that can be used
to produce a beautiful pencil sketch effect.
If you search on the Internet, you might stumble upon the following common
procedure to achieve a pencil sketch from an RGB color image:
1. Convert the color image to grayscale.
2. Invert the grayscale image to get a negative.
3. Apply a Gaussian blur to the negative from step 2.
4. Blend the grayscale image from step 1 with the blurred negative from step 3
using a color dodge.
Whereas steps 1 to 3 are straightforward, step 4 can be a little tricky. Let's get that
one out of the way first.
OpenCV 3 comes with a pencil sketch effect right out of the
box. The cv2.pencilSketch function uses a domain filter
introduced in the 2011 paper Domain transform for edge-aware
image and video processing, by Eduardo Gastal and Manuel
Oliveira. However, for the purpose of this book, we will develop
our own filter.
This essentially divides the value of an A[idx] image pixel by the inverse of the
B[idx] mask pixel value, while making sure that the resulting pixel value will be
in the range of [0, 255] and that we do not divide by zero.
[4]
Chapter 1
We could translate this into the following nave Python function, which accepts two
OpenCV matrices (image and mask) and returns the blended image:
def dodgeNaive(image, mask):
# determine the shape of the input image
width,height = image.shape[:2]
# prepare output argument with same size as image
blend = np.zeros((width,height), np.uint8)
for col in xrange(width):
for row in xrange(height):
# shift image pixel value by 8 bits
# divide by the inverse of the mask
tmp = (image[c,r] << 8) / (255.-mask)
# make sure resulting value stays within bounds
if tmp > 255:
tmp = 255
blend[c,r] = tmp
return blend
As you might have guessed, although this code might be functionally correct, it will
undoubtedly be horrendously slow. Firstly, the function uses for loops, which are
almost always a bad idea in Python. Secondly, NumPy arrays (the underlying format
of OpenCV images in Python) are optimized for array calculations, so accessing and
modifying each image[c,r] pixel separately will be really slow.
Instead, we should realize that the <<8 operation is the same as multiplying the pixel
value with the number 2^8=256, and that pixel-wise division can be achieved with
the cv2.divide function. Thus, an improved version of our dodge function could
look like this:
import cv2
def dodgeV2(image, mask):
return cv2.divide(image, 255-mask, scale=256)
We have reduced the dodge function to a single line! The dodgeV2 function produces
the same result as dodgeNaive but is orders of magnitude faster. In addition, cv2.
divide automatically takes care of division by zero, making the result 0 where 255mask is zero.
[5]
The constructor of this class will accept the image dimensions as well as an
optional background image, which we will make use of in just a bit. If the file
exists, we will open it and scale it to the right size:
self.width = width
self.height = height
# try to open background canvas (if it exists)
self.canvas = cv2.imread(bg_gray, cv2.CV_8UC1)
if self.canvas is not None:
self.canvas = cv2.resize(self.canvas,
(self.width, self.height))
Note that it does not matter whether the input image is RGB or BGR.
[6]
Chapter 1
5. We then invert the image and blur it with a large Gaussian kernel of size
(21,21):
img_gray_inv = 255 img_gray
img_blur = cv2.GaussianBlur(img_gray_inv, (21,21), 0, 0)
[7]
For kicks and giggles, we want to lightly blend our transformed image (img_blend)
with a background image (self.canvas) that makes it look as if we drew the image
on a canvas:
if self.canvas is not None:
img_blend = cv2.multiply(img_blend, self.canvas, scale=1./256)
return cv2.cvtColor(img_blend, cv2.COLOR_GRAY2BGR)
And we're done! The final output looks like what is shown here:
Chapter 1
Every set of anchor points should include (0,0) and (255,255). This is
important in order to prevent the image from appearing as if it has an
overall tint, as black remains black and white remains white.
[9]
Instead, we make use of a lookup table. Since there are only 256 possible pixel
values for our purposes, we need to calculate f(x) only for all the 256 possible values
of x. Interpolation is handled by the UnivariateSpline function of the scipy.
interpolate module, as shown in the following code snippet:
from scipy.interpolate import UnivariateSpline
def _create_LUT_8UC1(self, x, y):
spl = UnivariateSpline(x, y)
return spl(xrange(256))
The return argument of the function is a 256-element list that contains the
interpolated f(x) values for every possible value of x.
All we need to do now is come up with a set of anchor points, (x_i, y_i), and we are
ready to apply the filter to a grayscale input image (img_gray):
import cv2
import numpy as np
x = [0, 128, 255]
y = [0, 192, 255]
myLUT = _create_LUT_8UC1(x, y)
img_curved = cv2.LUT(img_gray, myLUT).astype(np.uint8)
The result looks like this (the original image is on the left, and the transformed image
is on the right):
Chapter 1
If you have a minute to spare, I advise you to play around with the different curve
settings for a while. You can choose any number of anchor points and apply the
curve filter to any image channel you can think of (red, green, blue, hue, saturation,
brightness, lightness, and so on). You could even combine multiple channels, or
decrease one and shift another to a desired region. What will the result look like?
However, if the number of possibilities dazzles you, take a more conservative
approach. First, by making use of our _create_LUT_8UC1 function developed in the
preceding steps, let's define two generic curve filters, one that (by trend) increases all
pixel values of a channel, and one that generally decreases them:
class WarmingFilter:
def __init__(self):
self.incr_ch_lut = _create_LUT_8UC1([0, 64, 128, 192, 256],
[0, 70, 140, 210, 256])
self.decr_ch_lut = _create_LUT_8UC1([0, 64, 128, 192, 256],
[0, 30, 80, 120, 192])
The easiest way to make an image appear as if it was taken on a hot, sunny day
(maybe close to sunset), is to increase the reds in the image and make the colors
appear vivid by increasing the color saturation. We will achieve this in two steps:
1. Increase the pixel values in the R channel and decrease the pixel values in
the B channel of an RGB color image using incr_ch_lut and decr_ch_lut,
respectively:
def render(self, img_rgb):
c_r, c_g, c_b = cv2.split(img_rgb)
c_r = cv2.LUT(c_r, self.incr_ch_lut).astype(np.uint8)
c_b = cv2.LUT(c_b, self.decr_ch_lut).astype(np.uint8)
img_rgb = cv2.merge((c_r, c_g, c_b))
2. Transform the image into the HSV color space (H means hue, S means
saturation, and V means value), and increase the S channel using incr_ch_
lut. This can be achieved with the following function, which expects an RGB
color image as input:
c_b = cv2.LUT(c_b, decrChLUT).astype(np.uint8)
# increase color saturation
c_h, c_s, c_v = cv2.split(cv2.cvtColor(img_rgb,
cv2.COLOR_RGB2HSV))
c_s = cv2.LUT(c_s, self.incr_ch_lut).astype(np.uint8)
return cv2.cvtColor(cv2.merge((c_h, c_s, c_v)),
cv2.COLOR_HSV2RGB)
[ 11 ]
Analogously, we can define a cooling filter that increases the pixel values in the B
channel, decreases the pixel values in the R channel of an RGB image, converts the
image into the HSV color space, and decreases color saturation via the S channel:
class CoolingFilter:
def render(self, img_rgb):
c_r, c_g, c_b = cv2.split(img_rgb)
c_r = cv2.LUT(c_r, self.decr_ch_lut).astype(np.uint8)
c_b = cv2.LUT(c_b, self.incr_ch_lut).astype(np.uint8)
img_rgb = cv2.merge((c_r, c_g, c_b))
# decrease color saturation
c_h, c_s, c_v = cv2.split(cv2.cvtColor(img_rgb,
cv2.COLOR_RGB2HSV))
c_s = cv2.LUT(c_s, self.decr_ch_lut).astype(np.uint8)
return cv2.cvtColor(cv2.merge((c_h, c_s, c_v)),
cv2.COLOR_HSV2RGB)
[ 12 ]
Chapter 1
Cartoonizing an image
Over the past few years, professional cartoonizer software has popped up all over
the place. In order to achieve the basic cartoon effect, all that we need is a bilateral
filter and some edge detection. The bilateral filter will reduce the color palette, or
the numbers of colors that are used in the image. This mimics a cartoon drawing,
wherein a cartoonist typically has few colors to work with. Then we can apply edge
detection to the resulting image to generate bold silhouettes. The real challenge,
however, lies in the computational cost of bilateral filters. We will thus use some
tricks to produce an acceptable cartoon effect in real time.
We will adhere to the following procedure to transform an RGB color image into
a cartoon:
1. Apply a bilateral filter to reduce the color palette of the image.
2. Convert the original color image into grayscale.
3. Apply a median blur to reduce image noise.
4. Use adaptive thresholding to detect and emphasize the edges in an
edge mask.
5. Combine the color image from step 1 with the edge mask from step 4.
A pixel value in the resized image will correspond to the pixel average of a small
neighborhood in the original image. However, this process may produce image
artifacts, which is also known as aliasing. While this is bad enough on its own, the
effect might be enhanced by subsequent processing, for example, edge detection.
[ 13 ]
A better alternative might be to use the Gaussian pyramid for downscaling (again to
a quarter of the original size). The Gaussian pyramid consists of a blur operation that
is performed before the image is resampled, which reduces aliasing effects:
img_small = cv2.pyrDown(img_rgb)
However, even at this scale, the bilateral filter might still be too slow to run in real
time. Another trick is to repeatedly (say, five times) apply a small bilateral filter to
the image instead of applying a large bilateral filter once:
num_iter = 5
for _ in xrange(num_iter):
img_small = cv2.bilateralFilter(img_small, d=9, sigmaColor=9,
sigmaSpace=7)
The result looks like a blurred color painting of a creepy programmer, as follows:
[ 14 ]
Chapter 1
The Sobel operator (cv2.Sobel) can reduce such artifacts, but it is not rotationally
symmetric. The Scharr operator (cv2.Scharr) was targeted at correcting this, but
only looks at the first image derivative. If you are interested, there are even more
operators for you, such as the Laplacian or ridge operator (which includes the
second derivative), but they are far more complex. And in the end, for our specific
purposes, they might not look better, maybe because they are as susceptible to
lighting conditions as any other algorithm.
For the purpose of this project, we will choose a function that might not even be
associated with conventional edge detectioncv2.adaptiveThreshold. Like cv2.
threshold, this function uses a threshold pixel value to convert a grayscale image into
a binary image. That is, if a pixel value in the original image is above the threshold,
then the pixel value in the final image will be 255. Otherwise, it will be 0. However,
the beauty of adaptive thresholding is that it does not look at the overall properties
of the image. Instead, it detects the most salient features in each small neighborhood
independently, without regard to the global image optima. This makes the algorithm
extremely robust to lighting conditions, which is exactly what we want when we seek
to draw bold, black outlines around objects and people in a cartoon.
However, it also makes the algorithm susceptible to noise. To counteract this, we
will preprocess the image with a median filter. A median filter does what its name
suggests; it replaces each pixel value with the median value of all the pixels in a
small pixel neighborhood. We first convert the RGB image (img_rgb) to grayscale
(img_gray) and then apply a median blur with a seven-pixel local neighborhood:
# convert to grayscale and apply median blur
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
img_blur = cv2.medianBlur(img_gray, 7)
After reducing the noise, it is now safe to detect and enhance the edges using
adaptive thresholding. Even if there is some image noise left, the cv2.ADAPTIVE_
THRESH_MEAN_C algorithm with blockSize=9 will ensure that the threshold is
applied to the mean of a 9 x 9 neighborhood minus C=2:
img_edge = cv2.adaptiveThreshold(img_blur, 255,
cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY, 9, 2)
Chapter 1
[ 17 ]
We will also have to import a generic GUI layout (from gui) and all the designed
image effects (from filters):
from gui import BaseLayout
from filters import PencilSketch, WarmingFilter, CoolingFilter,
Cartoonizer
In order to give our application a fair chance to run in real time, we will limit the size
of the video stream to 640 x 480 pixels:
capture.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 640)
capture.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 480)
If you are using OpenCV 3, the constants that you are looking for
might be called cv3.CAP_PROP_FRAME_WIDTH and cv3.CAP_
PROP_FRAME_HEIGHT.
Then the capture stream can be passed to our GUI application, which is an instance
of the FilterLayout class:
# start graphical user interface
app = wx.App()
layout = FilterLayout(None, -1, 'Fun with Filters', capture)
layout.Show(True)
app.MainLoop()
[ 18 ]
Chapter 1
The BaseLayout class is designed as an abstract base class. You can think of this class
as a blueprint or recipe that will apply to all the layouts that we are yet to designa
skeleton class, if you will, that will serve as the backbone for all of our future GUI code.
In order to use abstract classes, we need the following import statement:
from abc import ABCMeta, abstractmethod
We also include some other modules that will be helpful, especially the wx Python
module and OpenCV (of course):
import time
import wx
import cv2
The class is designed to be derived from the blueprint or skeleton, that is, the
wx.Frame class. We also mark the class as abstract by adding the __metaclass__
attribute:
class BaseLayout(wx.Frame):
__metaclass__ = ABCMeta
Later on, when we write our own custom layout (FilterLayout), we will use the
same notation to specify that the class is based on the BaseLayout blueprint (or
skeleton) class, for example, in class FilterLayout(BaseLayout):. But for now,
let's focus on the BaseLayout class.
An abstract class has at least one abstract method. An abstract method is akin to
specifying that a certain method must exist, but we are not sure at that time what
it should look like. For example, suppose BaseLayout contains a method specified
as follows:
@abstractmethod
def _init_custom_layout(self):
pass
Then any class deriving from it, such as FilterLayout, must specify a fully
fleshed-out implementation of a method with that exact signature. This will
allow us to create custom layouts, as you will see in a moment.
But first, let's proceed to the GUI constructor.
[ 19 ]
We will use the image size to prepare a buffer that will store each video frame as
a bitmap, and to set the size of the GUI. Because we want to display a bunch of
control buttons below the current video frame, we set the height of the GUI to self.
imgHeight+20:
self.bmp = wx.BitmapFromBuffer(self.imgWidth,
self.imgHeight, frame)
wx.Frame.__init__(self, parent, id, title,
size=(self.imgWidth, self.imgHeight+20))
We then provide two methods to initialize some more parameters and create the
actual layout of the GUI:
self._init_base_layout()
self._create_base_layout()
Chapter 1
3. The _on_next_frame method will process the new video frame and store the
processed frame in a bitmap. This will trigger another event, wx.EVT_PAINT.
We want to bind this event to the _on_paint method, which will paint the
display the new frame:
self.Bind(wx.EVT_PAINT, self._on_paint)
The _on_next_frame method grabs a new frame and, once done, sends the frame to
another method, __process_frame, for further processing:
def _on_next_frame(self, event):
ret, frame = self.capture.read()
if ret:
frame = self._process_frame(cv2.cvtColor(frame,
cv2.COLOR_BGR2RGB))
The paint method then grabs the frame from the buffer and displays it:
def _on_paint(self, event):
deviceContext = wx.BufferedPaintDC(self.pnl)
deviceContext.DrawBitmap(self.bmp, 0, 0)
[ 21 ]
Then, we just need to set the minimum size of the resulting layout and center it:
self.SetMinSize((self.imgWidth, self.imgHeight))
self.SetSizer(self.panels_vertical)
self.Centre()
parameters
Chapter 1
self.mode_warm = wx.RadioButton(pnl, -1, 'Warming Filter',
(10, 10), style=wx.RB_GROUP)
self.mode_cool = wx.RadioButton(pnl, -1, 'Cooling Filter',
(10, 10))
self.mode_sketch = wx.RadioButton(pnl, -1, 'Pencil Sketch',
(10, 10))
self.mode_cartoon = wx.RadioButton(pnl, -1, 'Cartoon',
(10, 10))
hbox = wx.BoxSizer(wx.HORIZONTAL)
hbox.Add(self.mode_warm, 1)
hbox.Add(self.mode_cool, 1)
hbox.Add(self.mode_sketch, 1)
hbox.Add(self.mode_cartoon, 1)
pnl.SetSizer(hbox)
Here, the style=wx.RB_GROUP option makes sure that only one of these radio
buttons can be selected at a time.
To make these changes take effect, pnl needs to be added to list of existing panels:
self.panels_vertical.Add(pnl, flag=wx.EXPAND | wx.BOTTOM | wx.TOP,
border=1)
[ 23 ]
Summary
In this chapter, we explored a number of interesting image processing effects. We
used dodging and burning to create a black-and-white pencil sketch effect, explored
lookup tables to arrive at an efficient implementation of curve filters, and got creative
to produce a cartoon effect.
In the next chapter, we will shift gears a bit and explore the use of depth sensors,
such as Microsoft Kinect 3D, to recognize hand gestures in real time.
[ 24 ]
www.PacktPub.com
Stay Connected: