Image Editing From Multiple Photos To One Single Best
Image Editing From Multiple Photos To One Single Best
Image Editing From Multiple Photos To One Single Best
May 2015
Abstract
This thesis is exploring multiple images taken from a similar view point. It contributes with a
editing tool that makes it easy to recompose a photo from a burst of images after the images is
photographed. With the editing tool it is simple to hand pick one single photo from the burst
of images in a practical way. Our method handles multiple images taken with a hand held
camera were an efficient alignment procedure makes up for small camera motions during the
capture. We find differences within the dynamics of a scene. These differences enables the
feature of recompose the photo by moving different regions from one photograph to another
in a graphical user interface. Our pipeline is implemented with client-side web-technologies.
A user study with questionnaire and interviews shows that our method is practical and it
works well with a large real world example.
Thanks to the Flickr users badkleinkirchheim for making the sledge-set and the bike2-set,
dmourati for the skate-set and for putting these images under the Creative Common licence.
Thanks to Morris Franken for providing useful feedback in the designing of the user interface.
Thanks to Jan van Gemert who was really supportive during the creation of this master
thesis and gave good constructive feedback.
Lastly a thank you to all the respondents who took the time to answer the survey and
the interview respondents who volunteered.
Table of contents
1 Introduction 1
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Burst of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Camera Motion 15
4.1 Image Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Keypoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Oriented and Rotated BRIEF . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.1 Keypoint localization . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.2 Accurate orientation component of the keypoints . . . . . . . . . . 18
4.3.3 BRIEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.4 ORB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
viii Table of contents
4.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 The DLT algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Isotropic point Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.6.1 Finding a Homography . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.8 Perspective Warp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Scene Motion 25
5.1 Level Set Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.1 Blob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.2 Signed Distance function . . . . . . . . . . . . . . . . . . . . . . . 27
5.1.3 Level Set Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Fast Marching Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Geodesic Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 Image Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4.1 Gaussian kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4.2 Difference map with Gaussian kernel . . . . . . . . . . . . . . . . 31
5.4.3 Label Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5 Local adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Inpainting 37
6.1 Blending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.1 Blending and matting . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.2 Poisson blending . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 Solve the discrete equation . . . . . . . . . . . . . . . . . . . . . . . . . . 40
9 Conclusions 55
References 57
Introduction
A lot of cameras contains Continuous Shooting mode. This is a feature that allows multiple
shots to be captured in a quick moment by holding down the shutter button. More generally
this mode is called burst. This feature was created to capture motion and to allow a selection
of the best image within the burst.
The limitation has been in the computational power of the camera. This mode is some-
thing that depend mostly on memory buffer size and in the DSLRs the mechanism of the
mirror flip is a key factor. Recently the computational power of smart phones, DSLRS and
compact cameras have been rising. For example iPhone 6 burst mode takes 10 photos per sec-
ond. Nowadays people tend to use burst mode greatly to take several photos of the same scene.
Today almost all devices and PCs have a web browser pre-installed and all web browsers
come with a javascript-engine. This is why javascript is the client-side scripting language of
web technologies.
In this thesis we present a plug-in free web browser image editing tool. We chose to use
javascript and HTML5 to implement our proposal as the tool in this way becomes platform
independent and users can run it in the browser on any operating system. The advantages are
that we can in a flexible way reach out to a large variety of users to test our tool. Thereby we
can get feedback easily.
1.1 Objectives
The goal of this thesis is to create an interactive tool that makes it easier for users to get the
single best picture from multiple images taken within the same scene. This involves both
picking the best preferred image and to do image editing on the photograph.
2 Introduction
The aim of this thesis is also to investigate how independent users use our proposal. We
conducted a user study with a questionnaire and interviews to determine the usability and the
end-user experience of the image editing tool. With the opportunities to recompose and to
simple hand pick an appropriate image we believe that users will be more aware of how to
photograph a scene. In the end we expect that the awareness will lead users to become more
accomplished photographers.
(a) Input 1
(b) Input 2
We have three photos taken from the same spot at slightly different times which essentially
resulted in us capturing multiple cars. We have photographies of cars but in several different
frames. The preference could be one single photo where as many cars as possible appear
visible.
Our goal is to put together details from multiple images into one single frame. Here we
are adding details from our pictures into one reference image. We select one reference view
that will include an area of the multiple cars, this is the middle frame which also contains a
car in the middle. By doing so we can now combine the left cars from the left frame and the
right car from the right frame to create the concise photo we are looking for, see figure 2.2d.
6 The goal of the Image Editing Tool
(a) Input 1
(b) Input 2
Here are two photographs from a sequence of shoots taken of a group of husky dogs.
These animals are hyperactive and their movements can sometimes be very rapid. It is a
difficult task to capture the moment when all the dogs look towards the camera and at the
same time look engaged. To do this with only one single frame is even more challenging.
In this example we decided to combine the exposure to the top-left where the dogs in
the middle are looking towards the camera with the down-left exposure where the two dogs
in the front look more focused and more serious than on the top-left frame. So here we are
replacing the faces of the two front dogs from image a with the faces of the dogs from image
b while we simultaneously maintain the middle dogs looking towards us. The result is seen
in figure 2.3 c.
When we replace the dogs faces from frame a with the frame b faces, the swapped result
must match up in a color-wise symbiosis. This is where the two exposures becomes one in a
seamless blend. We will come back to this in a later chapter.
details from several different pictures into one single frame. The difference is that it is the
same objects we are adding. Thus in this example we duplicate the two objects from our set
of frames into one single frame.
With this technique one can use several images to, for example “clone” people within a
scene and thereby visualize their path of motion in one single-frame. This cloning technique
is not something new in the scope of Image Editing. It has been explored and exploited
by photographers [3], but our application opens up a new more simple employment for the
technique which is available to users under a less demanding approach. It is much easier to
achieve results with our tool than with, for example commercialized software like photoshop.
To be aware, the conditions are that we have appropriate data suited to the purpose in form
of multiple photographies of the same scene and also taken from more or less the same
viewpoint. And preferably, the scene in question should contain some form of dynamics.
There are a lot of scenes that are of potential use for our application figures 2.5 shows
just a few example results that can be obtained.
8 The goal of the Image Editing Tool
Burst mode has been studied well in different areas of computer vision. It has been success-
fully exploited in the area of Image Denoising, the paper of Lui et.al [16] were they introduce
a homography flow representation to image alignment that led to the mobile application
Blink.
Early ideas in the direction of Spotting the Differences were found in the paper of Tasli
et.al [27] were they introduce what we in this thesis call Mosaic view. Further more the paper
proposes a user-friendly tool for exploring a personal photo gallery and recompositions with
the help of patches from different photos.
10 The goal of the Image Editing Tool
««<
Chapter 3
This chapter provides an overview regarding the implementation of the Image Editing Tool.
Here we present the algorithms and explain the main steps of the workflow that are used in
the thesis.
3.1 Outline
A summary of our application pipeline consist of three main parts:
1. Model the camera motion by aligning the images. This is done with the help of
perspective projection to compensate for small camera motions.
2. Model the scene motion by finding differences between the consecutive images.
3. Inpainting that combines the users selected layers from different images in to one
final image.
The rest of this section will cover these main parts on a more comprehensive way and the
following three chapters will go more in depth with each part.
3.2.2 Homography
A transformation between two images from the same projective plane can be calculated with
the help of a homography. The homography H describes planar projective transformations,
and it is invertible. In the case of this thesis we are dealing with 2D to 2D transformation.
The relation of two corresponding points in homogeneous coordinates are:
By aligning and adding several images together we create a mosaic as seen in Figure
3.1d. This is done by taking an input image( Figure 3.1b) and compute the perspective
transformation relative to a base image (Figure 3.1a).
3.4 Inpainting
The final step of our pipeline is blending the user selected region into the reference image.
We use the poisson image matting technique [20]. Blending is sometimes used along with
image stitching as a composition technique to reduce artifacts and other miss alignments.
Here we are using blending in a similar manner as to get rid of the visual seam that occur
when inserting a region from one image into another image. The aim of the inpainting
matting technique is to seamlessly blend the foreground, usually an object to the background,
in our case the reference image.
3.4 Inpainting 13
(a) Input 1, the base image (b) Input 2 (c) Input 2 Rectified and cropped
(d) Mosaic
Camera Motion
The creation of the mosaic is divided in: 1 Localizing points of interest referred to as interest
points or sometimes keypoints. 2 Describe the points of interest with local Descriptors.
3 Find point correspondence by match the Descriptors, estimate a Homography with the
Direct Linear Transformation (DLT), remove outliers with RANSAC and last reestimate
a Homography with all inliers. 4 Warp the separate images and stitch them together. The
stitching procedure handles multiple images from the same scene. To achieve a robust
alignment we do isotropic normalization [13] on the found points.
The image alignment is done to determine the overlap between the reference image and
the transformed image. The first task of our pipeline is to find a homography H that can
transform our input image to overlap the reference image again as seen in Figure 3.1c. A
summary of the implementation is described in figure 4.1. These steps are implemented as
described previously in this chapter.
1. Fast to compute;
Our choice of keypoint detector is a Laplacian minimum eigenvalue based feature detector
developed by [15].
As shown in figure 4.2 each pixel x̂, is centered with a surrounding discretized circle
and we consider the intensities of pixels around the circle. We then eliminate pixels whose
value is close to or almost the same as the opposed two pixels in the circle. From the
remaining pixels we approximate the Laplacian by using the value differences between pixels
on the circle. Mathematically this approximation is described by first scan to reject keypoint
positions:
if ˜ x̂) − I(
|I( ˜ x̂ + dRα )| ≥ τ and
(4.1)
if ˜ x̂) − I(
|I( ˜ x̂ − dRα )| ≥ τ then x̂ is not a keypoint
LoG(x̂) ≈ ∑ ˜ x̂ − dRα ) − I(
I( ˜ x̂) + I(
˜ x̂ + dRα ) (4.2)
α∈[0:π]
Where x̂ is the x and y position of a point, τ is a threshold value I˜ is the image smoothed
with a Gaussian filter and dRα = (Rcosα ; Rsinα ) where R is the radius and α varying in the
range [0 : π]. Thereafter we threshold the minimum eigenvalue of the hessian matrix:
Where λ is the minimum eigenvalue threshold and λ1 and λ2 are the minimum eigenvalues
of the Hessian. When the ratio of the eigenvalues is too high then the patch is considered as
edge like and the keypoint is rejected.
4.3 Oriented and Rotated BRIEF 17
The stability of the Laplacian and min eigenvalue detector is visualized in figure 4.3,
illustrating that the points are retained during rotation. Our keypoint detector has a low
computational complexity. It easily runs in real-time inside of a web browser on a standard
PC. To keep the computation time low we have a fixed number of features points per image.
Instead of sort points by non-maximal suppression radius as described by the authors of [7],
we sort the points by their score and perform a maximum suppression after detecting the
keypoints. In our case the score is the minimum eigenvalues of the Hessian matrix.
˜ x̂), I(
Fig. 4.2 a. Keypoint detection: if I( ˜ x̂ + dR) and I(
˜ x̂ − dR) are similar then x̂ is not a
keypoint. b. We also look at the neighbor pixels to avoid responses near edges.
1. Keypoint localization
m pq = ∑ x p yq I(x, y) (4.4)
x,y
Then we find the centroids of the moments to determine an orientation to each patch as
described by Rosin [23]:
m10 m01
C= , (4.5)
m00 m00
And in the last step we conclude that the orientation of the patch is:
As the angle measures are consistent regardless of the corner type this will also work for
our laplacian detector.
4.3 Oriented and Rotated BRIEF 19
4.3.3 BRIEF
Binary descriptors works by comparison of pairs of intensity values. One of the basic binary
descriptor is the Local Binary Patterns (LBP). The LBP compares gray level value of one
specific pixel with values of its neighbors to create one descriptor. BRIEF: Binary Robust
Independent Elementary Features [9], is a compact binary descriptor. It randomly select
pairs of intensity values with a fixed sampling pattern of ether 128, 256 or 512 pairs. The
comparison of intensity values is a binary test τ on a patch p :
(
1 if p(x) < p(y)
τ(p; x; y) := (4.7)
0 otherwise
here p(x) is the intensity of p at point x, smoothed. The BRIEF descriptor is build up by
a n-dimensional bit string of these tests:
Where n is the number of sampled pairs. In the original ORB paper [25] they choose the
patch size to be a 31 × 31 pixel window and show that this yields near optimal results for
resistant to noise. Our descriptor is of the same size this means that one descriptor vector
consists of 256 tests and are 32 bytes wide.
4.3.4 ORB
Since the BRIEF is sensitive to in-plane rotation ORB fixes this by adding rotation invariance
to BRIEF by rectifying the patches with the help of the keypoints orientation . ORB does
this by using the orientation θ to rectify each patch with its corresponding rotation. In this
way ORB minimize the correlation under various orientation changes.
The similarity between descriptors is measured with the Hamming distance between
the corresponding binary vectors. This is very efficient because Hamming distance can be
computed with a bitwise XOR operation followed by a bit count. This yields fast comparison
computation compared to (L2) Squared Eucledian distance or L1-distance between the
feature vectors
20 Camera Motion
4.4 Matching
For matcher we use a brute force matcher and the Hamming distance to find corresponding
feature points in two images. That is we compare all features in one image against all features
in the other image.
So for all the feature descriptors in image one, we loop through all feature descriptors
from image two and compute a distance between them to find the first nearest neighbor
(1NN) and the second nearest neighbor (2NN). We use the Hamming distance as a measure
between the feature vectors. If they pass the ratio test 1NN/2NN < than a threshold, then we
consider these two points as a corresponding pair. This criterion is also termed as Lowe’s
criterion [17].
h11
h12
0 −x1 x1′ −y1 x1′ −x1′ 0
x1 y1 1 0 0 h13
0 0
0 x1 y1 1 −x1 y′1 −y1 y′1 −y′1 h21
0
. . .
.. .. .. .. .. .. .. .. .. ..
. . . . . . . h22 = (4.9)
..
0 −xn xn′ −yn xn′ ′
−xn h23 .
xn yn 1 0 0
.
0 0 0 xn yn 1 −xn y′n −y1 y′n ′ h
−yn 31
..
h
32
h33
A, The left hand side matrix is a 2N × 9 matrix. Where xn and yn is the coordinates of a
sampled point-correspondence and N is the number of sampled points. The homography h
is a vector. The right hand side is a 2N zero vector. To estimate a homography of a pair of
images we need to have four or more point correspondences. This set of linear equations can
be solved with standard methods and in this thesis we are using the convenient and stable
4.6 Isotropic point Normalization 21
A = USV T (4.10)
The solution to h is the singular vector corresponding to the smallest singular value.
When the singular values in S are sorted in descending order then this is the last column of V.
1 n
mx = ∑ xi (4.11)
n i=1
1 n
my = ∑ yi (4.12)
n i=1
1 n
q
d = ∑ (xi − mx )2 + (yi − my )2 (4.13)
n i=1
p p
2/d 0 −mx 2/d
p p
T = 0 2/d −my 2/d (4.14)
0 0 1
Where xi and yi are the coordinates from either the point set x or x′ . So here T is the
similarity transformation which gives us normalized points from image one as:
x̂ = T x (4.15)
22 Camera Motion
x̂′ = T ′ x′ (4.16)
4.7 RANSAC
To reject incorrectly matched features and to get a robust estimate of the homography we apply
RANSAC. RANdom SAmple Consensus [12] is used to iteratively improve the estimated
model parameters by removing outliers from the data. For our problem the mathematical
model to fit is the Homography mapping the two input images and the outliers/inliers are the
matched images points. The RANSAC algorithm is outlined by:
1. Randomly select a subset of the matched image points and estimates a homography
from this subset. Here the estimation is done with the normalized DLT.
2. The points from the entire dataset thats fits the model with an error less then a threshold
are inliers and they are placed in a consensus set. As an error measurement function
we use the Symetric transfer error:
3. When the maximum number of iterations has been reached then the homography is
recomputed, but this time from the points in the consensus set and now we also need
to denormalize the homography.
4.8 Perspective Warp 23
To avoid holes and other artifacts we use inverse warping Algorithm. That is the mapping
of pixels is done in reverse order, from destination to the source. For each pixel of the
destination image the functions compute coordinates of the corresponding pixel in the source
image.
Chapter 5
Scene Motion
After aligning the individual images and determine their overlap with respect to a single
reference image we will now address the problem of finding the differences. We are creating
a difference image in which the differences are converted into layers that users can interact
with. These layers are for example highlighted with a color to visualize a difference between
the two images.
One naive approach of finding differences would be to subtract the values of the two
images at pixel level. Although this approach would work in an ideal environment in which
two images are perfectly aligned, this is almost never the case in real-world scenarios where
even the small exposure divergences will be registered as differences. We would however
have as result, a difference-image that is cluttered with a huge number of small differences,
as in figure 5.1.
The intention behind our approach is to compare regions inside of the two images rather
than looking at individual single pixel differences. These regions can be seen as blobs. For
each region, we create a separate layer to make them intractable and in that way users can
click and choose which layered blob region they would like to include in the final result.
This means that each separate "blob"-layer is assigned an unique id so that our underlying
architecture can keep track of the users interaction.
26 Scene Motion
Fig. 5.1 The Pixel Differences between input 1 (fig 3.1a) and input 2 (fig 3.1c). In the
Difference-image we can se that static objects are registered as differences, like the sculpture
on the left side and in the roof. this is not something we want to achieve.
5.1.1 Blob
The intractable difference region that we termed blob can be described as similar to a curve
that is defined on our pixel grid. For convenience we call the grid Ω and the blobs C. We
represent our blob by all the points inside of it in an implicit way with the function φ .
5.1 Level Set Method 27
We can implicitly represent boundaries directly as the zero level of the function φ . One
fundamental of the Level Set Method is that it represent the zero level of the function φ . This
function is sometimes called the embedding function in the literature because it embeds the
boarder as the zero level set of a signed distance function. Our blob C is the zero level set of
the embedding function:
C = {x ∈ Ω | φ (x) = 0} (5.1)
We represent a blob by all the points that holds a condition. In other words we separate the
pixel domain of Ω into three different subdomains namely Ω− the inside portion of the blob,
Ω+ the outside portion of the blob and ∂ Ω the border between the inside and the outside.
The border is also the zero level set defined as:
C = {x ∈ Ω | φ = 0} ⇒ ∂ Ω (5.2)
C = {x ∈ Ω | φ > 0} ⇒ Ω+ (5.3)
C = {x ∈ Ω | φ < 0} ⇒ Ω− (5.4)
This gives the distance to a blob and in our case we defined that the sign is negative inside a
blob and positive outside a blob.
28 Scene Motion
Every point on the boarder of the blob in figure 5.2c has zero distance to the blob, every point
inside the blob has negative distance and every point outside the blob has positive distance.
Blobs can be evolve by thresholding the level. What change the evolution of a blob is the
motion in the normal direction. That is the outer normal which is the vector:
∇φ
n= (5.6)
|∇φ |
The speed of the movement is given by F, here the variable F is the speed in the outward
normal direction. The motion comes from the difference of two images smoothed with a
Gaussian kernel. The Level Set Equation is defined as:
φt + F|∇φ | = 0 (5.7)
and can be reformulated as ∂∂tφ = F|∇φ |. This is a Partial Differential Equation (PDE) with
two variables; x in space and the threshold t. We are checking at any threshold t where we
have a sign change, and by mark that will give us a blob. Blobs are evolved by iterating this
function. This way of representing gives the advantage that blobs can undergo splitting and
merging through changes of the threshold as seen in figure 5.3, during all kinds of splitting
or merging of blobs we can read out the value of φ at any threshold t.
The Level Set Equation requires that at a blob location, the embedding function φ must be
zero for all thresholds [19] [26]. When we evaluate the embedding function at threshold t
5.2 Fast Marching Method 29
φ (C(t),t) = 0 ∀t (5.8)
Here we have a two dependence on t. One, we consider the embedding function at different
thresholds. The other dependency of t is the spatial location where we locate a blob at the
threshold t.
Fig. 5.3 The threshold value is changed by scroll the mouse wheel. This causes splinting and
merging of blobs and these are handled with the level set method
|∇T |F = 1, T = 0 on Γ (5.9)
Here Γ is a blobs initial location where the methods start and work outwards. From Γ, the
pixels that we start from are said to have a frozen condition and we compute distances at
their neighbors. Thorough implementation details are explained in [4] but in essential we are
30 Scene Motion
1
|∇T | = (5.10)
F
Where again T is the arrival time of the front and F is the speed of the evolution. We are
following Sethian’s approach to compute distances on our pixel grid. The initial values of the
boundary are known, and the nodes neighboring these can be updated by the square length of
the gradient:
max(V −V ,V −V )2 +
2 A B A C
|∇T | = (5.11)
max(VA −VD ,VA −VE )2
Where VA is the unknown distance value and the rest is the neighboring values as in figure
5.5. Here we are only using known values to compute the distance value and if some of the
neighbors are not known then these terms are dropped out of the equation.
1 u2 +v2
G(u, v) = e σ2 (5.12)
2πσ 2
cluttered with small irrelevant parts, so, for that reason we do a second smoothing this time
with a larger post sigma. This procedure is outlined in algorithm 1 and this is done for all
input images with respect to the reference image.
The result obtain by computing the difference of image 3.1a and image 3.1c with our
method is visualized in figure 5.7 top right, this is our difference image. The difference
image is a real valued height map represented in such way where high values mean that we
have a large difference between the two images and small values stands for small to non
difference at all at a specific point. It is from this algorithm that we create our φ function as
described in section 5.1.3.
Fig. 5.7 (Top Left) The labeled blobs found in the (Top Right) Contours of the Difference
image φ . (Middle) The difference image φ plotted in 3D, (Bottom) The inverse of the
Difference image labeled at a higher level than the top left
34 Scene Motion
In other words we are segmenting out regions lower than a threshold, this is visualized in
Figure 5.7 where we go from the difference image figure 5.7 right to labeled layers figure 5.7
left.
This is done with our blob extraction algorithm which start by thresholding the difference
image. So if a point P’s difference value is below a specific threshold th it will be set to zero
in a layer map otherwise we start labeling P as a new label. The labeling is done by check if
the pixel P has 8-connectivity with another pixel label, however if any of P’s 8-connecting
pixel neighbors (see Figure 5.8) are already labeled then P will be part of that label. Setting
the threshold to lower values will yield bigger blobs. This is outlined in Algorithm 2. As seen
in the bottom of figure 5.7 we can label the same persons appearing two times differently,
here with label 1 and 2.
For all selected layers we compute a relative distance map (figure 5.9 right) by solving
the Eikonal equation 5.10. This gives the users the ability to adjust each individual layer
separately with a relative threshold value.
5.6 Optimization
As stated in section 5.2 the values of the boundary is our initial curve Γ and these values are
known values, the neighboring pixels are updated and inserted into a priority que. The pixel
with the smallest distance are then removed and marked as known. The computation of a heap
based priority queue normally requires O(n log n) operations for n nodes. There is however
a more effective solution presented by Yatziv et.al [28] it the paper O(N) implementation
of the fast marching algorithm they termed the solution Untidy Priority Queue. This queue
make use of the fact that any key insertion is always greater than or equal to the key item we
just popped off. By maintaining a circular array of lists, items with the lowest key is always
popped first.
Chapter 6
Inpainting
6.1 Blending
The last step of our pipeline is to combine the selected blob elements with the reference
image and to manipulate these multiple sources into one convincing final image. This is
obtained by blending the user selected blob into the reference image. Blending is sometimes
used together with image stitching as a composition technique for correcting the difference of
exposure in multiple shoots, and it is also used to reduce other miss alignments and artifacts
that can occur when stitching images ( see [20] and [8]).
We are using blending in mainly two ways. One way is to add an object from a source
image into the reference image as seen in figures 6.1 a - c. The other way is to remove an
existing object in the reference image with a background patch from a source image as seen
in figures 6.1 d - e.
One form of blending is the closed form matting [21]. The closed form matting equa-
tion: I = αF + (1 − α)B is used to form a new image from a background (BG ) and a
foreground( FG)
layer where these two layers can come from different
sources. In the case
1, in FG F(x, y), where α = 1
of α(x, y) = . Will give the resulting image I(x, y) = .
0, in BG B(x, y), where α = 0
38 Inpainting
Fig. 6.1 In the left column we see the masked region marked as a green transparent layer.
In the middle column we see the masked out region from the source image pasted into the
reference image. In the right column we see the masked out region from the source image
blended into the reference image. The mask and the result were created by our application.
6.1 Blending 39
This process is similar as cutting out a region from a foreground layer and putting it on the
background image. In the resulting image we would see the edges and/ or a seam of the
different layers. So instead of using the matting equation we will use a different terminology
where we have three layers: A source image, A target image and a mask Figure 6.2.
Fig. 6.2 Left: Source, Target and Mask. Right Visual Seam. Image credit [22]
We use the gradient based Poisson image matting technique to seamlessly blend the selected
region into the reference image. In the paper of Peréz et al ([20]) they present the Poisson
Image editing. To describe the process Peréz et al use instead of a binary mask M and
a source image S, they describe the source region as defined over a closed region Ω, the
boundary of the region is denoted ∂ Ω and v is a vector field defined over the source region
Ω, see figure 6.3.
Now the objective is to make the edges inside Ω to look like the edges of the source S
and to make the colors on the boundary ∂ Ω to match the targets colors. This is formulated as
an optimization problem in the continuous domain as:
ZZ
min |∇ f − v|2 with f |∂ Ω= f ∗ |∂ Ω (6.1)
f
Ω
Here f is the intensity value at pixel p which also can be described as I(x, y). v is the
gradient field of the source ∇S(x, y). And f ∗ is the known destination function. Ω is the
inside of our mask region where we are optimizing the image color of our composite. In
other words we are minimizing the difference of the gradient of the image and the gradient
40 Inpainting
of the source by making it as small as possible. So we can rewrite the equation as:
ZZ
min |∇I(x, y) − ∇S(x, y)|2 dxdy (6.2)
I(x,y)
Ω
and from the equation 6.1 we have a constraint such that on the boundary the composite
colors have to match the targets colors:
This is to get rid of the visual seam while we are making the edges look as good as possible.
In continuous domain this is a variational calculus problem that have the solution [22]:
I(x, y) = ∇2 s(x, y) in Ω
(6.4)
s.t. I(x, y) = T (x, y) on ∂ Ω
The result can be seen by comparing figure 6.1b with figure 6.1c and figure 6.1e with
figure 6.1e. Where we see that the edges of the region has been smeared out into the
background and the colors inside the mask now match up.
in figure 6.3.
∑ fq + ∑ fq∗ + ∑ v pq
q∈N p ∩Ω q∈N p ∩∂ Ω q∈N p
fp = (6.5)
|N p |
1. ∑ fq
q∈N p ∩Ω
2. ∑ fq∗
q∈N p ∩∂ Ω
3. ∑ v pq
q∈N p
do
for all pixels do
if pixel p is inside the mask then
// solve the system of linear equations
for each RGB channel do
var sum_ fq = 0; // step (1)
var sum_∂ Ω = 0; // step (2)
var sum_v pq = 0; // step (3)
for all q do
if q is inside the maks then
sum_ fq + = result_pixel[q]; // step (1)
else
sum_∂ Ω+ = base_pixel[q]; // step (2)
end
sum_v pq + = src_pixel[q]; // step (3)
end
// Store the Estimeted RGB value of p
end
end
end
check if converged
while not converged;
Algorithm 3: Solve Poisson with Gauss–Seidel method
Chapter 7
Our whole proposal is implemented in javascript although this is not the fastest or the most
supported (in terms of "of the shelf" library’s) language, the reasons for using this technology
are that our experiments is designed to be very dependent on human interaction, as motivated
in Chapter 2. So by have the experiment live inside a webpage we can allow easy access for
users to test and interact with it.
7.1 Overview
Our proposal is an interactive web tool that make it easy to get the single best picture from
multiple exposures taken within the same scene. With the tool it is possible to recompose a
photo from a burst of images after the image is photographed. This is done by remove, add,
swap or combine objects in each individual photograph into one reference photo, as seen
in figure 7.1. Supported web browsers are Chrome, Firefox and Safari although Chrome is
highly recommended. To Explain the user interface we will take a step by step walk through
in the next sections.
The user selects their choice of reference image by scrolling the mouse wheel inside the
"mosaic view" in this way they can traverse multiple images. To select one base reference
image is simply done by click in the highlighted image.
(a) Default view presented initial to (b) Mouse wheel scroll in one direc- (c) Mouse wheel scroll in the other
the users. Mosaic view tion direction
The difference between the reference image and all other images are computed and high-
lighted, these highlighted areas are called layers. This allows the users to select content
to add or remove. Adjusting the Global threshold is done by scroll the mouse wheel in
anywhere except in the green layer, this is done to split or merge the layers with each other.
The green is explained in next step.
(a) Mouse wheel scroll in one direc- (b) Mouse wheel scroll in one direc-
tion to increase the global threshold tion to decrease the global threshold
When some content that the user want to change is present they can click in that layer to
make changes in the image. The user can toggle the content of that specific layer by click in
7.1 Overview 45
it. The green layer is just an indication that the layer is selected. The content inside the green
layer are coming from an image other than the reference image. The content inside all other
colored layers then the green mean that that content are coming from the reference image.
Different colors means that in that area there is differences with respect to the reference
image. The reference image in this example only have one other image then the reference
image so there is only one blue color indicating differences between these two. To choose
a green layer means that when the user clicks the blend button then the application will
automatically blend the content we see inside the green layer into the reference image.
Adjust the Relative threshold by scroll the mouse wheel inside any green layer, the layer you
are currently hovering are highlighted by a light-green color to indicate which region you are
adjusting.
(a) Mouse wheel scroll in one direc- (b) Mouse wheel scroll in one direc-
tion to increase the global threshold tion to decrease the global threshold
If the user are not satisfied with what the layer regions are covering they can manually paint
or erase the layers by selecting these options in the check-box. Simply paint inside a green
region to add layer. Erase parts of the layer by first select the eraser to the left.
When the user is satisfied with the ruff result presented they just click the blend button to let
the application blend in all selected layers in the reference image.
For a comprehensive view over Step 2 - 4a please see the flowchart in Appendix A.
Chapter 8
In this chapter we describe the data collection approach of our research study. The methods
we have chosen and why we choose these methods are explained. We also evolve why we
based our study on a questionnaire and interviews.
The reason why we chose a web-based survey was partly because it can easily be spread
rapidly to various online forums but also because we can easily compile the results with the
provided survey tools.
After gathering the data the material were compiled in various charts in order to make
the analysis of the data easier and transparent. The survey includes a total of 9 questions
(see Next Section). These questions were of different types. We used most multiple choice
questions, which meant that the respondent could choose one or more possible answers to a
question. We provided one open question, made possible for respondents to write down what
they thought. Non of the questions where made as a Required question. These questions
were then used as the basis for subsequent interviews.
Before publishing the surveys we conducted a small pilot test for a small number of
people to detect any flaws in the formulations of the questions and in the answer-options. The
pilot test helped us to see the questionnaire from a different perspective and it also helped
alter some questions before we published and sent out the questionnaire.
different way. Afterwards we asked the user to use approximately 10 minutes to go through
our set of example image (see Appendix A.2) meanwhile we took notes of the comments
they gave during interaction with the Tool.
8.4 Results
In this section we are presenting the result and analysis from the quantitative survey, and the
four qualitative semi-structured interviews. Through the survey, we received a total of 14
responses, of which 7 of the respondents did fill in the second section of the survey.
There was a shared view on how to use the tool, just as many said they would remove
content as said they would add content from different images. One third had the opinion that
they would use the tool to swap content, and only two of the respondent would use it for
duplication of content.
50 Evaluation and Results
We asked about how the tool might affect the awareness of photographer within a scene.
Here we let respondents rate their opinion from 1-5 where 1 represented no influence and 5
represented a lot of influence. These responses indicate that the tool may have impact on
how a photographer interacts and perceives the scenery surrounding them.
Here we present the results from the second section of the questioner. These questions cover
the steps of figure 7.1. The graphs to the corresponding steps are found on next page.
Step 1 As seen in graph a the majority found that the Mosaic view were helping them select
a reference image. In graph b we see that the majority of the respondents would prefer
to choose a reference image as presented in the Demo and only two said that they
would prefer to look through each image one by one.
Step 2 In graph c we see that the respondents were able to perceive what the global threshold
does, except for one single individual.
Step 4 Graph d tells us that the users were able to fine tune the selected layers, but this
does not tell us whether they tried to adjust with the relative threshold or with the
Paint/Erase tool.
Step 5 The graph in e tells that the respondents rate their opinion of the outcome of the
resulting image as positive.
8.4 Results 51
(a) Step 1
(b) Step 1
(c) Step 2
(d) Step 4
(e) Step 5
52 Evaluation and Results
Here we present the results of the part of the interviews when the user commented on how
they used and interacted with the tool.
Fine tune adjustment with Paint Layer. There were several comments of how to improve
the Paint Layer feature. The first was to only paint in one layer that the user self choose.
Because it is a problem when there is multiple layers selected that are next to each
other or multiple selected layers that are overlapping. When the user paints in one
layer our tool chooses the layer that the pointer starts to paint in but it. This situation
were perceived as the users lost control of the painting.
Fine tune adjustment with Erase Layer. There turned out to be a similar situation when
Erase Layer is selected. When multiple selected layers next to each other or selected
overlapping layers, then the user can not control in which layer to Erase but instead the
erase tool will erase all the layers underneath the eraser.
Choose which layer to Paint/Erase. Suggestions where made that the user should be able
to choose which layer to Paint or Erase in.
Erase without selecting. On the contrary to previews comment suggestions from other
users where that it should not be necessary to first select a layer when Erasing in
it. Instead they thought it would be better to erase any layer when the Eraser tool is
selected.
Ctrl-z. Other suggestions where made that when the user have Painted or Erased in a layer
they should be able to regret their last commands by pressing Ctrl-z on the key board
to go back.
Zoom In. A desired feature when doing Fine Tune adjustment is to be able to zoom in on a
region to be able to do even more fine grained adjustments.
Display threshold value. When doing adjustments to a threshold value it could help to
display that value.
8.4 Results 53
Display the options. Another suggestion was to show our layer options separately on a
smaller window next to the main canvas.
• The Instructions could be even more clearly and perhaps it would help to have interac-
tive instructions. That is instructions that shows a small popup window on the different
tool options.
• With the help of the mosaic view some user experienced that they gained knowledge
about 3d perspective and the relation to 2d plane.
• Users would probably use the tool in a group portrait photo to make everybody look
good.
• The static differences does not effect the users experience, acording to the respondents
when they have such control over what blobs to include or not include.
Chapter 9
Conclusions
We have presented and implemented a web based image editing tool that help photographer
go from multiple images to one best preferred. Our approach finds the differences with the
help of Level Set Methods. On an optimal way we do local adjustments of segments with
Fast Marching Methods. The pipeline runs in a web browser without an external server, only
a web server storing the necessary HTML5 and javascript files.
A user study validates that users could work and get satisfying results from multiple images
of the same scene in a easy way by interacting with our proposed method. The empirical
data gathered from the survey and the four interviews has been the basis of our analysis.
The results clearly show that the tool is useful it also suggest that there is a lot to improve
when it comes to the user friendliness. As we saw in most comments and suggestion they
were about the fine tune Paint/Erase tool. This was unexpected because the main focus was
on developing a good and fast enough algorithm for finding differences, the manual fine
tuning was not our greatest priority but our study shows that this area is more important than
expected.
The user study shows that the tool is useful in multiple areas as adding, removing,
swapping and duplication of content. Division in how the respondents thought about how
they would use the tool was relatively unpredictable, this result were surprising as we would
expected that most user would choose to use the tool for duplicate content.
Our results also confirms the hypothesis that the Image editing tool leads to better pictures
and that it can help that users to be more aware of how to photograph a scene.
References
[1] (2009). Mit 18.336 numerical methods for partial differential equations, spring
2009. (Accessed 25 March, 2015) [online] http://ocw.mit.edu/courses/mathematics/
18-336-numerical-methods-for-partial-differential-equations-spring-2009/.
[2] (2012). Black, noel and moore, shirley. "gauss-seidel method. from mathworld–a
wolfram web resource. (Accessed 28 March, 2015) [online] http://mathworld.wolfram.
com/Gauss-SeidelMethod.html.
[3] (2012). mathieu bernard-reymond intervalles series. (Accessed 28 March, 2015) [online]
http://matbr.com/intervalles/.
[4] Bærentzen, J. A. (2001). On the implementation of fast marching methods for 3D lattices.
[5] Bai, X. and Sapiro, G. (2007). A geodesic framework for fast interactive image and
video segmentation and matting. In Computer Vision, 2007. ICCV 2007. IEEE 11th
International Conference on, pages 1–8.
[6] Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. B. (2009). Patchmatch: A
randomized correspondence algorithm for structural image editing. ACM Trans. Graph.,
28(3):24:1–24:11.
[7] Brown, M., Szeliski, R., and Winder, S. (2005). Multi-image matching using multi-scale
oriented patches. In Proceedings of the 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’05) - Volume 1 - Volume 01, CVPR ’05,
pages 510–517, Washington, DC, USA. IEEE Computer Society.
[8] Burt, P. J. and Adelson, E. H. (1983). A multiresolution spline with application to image
mosaics. ACM Trans. Graph., 2(4):217–236.
[9] Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., and Fua, P. (2012).
BRIEF: Computing a Local Binary Descriptor Very Fast. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 34(7):1281–1298.
[10] Caselles, V., Kimmel, R., and Sapiro, G. (1995). Geodesic active contours. International
Journal of Computer Vision, 22:61–79.
[11] Dubrovina, A., Hershkovitz, R., and Kimmel, R. (2014). Image editing using level
set trees. In Image Processing (ICIP), 2014 IEEE International Conference on, pages
4442–4446.
58 References
[12] Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: A paradigm for
model fitting with applications to image analysis and automated cartography. In Comm. of
the ACM, volume 24, pages 381–395. CVIU.
[13] Hartley, R. I. (1997). In defense of the eight-point algorithm. IEEE Trans. Pattern Anal.
Mach. Intell., 19(6):580–593.
[14] Hartley, R. I. and Zisserman, A. (2004). Multiple View Geometry in Computer Vision.
Cambridge University Press, ISBN: 0521540518, second edition.
[15] Lepetit, V. and Fua, P. (2004). Towards Recognizing Feature Points using Classification
Trees. Technical Report IC/2004/74, EPFL.
[16] Liu, Z., Yuan, L., Tang, X., Uyttendaele, M., and Sun, J. (2014). Fast burst images
denoising. ACM Transactions on Graphics (TOG), 33(6).
[17] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. Int. J.
Comput. Vision, 60(2):91–110.
[18] Malladi, R., Sethian, J. A., and Vemuri, B. C. (1995). Shape modeling with front
propagation: A level set approach. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 17:158–175.
[19] Osher, S. and Fedkiw, R. P. (2003). Level set methods and dynamic implicit surfaces.
Applied mathematical science. Springer, New York, N.Y.
[20] Pérez, P., Gangnet, M., and Blake, A. (2003). Poisson image editing. In ACM SIG-
GRAPH 2003 Papers, SIGGRAPH ’03, pages 313–318, New York, NY, USA. ACM.
[21] Porter, T. and Duff, T. (1984). Compositing digital images. In Proceedings of the 11th
Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’84,
pages 253–259, New York, NY, USA. ACM.
[22] Radke, R. J. (2012). Computer Vision for Visual Effects. Cambridge University Press,
New York, NY, USA.
[23] Rosin, P. L. (1999). Measuring corner properties. Computer Vision and Image Under-
standing, 73(2):291–307.
[24] Rosten, E. and Drummond, T. (2006). Machine learning for high-speed corner detection.
In European Conference on Computer Vision, volume 1, pages 430–443.
[25] Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). Orb: An efficient
alternative to sift or surf. In Proceedings of the 2011 International Conference on
Computer Vision, ICCV ’11, pages 2564–2571, Washington, DC, USA. IEEE Computer
Society.
[26] Sethian, J. A. (1999). Level set methods and fast marching methods : evolving interfaces
in computational geometry, fluid mechanics, computer vision, and materials science.
Cambridge monographs on applied and computational mathematics. Cambridge University
Press, Cambridge, GB. Première édition parue sous le titre : Level set methods, 1996.
References 59
[27] Tasli, H. E., van Gemert, J. C., and Gevers, T. (2013). Spot the differences: From
a photograph burst to the single best picture. In ACM International Conference on
Multimedia.
[28] Yatziv, L., Bartesaghi, A., and Sapiro, G. (2006). O(n) implementation of the fast
marching algorithm. Journal of Computational Physics, 212(2):393 – 399.
Appendix A
For a comprehensive view and to catch up with Step 2 - 4a, we here present flowchart. These
include handling of mouse scroll wheel, mouse point and mouse click when the check box
Select Layer is selected.