EZ-sketching
Three-level optimization for error-tolerant image tracing
Su, Qingkun; Andy Li, Wing Ho; Wang, Jue; Fu, Hongbo
Published in:
ACM Transactions on Graphics
Published: 01/07/2014
Document Version:
Post-print, also known as Accepted Author Manuscript, Peer-reviewed or Author Final version
Publication record in CityU Scholars:
Go to record
Published version (DOI):
10.1145/2601097.2601202
Publication details:
Su, Q., Andy Li, W. H., Wang, J., & Fu, H. (2014). EZ-sketching: Three-level optimization for error-tolerant image
tracing. ACM Transactions on Graphics, 33(4), Article 54. https://doi.org/10.1145/2601097.2601202
Citing this paper
Please note that where the full-text provided on CityU Scholars is the Post-print version (also known as Accepted Author
Manuscript, Peer-reviewed or Author Final version), it may differ from the Final Published version. When citing, ensure that
you check and use the publisher's definitive version for pagination and other details.
General rights
Copyright for the publications made accessible via the CityU Scholars portal is retained by the author(s) and/or other
copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal
requirements associated with these rights. Users may not further distribute the material or use it for any profit-making activity
or commercial gain.
Publisher permission
Permission for previously published items are in accordance with publisher's copyright policies sourced from the SHERPA
RoMEO database. Links to full text versions (either Published or Post-print) are only available if corresponding publishers
allow open access.
Take down policy
Contact lbscholars@cityu.edu.hk if you believe that this document breaches copyright and provide us with details. We will
remove access to the work immediately and investigate your claim.
Download date: 16/01/2025
EZ-Sketching: Three-Level Optimization for Error-Tolerant Image Tracing
1
Qingkun Su1
Wing Ho Andy Li2
Jue Wang3
Hongbo Fu2
2
Hong Kong University of Science and Technology
City University of Hong Kong
1
Adobe Research
a
b
c
d
e
f
Figure 1: The proposed system automatically refines sketch lines (a, c, e) (created by different users) roughly traced over a single image in
a three-level optimization framework. The refined sketches (b, d, f) show closer resemblance to the traced images and are often aesthetically
more pleasing, as confirmed by the user study. Image credits: John Ragai; Jeremy G.; Aubrey.
Abstract
1
We present a new image-guided drawing interface called EZSketching, which uses a tracing paradigm and automatically corrects sketch lines roughly traced over an image by analyzing and
utilizing the image features being traced. While previous edge
snapping methods aim at optimizing individual strokes, we show
that a co-analysis of multiple roughly placed nearby strokes better captures the user’s intent. We formulate automatic sketch improvement as a three-level optimization problem and present an efficient solution to it. EZ-Sketching can tolerate errors from various
sources such as indirect control and inherently inaccurate input, and
works well for sketching on touch devices with small screens using
fingers. Our user study confirms that the drawings our approach
helped generate show closer resemblance to the traced images, and
are often aesthetically more pleasing.
Freeform drawing gives artists complete freedom but is rather difficult to master. In contrast, the technique commonly known as “under painting” provides a target image as visual guidance and allows
users to paint or draw over it. This method has been widely adopted
by novice users as a means to improve their drawing skills, and to
generate a plausible drawing that otherwise they cannot achieve.
It has also been extensively used by professionals to improve the
composition and layout accuracy of their art work derived from the
image.
CR Categories: I.3.3 [Computer Graphics]—Line and curve generation J.5 [Arts and Humanities]: Fine arts;
Keywords: interactive drawing, stroke improvement, edge snapping, co-analysis
Links:
DL
PDF
W EB
© ACM, 2014. This is the author's version of the work.
It is posted here by permission of ACM for your personal
use. Not for redistribution. The definitive version was
published in ACM Transactions on Graphics, {Volume 33,
Issue 4, (July 2014)}
http://doi.acm.org/10.1145/2601097.2601202
Introduction
One popular drawing style is line drawing, i.e., using abstract
strokes to depict the shapes of objects in a scene. When drawing
over an image, the user usually wants input strokes to be aligned
with object boundaries for drawing accuracy. However, accurate
tracing is easy to achieve only when it is operated under direct manipulation and an input device is accurate, e.g., tracing a printed
photo with a pencil. Due to the gap between the control space and
the display space, indirect interaction makes it difficult for the user
to accurately express his/her intention when drawing a stroke, even
if the input device (e.g., a computer mouse or even a graphic tablet
like Wacom Intuos) itself is of high precision. On the other hand,
when the input device is inherently inaccurate (e.g., existing touch
devices due to the well-known fat-finger problem), it is extremely
difficult for the user to specify strokes that are accurate enough for
a high quality sketch.
This work introduces a novel stroke refinement technique that automatically refines the location, orientation, and shape of input
strokes by exploiting gradient features from the underlying image
being traced (Figure 1). Automatically adjusting user strokes with
respect to image gradients, i.e., edge snapping is not an easy task,
since a desired image edge that the user intends to trace is often
mixed with other, possibly much stronger ones in complex image regions, and is difficult to extract even with the state-of-the-art
edge detectors (e.g., [Arbelaez et al. 2011]). Although various approaches have been proposed for tackling this problem, they all try
to use local image features (i.e., edges near the user stroke) to find
a locally-optimal solution for each user stroke independently, thus
always face the danger of snapping to a wrong edge.
Our key idea is that while identifying a desired edge for a single stroke might be ambiguous, properly modeling the correlation
among multiple input strokes in a larger image region can greatly
resolve ambiguities. We thus propose a bottom-up optimization approach, which optimizes across three levels: local, semi-global, and
global, to improve user strokes and at the same time preserve the
original drawing style.
Our experiments show that the proposed technique is robust against
various sources of error, e.g., errors caused by indirect control or
inaccurate touch devices. It is flexible and applicable to many types
of images, as shown in Figure 1. Moreover, our approach is simple
and efficient, and can run at an interactive rate (see the accompanying video). Our user study confirms that the user drawings refined
by our algorithm show closer resemblance to the traced images than
the original ones, and are often visually more pleasing.
2
Related Work
Several systems have been developed in recent years to help users
draw better. The iCanDraw system [Dixon et al. 2010] assists users
in drawing human faces by providing corrective feedback. Iarussi
et al. [2013] extend this idea and provide more general guidance
to draw arbitrary models. ShadowDraw [Lee et al. 2011] provides
guidance for freeform drawing of objects by dynamically retrieving
and blending relevant images from a large database with respect to
the user’s current sketch. Such a computer-assisted drawing concept has also been explored for drawing, painting or even sculpture
with traditional media and tools [Flagg and Rehg 2006; Laviole and
Hachet 2012; Rivers et al. 2012]. Unlike these works, our system
adopts the widely used “under painting” technique and utilizes an
underlying image as visual guidance for drawing.
Gingold et al. [2012] show how to aggregate a crowdsourced set of
low-quality inputs to produce better drawings, paintings, and songs.
Limpaecher et al. [2013] take a similar crowdsourcing idea to correct sketches traced over a picture. That is, their work shares the
same goal as ours but adopts a completely different approach. For a
given picture, their approach first collects a large database of drawings by different artists through crowdsourcing and then uses a consensus of the collected drawings to correct new drawings. While
their results are very positive, their technique is limited to pictures
accompanied with a rich set of registered drawings and has been
demonstrated for face portraits only. In contrast our technique demands the information from the picture being traced only and is
thus more general.
Our local optimization step is largely inspired by the existing efforts
on edge detection or contour extraction. Automatic edge detectors,
even with the state-of-the-art techniques [Arbelaez et al. 2011], tend
to produce noisy and/or inconsistent edges. The traditional interactive contour extraction methods such as active contour model [Kass
et al. 1988] and intelligent scissor [Mortensen and Barrett 1995]
typically require very precise user input for accurate selection. The
recent work by Yang et al. [2013] allows the user to roughly scribble over the desired region to extract semantically dominant edges
in it. We instead do not explicitly extract image edges given our
goal of a drawing interface, where different input drawing styles
should be retained. Kang et al. [2005] present an interactive system for generating artistic sketches from images, based on livewire
algorithms [Mortensen and Barrett 1995]. However like other contour tracing techniques, it does not solve the inherent ambiguity
problem during local snapping. Instead, we show that a co-analysis
of multiple roughly specified strokes often lead to results that better
capture the user’s intent. This co-analysis procedure bears some resemblance to the sketch-based co-retrieval idea in [Xu et al. 2013].
Our work is also related to the previous interactive or automatic
beautification systems [Igarashi et al. 1997; Orbay and Kara 2011;
Zitnick 2013], none of which, however, uses a picture to guide the
beautification process. We do not regard our technique as a beautification system, since drawing is a creative process and we try to
preserve the original drawing style of the user. The main purpose of
snapping input sketches to image edges is to improve the drawing
accuracy, and it does not necessarily lead to aesthetic enhancement
of strokes. However in many cases our algorithm indeed improves
aesthetics, especially for users who are not good at freeform drawing.
Our semi-global optimization step essentially leads to correspondences between user-specified strokes and underlying image features. From this point of view, it is related to non-rigid image registration [Sýkora et al. 2009; Levi and Gotsman 2013] on a concept
level. Unlike image registration, which often employs one-to-one
feature point matching and computes a globally optimal transformation, our approach is essentially a candidate selection process,
by first proposing multiple matching candidates though local optimization, and then selecting from them in the semi-global level. In
addition, we have to tackle the unique issues such as temporal information (determined by the drawing order) and user interaction.
Another relevant topic is the vectorization of line drawings, where
ambiguities also exist when performing vectorization locally. For
clean drawings, the difficulty often lies near junctions, for which
automatic non-local solutions [Bao and Fu 2012; Noris et al. 2013]
often suffice. For drawings with over-traced scribbles, a possible
solution is to first group over-traced line strokes into line groups,
for example using Gabor filtering techniques. The Gabor filter used
in [Bartolo et al. 2007] bears some resemblance to our directionaware FDoG filter, with a key difference that our filter is driven by
a user-specified stroke. There exist various techniques (e.g., [Kang
et al. 2005; Baran et al. 2010]) for approximating non-smooth vector lines with high-quality curves. Such techniques might be incorporated into our method to achieve curves with high-order continuity.
3
Three-Level Sketch Stroke Optimization
As mentioned earlier, at the core of our system lies a three-level
optimization method that automatically improves the user strokes
based on image features. It consists of three components:
Local Optimization. Given a new user stroke, we first analyze
its nearby image features to propose several snapping candidates
(Section 3.1). Note that we do not simply choose one candidate as
the final result as traditional edge snapping approaches do, due to
the inherent ambiguity in local snapping.
Semi-global Optimization. Once the snapping candidates of a new
stroke are determined, we jointly optimize the current stroke and
its spatial-temporal neighboring ones, to determine their optimal
snapping results together (Section 3.2). The nearby strokes will
influence each other’s decision on candidate selection to produce a
locally consistent result.
Global Optimization. Finally, for strokes that do not have good
snapping candidates, we adjust their positions using a global optimization procedure (Section 3.3) based on an existing mesh deformation technique, to preserve the overall stroke layout.
qi+1,k i+1
Q2
v
pi
p1
p2
Gσ c
S
-T
pN
-T
Q1
m
T
-T
rs
Q0
T
-S
Figure 2: Illustration of chain graph in the local optimization step.
The graph (right) is complete but only a subset of edges are shown
to avoid clutter. Image credits: John Ragai.
Gσ s
T
e
qi,k i
3.1 Local Optimization
Given a parameterized user stroke s = {p1 , p2 , ..., pN }, where pi s
are stroke vertex coordinates in the image space, the goal of the
local optimization step is to find several good snapping candidates,
each candidate being a new stroke, which satisfy two constraints:
(1) the candidates should have similar shapes to the original stroke;
and (2) each candidate should align well with some
structural image
edges. We denote these candidates as s′j = p′j,1 , p′j,2 , ..., p′j,N ,
where j = {1, ..., M }, and p′j,i is the new location of pi in the jth
candidate. The number of candidates for each stroke M is at most
4 in our experiments.
For each stroke point pi , we look in a local neighborhood window
centered at it with radius rs for finding its snapping candidates. We
first apply a small threshold (0.2 after normalization) on the image gradient magnitudes in the window to remove all pixels that
have small gradient magnitudes. In the remaining pixels, we identify those that have the locally maximal gradient magnitudes (i.e.,
pixels whose gradient magnitudes are the greatest in 3 × 3 neighborhoods), and randomly sample K (≤ 100 in our implementation)
positions Qi = {qi,1 , qi,2 , ..., qi,K }, as the snapping candidates for
pi . We then construct a chain shaped graph G = (V, E), where V
contains all candidate points Qi . Unlike previous livewire techniques [Mortensen and Barrett 1995; Kang et al. 2005], we do not
pin the stroke ends in place, since their initial positions might not be
accurate. We thus add a virtual starting point Q0 = {q0,0 }, so that
S
V = N
0 Qi , as shown in Figure 2. A complete bipartite graph is
constructed between two neighboring node sets Qi and Qi+1 .
The weight of each edge e = qi,ki , qi+1,ki+1 ∈ E plays the
most important role for this optimization process. We first want to
encourage the input stroke to snap to nearby strong image edges that
are parallel to the stroke direction. To identify such edges, inspired
by the flow-based Difference-of-Gaussians (FDoG) filter proposed
in [Kang et al. 2007], we define a DoG filter H along the stroke
direction as:
Z X
Z Y
I (l(x, y)) f (x)dxdy, (1)
Gσm (y)
H (m, v) =
−Y
−X
where I is the grayscale version of the input image. m and v are the
midpoint and the unit direction vector of the graph edge e (see Fig−−−−−−−−−−−→
ure 3), i.e., m = (qi,ki + qi+1,ki+1 )/2, v = c · qi+1,ki+1 − qi,ki ,
where c is the normalization factor. l(x, y) = m + xu + yv, u is the
unit vector that is perpendicular to v. f (x) = Gσc (x) − ρGσs (x)
is the difference of two Gaussian functions, as shown in Figure 3.
H(m, v) thus can effectively find image edges that are parallel to
the local stroke direction v, thus better capturing the user’s intention. Compared to Kang et al.’s work [2007], the local filter direction is no longer determined by the image features, but provided
by the user stroke. Figure 3 (bottom) shows the benefit of our
direction-aware FDoG filter.
Figure 3: Top: a DoG filter (right) is applied along the edge direction v at the middle point m between two candidate vertices qi,ki
and qi+1,ki+1 (left), to find image edges that are roughly parallel to
v. Bottom: our direction-aware FDoG filter can effectively remove
image edges that are perpendicular to the input stroke. From left to
right: input image overlaid with one user stroke; result of Kang et
al.’s approach [2007]; and our result.
We then convert the filtering response H(m, v) into an edge weight
term as:
1 + tanh(H(m, v))
if H(m, v) < 0,
H̃ (m, v) =
(2)
1
otherwize.
Specifically, if the middle point m is on an image edge whose direction is close to v, H(m, v) will be a large negative value, thus
H̃ (m, v) in Equation 2 will be close to zero, leading to a small edge
weight. On the contrary, an edge that is perpendicular to v will lead
to a large edge weight, thus would not be preferred in this energy
minimization framework.
We also expect that after local snapping, the shape and position
of the input stroke would not change dramatically. This can be
achieved by adding another term in the edge weight that penalizes
the shifts of vertices. Combining these two terms, the total edge
weight we is defined as:
we =
1
(pi+1 − pi ) − qi+1,ki+1 − qi,ki
rs2
2
2
+ αH̃ (m, v) ,
(3)
where α is a balancing weight.
We use an iterative approach to search for snapping candidates.
We aim at a small number of candidates but with large variations.
Firstly, with the constructed graph G = (V, E), we compute the
best snapping position s′1 of the user stroke s by finding the shortest
path (q0,0 , q1,k1 , q2,k2 , ..., qn,kN ) from Q0 to QN , with the total
energy of a path defined as the sum of edge weights along the path.
This problem can be effectively solved using dynamic programming. Secondly, we check two conditions: a) whether the average
edge weight of s′1 is small enough (≤ 0.07 in our implementation);
b) whether s′1 is significantly different from the existing ones1 . s′1
can be added to the final set of candidates only if both conditions
are true. The first condition helps filter out snapping candidates
1 This is achieved by computing the average distance between the corresponding vertices of the new candidate and existing ones. The condition is
true only if the distance is large enough (> rs /4).
M'
φS
T
1st candidate
…
M
Semi-global Optimization
M'
φS
φS
S
T
kth candidate
Figure 4: Input strokes (opaque) and their snapping candidates
(semi-transparent). For clean visualization, not all candidates are
visualized. Strokes with single candidates are shown in black. The
size of the local neighborhood window (with radius rs ) is visualized by the gray circle in the top-left corner. Image credits: David
Dennis; scarletgreen.
with low confidence. Note that it is possible that no snapping candidates satisfy this condition, which will leave the input stroke s
not to be adjusted until the global optimization step (Section 3.3).
The second condition encourages the selected candidates to have
large variations. Thirdly, no matter whether s′1 is selected or not,
we remove all edges in s′1 from the graph. The removal of almost
identical candidates lowers the computational cost for future steps.
The above three steps are repeated until at most M candidates are
identified.
Figure 4 shows two examples of the final snapping candidate sets.
When an input stroke has a high ambiguity, we keep it at its original
position instead of taking a simple guess, as shown in the accompanying video. This conservative behavior is under the consideration
that if simple guesses are taken early on, they will inevitably introduce frequent errors, which could easily annoy the user.
3.2 Semi-global Optimization
The local optimization step yields multiple snapping candidates for
some user strokes. As discussed earlier, determining the right candidate from them is sometimes ambiguous if we only consider a single stroke. This is the major limitation of previous edge snapping
approaches. In our system we propose a semi-global optimization
method to jointly consider multiple strokes to determine their snapping positions.
While considering multiple strokes together is beneficial, we would
like to point out that optimizing too many strokes together is also
not ideal, as it brings stability and convergence problems to the system. Consider the other extreme where we always optimize all user
strokes. The system then would become uncontrollable, as any new
stroke the user adds would alter the existing results, even in faraway
regions that have nothing to do with the local area that the user is
focusing on at this moment. To avoid this problem our system only
considers spatio-temporal neighboring strokes of the current user
stoke. In other words, every new user stroke will invoke a semiglobal optimization procedure, and only those strokes that are both
spatial and temporal neighbors of the new one are considered in it.
The temporal neighboring relationship is determined by the drawing order, the most recently 3 strokes are considered as the temporal
neighbors of the new stroke. To determine if they are spatially close
to the new stroke, for each vertex on the new stroke, we find the
closest vertex on a neighboring stroke and record their distance. If
more than 30% of such distances are smaller than 1.5 · rs , we then
Local Optimization
Figure 5: Semi-global stroke optimization. Given an active stroke
set (green), we build a local mesh by connecting stroke vertices
in the local region, and select the best candidate combination that
minimizes the energy in Equation 5. The bottom right image shows
that without semi-global optimization, the local optimization alone
generates a less satisfactory result. The boundary vertices φS are
tagged by green hollow points. Image credits: John Ragai.
treat them as spatial neighbors. We call the current user stroke and
its valid spatio-temporal neighbors as an active stroke set, denoted
as S = {s1 , s2 , ..., sm }, and their candidate set C = {c(i, j)},
where c(i, j) means the jth candidate of the ith stroke si .
Our goal is to select the best candidate combination T =
{c(1, j1 ), c(2, j2 ), ..., c(m, jm )} ⊂ C which satisfies the following
three constraints: (1) each selected candidate has a small snapping
cost in local optimization; (2) their spatial layout is similar to S;
and (3) T should not introduce new conflicts with other strokes that
are not involved in the current optimization step. We thus define an
energy term to describe each of the three constraints.
We first define Eg (T ) as the sum of the snapping costs computed
in local optimization (i.e., edge weights we defined in Equation 3)
for snapping candidates in T . By minimizing this term, the first
constraint is satisfied.
To encourage layout consistency between S and T , we define a 2D
mesh M by connecting each vertex pi in S to its spatially neighboring vertices, as shown in Figure 5. Note that the resulting mesh
is not necessarily a triangular mesh. Similarly, we define a corresponding mesh M ′ , by connecting each vertex ci in T to its spatial
neighboring vertices. We then define two energy terms as:
X
Er (T ) =
η k(pi − pj ) − (ci − cj )k22 ,
eij ∈M
Eb (T ) =
X
η kpi − ci k22 ,
(4)
i∈φS
where η = 1/rs2 . Intuitively, Er (T ) corresponds to the second
constraint and measures the mesh deformation that is covered by
S. Eb (T ) measures the mesh deformation along the boundary of
S, where φS contains all boundary vertices of S, i.e., vertices that
are connected with strokes that are not involved in the current optimization step in the mesh M . Minimizing this term will encourage
the boundary vertices of S to stay at the same positions, thus avoiding introducing conflicts with other nearby strokes.
Combining the three terms together, the final energy minimization
problem is defined as:
min Eg (T ) + β(Er (T ) + Eb (T )),
T ⊂C
where β is a balancing weight.
(5)
Original Strokes
After Semi-global Optimization
After Global Optimization
Mesh Deformation
Original Mesh
Deformed Mesh after Global Optimization
Figure 6: We adjust strokes with no snapping candidate (green)
with respect to those with snapping candidates (blue) via global
optimization. Image credits: Mike Baird.
In our system, given that the size of T is limited (the active stroke
set contains at most 3 strokes in our implementation) and all energy terms can be computed efficiently, we simply use an exhaustive search to find the globally optimal solution to Equation 5. As
we will demonstrate later, the exhaustive search can be done effectively in real-time in our test application. Increasing the size
of T could potentially slow down the computation. However as
discussed earlier we intentionally limit the size of T to avoid the
convergence problem and provide a better user experience.
Figure 5 shows an example of how semi-global optimization can
simultaneously adjust nearby strokes to achieve a semanticallycompatible result (top right). In contrast, without semi-global optimization, local optimization alone cannot find the optimal snapping
positions for all strokes, leading to a less satisfactory result shown
in the bottom right corner of Figure 5.
3.3 Global Optimization
As discussed previously, some strokes might remain untouched during the semi-global optimization step, since their corresponding
snapping energy is too high and thus they have no snapping candidate. We often have such strokes in the scenario where the user
intentionally does not want to follow any image edges at all, e.g.,
when making hatching-like effects or applying decorative strokes.
However, since we adjust other strokes with snapping candidates,
there is a potential topology issue: an adjusted stroke may cross
some nearby untouched strokes that have been drawn earlier, which
is undesirable (see Figure 6). To avoid this problem, we employ
a global optimization step to adjust the positions of strokes with
no snapping candidate so that the overall stroke topology remains
unchanged, and the user’s drawing style is better preserved. In other
words, we intend to preserve the original global layout of untouched
strokes while responding to the change of strokes with snapping
candidates.
The global optimization is achieved using a 2D deformation approach that is similar to the one used in the semi-global optimization step. However, it is worth emphasizing that the semi-global optimization essentially performs selection rather than deformation.
Denote all the original strokes with snapping candidates as S and
their refined positions as T , we triangulate the image plane using
all stroke vertices, and compute a mesh deformation according to
the mapping from S to T , using the method proposed by Zhou et
al. [2005], which is originally designed for feature-preserving 3D
mesh deformation. We then use the computed deformation field to
move strokes with no snapping candidate to their adjusted positions,
as shown in Figure 6.
Input strokes
Our method
Active contours
Intelligent scissors
Figure 7: Comparing our method with previous edge snapping
methods. The top and bottom examples are sketched using a computer mouse. Image credits: Andrea Campi.
4
Experimental Results and Evaluation
We implemented our method in C++ as a real-time drawing application running on the Windows platform. A touch-screen notebook
(Aspire R7-572G with Intel(R) Core(TM) i5-4200U @1.6GHz
2.30GHz and 16GB RAM) running Windows 8.1 was used as the
testing device. Our experiments show that EZ-Sketching is able
to achieve real-time performance on this device: for a typical input
stroke, it takes 0.03 seconds for local optimization, 0.01 seconds for
semi-global optimization, and 0.5 seconds for global optimization.
Note that the global optimization is more like a post-processing step
and is not needed for every input stroke. The delay caused by stroke
refinement is thus almost unnoticeable in a real drawing session.
We have tested the system on many images that contain a wide variety of scenes and objects, as shown in Figure 9. In Figure 7, we
show some comparisons between our approach and previous edge
snapping methods, such as Active Contours [Kass et al. 1988] and
Intelligent Scissors [Mortensen and Barrett 1995]. These results
clearly show that EZ-Sketching outperforms previous methods especially in regions with high ambiguities, thanks to the bottom-up
optimization framework. It is worth mentioning that our method
performs better even on some of the single, isolated strokes (e.g.,
the bottom example in Figure 7). This is possibly due to the
direction-aware FDoG filter and the energy minimization framework employed in this step, which are quite different from previous
methods. Furthermore, the top and bottom results are sketched using a computer mouse by us, indicating that our system can tolerate
errors caused by indirect control, when drawing with a computer
mouse. Figures 1 (d) and 10 give two hatching examples. In our
implementation, for strokes with length < rs , no snapping results
are extracted for them due to the high ambiguity involved.
To demonstrate the necessity and importance of the proposed semiglobal optimization step, in Figure 8 we compare the snapping results of several examples (top two are from Figure 4) with and without semi-global optimization. The results show that if we only use
local optimization, then dense strokes that are close to each other
are often undesirably snapped to the same strong edges nearby. In
A
D
B
C
E
F
H
G
J
K
I
L
Figure 9: Images used in User Study I. Image credits: David
Saddler; gynti 46; Steve Cadman; Jeremy G.; Moyan Brenn;
DeusXFlorida; Tim Pearce; John Ragai; Herv; Kevin Gibbons; Ian
Barbour.
4.1 User Study I
Figure 8: Comparing results generated by our system with and
without using the semi-global optimization step. Left: user’s input. Middle: with local optimization only. Right: with both local
and semi-global optimization. Note that without semi-global optimization, nearby dense strokes often collapse into the same position. Image credits: scarletgreen; David Dennis; Aubrey; Andrea
Campi.
contrast, with the help of the semi-global optimization step, ambiguous strokes can snap to the correct positions and maintain their
original spatial layout properties.
We used fixed parameter setting for generating all results. We set
α = 0.1 in Equation 3, β = 0.5 in Equation 5, and σc = 1.0,
σs = 1.6, σm = 3.0, ρ = 0.99, X = 4, and Y = 7 in Equation 1. The radius of the local window introduced in Section 3.1
is set to be rs = 7.5mm. For this parameter we purposely use a
unit of physical measurement instead of pixels in order to cover the
errors produced by “Fat Fingers”, a well-known problem in touch
interfaces. Since a “fat finger” has a fixed physical size which is independent of the display resolution, defining this size using pixels
would make the system depend more on a specific display and thus
lose its generality. Defining it as a physical distance has shown to
work well on multiple types of displays in our experiments.
We conducted a formal user study to evaluate the effectiveness of
EZ-Sketching. The user study was divided into two parts. In User
Study I, a number of participants were invited to create drawings
with our system. It was designed to collect user feedback on the
usability of our system. The drawings created before and after autorefinement were then evaluated by a different group of participants
in User Study II.
Participants. We invited 12 volunteers (a1 to a12) to participate
in the study: 7 men and 5 women, from 21 to 27 years old, 10
right-handed and 2 left-handed. The first 6 participants (a1 to a6)
were with good drawing skills. They had either received professional drawing training, self-learned some drawing skills or had
been practicing drawing for more than 5 times per week. The other
6 participants (a7-a12) had little drawing experience or knowledge.
Design and Procedure. We prepared 12 images (A to L) to be
traced by the participants, as listed in Figure 9. The image set covers different types of objects or scenes, including man-made objects, architectures, plants, animals, human portraits etc. Each of
the participants was assigned to trace 6 of the 12 images twice,
once with naı̈ve image tracing (i.e., without auto-refinement) and
the other with EZ-Sketching. All the drawings were sketched by
touching fingers on images displayed in a fixed window size of 5inch (diagonal), which was used to simulate a mobile touch screen
device.
Each participant was first given a short tutorial of the drawing interface and its two modes: with and without auto-refinement, followed by a short practice on an image that was out of the 6 assigned images not included in the formal test dataset. Next, each
participant sketched the assigned 6 images with one of the drawing
modes: with and without auto-refinement. The same set of images
were then traced with the other mode. Finally, the participants were
asked to complete a questionnaire.
We applied a Latin square design on image assignment and tracing order: 1) each participant was assigned to a different image
set; 2) every image was drawn by 6 different participants: 3 with
good drawing skills, 3 with little drawing experience; 3) half of the
participants with and without good drawing skills started with autorefinement first; 4) ordering and first-order carryover effects were
balanced.
Results. After completing the sketching tasks, in the questionnaire
5
with refinement
without refinement
mean of ratings
4
3
2
1
ease of use
accuracy
aesthetics
Figure 11: Average ratings of the drawing interface with and without auto-refinement. Error bars are standard errors.
Figure 10: Our system also supports hatching effects. Top: raw
strokes (left) and corresponding results (right). Bottom: close-ups.
Image credits: tommerton2010.
each participant was asked to rate the drawing interface, with and
without auto-refinement, in terms of ease of use, accuracy, and aesthetics of the results on a discrete scale from 1 (poorest) to 5 (best).
Figure 11 plots the average rating scores. In terms of ease-ofuse, only one participant (a10) preferred the interface without autorefinement and another participant (a4) had no preference. The rest
of the participants gave a higher rating to “with auto-refinement”.
Paired t-test confirmed that the ratings with auto-refinement were
significantly higher (t = 2.86, p < 0.02). Similar conclusions
were applicable to the evaluation in terms of accuracy, the question of “how accurately your drawings capture the image structure”.
Only two participants (a6 and a10) felt that the drawings made without auto-refinement were aesthetically more pleasing. However, no
statistically significant difference was found in terms of aesthetics
(t = 1.54, p = 0.15).
During the study our system also recorded the number of undoes
and the time for drawing a stroke (from touch-down to touch-up).
Our hypothesis was that EZ-Sketching could reduce the drawing
time and the number of undoes. Wilcoxon rank-sum test found a
statistically significant difference (z = −2.249, p = 0.024) in the
drawing time per stroke between EZ-Sketching (0.431s) and naı̈ve
tracing (0.511s). However, surprisingly, the average number of undoes for each drawing with auto-refinement was slightly higher,
though the difference was not statistically significant (z = 0.87,
p = 0.38). After looking into the drawings created by the participants, we found that since each participant was asked to trace
an image twice, the two drawings often consisted of two different
sets of lines. In this case, the comparison between with and without
auto-refinement in the drawing time or the number of undoes would
not be conclusive.
4.2 User Study II
Participants. In the second part of the user study, we invited a
different group of participants to evaluate the drawings before and
after auto-refinement, i.e., between the raw strokes as input to our
algorithm and the refined strokes by our algorithm. Some of such
drawing pairs are shown in Figure 12. In total 48 individuals participated, with the ages from 20 to 50 years old, and 33 men and 15
women. 15 of them reported that they had good drawing skills.
Tasks. Our initial attempt was to evaluate the results created by
naı̈ve tracing and those created by our system. However we found
this was quite a difficult task, because these two types of drawings
usually contained completely different sets of strokes. We thus only
asked the participants to evaluate the difference between a pair of
drawings before and after auto-refinement, from User Study I. For
such pairs of drawings, there naturally exists one-to-one stroke correspondence, thus their difference is only produced by the stroke
refinement technique.
An online questionnaire was designed. Each pair of drawings before and after auto-refinement were placed at the same position,
periodically switching between each of them for every 1.5 seconds.
Each source image and the corresponding overlaid pair of drawings
were placed side-by-side. Each participant was asked to pick which
drawing better resembled the input image and which was aesthetically more pleasing.
To avoid fatigue, each participant was asked to evaluate only 12
out of the 72 pairs created in User Study I. The image pairs were
randomly selected subject to the following constraints: 1) The 12
drawing pairs rated by individual participants were different; 2) the
creators of the 12 drawing pairs were different. The presentation
order of drawings was randomized.
Results. Each of the 72 drawing pairs was evaluated by at least
4 participants. The overall opinion favored the drawings refined by
our system in both of the criteria, as summarized in Table 1. Specifically, 69.97% of votes were given to the auto-refinement results as
more accurate. Paired t-test confirmed a statistically significant difference (t = 10.44, p < 0.01). 66.84% of the participants agreed
that the auto-refinement results were aesthetically more pleasing
(t = 8.58, p < 0.01). Around 33% of the original sketches were
still considered to be more aesthetic. We speculate that this is because some sketches may appear to be more artistically appealing,
or be considered as artistic exaggeration when they do not exactly
follow the original image features. For example, people may think
the curved building outlines in the forth column of Figure 12 to
be artistically more appealing even if they obviously do not obey
the image. Our system is not designed to capture such high-level
aesthetic properties of sketches.
We further analyzed the opinions on the drawings created by the
two groups of participants, with (a1-a6) and without (a7-a12) good
drawing skill. As shown in Table 1, there were higher percentages
of refined drawings from the participants with little drawing skill,
Table 1: Percentage of refined drawings voted by the two groups of
participants, with good drawing skill (a1-a6) and with little drawing skill (a7-a12).
a1 - a6
a7 - a12
overall
accuracy
64.93%
75.00%
69.97%
aesthetics
60.41%
73.26%
66.84%
Figure 12: Sample sketches created in User Study I. Top: original user sketches. Bottom: refined by our method. The ones in the odd columns
are from users with good drawing skills.
voted as both more accurate and aesthetically more pleasing. Independent t-test confirmed that both differences were significant
(t = 2.65, p < 0.01 and t = 3.30, p < 0.01). We speculate that it
is because users with less drawing skill produced more errors on the
strokes, which gave more room for our method to improve, leading
to more obvious difference before and after refinement. It suggests
that our method would be more useful to users with less drawing
skill. Nevertheless, for both participant groups, either with (a1-a6)
or without (a7-a12) good drawing skill, the refined drawings received significantly more votes (p < 0.01) in both accuracy and
aesthetics, suggesting that our method would be helpful to users at
different skill levels.
percentages of refined drawings being voted
For each individual image (Figure 13), paired t-test confirmed that
there were significantly more votes given to the refined drawings
as more accurate (p < 0.05), except for images B (t = 0.57, p >
0.56), K (t = 1.46, p > 0.15) and L (t = 1.46, p > 0.15). Paired
t-test also confirmed there were significantly more votes given to
the refined drawings as aesthetically more pleasing for images A
(t = 2.77, p < 0.01), C (t = 3.54, p < 0.01), D (t = 5.49,
p < 0.01), E (t = 2.77, p < 0.01), G (t = 4.42, p < 0.01) and
H (t = 3.96, p < 0.01). However, no significant preference was
found in terms of aesthetics for the other images (B, F, I, J, K, L).
100
accuracy
aesthetics
90
80
70
60
4.3 Limitations
To better understand the limitations of the system, we examine the
examples that did not perform well in the user study. In particular, our system only achieved minimal success on test image L. As
shown in Figure 14 (left), this is largely because the image contains a lot of weak edges, and the object boundary is quite weak
compared with the background clutter. Thus, our system may snap
some boundary strokes to wrong edges in this difficult case. In contrast, human perception can easily tell the boundary by using higher
level semantics. This suggests that our system could potentially incorporate high-level object recognition methods for more semantic
edge snapping.
Our system is designed to be conservative to preserve the original
drawing style. Hence if the original sketch contains large errors,
our refined result will not be accurate as well. Such an example is
shown in Figure 14 (right). However this might be a desired feature in practice, as the user still gets correct feedback for improving
his/her drawing skills.
5
50
40
30
20
10
0
and rely on our tool for auto-refinement. In other words, the input
strokes for auto-refinement might be less accurate than they would
be without auto-refinement (similar to spelling correction systems).
Therefore, User Study II might not fully capture the advantage of
our tool over naı̈ve tracing. A more proper, though expensive, evaluation is to recruit two independent groups, with a bigger pool of
participants, to assess auto-refined and naı̈ve image tracing independently., as similarly done in [Limpaecher et al. 2013].
A
B
C
D
E
F
G
H
I
J
K
L
source images
Figure 13: Participates’ opinion on accuracy and aesthetics of individual drawings before and after being processed by our method.
The ones that the votes to corrected drawings were not significantly
higher than the uncorrected drawings (p > 0.05) are hatched in
white.
Discussions. Knowing the ability of our tool, participants could
have a tendency to provide less accurate strokes with less effort
Conclusion and Future Work
We presented EZ-Sketching, a new drawing interface with the
power of automatic stroke refinement, using a novel three-level optimization framework. Our system automatically improves the accuracy of the user’s drawing while maintaining the original style.
We believe that our technique will encourage novice users with
limited drawing skills to create drawings under an under-painting
paradigm, especially on touch devices.
As future work, we plan to port our implementation to popular mobile platforms such Android and iOS and release the app in the
app stores, so that EZ-Sketching can reach a larger group of users,
enabling a larger-scale evaluation. Our idea of modeling the interaction of multiple strokes might inspire other applications such as
scribble-based image segmentation. How to provide snapping suggestions in the scenario where multiple reference images are available [Lee et al. 2011], is also an interesting direction to explore.
I GARASHI , T., M ATSUOKA , S., K AWACHIYA , S., AND TANAKA ,
H. 1997. Interactive beautification: a technique for rapid geometric design. In UIST ’97, 105–114.
K ANG , H. W., H E , W., C HUI , C. K., AND C HAKRABORTY, U. K.
2005. Interactive sketch generation. The Visual Computer 21, 810, 821–830.
K ANG , H., L EE , S., AND C HUI , C. K. 2007. Coherent line drawing. In NPAR 2007, 43–50.
K ASS , M., W ITKIN , A., AND T ERZOPOULOS , D. 1988. Snakes:
Active contour models. International Journal of Computer Vision 1, 4, 321–331.
Figure 14: Limitations. Left: our method has difficulties to deal
with weak edges (highlighted in green) and cluttered backgrounds
(highlighted in red). The middle column shows the FDoG filtering result. Right: our method is designed to preserve the original
drawing style, and thus cannot correct large errors and dramatically improve aesthetics (top is input and bottom is refined). Image
credits: Ian Barbour.
Acknowledgements
We thank the reviewers for their constructive comments, the user
study participants for their time and the Flickr users for making
their images available through creative common licenses. This
work was substantially supported by grants from the RGC of HKSAR (CityU 113513) and The City University of Hong Kong
(7002925).
L AVIOLE , J., AND H ACHET, M. 2012. PapARt : interactive 3D
graphics and multi-touch augmented paper for artistic creation.
In 3DUI.
L EE , Y. J., Z ITNICK , C. L., AND C OHEN , M. F. 2011. Shadowdraw: real-time user guidance for freehand drawing. ACM
Trans. Graph. 30, 27:1–27:10.
L EVI , Z., AND G OTSMAN , C. 2013. D-snake: Image registration by as-similar-as-possible template deformation. Visualization and Computer Graphics, IEEE Transactions on 19, 2, 331–
343.
L IMPAECHER , A., F ELTMAN , N., T REUILLE , A., AND C OHEN ,
M. 2013. Real-time drawing assistance through crowdsourcing.
ACM Trans. Graph. 32, 4, 54:1–54:8.
M ORTENSEN , E. N., AND BARRETT, W. A. 1995. Intelligent
scissors for image composition. In SIGGRAPH ’95, 191–198.
References
N ORIS , G., H ORNUNG , A., S UMNER , R. W., S IMMONS , M.,
AND G ROSS , M. 2013. Topology-driven vectorization of clean
line drawings. ACM Trans. Graph 32, 1, 4.
A RBELAEZ , P., M AIRE , M., F OWLKES , C., AND M ALIK , J.
2011. Contour detection and hierarchical image segmentation.
Pattern Analysis and Machine Intelligence, IEEE Transactions
on 33, 5, 898–916.
O RBAY, G., AND K ARA , L. B. 2011. Beautification of design
sketches using trainable stroke clustering and curve fitting. Visualization and Computer Graphics, IEEE Transactions on 17, 5,
694–708.
BAO , B., AND F U , H. 2012. Vectorizing line drawings with nearconstant line width. In ICIP 2012.
R IVERS , A., A DAMS , A., AND D URAND , F. 2012. Sculpting by
numbers. ACM Trans. Graph. 31, 6, 157:1–157:7.
BARAN , I., L EHTINEN , J., AND P OPOVI Ć , J. 2010. Sketching
clothoid splines using shortest paths. In Computer Graphics Forum, vol. 29, 655–664.
S ÝKORA , D., D INGLIANA , J., AND C OLLINS , S. 2009. Lazybrush: Flexible painting tool for hand-drawn cartoons. Computer
Graphics Forum 28, 2, 599–608.
BARTOLO , A., C AMILLERI , K. P., FABRI , S. G., B ORG , J. C.,
AND FARRUGIA , P. J. 2007. Scribbles to vectors: preparation
of scribble drawings for cad interpretation. In Proceedings of
the 4th Eurographics workshop on Sketch-based interfaces and
modeling, 123–130.
D IXON , D., P RASAD , M., AND H AMMOND , T. 2010. iCanDraw:
using sketch recognition and corrective feedback to assist a user
in drawing human faces. In CHI, 897–906.
F LAGG , M., AND R EHG , J. M. 2006. Projector-guided painting.
In UIST ’06, 235–244.
G INGOLD , Y., VOUGA , E., G RINSPUN , E., AND H IRSH , H. 2012.
Diamonds from the rough: Improving drawing, painting, and
singing via crowdsourcing. In Proceedings of the AAAI Workshop on Human Computation (HCOMP).
I ARUSSI , E., B OUSSEAU , A., AND T SANDILAS , T. 2013. The
drawing assistant: Automated drawing guidance and feedback
from photographs. In UIST ’13, 183–192.
X U , K., C HEN , K., F U , H., S UN , W.-L., AND H U , S.-M. 2013.
Sketch2scene: Sketch-based co-retrieval and co-placement of 3d
models. ACM Trans. Graph 32, 4, Article No. 123.
YANG , S. L., WANG , J., AND S HAPIRO , L. 2013. Supervised
semantic gradient extraction using linear optimization. In CVPR
2013.
Z HOU , K., H UANG , J., S NYDER , J., L IU , X., BAO , H., G UO ,
B., AND S HUM , H.-Y. 2005. Large mesh deformation using the
volumetric graph Laplacian. ACM Trans. Graph. 24, 3, 496–503.
Z ITNICK , C. L. 2013. Handwriting beautification using token
means. ACM Trans. Graph. 32, 4, 53:1–53:8.