08357884
08357884
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier xx.xxxx/ACCESS.2018.DOI
ABSTRACT Conventional stitching techniques for images and videos are based on smooth warping
models, and therefore, they often fail to work on multi-view images and videos with large parallax captured
by cameras with wide baselines. In this paper, we propose a novel video stitching algorithm for such
challenging multi-view videos. We estimate the parameters of ground plane homography, fundamental
matrix, and vertical vanishing points reliably, using both of the appearance and activity based feature
matches validated by geometric constraints. We alleviate the parallax artifacts in stitching by adaptively
warping the off-plane pixels into geometrically accurate matching positions through their ground plane
pixels based on the epipolar geometry. We also exploit the inter-view and inter-frame correspondence
matching information together to estimate the ground plane pixels reliably, which are then refined by energy
minimization. Experimental results show that the proposed algorithm provides geometrically accurate
stitching results of multi-view videos with large parallax and outperforms the state-of-the-art stitching
methods qualitatively and quantitatively.
INDEX TERMS Multi-view videos, video stitching, image stitching, large parallax, adaptive pixel warping,
epipolar geometry.
VOLUME x, 2018 1
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
FIGURE 1: Stitching images with large parallax. (a) A target image and (b) a reference image. The resulting stitched images by using (c) a
homography based warping scheme, (d) APAP [13], and (e) the proposed parallax-adaptive stitching, respectively.
the relative orders of control points are changed across mul- putational complexity of video stitching [28]. Jiang et al.
tiple images due to large parallax [33]. extended CPW of local alignment and image composition
The stitched images usually exhibit perspective distortions to video stitching by applying the seam cutting scheme to
in non-overlapping regions among multiple images where spatiotemporal domain [29].
no valid feature matches are obtained. To alleviate the per-
spective distortions, shape-preserving warps were proposed B. STATIC MULTI-CAMERA BASED TRACKING
which extrapolate the warping models to non-overlapping Multi-camera based people tracking techniques detect walk-
regions using similarity transformation and/or homography ing pedestrians on a ground plane from multiple videos,
linearization [15]–[18]. Chang et al. applied a homography to which are captured by different static cameras set toward a
the overlapping region of images and similarity transforma- common ground plane and positioned with relatively wide
tions to the non-overlapping regions, respectively [15]. Lin et baselines. Specifically, moving foreground objects are first
al. proposed a homography linearization method to combine detected by background subtraction methods, and then the
homography and similarity transformations smoothly [16]. elongated shapes of detected people are represented by
Chen et al. improved the shape-preserving warp by accu- principal axes [24] which are used for people tracking in
rately estimating the scale and rotation of similarity trans- addition to the ground plane homography. To localize each
formation [17]. Li et al. proposed quasi-homography warps person for robust tracking, Khan et al. computed multiple
which linearly extrapolate the horizontal component of ho- homographies associated with parallel planes to the ground
mography [18]. The shape-preserving warps provide visually plane using vanishing points [25]. In addition to homography
plausible stitching results, but do not always produce geo- and vanishing points, fundamental matrix was also used to
metrically correct results. reliably find correspondence matching for the top points of
Attempts have been also made to align only a certain people [26].
region of input images and hide the artifacts of mismatched
regions by applying seam-based composition methods. Gao III. PARALLAX-ADAPTIVE PIXEL WARPING MODEL
et al. obtained multiple homographies by taking the groups In many practical applications of multi-view videos such as
of inlier feature matches in order, and selected the best ho- surveillance and sports, static multiple cameras are located
mography that yields a minimum seam cost [19]. Zhang et al. with wide baselines toward a target real-world scene which
clustered closely located feature points together and found an yields severely different camera parameters, e.g., rotation,
optimal local homography associated with a minimum seam translation, and zoom factor. Also, in a typical video se-
cutting error to align a local image region [20]. They also quence, the background is composed of a ground plane
applied content-preserving warping (CPW) [34] to further and optionally a far distant region orthogonal to the ground
refine the local alignment. Lin et al. generated multiple local plane, and moreover, people moving on the ground plane at
homographies using a superpixel-based grouping scheme, different distances from the cameras are captured as multiple
and further refined each homography to select the best one foreground objects. Figs. 1(a) and (b) show two frames of
by using energy minimization [21]. They also designed an the ‘Soccer’ sequence captured by two cameras with severely
energy function to encourage the warp undergoes similarity different positions and viewing directions from each other,
transformation and to preserve the structures like curves and where large parallax is observed especially in the vicinity of
lines after warping. Note that these techniques register one the foreground objects. For example, the players denoted by
local region only and thus inevitably cause geometrically red boxes in Fig. 1(a) appear in a different order in Fig. 1(b).
inaccurate stitching results. In addition, the players denoted by yellow boxes appear in
On the other hand, the previous video stitching algorithms only one view of Fig. 1(a) not in Fig. 1(b).
simply apply the existing image stitching techniques to stitch Such large parallax makes the multi-view video stitching
the video frames at each time instance, respectively [28]. quite a challenging problem, and the conventional stitching
Also, they extend the image stitching techniques straightfor- techniques often fail to provide faithful results. Fig. 1(c)
wardly to video stitching for the purposes of improving the shows the stitched image by warping a target frame in
computation speed or reducing the flickering artifacts. El- Fig. 1(a) to a reference frame in Fig. 1(b) according to the
Saban et al. computed SIFT descriptors for selected frames homography. Since the homography-based warping assumes
only and tracked the feature points to reduce the com- a planar scene structure, only the ground plane is accurately
VOLUME x, 2018 3
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
(k)
F̃spatial . For B, we test only the first constraint and apply the
multi-structure guided sampling (MULTI-GS) [40] to obtain
a refined set B̃. Fig. 6 shows that the proposed matching
refinement for MVLP removes most of the spurious matches
successfully both on the foreground objects and the back-
ground. Finally, we estimate the fundamental matrix F by
applying RANSAC to the appearance-based feature matches
(k)
of F̃spatial ’s and B̃ as well as the activity-based matches of
Bground together. Note that, due to computational complexity,
(k)
we empirically collect 1000 feature matches from F̃spatial ’s
associated with randomly selected frames.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
FIGURE 9: Stitching results of multiple foreground objects using the proposed ground plane pixel estimation methods. (a) Target frames and
(b) reference frames. The stitched images by using (c) RE, (d) SME+RE, (e) SME+TME+RE without optimization, and (f) SME+TME+RE with
optimization, respectively. From top to bottom, “Lawn,” “Street,” and “Garden” sequences.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
FIGURE 11: Video stitching results of the proposed algorithm. For each sequence, pairs of target and reference frames (left) and the stitched
images (right) are shown.
10 VOLUME x, 2018
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
29857th frames. The proposed algorithm warps this object locations without any overlap on the stitched domain, since
naturally on the non-overlapped area in the stitched images. the conventional methods extract dominant features from
In the “Trail” sequence, the foreground object approaches the distant background regions causing the misalignment
to the camera yielding severely changing scene depths, but artifacts on the ground planes and the foreground objects.
the proposed algorithm aligns this object correctly at various Specifically, Homography warps all the pixels in a target
scales. On the other hand, the proposed algorithm yields frame by global transformation derived from a dominant
artifacts on some exceptional situations. In the “Badminton” planar scene structure, and thus it mismatches either the
sequence, the person marked with a red circle is jumping and ground plane or a distant background region. CPW adap-
never touches the ground plane at the 27767th frame. In such tively refines the initial homography according to feature
a case, no valid inter-view feature matches are obtained on matches, and reduces the parallax artifacts on the foreground
this region due to the geometric constraint in Section IV-B, objects compared with that of Homography, as shown in the
and thus RE yields the misalignment artifact. In the “Office” “Tennis,” “Office” and “Street” sequences. SPHP adopts the
sequence, we see some artifacts near the right person since a similarity transformation to reduce the perspective distortion
moving car behind the cameras is reflected on the background of the non-overlapping area, and thus it aligns the foreground
windows. The “Soccer” is quite a challenging sequence objects on the non-overlapping areas well in the “Square”
which includes various fast moving players, where multiple sequence as marked with a red circle. However, at the same
people occlude one another at the 3800th and 5038th frames. time, SPHP distorts the line structure on the ground plane
In such cases, SIFT provides insufficient correct inter-view to curves as marked with green ellipses in the “Lawn” and
matches or even no correct match at all, resulting in the “Square” sequences. APAP estimates locally adaptive warps
stitching artifacts indicated by red circles. and reduces the spatial deviation of a same foreground object
in the stitched domain compared with that of CPW, as shown
C. COMPARISON WITH CONVENTIONAL METHODS in the “School” sequence, however, APAP results in unnat-
We compare the performance of the proposed algorithm with ural distortions in the “Badminton,” “Trail,” and “School”
that of the four conventional methods including the state-of- sequences as marked with green ellipses.
the-art image stitching techniques: Homography, CPW [34], On the contrary, in all the frames, the proposed algorithm
SPHP [15] and APAP [13]. Note that CPW is used as an alleviates the parallax artifacts of video stitching success-
alignment model for stitching methods [20], [29]. SPHP is fully by adaptively aligning the multiple foreground objects
a shape-preserving warping method which can be compared and background simultaneously. It also performs geomet-
to evaluate the naturalness of warping on non-overlapping rically accurate warping on the non-overlapping areas as
regions. APAP is one of the most flexible warping methods well, as shown in the “Badminton,” “Square,” and “Soccer”
which directly estimates multiple homographies for local sequences. Moreover, the proposed algorithm correctly de-
image regions. However, we do not compare the seam-based termines the existence of distant background regions in all
techniques [19]–[21], since they just hide the misalignment 12 test sequences. Thus both of the ground plane and the
artifacts using seam-cutting based composition. We apply the distant background region are correctly aligned as shown in
compared image stitching techniques to the frames at each the “Badminton,” “Office,” and “School” sequences. In the
time instance, respectively. We implement Homography and “Soccer” sequence, even some ghost artifacts are observed
CPW. The parameters for warp in CPW are set as [29]. We due to significant amount of occlusion as marked by a red
obtain the stitching results of SPHP and APAP using the circle, the proposed algorithm aligns most people accurately
source codes provided by the authors’ webpages [43], [44]. In while the compared methods fail to work on this challenging
our experiment, MULTI-GS [40] used in [13] yields a better case. Also, the umpire chair and the net in the “Tennis”
performance of outlier removal than RANSAC, and thus we sequence and the net and the light lamp in the “Badminton”
also apply MULTI-GS to remove outlier matches of SIFT in sequence are static objects over a whole video sequence
Homography, CPW, and SPHP as well. which are not detected as moving foreground objects, and
Fig. 12 compares the stitching results on selected frames therefore the proposed algorithm cannot align them correctly.
of 12 test video sequences. All the conventional methods However, all the compared methods also fail to align these
including the proposed algorithm achieve good stitching re- objects as marked with yellow ellipses. More comparative
sults on the “Fountain” sequence which yields the smallest results of video stitching are provided in the supplementary
parallax angle of 1.9◦ . However, for the other sequences video.
of MVLP, the conventional methods fail to work to align We also quantitatively compare the performance of the
multiple foreground objects and background simultaneously. proposed algorithm with that of the conventional methods us-
For example, in the “Square” and “Office” sequences, the ing manually obtained ground truth correspondence matches
feet of multiple people are well aligned on the ground planes, on the foreground objects and the background together. We
but the mismatch artifact gets worse toward the heads, since use the same ground truth matches on the foreground objects
the ground plane warping is dominant in the conventional as explained in Sec. VI-A. We generate ground truth matches
methods. On the other hand, in the “Stadium,” “Soccer,” and on the background only once for each sequence using the
“Garden” sequences, a same person appears twice at different background image. We first consider multiple large planar
VOLUME x, 2018 11
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
FIGURE 12: Comparison of video stitching results of the proposed algorithm and the four existing methods: Homography, CPW [34], SPHP [15],
and APAP [13]. From top to bottom, “Fountain,” “Tennis,” “Lawn,” “Badminton,” “Square,” “Office,” “Trail,” “Stadium,” “Soccer,” “Street,” “School,”
and “Garden” sequences.
areas in the background, and compute an optimal homogra- truth matches on the background image are added to each of
phy for each planar area by using manually obtained feature the 100 frames which are selected for finding ground truth
matches. Then we select regularly distributed query pixels matches on the foreground objects, where we exclude the
on the background image of a target view, and find the background query pixels occluded by the foreground objects.
ground truth matching pixels by warping the query pixels Consequently, on average, we have 724 ground truth matches
employing the multiple homographies selectively. For the on the background for each of the 100 selected frames over
query pixels on small and/or non-planar areas, we manually 12 test sequences.
obtain the ground truth matching pixels. The resulting ground
Fig. 13 presents the RMSE between the ground truth
12 VOLUME x, 2018
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
corresponding pixels and the warped pixels on the overlapped matching [38]. PE includes the homography estimation with
regions of the target and reference frames. We see that the activity-based correspondence matching computation, the
conventional methods tend to yield large RMSEs on test se- fundamental matrix estimation, and the estimation of vertical
quences with large parallax angles. For example, the RMSEs vanishing points. ST includes the SIFT matching computa-
of all the stitching methods are below 2 pixels on the “Foun- tion, ground pixel estimation, warping, and blending. Note
tain” sequence which exhibits the smallest parallax angle that PP and PE are performed once over the entire frames for
of 1.9◦ . However, on the challenging sequences of MVLP each video sequence, and thus yield relatively short execution
such as “Soccer” and “School,” the conventional methods times for each frame. However, ST in the proposed algorithm
yield significantly larger RMSEs compared with that of the consumes a major portion of the execution time to compute
other sequences. On the other hand, the proposed algorithm hole pixels in the warped target frame using valid warped
always achieves smaller RMSEs than that of the conventional pixels, which takes 33.8 seconds per frame on average. Note
methods on all the test sequences, and yields a much smaller that “Fountain,” “Lawn,” and “Square” sequences exhibit
average error of 5.64 pixels while Homography, CPW, SPHP, relatively short execution times of ST, since they do not have
and APAP result in the average errors of 35.37, 34.91, 32.05, distant background regions.
and 34.86 pixels, respectively.
VII. CONCLUSIONS
D. EXECUTION TIME COMPARISON
We proposed a novel video stitching algorithm to achieve
geometrically accurate alignment of MVLP. We warped the
Table 2 compares the execution times of the conventional
multiple foreground objects, distant background, and ground
methods and the proposed algorithm measured on a PC
plane adaptively based on the epipolar geometry, where an
with 3.4 GHz AMD Ryzen 7 1700X CPU and 32 GB
off-plane pixel in a target view is warped to a reference
RAM. Note that this may not be a fair comparison since
view through its GPP. We also estimated optimal GPPs for
the optimization level of implementation is different for the
the foreground objects by using the spatiotemporal feature
compared methods. The execution times of the conventional
matches, and for the background by using the spatial feature
methods and the stitching (ST) in the proposed algorithm
matches, respectively. The initially obtained GPPs are refined
are averaged on 100 frames for each sequence, and that of
by energy minimization. Experimental results demonstrated
the preprocessing (PP) and the parameter estimation (PE)
that the proposed algorithm aligns various MVLP accurately,
in the proposed algorithm are averaged on the entire frames
and yields a significantly better performance of parallax
for each sequence. Homography is the fastest method which
artifact reduction qualitatively and quantitatively compared
takes 0.57 seconds per each frame on average. CPW, SPHP,
with the state-of-the-art image stitching techniques. Our fu-
and APAP require relatively longer execution times, since
ture research topics include the warping of static objects
these methods use different warping models for each cell or
with large parallax and the parallax-free stitching for MVLP
mesh grid in an image. Note that CPW is a non-parametric
captured by moving cameras.
warping scheme and takes the longest execution time of 15.4
seconds per frame among the four conventional methods. The
REFERENCES
proposed algorithm is divided into three steps to evaluate
[1] W. Liu, M. Zhang, Z. Luo, and Y. Cai, “An ensemble deep learning method
the execution times. PP includes the background subtraction for vehicle type classification on visual traffic surveillance sensors,” IEEE
and the activity extraction for activity-based correspondence Access, vol. 5, pp. 24 417–24 425, 2017.
VOLUME x, 2018 13
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
[2] R. Panda and A. K. Roy-Chowdhury, “Multi-view surveillance video [28] M. El-Saban, M. Izz, and A. Kaheel, “Fast stitching of videos captured
summarization via joint embedding and sparse optimization,” IEEE Trans. from freely moving devices by exploiting temporal redundancy,” in Proc.
Multimedia, vol. 19, no. 9, pp. 2010–2021, May 2017. IEEE Int’l Conf. Image Process., 2010.
[3] M. Wang, B. Cheng, and C. Yuen, “Joint coding-transmission optimization [29] W. Jiang and J. Gu, “Video stitching with spatial-temporal content-
for a video surveillance system with multiple cameras,” IEEE Trans. preserving warping,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Multimedia, Sep. 2017. Workshops, 2015.
[4] K. Bilal, A. Erbad, and M. Hefeeda, “Crowdsourced multi-view live video [30] K.-Y. Lee and J.-Y. Sim, “Robust video stitching using adaptive pixel
streaming using cloud computing,” IEEE Access, vol. 5, pp. 12 635– transfer,” in Proc. IEEE Int’l Conf. Image Process., 2015.
12 647, 2017. [31] R. Hartley and A. Zisserman, Multiple view geometry in computer vision.
[5] S. A. Pettersen, D. Johansen, H. Johansen, V. Berg-Johansen, V. R. Cambridge university press, 2003.
Gaddam, A. Mortensen, R. Langseth, C. Griwodz, H. K. Stensland, and [32] T. Igarashi, T. Moscovich, and J. F. Hughes, “As-rigid-as-possible shape
P. Halvorsen, “Soccer video and player position dataset,” in Proc. ACM manipulation,” ACM Trans. Graphics, vol. 24, no. 3, pp. 1134–1141, 2005.
Multimedia Syst. ACM, 2014, pp. 18–23. [33] S. Schaefer, T. McPhail, and J. Warren, “Image deformation using moving
[6] Q. Yao, H. Sankoh, K. Nonaka, and S. Naito, “Automatic camera self- least squares,” ACM Trans. Graphics, vol. 25, no. 3, pp. 533–540, 2006.
calibration for immersive navigation of free viewpoint sports video,” in [34] F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content-preserving warps
Proc. Int’l Conf. Multimedia Signal Process., Sep. 2016, pp. 1–6. for 3d video stabilization,” ACM Trans. Graphics, vol. 28, no. 3, p. 44,
[7] B. Kwon, J. Kim, K. Lee, Y. K. Lee, S. Park, and S. Lee, “Implementation 2009.
of a virtual training simulator based on 360° multi-view human action [35] J.-M. Morel and G. Yu, “Asift: A new framework for fully affine invariant
recognition,” IEEE Access, vol. 5, pp. 12 496–12 511, 2017. image comparison,” SIAM J. Imaging Sciences, vol. 2, no. 2, pp. 438–469,
[8] B. Macchiavello, C. Dorea, E. M. Hung, G. Cheung, and W. T. Tan, “Loss- 2009.
resilient coding of texture and depth for free-viewpoint video conferenc- [36] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm
ing,” IEEE Trans. Multimedia, vol. 16, no. 3, pp. 711–725, Apr. 2014. for model fitting with applications to image analysis and automated
[9] L. Toni, G. Cheung, and P. Frossard, “In-network view synthesis for cartography,” Comm. ACM, vol. 24, no. 6, pp. 381–395, 1981.
interactive multiview video systems,” IEEE Trans. Multimedia, vol. 18, [37] E. Ermis, P. Clarot, P. Jodoin, and V. Saligrama, “Activity based matching
no. 5, pp. 852–864, May 2016. in distributed camera networks,” IEEE Trans. Image Process., vol. 19,
no. 10, pp. 2595–2613, Oct. 2010.
[10] R. Szeliski, “Image alignment and stitching: A tutorial,” Foundations and
[38] S.-Y. Lee, J.-Y. Sim, C.-S. Kim, and S.-U. Lee, “Correspondence matching
Trends® in Computer Graphics and Vision, vol. 2, no. 1, pp. 1–104, 2006.
of multi-view video sequences using mutual information based similarity
[11] J. Gao, S. J. Kim, and M. S. Brown, “Constructing image panoramas using
measure,” IEEE Trans. Multimedia, vol. 15, no. 8, pp. 1719–1731, Dec.
dual-homography warping,” in Proc. IEEE Conf. Comput. Vis. Pattern
2013.
Recognit., 2011.
[39] J. M. McHugh, J. Konrad, V. Saligrama, and P.-M. Jodoin, “Foreground-
[12] W.-Y. Lin, S. Liu, Y. Matsushita, T.-T. Ng, and L.-F. Cheong, “Smoothly
adaptive background subtraction,” IEEE Signal Process. Lett., vol. 16,
varying affine stitching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
no. 5, pp. 390–393, May 2009.
nit., 2011.
[40] T.-J. Chin, J. Yu, and D. Suter, “Accelerated hypothesis generation for
[13] J. Zaragoza, T.-J. Chin, Q.-H. Tran, M. S. Brown, and D. Suter, “As- multistructure data via preference analysis,” IEEE Trans. Pattern Anal.
projective-as-possible image stitching with moving dlt,” IEEE Trans. Mach. Intell., vol. 34, no. 4, pp. 625–638, Apr. 2012.
Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1285–1298, Jul. 2014. [41] F. Lv, T. Zhao, and R. Nevatia, “Camera calibration from video of a
[14] G. Zhang, Y. He, W. Chen, J. Jia, and H. Bao, “Multi-viewpoint panorama walking human,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 9,
construction with wide-baseline images,” IEEE Trans. Image Process., pp. 1513–1518, Sep. 2006.
vol. 25, no. 7, pp. 3099–3111, Jul. 2016. [42] E. Tola, V. Lepetit, and P. Fua, “Daisy: An efficient dense descriptor
[15] C.-H. Chang, Y. Sato, and Y.-Y. Chuang, “Shape-preserving half- applied to wide-baseline stereo,” IEEE Trans. Pattern Anal. Mach. Intell.,
projective warps for image stitching,” in Proc. IEEE Conf. Comput. Vis. vol. 32, no. 5, pp. 815–830, May 2010.
Pattern Recognit., 2014. [43] [Online]. Available: https://www.cmlab.csie.ntu.edu.tw/~frank/
[16] C.-C. Lin, S. U. Pankanti, K. N. Ramamurthy, and A. Y. Aravkin, “Adap- [44] [Online]. Available: http://cs.adelaide.edu.au/~tjchin/apap/
tive as-natural-as-possible image stitching,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., 2015.
[17] Y.-S. Chen and Y.-Y. Chuang, “Natural image stitching with the global
similarity prior,” in Proc. Eur. Conf. Comput. Vis., 2016.
[18] N. Li, Y. Xu, and C. Wang, “Quasi-homography warps in image stitching,”
IEEE Trans. Multimedia, vol. PP, no. 99, pp. 1–1, 2017.
[19] J. Gao, Y. Li, T.-J. Chin, and M. S. Brown, “Seam-driven image stitching,”
in Proc. Eurographics, 2013.
[20] F. Zhang and F. Liu, “Parallax-tolerant image stitching,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., 2014.
[21] K. Lin, N. Jiang, L.-F. Cheong, M. Do, and J. Lu, “Seagull: Seam-guided
local alignment for parallax-tolerant image stitching,” in Proc. Eur. Conf.
Comput. Vis., 2016.
[22] M. Yu and G. Ma, “360 surround view system with parking guidance,”
SAE Int’l J. Commercial Vehicles, vol. 7, no. 2014-01-0157, pp. 19–24,
2014.
[23] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
Int’l J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[24] W. Hu, M. Hu, X. Zhou, T. Tan, J. Lou, and S. Maybank, “Principal
axis-based correspondence between multiple cameras for people tracking,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 663–671, Apr.
KYU-YUL LEE received the B.S. degree in elec-
2006.
trical and computer engineering from Ulsan Na-
[25] S. M. Khan and M. Shah, “Tracking multiple occluding people by local-
tional Institute of Science and Technology, Ulsan,
izing on multiple scene planes,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 31, no. 3, pp. 505–519, Mar. 2009. Korea, in 2013, where he is currently pursuing the
[26] A. Yildiz and Y. S. Akgul, “A fast method for tracking people with multiple Ph.D. degree in electrical and computer engineer-
cameras,” in Proc. Eur. Conf. Comput. Vis. Workshops, 2010. ing. His research interests include correspondence
[27] M. Takahashi, K. Ikeya, M. Kano, H. Ookubo, and T. Mishina, “Robust matching, video stitching and deep learning.
volleyball tracking system using multi-view cameras,” in Proc. Int’l Conf.
Pattern Recognit., Dec. 2016, pp. 2740–2745.
14 VOLUME x, 2018
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2835659, IEEE Access
K.-Y. Lee and J.-Y. Sim: Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping
VOLUME x, 2018 15
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.