2D-to-3D Photo Rendering For 3D Displays: Comandu@dsi - Unifi.it Atsuto - Maki@crl - Toshiba.co - Uk
2D-to-3D Photo Rendering For 3D Displays: Comandu@dsi - Unifi.it Atsuto - Maki@crl - Toshiba.co - Uk
2D-to-3D Photo Rendering For 3D Displays: Comandu@dsi - Unifi.it Atsuto - Maki@crl - Toshiba.co - Uk
2
Reference image I
Camera calibration
Support image J
Image segmentation:
Ground plane equation
Ground & foreground objects
Stereoscopic
Ground images baseline δ
Background images +
Foreground images
where i lπ is the vanishing line of π in image I. The signed RANSAC algorithm [3] on SIFT correspondences [10].
distance dπ can be obtained by triangulating any two cor- In particular, for the ground plane homography the
responding points under the homography Hπ (induced by π parametrization of Eq. 4 is used, thus requiring only three
between I and J, and estimated as detailed in subsect. 2.2.1) point correspondences for its estimation.
and imposing the passage of π through the triangulated 3D
point. The vanishing line i lπ of the planar region π is com-
posed of points that are mapped from I to J both by Hπ and
by the infinite homography H∞ = Kj RK−1 2.1.1 Camera self-calibration
i . The homogra-
phy
Hp = H−1
π H∞ (3) Camera self-calibration follows the approach of [11], where
the fundamental matrix F between I and J is exploited. In
mapping I onto itself is actually a planar homology, i.e., a
our notation, F is defined as
special planar transformation having a line of fixed points
(the axis) and a distinct fixed point (the vertex), not on the
j >
line. In the case of Hp , the vertex is the epipole i ej ∈ I of x F ix = 0 , (6)
view J, and the axis is the vanishing line i lπ , since it is the
intersection of π with the plane at infinity π∞ [5]. Thus, for any two corresponding points i x ∈ I and j x ∈ J. In
thanks to the properties of homologies, i lπ is obtained as [11], the internal camera matrices Ki and Kj are estimated
i
lπ = w1 × w2 , where w1 , w2 are the two eigenvectors of by forcing the matrix Ê = K> j FKi to have the same proper-
Hp corresponding to the two equal eigenvalues. ties of the essential matrix. This is achieved by minimizing
In order to obtain robust warping results, it is required the difference between the two non zero singular values of
that the homography Hπ be compatible with the fundamen- Ê, since they must be equal. The Levenberg-Marquardt al-
tal matrix F, i.e., H> >
π F + F Hπ = 0. This is achieved by gorithm is used, so an initial guess for Ki and Kj is required.
using a proper parametrization for Hπ [5]. Given the fun- The most uncertain value among the entries of Ki and Kj is
damental matrix F between two views, the three-parameter the focal length: as suggested in [6], this value is expected
family of homographies induced by a world plane π is to fall in the interval [1/3(w + h), 3(w + h)], where w and
H π = A − j ei v > , (4) h are respectively the width and height of the image. In
our approach, the first guess for the focal length is obtained
where [j ei ]× A = F is any decomposition (up to scale) of the with the method proposed in [15] if the solution falls in the
fundamental matrix, and j ei is the epipole of view I in im- above interval, otherwise it is set to w + h. The principal
age J (in other words, j e> > j j
i F = 0 ). Since [ ei ]× [ ei ]× F = point is set in the center of the image, while pixels are as-
°j °2
° °
− ei F, the matrix A can be chosen as sumed square (unit aspect ratio and zero skew). Extrinsic
parameters (rotation matrix R and translation vector t) of
A = [j ei ]× F . (5)
the support camera with respect to the reference camera are
Both the fundamental matrix F and the ground plane then recovered by factorizing the estimated essential matrix
homography Hπ are robustly computed by running the as Ê = [t]× R [5].
3
(a) (b) (a) (b)
Figure 2. (a): Reference image I. (b): Support image J. Figure 3. Ground plane recovery. (a): Ground classification for
image I: The brighter the color, the more probable the ground
region. (b): Recovery of the ground plane vanishing line (dashed
2.2. Stereo pair generation and rendering line in the picture), after camera self-calibration and ground plane
homography estimation.
So far, we have described how to compute the pair of ho-
mographies mapping the image of a generic planar region
onto the two translated virtual views forming the stereo-
scopic pair (Il ,Ir ). This section specializes the use of Eq. 1
to the case of a scene including a planar ground, and then
expounds how to warp the background and foreground ob-
jects properly, given the image of the ground plane. Fig. 2
shows the images I and J that will be used to illustrate the
various rendering phases.
4
sponding background column, the foreground pixels are not that the 3D image is inside the TV, starting from the screen
copied. The remaining background portions, i.e., those oc- surface. Users are nonetheless free to change the overall
cluded by the foreground objects, are filled in with smooth shift and put on the screen surface other frontal regions, if
color interpolation. required.
(a) (b)
(a) (b)
(c) (d)
Figure 5. Background generation for Ir . (a): Top border of the
background not occluded. (b): Recovery of the background for
the occluded part of the ground top border.
5
(a) (b)
(c) (d)
Figure 6. Stereoscopic rendering for I of Fig. 2(a). (a): Il . (b): Ir . (c): Superimposed stereoscopic images. (d): Disparity map.
good accuracy of geometric estimates. Indeed, the ground disparity map obtained with our approach is confirmed by
boundaries next to the walls (dashed lines) are almost per- a visual comparison against the disparity map of Fig. 8(f),
fectly orthogonal, as it should be, despite the very slanted which was obtained with a state-of-the-art dense stereo ap-
view of the ground in the original image. proach [16]: The two maps look very similar. However,
dense stereo is much slower than our approach, taking
Figs. 8(a) and (b) illustrate the “bushes” pair, where about 50 minutes for each image pair on a quad core In-
two partially self-occluding foreground objects are present. tel Xeon 2.5GHz PC. In the present MATLAB implemen-
Notice, from both Figs. 8(c) and (d), the small blurred tation of our approach, the overall processing time for an
regions—especially evident to the left (c) and right (d) of image pair is less than 5 minutes, also taking into account
the closer bush—due to color interpolation inside occluded the semi-automatic foreground segmentation procedure.
background areas. As evident from the disparity map of
Fig. 8(e), the two bushes are correctly rendered as belong- Fig. 9 illustrates the results obtained with the “bride stat-
ing to two distinct depth layers. The good quality of the ues” pair. This pair also includes two foreground objects,
6
(a) (b)
(a) (b)
(c) (d)
(e) (f)
Figure 8. The “bushes” example. (a): Reference image I. (b):
Support image J. (c): Left stereoscopic image Il . (d): Right
stereoscopic image Ir . (e): Disparity map with our approach. (f):
Disparity map with a dense stereo approach.
(c) (d)
7
[2] A. Criminisi, M. Kemp, and A. Zisserman. Bringing pic-
torial space to life: computer techniques for the analysis of
paintings. In on-line Proc. Computers and the History of Art,
2002. 2
[3] M. Fischler and R. Bolles. Random sample consensus:
A paradigm for model fitting with applications to image
analysis and automated cartography. Comm. of the ACM,
(a) (b) 24(6):381–395, 1981. 3
[4] M. Guttman, L. Wolf, and D. Cohen-Or. Semi-automatic
stereo extraction from video footage. In Proc. IEEE Interna-
tional Conference on Computer Vision, 2009. 1
[5] R. Hartley and A. Zisserman. Multiple View Geometry in
Computer Vision. Cambridge University Press, 2004. 3
[6] A. Heyden and M. Pollefeys. Multiple view geometry. In
G. Medioni and S. B. Kang, editors, Emerging Topics in
Computer Vision. Prentice Hall, 2005. 3
(c) (d) [7] D. Hoiem, A. Efros, and M. Hebert. Recovering surface lay-
out from an image. International Journal on Computer Vi-
sion, 75(1), 2007. 2, 4
[8] G. Jones, D. Lee, N. Holliman, and D. Ezra. Controlling per-
ceived depth in stereoscopic images. In Proc. SPIE Stereo-
scopic Displays and Virtual Reality Systems VIII, volume
4297, 2001. 1
[9] J. Koenderink, A. van Doorn, A. M. L. Kappers, and J. T.
(e) (f) Todd. Ambiguity and the ‘mental eye’ in pictorial relief.
Perception, 30(4):431–448, 2001. 2
Figure 10. Some frames of a synthetic video sequence for the
[10] D. Lowe. Distinctive image features from scale-invariant
“horse” example of Fig. 7. The camera translates along its x−axis
keypoints. International Journal on Computer Vision,
from right to left. Black pixels around the horse correspond to
60(2):91–110, 2004. 3
occluded background points.
[11] P. Mendonça and R. Cipolla. A simple technique for self-
calibration. In Proc. Conf. Computer Vision and Pattern
parallel stereoscopic displays, where the disparities of all Recognition, 1999. 3
scene elements are generated after statistical segmentation [12] V. Nedovic, A. W. M. Smeulders, A. Redert, and J. M.
Geusebroek. Stages as models of scene geometry. IEEE
and geometric localization of the ground plane in the scene.
Transactions on Pattern Analysis and Machine Intelligence,
Future work will address (1) extending the approach to (in press), 2010. 1
videos (which will lead to investigate the problem of of tem- [13] C. Rother, V. Kolmogorov, and A. Blake. Grabcut: Interac-
poral consistency among frames), (2) relaxing the ground tive foreground extraction using iterated graph cuts. In ACM
plane assumption, (3) performing a totally automatic im- Transactions on Graphics (SIGGRAPH), 2004. 5
age segmentation based on a multi-planar scene model, thus [14] A. Saxena, M. Sun, and A. Y. Ng. Learning 3-d scene struc-
further speeding up computations (in the current implemen- ture from a single still image. In Proc. IEEE International
tation, more than 90% of the time is taken by the semi- Conference on Computer Vision, pages 1–8, 2007. 2
automatic foreground object segmentation) while retaining [15] P. Sturm. On focal length calibration from two views. In
the basic geometric structure of the approach expounded in Proc. IEEE Conference on Computer Vision and Pattern
subsect. 2.1, (4) implementing an automatic method to de- Recognition, 2001. 3
termine the optimal range of disparities for 3D perception. [16] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon. Global
stereo reconstruction under second-order smoothness priors.
IEEE Trans. on Pattern Analysis and Machine Intelligence,
Acknowledgements 31(12):2115–2128, 2009. 1, 6, 8
We heartily thank Oliver Woodford for providing us with [17] G. Zhang, W. Hua, X. Qin, T. T. Wong, and H. Bao. Stereo-
the experimental results used to compare our approach with scopic video synthesis from a monocular video. IEEE
Transactions on Visualization and Computer Graphics,
his dense stereo method [16].
13(4):686–696, 2007. 2
[18] G. Zhang, J. Jia, T.-T. Wong, and H. Bao. Consistent depth
References maps recovery from a video sequence. IEEE Trans. on
Pattern Analysis and Machine Intelligence, 31(6):974–988,
[1] S. Coren, L. M. Ward, and J. T. Enns. Sensation and Percep-
2009. 2
tion. Harcourt Brace, 1993. 2