Academia.eduAcademia.edu

Novel view synthesis using a translating camera

2005

This paper addresses the problem of synthesizing novel views of a scene using images taken by an uncalibrated translating camera. We propose a method for synthesis of views corresponding to translational motion of the camera. Our scheme can handle occlusions and changes in visibility in the synthesized views. We give a characterisation of the viewpoints corresponding to which views can be synthesized. Experimental results have established the validity and effectiveness of the method. Our synthesis scheme can also be used to detect translational pan motion of the camera in a given video sequence. We have also presented experimental results to illustrate this feature of our scheme.

Novel view synthesis using a translating camera Geetika Sharma a, Ankita Kumar a, Shakti Kamal a, Santanu Chaudhury b,*, J.B. Srivastava a a Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110 016, India Department of Electrical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110 016, India Abstract This paper addresses the problem of synthesizing novel views of a scene using images taken by an uncalibrated translating camera. We propose a method for synthesis of views corresponding to translational motion of the camera. Our scheme can handle occlusions and changes in visibility in the synthesized views. We give a characterisation of the viewpoints corresponding to which views can be synthesized. Experimental results have established the validity and effectiveness of the method. Our synthesis scheme can also be used to detect translational pan motion of the camera in a given video sequence. We have also presented experimental results to illustrate this feature of our scheme. Keywords: Camera translation; Image-based rendering; Novel view synthesis; Pan-detection 1. Introduction View synthesis from images of real-world scenes has gained much attention in recent times mainly due its wide and numerous applications ranging from video compression to virtual walkthroughs to special effects and animation. Active research in this area strives for faster rendering algorithms and more realistic synthesized views. In this work, we have considered the problem of synthesizing novel views of a scene using two or more images taken by an uncalibrated, translating camera. Since the given images are taken by a translating camera their image planes are parallel. Under the additional assumption of constant but unknown internal camera parameters, we have developed a synthesis scheme which produces perspectively correct views for arbitrary translation of the virtual camera. Our technique also computes zbuffer values for the given and novel views so that changes in visibility between objects in the scene may be handled correctly. Further, our technique can be used to detect translational camera motion ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx in video sequences which has applications in video segmentation and characterisation of video sequences based on camera motion patterns. Our scheme can be used as part of a rendering engine for virtual walkthroughs and also to generate translational videos of a scene from still images of it. Additionally, the translational pan detection scheme may be combined with the view synthesis scheme to first identify and then compress translational videos. Previous work in this area includes Akhloufi et al. (1999) who require the fundamental matrices relating the given and novel views to be known and do not explicitly handle occlusions while we do not require knowledge of the fundamental matrix and handle occlusions explicitly. Avidan and Shashua (1998) need to estimate the trifocal tensor for view synthesis. While they use arbitrary input views, they cannot handle occlusions. Beier and Neely (1992) use morphing between line segments specified and matched by a human animator and Chen and Williams (1993) require the camera transformation and range data to be given. Fusiello et al. (2003) use relative affine structure (Shashua and Navab, 1996), which is a projective reconstruction. They can generate novel views only for the same displacement of the virtual camera as between the given views and do not handle occlusions explicitly. Our technique can generate all in between views and also extrapolate. Genc and Ponce (2001) use constraints imposed by weak-perspective and para-perspective cameras while we work with a perspective camera. Chang and Zakhor (2001) propose a novel representation consisting of multiple depth and intensity levels for new view synthesis, however, they require calibrated cameras. Inamoto and Saito (2002), Saito et al. (2002) and Yaguchi and Saito (2002) use a set of views taken from multiple cameras to reconstruct the scene in Projective Grid Space defined by the epipolar geometry between two chosen basis cameras. Buehler et al. (2001) addresses the problem of video stabilisation using image-based rendering. A new sequence as seen from a stabilised camera trajectory is synthesized using a quasi-affine reconstruction of the scene. Unlike them we do not require any kind of reconstruction to synthesize novel views. Lhuillier and Quan (2003) give a method for obtaining a quasi-dense disparity map and a novel representation of a pair of images for view synthesis. They handle occlusions by rendering new views in a heuristically determined order using disparity values. Computing disparity from image information alone requires rectified images which correspond to a translation of the camera in the direction of the u axis of the image. Our technique can handle occlusions using z-buffer values and arbitrary camera translation. Seitz and Dyer (1996) do a rectification and interpolation of arbitrary views followed by projective transformation of the interpolated new view. We can create novel views for any virtual viewpoint within the rectangle whose diagonal is the line joining the viewpoints of the two given views and not only on the line joining them. In the case of xyz translation, the virtual viewpoint can move in rectangles of different sizes depending on the amount of z translation. Also, we compute z-buffer values for explicit occlusion handling. The novel views we render are perspectively correct while most existing techniques, except Seitz and Dyer (1996) and Avidan and Shashua (1998), require camera parameters to produce correct perspective views. Camera motion detection is an important prerequisite in video processing. Techniques in this area, like Srinivasan et al. (1997) and Sudhir and Lee (1997), are based on optical flow; Jadon et al. (2002) uses fuzzy set theory; Park et al. (1994) uses a transformation model based on perspective projection, 3D rotation and zoom. We present a novel approach to camera motion detection based on the geometric relationships between objects in a static scene and the constraints imposed by translational camera motion. Our technique for view synthesis can be used for detection of translational pan motion in video sequences in which the internal parameters of the camera do not change and can identify translationally related frames, i.e., frames whose image planes are parallel, even though the camera may have undergone arbitrary motion between them. This paper is organised as follows. In Section 2 we describe the view synthesis scheme for camera translation in xy direction while Section 3 describes the scheme for translation in xyz direction. ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx We describe the application of our technique to translational pan detection in Section 4 and conclude in Section 5. 2. Camera translation in xy direction k2p2 = KRP + We first consider the problem of view synthesis from a set of views taken by an uncalibrated camera translating parallel to the image plane, i.e., in the xy direction. 2.1. Synthesis from two views Let I1 and I2 be the two views of the scene and let their centres of projection be C1 and C2, respectively. Also, let P=(X, Y,Z)T be a point in the scene and pi = (ui, v\ 1)T its image in the ith view, i= 1,2. Then, the world to image projection is given by (Hartley and Zisserman, 2000) Xlpl=K{RP+tl) (1) where is the matrix of camera internals, R is a rotation matrix representing the orientation of the camera and t1 = (x1,y1,z1)T is a vector representing the position of the first camera. In K, f is the focal length, su and sv are the scale factors along the u and v axes of the image plane, respectively, and x0, y0 are the coordinates of the principle point. Also, k1 is the depth of the point P in the coordinate system of the first camera. We assume that R 5 I and t1 5 (0,0,0) T so that the camera is not aligned with the world coordinate system. This allows us to further assume that the origin of the world coordinate system is visible in the given views. So if we have n point correspondences given between I1 and I2, without any loss of generality, we may choose one of them as the origin of the world coordinate system. Letp1 denote its image in the first view. Then, k10p10 = Kt0 and substituting this in Eq. (1) we get P = R-lK-\Xlpl-X\pl) The second view I2, has camera matrix K[Rjt1 + t2], where t2 = (x2,y2,0)T, since we are assuming camera translation parallel to the image plane. The visibility of the origin in this view reduces the projection equation to (2) XIP20 (3) and zero translation along the z direction implies k1 = k2. Eliminating P between (2) and (3), we get The only unknowns in the above equation are k1 which is fixed and k1 for each point correspondence. Also, each point correspondence gives us two equations, linear in the unknowns. So, given n P 2 point correspondences we can set up a system of 2(n — 1) equations in n unknowns which can be solved for using singular value decomposition. The /l b s computed are actually the values that would be stored in a z-buffer as they are the depths of scene points from the centre of projection in the camera coordinate system. Since these A's are computed using image information alone they can be used for rendering new views without any perspective error. It can be shown that the equations in (4) imply the well-known epipolar constraint. To the best of our knowledge, the equations in (4) have not been documented in literature. 2.2. Synthesis of novel views Suppose that the virtual camera undergoes translation ts = (xs,ys,0)T relative to the first camera. Then, the projection equation for the novel view I is = KÐRP þ h + tsÞ Again, since zs = 0, ks = k1. Using (1), we get k1ps =XlPlþKts ð5Þ XlPl=Xlpl+Kts (6) (us0,vs0,1ÞT where ps = is the image of the origin in the new view Is. Equating the first coordinate on both sides and rearranging, we get xs = k ° "°. Since X\, u10,f and su are fixed by the given views, different values of u0s correspond to different trans- ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx lations xs in the x direction. Thus, a translation for the virtual camera may be chosen, interactively, by giving a value to us0. Similarly, the y translation of the virtual camera may chosen by giving a value to vs0. It follows that a choice of ps0 fixes the translation of the virtual camera and specifies the new view. Substituting Kts from (6) in (5), we get Note that we only want to ensure that all corresponding points are within the FOV of the virtual camera and it is possible that subsequently, some points become occluded by others due to the scene geometry. We also show that the synthesized view is infact a convex combination of the given views. We have, k1(p2 — p1) = Kt2. Combining this equation with (7) we get, The positions of point correspondences in the new view can be obtained from the above equation. Since we do not assume dense correspondences, the new view can be rendered by triangulating it and the given views using corresponding points as vertices. Triangles in the new view are then texture mapped by combining textures from corresponding triangles in the given views. Since the computed A}'s act as z-buffer values for corresponding points, these can be used to handle occlusions while rendering the new view. Thus, given n point correspondences the algorithm for synthesis of new views is as follows: U 1. Setup a system of equations using (4) and compute Xus. 2. Specify a new view by giving values to us0 and v0s. 3. Render the position of corresponding points in the new view using (7). Resolve any visibility issues using )}'& as z-buffer values. 4. Triangulate the new view using the rendered points as vertices and texture map the triangles from the given views. 2.3. Characterisation of viewpoints Our view synthesis scheme works for any arbitrary translation of the camera. However, it is possible that the chosen camera translation is so large that in the new view except for the origin all other corresponding points move outside the field of view (FOV) of the virtual camera. None of the corresponding points would be visible in the image, in such a case, making it impossible to triangulate and render the new view. To avoid such choices, we give a characterisation of all viewpoints for which it is guaranteed that all the corresponding points lie within the FOV of the virtual camera. = 1 V )U -\ M j Since the image is restricted to a finite portion of the image plane, the u and v coordinates of image points are bounded. Also, since the matrix of camera internals does not change across the given and the synthesized views, these bounds are the same for all the views. Let a be the lower bound for the u axis and b, the upper bound. We will obtain a condition on xs so that us is within the bounds of the image plane of the new view. We have, First suppose that 0 6 xs 6 x2. Then, multiplying the first inequality in (8) with 1 — j - , the second with f- and adding, we get a 6 us 6 b. Following a similar approach we can show that if, 2c and d are the bounds for the v axis, then c 6 vs 6 d if y2. Thus, if we specify Is such that the 0 centre of projection remains within the rectangle whose diagonal is the line joining C1 and C2, all the points in the fields of view of I1 and I2 are also in the FOV of Is. We call this rectangle the rectangle of renderable views. It is the shaded rectangle in Fig. 4. If, xs > x2 and there is a point in the scene such that its u coordinate in I1 is a and in I2 is b, then, its image in Is will be að1 — f-) + bxxs> b. Such a point will not lie in the FOV of Is. Thus, if xs > x2 we cannot guarantee that all points in the fields of view I1 and I2 will lie in the FOV of Is as well. Similarly, for xs < 0, we can show that there can be points in the fields of view of I1 and I2 which are not in the FOV of Is. We conclude that for all corresponding points to lie in the FOV of the new view we must move the centre of projection only within the rectangle of renderable views. Note that in this case, us be- ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx comes a convex combination of u1 and u2 and vs becomes a convex combination of v1 and v2. We now give an interpretation of these conditions in terms of the image of the origin in the new frame. We have {ul - z/0) = ðx2 - x Since fsu, X\ > 0 and (x2 — xs) P 0 we must have US < MQ. Similarly, comparing the second coordinate on both sides, we get vs0 6 v02. Thus, to remain within the rectangle of renderable views we must and vs0 6 v02. Also, note that specispecify us fying U0 = MQ implies xs = x2 and gives the set of views along the upper horizontal boundary of the rectangle of renderable views. Similarly, the views along the other boundaries of the rectangle may be obtained by giving appropriate values to S0 and UQ. We would like to mention here that this is only a characterisation of viewpoints for which all corresponding points will lie in the FOV of the virtual camera and our technique can be used to synthesize views for any translation of the virtual camera. However, it is possible that for viewpoints outside the rectangle of renderable views, some or all of the corresponding points move outside the FOV of the virtual camera. 2.4. Results We have experimented the proposed scheme on a variety of scenes. Correspondences were established between feature points detected by the Harris corner detector. Fig. 1 shows the input images of an outdoor scene with translation in the x direction and six synthesized novel views. As the viewpoint shifts in the synthesized views, some portions of the chimneys in the background become occluded by the tree in the foreground while others come into view. These changes in visibility have been correctly rendered by our method. Fig. 2(a)-(d) are four views of the same scene synthesized by Lhuillier and Quan (2003) who work with uncalibrated cameras. The images have been obtained from http://wwwlasmea.univ-bpclermont.fr/Personnel/ Maxime.Lhuillier/Interpol2.html. Observe the distortion in the shape of the tree. While occlusions are being handled correctly, due to the distortion in shape, parts of the image are not rendered correctly. For instance in (a), a portion of the chimney that would have been occluded had the shape of the tree been preserved is visible. 2.5. Synthesis from N P 3 views Our technique can be extended to synthesize novel views from three or more views. This allows for (9) (h) Fig. 1. Results for pure x translation: (a) and (h) are the input images with translation only in the x direction. (b)-(g) are six synthesized views with translation in x direction. Note the movement of the tree with respect to the two chimneys highlighted by the box. In (b)-(d) the first chimney starts getting occluded by the tree while the second chimney starts becoming visible. In (e), the first chimney is completely occluded and starts becoming visible again in (f) and (g) while the second chimney is completely visible in these ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx (b) (c) (d) Fig. 2. (a)-(d) Four views synthesized by Lhuillier and Quan (2003). 3. Camera translation in xyz direction a better coverage of the scene and expands the rectangle of renderable views to the union of the rectangles formed when the images are taken two at a time. These facts are illustrated by Fig. 3 which shows three input images of a lab scene and six synthesized views.The input views have been obtained by translation both in the x and y directions. Again the changes in visibility induced by a shift of the viewpoint have been correctly rendered by our scheme. Using all the three views we get correspondences on different parts of the big box which would not have been possible from just the first and third views. Also, since the second view has y translation, we can generate novel views with y translation, i.e., the space of renderable views expands. In this section we describe the view synthesis scheme when the input views have been obtained by a camera translating in an arbitrary or xyz direction. 3.1. Synthesis from two views The projection equations for the two views I1 and I2 are given by kipi = KðRP þ tiÞ where ti = (xi,yi,zi)T, i=1,2, and z1 The equation relating the k values of a point in the two views is (h) (i) Fig. 3. Results for xy translation: (a)-(c) are the input views. (a) and (b) are related by a translation in y, while (a) and (c) are related by a translation in x. (d)-(f) are three synthesized views with translation in x direction, while (g)-(i) are those from a sequence with xy translation. Observe the changing positions of the objects relative to each other as highlighted by the boxes. ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx and k0s independent of each other. However, if we want zero translations along the x, y or both x and y directions, we do not have enough independent parameters to choose us0 and v0s independently. Such translations can be specified if the internal parameters x0 and y0 are known. We can, however, specify a zero translation along the z direction by choosing Xs0 k 10 and us0 and v0s arbitrarily. In this case ks = k1 and choosing us0 = u1 corresponds to zero translation along the x direction. Thus, if the translation along the z direction is zero, we can have zero translations along x or y as well. Note that Xs's are the z-buffer values again. Since they have been computed using image information alone any view rendered using these values will be perspectively correct. Fig. 5 shows a lab and an outdoor scene with translation in xyz direction and three synthesized views. As the viewpoint translates in z, towards the scene, the size of the objects in the images increases. This increase in the size of the objects is apparent in the synthesized views. U0S;VS0 Fig. 4. Rectangle of renderable views: the camera can move in the shaded rectangle which is the rectangle of renderable views. - KPI (9) In this equation the number of unknowns increases since k1 5 k2. However, the equations continue to be linear in the unknowns and can be solved given n P 3 point correspondences. The equation defining the new view is The non-linearity on the left-hand side can be resolved by first computing ks from the last coordinate on both sides and then computing us and vs. To specify the new view we need to give values to MQ, vs0 and ks0 which corresponds to the translation of the camera in the z direction. The view synthesis scheme and the rendering algorithm is the same as that for the xy translation case except that we use (9) to compute the A's and (10) to render the new views. If we want the virtual camera to undergo a nonzero translation in each direction, we can choose (f) (h) 3.2. Characterisation of viewpoints As in Section 2.3 we give a characterisation of all viewpoints for which it is guaranteed that all corresponding points will lie within the field of view (FOV) of the virtual camera. Since the orientation matrix R is the same in all the views and the translation for the new view is specified with respect to the first view, to characterise the set (i) (i) Fig. 5. Results for xyz translation: (a) and (b) are the input images of a lab scene with translation in the xyz direction. (c)-(e) are three synthesized views. (f) and (g) are input images of an outdoor scene. (h)-(j) are three synthesized frames. Note the increase in the size of the objects as the virtual camera translates closer to the scene. ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx of viewpoints we may assume that the first camera is aligned. Let the first camera be K[Ij0], the second be K[Ijt1] and the new camera be K[Ijts], ts = (xs,ys,zs)T. Let a and b be the bounds for the u axes and c and d be the bounds for the v axes. The view-volume of a camera is defined by four planes which intersect at the centre of projection and pass through the boundaries of the image plane. In order to ensure that all points in the fields of view of the given views are also in the FOV of the novel view, we will intersect the view-volumes of the given cameras and require that the viewvolume of the new camera contain the intersection. For the first view, let planes defining the view-volume be p1, p2, p3 and p4. Then, the equations of these planes are p1 :fX — aZ = 0, p2: Fig. 6. The set of possible virtual camera locations is the set of solid rectangles. a ( a azs - max I 0; az1 - < xs < zs -min 0 ; b z 1 - fY-dZ = 0, n3:fX-bZ = 0, n4:fY-cZ = 0. Let Ni denote the normal of nh i= 1,2,3,4, that points outside the view-volume. Let P = (X, Y, Z) T be any point inside the view-volume. Then, the dot product of Ni, i =1,2,3,4, with the vector joining any point on the plane pi and P is less than zero. Choosing the centre of projection as the point on the plane, P will lie inside the view-volume if, a b aZ <X <bZ , c and We can get similar relations for ys. Thus, if the internal parameters and the size of the image plane is known, we can get the bounds for xs and ys. Using these we can obtain bounds on u0s and vs0. Note that the amount of x or y translation permissible depends on the z translation. The set of possible viewpoints for the virtual camera turns out to be a set rectangles of different sizes depending on the amount of z translation as shown in Fig. 6. d cZ<Y<dZ Similarly, if p0 1 ;p02 ; p30 and < are the planes defining 4. Translational pan detection the second view-volume, then n[ : fX — aZ þ fx1 — In this section we describe how our view syntheaz1 :jY-dZ^ þfy1 - dz1 = 0, n'3:fX4 = 0, sis scheme can be used to detect whether a video bZ -bz i = 0 , < :fY -cZ+fy, - cz1 = 0. segment in which the internal parameters of the A point P l lies inside the second view-volume camera are constant, is a translational pan seif, quence. We choose three frames from the seab quence. The idea is that if these frames are infact Z þ z1 ðZ þ z\) — x1 and x1 <X < part of a translational pan sequence then the cor( f 7 responding points in these frames must satisfy all the geometric relationships that we have used for view synthesis in the previous sections. We determine to what extent these constraints are satisfied We may assume that y2 > y1, choosing as I1 the by the corresponding points and then decide view with the lesser vertical displacement. Then, whether a given sequence is a translational pan sethe intersection of the given view-volumes is quence or not. bounded by the left and bottom planes of I2 and the right and top planes of I1. Requiring that a Let the three frames be I1, I2 and I3 and setup point lying in the given view-volumes also lie in point correspondences between them. Treating I1 the new view-volume gives and I3 as input images we synthesize the frame cor- ARTICLE IN PRESS G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx Coffee Sequence Mean=2.22 House Sequence Mean=1.96 Lab Sequence Mean-56.36 Fig. 7. Sequences for pan detection. Frames in the middle row were checked for translational pan using correspondences between frames in the first and last rows. responding to I2. This is done by choosing a point visible in I2 as the origin and synthesizing a new view with the origin in the same position as in I2. We then measure the similarity between the rendered view and I2. Mean of the distances between the positions of corresponding points in the rendered view and their positions in I2 is used as a measure of similarity. Any one of the corresponding points could be chosen as the origin to synthesize the new view. However, we choose that point which gives the minimum mean error since that point synthesizes a frame closest to the given frame. If the mean error is within a certain threshold, the segment is classified as a translational pan sequence. The threshold is determined from a training data set which consists of translational pan sequences. We have tested our algorithm on a number of sequences. In order to determine the threshold for the mean error to declare the two frames to be translationally related, we used a training set consisting of 80 sequences to assess the distribution pattern of errors. On the basis of the distribution, we have found that with 99% confidence interval we can identify a frame to be translational pan if the the mean error is less than 2.275863 pixels. Fig. 7 shows three sequences for which it is known that the coffee and house sequence are translational pan sequences and the lab sequence is not. These have been correctly classified by our technique. Our technique can also determine if three nonconsecutive frames of a sequence are translationally related, i.e., if their image planes are parallel, although the camera motion between them may be arbitrary. Thus, we can identify both translational pan sequences as well as translationally related frames in a video sequence. This is possible because our technique is based on the geometric relationships between points in a static scene and the constraints imposed by translational camera motion which are independent of intermediate camera motion. 5. Conclusions In this paper we have proposed a technique for the synthesis of novel views using two or more views taken by an uncalibrated translating camera. Our scheme produces correct perspective views while most techniques require calibrated cameras to produce correct perspective views. We have also characterised the set of viewpoints for which new ARTICLE IN PRESS 10 G. Sharma et al. / Pattern Recognition Letters xxx (2004) xxx-xxx views can be rendered. Our scheme can also be used for detecting translational pan motion in video sequences in which the internal camera parameters do not change. This work can be extended to handle arbitrary motion of the camera. However, some additional information, for example, correspondence of vanishing points, would be required. This will be a topic for future endeavours. References Akhloufi, M.A., Polotski, V., Cohen, P., 1999. Virtual view synthesis from uncalibrated stereo cameras. In: Proc. Internat. Conf. on Multimedia Computing and Systems, vol. 2, pp. 672-676. Avidan, S., Shashua, A., 1998. Novel view synthesis by cascading trilinear tensors. IEEE Trans. Visualiz. Comput. Graphics 4 (4), 293-306. Beier, T., Neely, S., 1992. Feature-based image metamorphosis. In: Proc. SIGGRAPH, pp. 35^2. Buehler, C., Bosse, M., McMillan, L., 2001. Non-metric imagebased rendering for video stabilization. In: Proc. Computer Vision and Pattern Recognition, vol. 2, pp. 609-614. Chang, N.L., Zakhor, A., 2001. Constructing a multivalued representation for view synthesis. Internat. J. Computer Vision 45 (2), 157-190. Chen, S.E., Williams, L., 1993. View interpolation for image synthesis. In: Proc. SIGGRAPH, pp. 279-288. Fusiello, A., Caldrer, S., Sara, C., Mattern, N., Murino, V., 2003. View synthesis from uncalibrated images using parallax. In: Proc. Internat. Conf. on Image Analysis and Processing, pp. 146-151. Genc, Y., Ponce, J., 2001. Image-based rendering using parametrised image varieties. Internat. J. Computer Vision 41 (3), 143-170. Hartley, R., Zisserman, A., 2000. Multiple View Geometry in Computer Vision, first ed. Cambridge University Press. Inamoto, N., Saito, H., 2002. Intermediate view generation of soccer scene from multiple views. In: Proc. ICPR. Jadon, R.S., Chaudhury, S., Biswas, K.K., 2002. A fuzzy theoretic approach for camera motion detection, 2002. In: Proc. IPMU. Lhuillier, M., Quan, L., 2003. Image-based rendering by joint view triangulation. IEEE Trans. Circ. Systems Video Technol. 13 (11), 1051-1063. Park, J., Yagi, N., Enami, K., Aizawa, K., Hatori, M., 1994. Estimation of camera parameters from image sequence for model-based video coding. IEEE Trans. Circ. Systems Video Technol. 4 (3), 288-296. Saito, H., Kimura, M., Yaguchi, S., Inamoto, N., 2002. View interpolation of multiple cameras based on projective geometry. In: International Workshop on Pattern Recognition and Understanding for Visual Information Media. Seitz, S.M., Dyer, C.R., 1996. View morphing. In: Proc. SIGGRAPH, pp. 21-30. Shashua, A., Navab, N., 1996. Relative affine structure: Canonical model for 3D from 2D geometry and applications. IEEE Trans. Pattern Anal. Machine Intell. 18 (9), O/J—OOJ . Srinivasan, M.V., Venkatesh, A., Hosie, R., 1997. Quantitative estimation of camera motion parameters from video sequences. Pattern Recognit. 30, 593-606. Sudhir, G., Lee, J., 1997. Video annotation by motion interpretation using optical flow streams. J. Visual Comm. Image Represent. 7, 354-368. Yaguchi, S., Saito, H., 2002. Arbitrary viewpoint video synthesis from uncalibrated multiple cameras. In: Proc. WSCG.