Papers by Ira Kemelmacher
Given a photo of person A, we seek a photo of person B with similar pose and expression. Solving ... more Given a photo of person A, we seek a photo of person B with similar pose and expression. Solving this problem enables a form of puppetry, in which one person appears to control the face of another. When deployed on a webcam-equipped computer, our approach enables a user to control another person's face in real-time. This image-retrieval-inspired approach employs a fully-automated pipeline of face analysis techniques, and is extremely general--we can puppet anyone directly from their photo collection or videos in which they appear. We show several examples using images and videos of celebrities from the Internet.
Recent face recognition experiments on the LFW benchmark show that face recognition is performing... more Recent face recognition experiments on the LFW benchmark show that face recognition is performing stunningly well, surpassing human recognition rates. In this paper, we study face recognition at scale. Specifically, we have collected from Flickr a \textbf{Million} faces and evaluated state of the art face recognition algorithms on this dataset. We found that the performance of algorithms varies--while all perform great on LFW, once evaluated at scale recognition rates drop drastically for most algorithms. Interestingly, deep learning based approach by \cite{schroff2015facenet} performs much better, but still gets less robust at scale. We consider both verification and identification problems, and evaluate how pose affects recognition at scale. Moreover, we ran an extensive human study on Mechanical Turk to evaluate human recognition at scale, and report results. All the photos are creative commons photos and will be released for research and further experiments.
We reconstruct a controllable model of a person from a large photo collection that captures his o... more We reconstruct a controllable model of a person from a large photo collection that captures his or her {\em persona}, i.e., physical appearance and behavior. The ability to operate on unstructured photo collections enables modeling a huge number of people, including celebrities and other well photographed people without requiring them to be scanned. Moreover, we show the ability to drive or {\em puppeteer} the captured person B using any other video of a different person A. In this scenario, B acts out the role of person A, but retains his/her own personality and character. Our system is based on a novel combination of 3D face reconstruction, tracking, alignment, and multi-texture modeling, applied to the puppeteering problem. We demonstrate convincing results on a large variety of celebrities derived from Internet imagery and video.
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005
The task of identifying 3D objects in 2D images is difficult due to variation in objects' appeara... more The task of identifying 3D objects in 2D images is difficult due to variation in objects' appearance with changes in pose and lighting. The task is further complicated by the presence of occlusion and clutter. Shape indexing is a method for rapid association between features identified in an image and their corresponding 3D features stored in a database. Previous indexing methods ignored variations due to lighting, restricting the approach to polyhedral objects. In this paper, we further develop these methods to handle variations in both pose and lighting. We focus on rigid objects undergoing a scaled-orthographic projection and use spherical harmonics to represent lighting. The resulting integrated algorithm can recognize 3D objects from a single input image; furthermore, it recovers the pose and lighting of each familiar object in the given image. The algorithm has been tested on a database of real objects, demonstrating its performance on cluttered scenes under a variety of poses and illumination conditions.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
Human faces are remarkably similar in global properties, including size, aspect ratio, and locati... more Human faces are remarkably similar in global properties, including size, aspect ratio, and location of main features, but can vary considerably in details across individuals, gender, race, or due to facial expression. We propose a novel method for 3D shape recovery of faces that exploits the similarity of faces. Our method obtains as input a single image and uses a mere single 3D reference model of a different person's face. Classical reconstruction methods from single images, i.e. shape-from-shading, require knowledge of the reflectance properties and lighting as well as depth values for boundary conditions. Recent methods circumvent these requirements by representing input faces as combinations (of hundreds) of stored 3D models. We propose instead to use the input image as a guide to "mold" a single reference model to reach a reconstruction of the sought 3D shape. Our method assumes Lambertian reflectance and uses harmonic representations of lighting. It has been tested on images taken under controlled viewing conditions as well as on uncontrolled images downloaded from the internet, demonstrating its accuracy and robustness under a variety of imaging conditions and overcoming significant differences in shape between the input and reference individuals including differences in facial expressions, gender and race.
2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
. Given a pair of images (first and last in the sequence) the in-between photos are automatically... more . Given a pair of images (first and last in the sequence) the in-between photos are automatically synthesized using our flow estimation method. Note the significant variation in lighting and facial expression between the two input photos.
2014 2nd International Conference on 3D Vision, 2014
We present an algorithm that takes a single frame of a person's face from a depth camera, e.g., K... more We present an algorithm that takes a single frame of a person's face from a depth camera, e.g., Kinect, and produces a high-resolution 3D mesh of the input face. We leverage a dataset of 3D face meshes of 1204 distinct individuals ranging from age 3 to 40, captured in a neutral expression. We divide the input depth frame into semantically significant regions (eyes, nose, mouth, cheeks) and search the database for the best matching shape per region. We further combine the input depth frame with the matched database shapes into a single mesh that results in a highresolution shape of the input person. Our system is fully automatic and uses only depth data for matching, making it invariant to imaging conditions. We evaluate our results using ground truth shapes, as well as compare to state-ofthe-art shape estimation methods. We demonstrate the robustness of our local matching approach with high-quality reconstruction of faces that fall outside of the dataset span, e.g., faces older than 40 years old, facial expressions, and different ethnicities.
ACM Transactions on Graphics, 2011
We present an approach for generating face animations from large image collections of the same pe... more We present an approach for generating face animations from large image collections of the same person. Such collections, which we call photobios, sample the appearance of a person over changes in pose, facial expression, hairstyle, age, and other variations. By optimizing the order in which images are displayed and crossdissolving between them, we control the motion through face space and create compelling animations (e.g., render a smooth transition from frowning to smiling). Used in this context, the cross dissolve produces a very strong motion effect; a key contribution of the paper is to explain this effect and analyze its operating range. The approach operates by creating a graph with faces as nodes, and similarities as edges, and solving for walks and shortest paths on this graph. The processing pipeline involves face detection, locating fiducials (eyes/nose/mouth), solving for pose, warping to frontal views, and image comparison based on Local Binary Patterns. We demonstrate results on a variety of datasets including time-lapse photography, personal photo collections, and images of celebrities downloaded from the Internet. Our approach is the basis for the Face Movies feature in Google's Picasa.
Lecture Notes in Computer Science, 2014
. Given a YouTube video of a person's face our method estimates high detail geometry (full 3D flo... more . Given a YouTube video of a person's face our method estimates high detail geometry (full 3D flow and pose) in each video frame completely automatically.
2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, 2012
Multiview structure recovery from a collection of images requires the recovery of the positions a... more Multiview structure recovery from a collection of images requires the recovery of the positions and orientations of the cameras relative to a global coordinate system. Our approach recovers camera motion as a sequence of two global optimizations. First, pairwise Essential Matrices are used to recover the global rotations by applying robust optimization using either spectral or semidefinite programming relaxations. Then, we directly employ feature correspondences across images to recover the global translation vectors using a linear algorithm based on a novel decomposition of the Essential Matrix. Our method is efficient and, as demonstrated in our experiments, achieves highly accurate results on collections of real images for which ground truth measurements are available.
International Journal of Computer Vision, 2007
Work on photometric stereo has shown how to recover the shape and reflectance properties of an ob... more Work on photometric stereo has shown how to recover the shape and reflectance properties of an object using multiple images taken with a fixed viewpoint and variable lighting conditions. This work has primarily relied on known lighting conditions or the presence of a single point source of light in each image. In this paper we show how to perform photometric stereo assuming that all lights in a scene are distant from the object but otherwise unconstrained. Lighting in each image may be an unknown and may include arbitrary combination of diffuse, point and extended sources. Our work is based on recent results showing that for Lambertian objects, general lighting conditions can be represented using low order spherical harmonics. Using this representation we can recover shape by performing a simple optimization in a low-dimensional space. We also analyze the shape ambiguities that arise in such a representation. We demonstrate our method by reconstructing the shape of objects from images obtained under a variety of lightings. We further compare the reconstructed shapes against shapes obtained with a laser scanner.
2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008
Two-tone ("Mooney") images seem to arouse vivid 3D percept of faces, both familiar and unfamiliar... more Two-tone ("Mooney") images seem to arouse vivid 3D percept of faces, both familiar and unfamiliar, despite their seemingly poor content. Recent psychological and fMRI studies suggest that this percept is guided primarily by topdown procedures in which recognition precedes reconstruction. In this paper we investigate this hypothesis from a mathematical standpoint. We show that indeed, under standard shape from shading assumptions, a Mooney image can give rise to multiple different 3D reconstructions even if reconstruction is restricted to the Mooney transition curve (the boundary curve between black and white) alone. We then use top-down reconstruction methods to recover the shape of novel faces from single Mooney images exploiting prior knowledge of the structure of at least one face of a different individual. We apply these methods to thresholded images of real faces and compare the reconstruction quality relative to reconstruction from gray level images.
Communications of the ACM, 2014
We present an approach for generating face animations from large image collections of the same pe... more We present an approach for generating face animations from large image collections of the same person. Such collections, which we call photobios, are remarkable in that they summarize a person's life in photos; the photos sample the appearance of a person over changes in age, pose, facial expression, hairstyle, and other variations. Yet, browsing and exploring photobios is infeasible due to their large volume. By optimizing the quantity and order in which photos are displayed and cross dissolving between them, we can render smooth transitions between face pose (e.g., from frowning to smiling), and create moving portraits from collections of still photos. Used in this context, the cross dissolve produces a very strong motion effect; a key contribution of the paper is to explain this effect and analyze its operating range. We demonstrate results on a variety of datasets including time-lapse photography, personal photo collections, and images of celebrities downloaded from the Internet. Our approach is completely automatic and has been widely deployed as the "Face Movies" feature in Google's Picasa.
Lecture Notes in Computer Science, 2006
Uploads
Papers by Ira Kemelmacher