At TechFest 2011, this research showcases a new, 3-D, photo-real talking head with freely controlled head motions and facial expressions. "It extends our prior, high-quality, 2-D, photo-real talking head to 3-D. First, we apply a 2-D-to-3-D reconstruction algorithm frame by frame on a 2-D video to construct a 3-D training database," explains Laura Foy.
In training, super-feature vectors consisting of 3-D geometry, texture, and speech are formed to train a statistical, multistreamed, Hidden Markov Model (HMM). The HMM then is used to synthesize both the trajectories of geometric animation and dynamic texture. The 3-D talking head can be animated by the geometric trajectory, while the facial expressions and articulator movements are rendered with dynamic texture sequences. Head motions and facial expression also can be separately controlled by manipulating corresponding parameters.
The new 3-D talking head has many useful applications, such as voice agents, telepresence, gaming, and speech-to-speech translation.
Face recognition in video is an emerging technology that'll have great impact on user experience in fields such as television, gaming, and communication. In the near future, a television or an Xbox will be able to recognize people in the living room, home video will be annotated automatically and become searchable, and TV watchers will be able to get information about an unfamiliar actor, athlete, or singer just by pointing to the person on the screen.
Microsoft research showcases the face-recognition technology developed by iLabs, includes novel algorithms in face detection, recognition, and tracking. The research demonstrates semi-automatic labeling of videos, a novel TV-watching experience using faces in a video as hyperlinks to get more information, and automatic recognition of the person in front of the television, Xbox, or computer.
In the video, Microsoft research demonstrates an easy-to-use system for creating photorealistic, 3-D-image-based models simply by walking around an object of interest with your phone, still camera, or video camera. The objects might be your custom car or motorcycle, a wedding cake or dress, a rare musical instrument, or a hand-crafted artwork. The system uses 3-D stereo matching techniques combined with image-based modeling and rendering to create a photorealistic model you can navigate simply by spinning it around on your screen, tablet, or mobile device.