To quench the Kinect thirst, Microsoft Research recently released an 8-page research publication to be presented at the IEEE Conference on Computer Vision and Pattern Recognition in June etitled "Real-Time Human Pose Recognition in Parts from a Single Depth Image". The paper reveals a lot of interesting facts, science and data behind the algorithms of Kinect.
One area the research team focused on was per-frame initialization. That's, the system can work without a lengthy "set-up" phase for the user.
The success of their work is of course a key component of Kinect -- anyone can hop in to play at any time.
As part of its development, the team also collected a database of around 500,000 frames of motion capture data of simulated poses in an entertainment scenario such as driving, dancing, kicking, running and navigating menus.
From that, they generalized the dataset down to 100,000 more unique poses to which the system was trained to estimate body parts from. As an indication of just how computational intensive the development process really was, "training 3 (decision) trees to depth 20 from 1 million images takes about a day on a 1000 core cluster".
The system runs at 200 frames per second on consumer hardware.
Here's th full paper:
In the video below, Jamie Shotton one of the inventors of Human Skeletal Tracking--chat about this great invention: