Microsoft is Making Big Bets on Speech NUI with Tellme Powering 'Kinect, WP, Bing, Windows 8'

Microsoft is working hard to get technology out of your way so your experience is more natural and intuitive. "This year at the E3 Expo 2011, more than 50 years after the introduction of the remote control, the company's Xbox team introduced a new, innovative way to experience entertainment. It begins by giving TV a […]

Microsoft is working hard to get technology out of your way so your experience is more natural and intuitive. "This year at the E3 Expo 2011, more than 50 years after the introduction of the remote control, the company's Xbox team introduced a new, innovative way to experience entertainment. It begins by giving TV a new voice: yours. This Xbox experience uses Kinect for Xbox 360 and speech technology from Microsoft Tellme to combine voice and gestures in a way that humanizes the power of the TV, from searching through media to interacting with games. I believe this kind of speech NUI will also change how we interact with devices of any screen size, whether they're in your pocket, bag, car or office. We'll use our voices across these varied devices to get more done, quickly and easily," wrote Zig Serafin, General Manager, Microsoft Tellme.

Microsoft is making big bets on speech NUI. Microsoft Tellme is driving that forward, powering the speech experiences in Kinect for Xbox 360, Windows Phone, Bing Mobile and Microsoft Tellme IVR. Because speech fits well with NUI across devices of all screen sizes, Microsoft Tellme is truly at the center of the NUI evolution.

Microsoft execs have been demonstrating publicly how Windows Phones currently can handle spoken queries. With Mango, Windows Phones will support even more speech functions, including speech-to-text and text-to-speech. And the Kinect sensor is going to get more sophisticated voice-command support this fall, enabling users to use Bing to search for movies, TV, music and other content via voice. But within the coming year, even more Microsoft products and services are getting the speech recognition/understanding treatment.

Windows 7 today can recognize a limited set of spoken commands. But Microsoft will be taking this work further with Windows 8, said Ilya Bukshteyn, Tellme Senior Director of Sales and Marketing. Windows 8 on ARM and Intel slates will be able to recognize many speech commands, which makes sense given they won't be optimized for keyboard and mouse input. And because Windows 8 is "HTML-based," the HMTL5 speech tag could allow developers inside and outside Microsoft to create applications for Windows 8 that are speech-capable, Bukshteyn added.

As the Tellme team pushes beyond speech recognition and into conversational understanding, scenarios become even more interesting, Bukshteyn said. When CEO Steve Ballmer recently touted the ability of Bing to support complex natural-languge-query commands, he didn't explain what would make that magic happen. It turns out it's Tellme's voice technology, combined with social-graph information delivered via Windows Live, plus Bing's search functionality. ("Windows Live is a social graph hub for FaceBook, Twitter and LinkedIn," Bukshteyn explained.)

From a Tellme blog post on August 9, here's Microsoft's explanation as to what's coming with Bing/Tellme/social-graph integration:

"We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that's important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using."

The Tellme team also is planning to add support for the Tellme speech cloud to Windows Azure at some point, so that developers will be able to build and support IVR-enabled apps and services running on Azure. Tellme's speech cloud doesn't run on Azure today; there's no firm timetable as to when or if Microsoft may move it to Azure, Bukshteyn said. But the Tellme service will be available to third-party developers regardless of whether Microsoft moves Tellme itself to Azure or not, he said.

Microsoft has also posted a video clip embedded below highlighting how this kind of conversational understanding could work (and showed this clip at the SpechTek conference keynote in New York today).

Example: Say you want to meet with a friend in New York for dinner next week. Maybe as soon as a couple of years from now, Microsoft officials think you'll be able to say to your PC "arrange a dinner with Joe in Manhattan on Thursday," and Tellme will recognize the query, link to your Facebook or LinkedIn social-graph information to discern which "Joe" you're likely looking to meet, compare your calendars, and use Bing to search for restaurants you both have indicated you "Like" on Facebook.

[Source: Microsoft Press]