Microsoft on Tuesday, shared more details about its new deep learning acceleration platform codenamed Project Brainwave, to customizable chips -- a major leap forward in both performance and flexibility for cloud-based serving of deep learning models making Azure an 'AI cloud.'
"Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users," writes Microsoft.
Since, Project Brainwave system is designed for real-time AI, it processes requests as fast as it receives them, with ultra-low latency. The system is built with three main layers: "a high-performance, distributed system architecture, a hardware DNN engine synthesized onto FPGAs, and a compiler and runtime for low-friction deployment of trained models."
Brainware system was ported to Intel's new 14 nm Stratix 10 FPGA, even on early Stratix 10 silicon, it ran a large GRU model, which's five times larger than Resnet-50—"with no batching, and achieved record-setting performance," writes the company.
The demo used Microsoft's customized 8-bit floating point format ("ms-fp8"), which doesn't suffer accuracy losses (on average) across a range of models.
"We showed Stratix 10 sustaining 39.5 Teraflops on this large GRU, running each request in under one millisecond. At that level of performance, the Brainwave architecture sustains execution of over 130,000 compute operations per cycle, driven by one macro-instruction being issued each 10 cycles. Running on Stratix 10, Project Brainwave thus achieves unprecedented levels of demonstrated real-time AI performance on extremely challenging models," expalins Microsoft.
Further Microsoft notes, as they continue tuneing this system, "significant further performance improvements are expected in over next few quarters."
Also, in the near future, this powerful real-time AI system will be introduce to Azure, "so customers can benefit directly, complementing the indirect access through services such as Bing," said Microsoft.
For more information, you can see Microsofts' Hot Chips demonstration slides below:
In other releated news, Microsoft researchers achieved a new milestone in conversational speech recognition— a 5.1 percent error rate, substantially surpassing the accuracy it achieved last year.
Advances in speech recognition have created services such as Speech Translator, which can translate presentations in real-time for multi-lingual audiences. A technical report published this weekend documents the details, states:
"Team members reduced their error rate by about 12 percent from last year's accuracy level, using a series of improvements to their neural, net-based acoustic and language models. They introduced an additional CNN-BLSTM (convolutional neural network combined with bidirectional long-short-term memory) model for improved acoustic modeling. Additionally, their approach to combine predictions from multiple acoustic models now does so at both the frame/senone and word levels.
They also strengthened the recognizer's language model by using the entire history of a dialog session to predict what is likely to come next, effectively allowing the model to adapt to the topic and local context of a conversation."
You can find the full report HERE.