Microsoft’s Speech Recognition Tech Now Understands Conversations

Voice tech

Microsoft Artificial Intelligence and Research group has claimed in a paper published this week that its speech recognition system can now understand human conversations, marking the first time “human parity” has been reported for conversational speech. According to Network World, Microsoft’s researchers are now working on ways to make sure that speech recognition works well in more real-life settings.

“The error rate of professional transcriptionists is 5.9% for the Switchboard portion of the data, in which newly acquainted pairs of people discuss an assigned topic, and 11.3% for the CallHome portion where friends and family members have open-ended conversations. In both cases, our automated system establishes a new state-of-the-art, and edges past the human benchmark. This marks the first time that human parity has been reported for conversational speech”.

Microsoft says that its speech recognition system makes fewer errors than a human professional transcriptionist. The 5.9% error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against Switchboard, a standard set of conversational speech and text used in speech recognition tests. 

The company believes the milestone will have broad implications for consumer and business products that can be “significantly augmented by speech recognition”. These include consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.

For more information, head over to the Microsoft blog.

P.S. Help support us and independent media here: Buy us a beer, Buy us a coffee, or use our Amazon link to shop.