Microsoft has made another step towards highly accurate speech recognition software. The new version had a word error rate of 5.9% – about the same as that of human transcribers.
The software itself relies on deep neural networks — technology that interprets data in a way similar to how the human brain works — as well as specialized graphics processing units (GPUs) that allow the software to learn at speeds not previously possible.
The milestone has far-reaching implications. On a practical level, it means that Microsoft’s products could soon be a whole lot better at understanding humans. The researchers name Microsoft’s personal assistant app Cortana and the Xbox as two products that could immediately benefit from the research. Accessibility software, such as instant transcription services, could also benefit from the advancement.
It could also easily be incorporated into Microsoft’s productivity tools like Office — imagine how much better Word’s dictation feature would be with near-human levels of accuracy — or its enterprise offerings.
Consumer products aside, it also marks a turning point for AI research. In a statement, Geoffrey Zweig, from Microsoft’s Speech and Dialog research group, notes that the next phase is to help build software that can not just transcribe human speech but understand it as well. Though that’s a goal that’s much further away, being able to accurately transcribe human speech is a big step forward.