Here’s How Apple is Making Siri Sound More Human

With the release of iOS 11, Apple is making a significant breakthrough by giving its digital voice assistant Siri a new voice. The new voice, which Apple hopes will help people engage with Siri even more going forward, is considerably more natural and human than in iOS 9 or 10. Alex Acero, the Apple executive in charge of the tech behind Siri, has explained exactly how the company is making Siri sound more human (via Wired).


“For iOS 11, we chose a new female voice talent with the goal of improving the naturalness, personality and expressivity of Siri’s voice,” said Acero. After evaluating hundreds of candidates before choosing the best one, Apple got the winning vocal actress to record over 20 hours of speech for processing. That recorded speech was then sliced into its elementary components, and then recombined to create entirely new speech.

But to make Siri sound more human and less robotic, Apple turned to deep learning and created a system that can “accurately predict both target and concatenation” elements in the database of half-phones that it has access to.

“The benefit of this approach becomes more clear when we consider the nature of speech. Sometimes the speech features, such as formants, are rather stable and evolve slowly, such as in the case of vowels. Elsewhere, speech can change quite rapidly, such as in transitions between voiced and unvoiced speech sounds. To take this variability into account, the model needs to be able adjust its parameters according to the aforementioned variability”.

We’ll all be able to hear the new and improved Siri when iOS 11 releases later this month.