In its latest Machine Learning Journal entry, Apple has explained the technical approach and implementation details of the ‘Hey Siri’ trigger phrase, that lets users access Siri in situations where their hands might be otherwise occupied. Apple specifically highlights the concept of “Speaker Recognition”, which has been designed to only wake up when their main users say Hey Siri.
Apple notes that speaker recognition is why people are asked to repeat Hey Siri several times when they set up the AI assistant for the first time. “We are interested in “who is speaking,” as opposed to the problem of speech recognition, which aims to ascertain “what was spoken”, notes Apple, continuing that the overall goal of speaker recognition is to ascertain the identity of a person using his or her voice.
Apple says it measures the performance of a speaker recognition system as “a combination of an Imposter Accept (IA) rate and a False Reject (FR) rate”.
“The application of a speaker recognition system involves a two-step process: enrollment and recognition. During the enrollment phase, the user is asked to say a few sample phrases. These phrases are used to create a statistical model for the users’s voice. In the recognition phase, the system compares an incoming utterance to the user-trained model and decides whether to accept that utterance as belonging to the existing model or reject it.”
You can read the lengthy article in its entirety at this link.