A team of Apple researchers has revealed in a recently published paper at Arxiv.org that their new approach for selecting training data for Siri’s domain classifier leads to a substantial error reduction by the virtual assistant when understudying a person’s command related to something like their calendar rather than their alarms.
According to the researchers, Siri uses a classifier called the “Domain Chooser” to identify a given user’s intent. Once an utterance is matched to one of the over 60 defined domains, a component called the Statistical Parser assigns a parse label to each part of the utterance.
The domain and parse labels predicted by the Domain Chooser and Statistical Parser are then “mapped into an intent representation that kicks off the appropriate action.”
“In this paper, we have proposed a simple but effective method for efficient discovery of useful training data for a domain chooser classifier, as part of [Siri],” the researchers wrote. “The method produces … better quality data … [which] reduces the time taken for human annotation … Although developed and tested in the setting of a commercial intelligence assistant, the technique is widely applicable.”
The researchers have also compiled a corpus consisting of 850,000 randomly selected utterances from a development set previously used to debug Siri, plus 20,000 utterances labelled training data obtained with their method. You can read about it in detail at this link.