TY - JOUR
T1 - An online model for vowel imitation learning
AU - Rasilo, Heikki
AU - Räsänen, Okko
PY - 2017/2/1
Y1 - 2017/2/1
N2 - When infants learn to pronounce speech sounds of their native language, they face the so-called correspondence problem – how to know which articulatory gestures lead to acoustic sounds that are recognized as native speech sounds by other speakers? Previous research suggests that infants might not learn to imitate their parents via autonomous babbling because direct evaluation of the acoustic similarity between the speech sounds of the two is not possible due to different spectral characteristics of the voices caused by differing vocal tract morphologies. We present a novel robust model of infant vowel imitation learning, following a hypothesis that an infant learns to match their productions to their caregiver's speech sounds when the caregiver imitates the infant's babbles. Adapting a cross-situational associative learning technique, evidently present in infant word learning, our simulated language learner can cope with ambiguity in caregiver's responses to babbling as well as with the imprecision of the articulatory gestures of the infant itself. Our fully online learning model also combines vocal exploration and imitative interaction into a single process. Learning performance is evaluated in experiments using Finnish adults as caregivers for a virtual infant, responding to the infant's babbles with lexical words and, after a learning stage, evaluating the quality of the vowels produced by the learner. After 1000 babble-response pairs, our virtual infant is seen to reach a satisfying vowel imitation accuracy of 70–80 %.
AB - When infants learn to pronounce speech sounds of their native language, they face the so-called correspondence problem – how to know which articulatory gestures lead to acoustic sounds that are recognized as native speech sounds by other speakers? Previous research suggests that infants might not learn to imitate their parents via autonomous babbling because direct evaluation of the acoustic similarity between the speech sounds of the two is not possible due to different spectral characteristics of the voices caused by differing vocal tract morphologies. We present a novel robust model of infant vowel imitation learning, following a hypothesis that an infant learns to match their productions to their caregiver's speech sounds when the caregiver imitates the infant's babbles. Adapting a cross-situational associative learning technique, evidently present in infant word learning, our simulated language learner can cope with ambiguity in caregiver's responses to babbling as well as with the imprecision of the articulatory gestures of the infant itself. Our fully online learning model also combines vocal exploration and imitative interaction into a single process. Learning performance is evaluated in experiments using Finnish adults as caregivers for a virtual infant, responding to the infant's babbles with lexical words and, after a learning stage, evaluating the quality of the vowels produced by the learner. After 1000 babble-response pairs, our virtual infant is seen to reach a satisfying vowel imitation accuracy of 70–80 %.
KW - Associative algorithm
KW - Correspondence problem
KW - Imitation
KW - Normalization problem
KW - Speech acquisition
KW - Vowel learning
KW - Weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=84995813059&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2016.10.010
DO - 10.1016/j.specom.2016.10.010
M3 - Article
AN - SCOPUS:84995813059
SN - 0167-6393
VL - 86
SP - 1
EP - 23
JO - Speech Communication
JF - Speech Communication
ER -