The research in this thesis examines the topic of the cognitive and probabilistic nature of prominence perception in speech. In recent years, there has been an accumulating number of studies from linguistics, phonetics, and neuroscience providing evidence that (i) prominence is related to attention- and expectation-based factors, (ii) frequency and predictability effects hold an important role in language processing, accounting for several linguistic phenomena, and (iii) the human brain represents information in a probabilistic way, with humans behaving as optimal probabilistic observers. On the basis of this evidence, the relationship between prominence, attention, and predictability is explored. A hypothesis is proposed suggesting that prominence perception in speech is connected with the unpredictability of prosodic features that draw the listeners' attention to the surprising aspects of the input. This thesis consists of a series of computational and behavioral studies that investigate different aspects of the prominence–attention–predictability tripartite. The core idea throughout this work is to investigate the probabilistic relations that take place at the acoustic prosodic domain through statistical modeling of the acoustic correlates of prominence, examining their relationship with the concurrent prominent/non-prominent units. As the probabilistic view of prominence also implies that listeners utilize some type of statistical learning mechanism operating at the suprasegmental acoustic prosodic level, a number of behavioral experiments are also conducted. The aim of these experiments is to understand whether human listeners are sensitive to the statistical regularities of suprasegmental speech acoustics and, if so, to what extent. A basic application of statistical models for the automatic detection of prominence in speech is also reported. As a result of these studies, the thesis shows that predictability at the acoustic prosodic level is strongly correlated with human listeners' perception of prominence in speech. This statistical connection, however, is not fixed but depends on the listeners' experience with the language and thereby with subjective expectations of prosodic outcomes. This is illuminated by results that show that the human perceptual system appears to quickly adapt to the suprasegmental probabilistic structure of the incoming speech, causing the prosodic patterns that are less frequent in the recent discourse-specific acoustics to be more prominent. Thus, the experiments indicate a type of statistical learning mechanism operating at the suprasegmental acoustic level. Finally, a practical application of the predictability framework to the unsupervised detection of prominence in speech is described. Experiments in several languages show that the method provides high agreement with human judgments of prominence despite not having access to prominence labeling during training of the detector.
|Tila||Julkaistu - 2017|
|OKM-julkaisutyyppi||G5 Tohtorinväitöskirja (artikkeli)|