In agglutinative languages, such as Finnish, a single word can have a large number of possible inflected and derived forms. It is necessary for the human brain to recognize regularities in the subword structures. In study IV it was observed that the brain responses to linguistic stimuli are related to fine-grained predictions of the language input at least at the syllable level. Studies I-III tested quantitative models for describing the relationship between subword structure and the responses related to human word processing.
Statistical machine-learning models developed for automated applications in Natural Language Processing have proven useful for describing morphological regularities in languages. In this thesis, these models are applied to human word processing.
Visual word recognition evokes a distinct pattern of neural responses that can be functionally, temporally and spatially separated using magnetoencephalography (MEG). In study I these responses were linked to language models describing different levels of linguistic abstraction. The early occipital and occipito-temporal responses could be modeled using visual and orthographic features, whereas the responses in the bilateral temporal areas were best described by models that represented words as compositions of morphemic units or as whole words.
In the statistical model of morphology used in these studies, the subword structure emerges from optimization of information representation. The structure is determined by the cost of storing distinct morphemic units and the cost of combining them. Study III found that the best performing model for describing eye-movements used compositions of morphemic segments to represent many, but not all, complex words. Many words were also kept intact. The optimal morphemes were generally more coarse-grained than those implicated by linguistic analysis. In Study II, the morphemes from the optimal statistical model were compared to linguistic morphemes in a neural decoding task in which words were identified from the cortical responses. Both statistically and linguistically structured models were successful in the decoding task.
The results of this thesis suggest that the neural responses to words are related to word representation by compositions of morphemic units. The units may not be strictly linguistically determined; instead, the word structures can reflect the statistical regularities of language environment. This thesis demonstrates that quantitative modeling of cortical responses is useful for describing even relatively abstract linguistic phenomena such as morphology.
|Publication status||Published - 2020|
|MoE publication type||G5 Doctoral dissertation (article)|
- MEG, morphology, computational linguistics, morfessor, word recognition