Computational Modeling and Simulation of Language and Meaning: Similarity-Based Approaches

    Research output: ThesisDoctoral ThesisCollection of Articles


    This dissertation covers various similarity-based, data-driven approaches to model language and lexical semantics. The availability of large amounts of text data in electronic form allows the use of unsupervised, data-driven methodologies. Compared to linguistic models based on expert knowledge, which are often costly or unavailable, the data-driven analysis is faster and more flexible. The same methodologies can be often used regardless of the language. In addition, data-driven analysis may be exploratory and offer a new view on the data. The complexity of different European languages was analyzed at syntactic and morphological level using unsupervised methods based on compression and unsupervised morphology induction. The results showed that the unsupervised methods are able to produce useful analyses that correspond to linguistic models. The distributional word vector space models represent the meaning of words in a text context of co-occurring words, collected from a large corpus. The vector space models were evaluated with linguistic models and human semantic similarity judgment data. Two unsupervised methods, Independent Component Analysis and Latent Dirichlet Allocation, were able to find groups of semantically similar words, corresponding reasonably well to the evaluation sets. In addition to validating the results of the unsupervised methods with the evaluation data, the research was also exploratory. The unsupervised methods found semantic word sets not covered by the evaluation set, and the analysis of the categories of the evaluation sets showed quality differences between the categories. In the agent simulation models, the meaning of words was directly linked to the perceived context of the agent. Each agent had a subjective conceptual memory, in which the associations between words and perceptions were formed. In a population of simulated agents, the emergence of a shared vocabulary was studied through simulated language games. As a result of the simulations, a shared vocabulary emerges in the community.
    Translated title of the contributionKielen ja merkityksen laskennallinen mallintaminen ja simulointi: samankaltaisuuteen perustuvia menetelmiä
    Original languageEnglish
    QualificationDoctor's degree
    Awarding Institution
    • Aalto University
    • Oja, Erkki, Supervising Professor
    • Honkela, Timo, Thesis Advisor
    • Creutz, Mathias, Thesis Advisor
    Print ISBNs978-952-60-5643-2
    Electronic ISBNs978-952-60-5644-9
    Publication statusPublished - 2014
    MoE publication typeG5 Doctoral dissertation (article)


    • lexical semantics
    • language
    • meaning
    • computational modeling
    • vector space models
    • language complexity
    • agent simulation
    • unsupervised learning
    • machine learning


    Dive into the research topics of 'Computational Modeling and Simulation of Language and Meaning: Similarity-Based Approaches'. Together they form a unique fingerprint.

    Cite this