Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing

Shreyas Seshadri, Ulpu Remes, Okko Räsänen

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    3 Citations (Scopus)
    179 Downloads (Pure)

    Abstract

    Zero-resource speech processing (ZS) systems aim to learn structural representations of speech without access to labeled data. A starting point for these systems is the extraction of syllable tokens utilizing the rhythmic structure of a speech signal. Several recent ZS systems have therefore focused on clustering such syllable tokens into linguistically meaningful units. These systems have so far used heuristically set number of clusters, which can, however, be highly dataset dependent and cannot be optimized in actual unsupervised settings. This paper focuses on improving the flexibility of ZS systems using Bayesian non-parametric (BNP) mixture models that are capable of simultaneously learning the cluster models as well as their number based on the properties of the dataset. We also compare different model design choices, namely priors over the weights and the cluster component models, as the impact of these choices is rarely reported in the previous studies. Experiments are conducted using conversational speech from several languages. The models are first evaluated in a separate syllable clustering task and then as a part of a full ZS system in order to examine the potential of BNP methods and illuminate the relative importance of different model design choices.
    Original languageEnglish
    Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    PublisherInternational Speech Communication Association (ISCA)
    Pages2744-2748
    Number of pages5
    Volume2017-August
    ISBN (Print)978-1-5108-4876-4
    DOIs
    Publication statusPublished - Aug 2017
    MoE publication typeA4 Conference publication
    EventInterspeech - Stockholm, Sweden
    Duration: 20 Aug 201724 Aug 2017
    Conference number: 18
    http://www.interspeech2017.org/

    Publication series

    NameInterspeech: Annual Conference of the International Speech Communication Association
    ISSN (Electronic)1990-9772

    Conference

    ConferenceInterspeech
    Country/TerritorySweden
    CityStockholm
    Period20/08/201724/08/2017
    Internet address

    Keywords

    • Non-parametric clustering
    • zero-resource processing
    • variational inference
    • Pitman-Yor process
    • von Mises-Fisher mixtures

    Fingerprint

    Dive into the research topics of 'Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing'. Together they form a unique fingerprint.

    Cite this