Unsupervised discovery of recurring speech patterns using probabilistic adaptive metrics

Okko Räsänen, María Andrea Cruz Blandón

    Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

    20 Sitaatiot (Scopus)
    164 Lataukset (Pure)

    Abstrakti

    Unsupervised spoken term discovery (UTD) aims at finding recurring segments of speech from a corpus of acoustic speech data. One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data. However, automatic selection of initial candidate segments for the DTW-alignment and detection of “sufficiently good” alignments among those require some type of predefined criteria, often operationalized as threshold parameters for pair-wise distance metrics between signal representations. In the existing UTD systems, the optimal hyperparameters may differ across datasets, limiting their applicability to new corpora and truly low-resource scenarios. In this paper, we propose a novel probabilistic approach to DTW-based UTD named as PDTW. In PDTW, distributional characteristics of the processed corpus are utilized for adaptive evaluation of alignment quality, thereby enabling systematic discovery of pattern pairs that have similarity what would be expected by coincidence. We test PDTW on Zero Resource Speech Challenge 2017 datasets as a part of 2020 implementation of the challenge. The results show that the system performs consistently on all five tested languages using fixed hyperparameters, clearly outperforming the earlier DTW-based system in terms of coverage of the detected patterns.

    AlkuperäiskieliEnglanti
    OtsikkoProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    KustantajaInternational Speech Communication Association (ISCA)
    Sivut4871-4875
    Sivumäärä5
    Vuosikerta2020-October
    DOI - pysyväislinkit
    TilaJulkaistu - 2020
    OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
    TapahtumaInterspeech - Shanghai, Kiina
    Kesto: 25 lokak. 202029 lokak. 2020
    Konferenssinumero: 21
    http://www.interspeech2020.org/

    Julkaisusarja

    NimiInterspeech
    KustantajaInternational Speech Communication Association
    ISSN (painettu)2308-457X

    Conference

    ConferenceInterspeech
    LyhennettäINTERSPEECH
    Maa/AlueKiina
    KaupunkiShanghai
    Ajanjakso25/10/202029/10/2020
    www-osoite

    Sormenjälki

    Sukella tutkimusaiheisiin 'Unsupervised discovery of recurring speech patterns using probabilistic adaptive metrics'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

    Siteeraa tätä