Blind phoneme segmentation with temporal prediction errors

Paul Michel, Okko Räsänen, Roland Thiolliere, Emmanuel Dupoux

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)
152 Downloads (Pure)

Abstract

Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural networks. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.
Original languageEnglish
Title of host publicationProceedings of the Student Research Workshop at the Annual Meeting of the Association for Computational Linguistics
Pages62-68
Number of pages7
ISBN (Electronic)978-1-945626-56-2
DOIs
Publication statusPublished - 2017
MoE publication typeA4 Article in a conference publication
EventAnnual Meeting of the Association for Computational Linguistics: Student Research Workshop - Vancouver, Canada
Duration: 30 Jul 20174 Aug 2017

Conference

ConferenceAnnual Meeting of the Association for Computational Linguistics
Abbreviated titleSRW
CountryCanada
CityVancouver
Period30/07/201704/08/2017

Fingerprint Dive into the research topics of 'Blind phoneme segmentation with temporal prediction errors'. Together they form a unique fingerprint.

Cite this