Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)
98 Downloads (Pure)

Abstract

Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.

Original languageEnglish
Title of host publicationProceedings of Interspeech
PublisherInternational Speech Communication Association
Pages1200-1204
Number of pages5
Volume2018-September
DOIs
Publication statusPublished - 1 Jan 2018
MoE publication typeA4 Article in a conference publication
EventInterspeech - Hyderabad International Convention Centre, Hyderabad, India
Duration: 2 Sep 20186 Sep 2018
http://interspeech2018.org/

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association
ISSN (Print)2308-457X

Conference

ConferenceInterspeech
CountryIndia
CityHyderabad
Period02/09/201806/09/2018
Internet address

Keywords

  • Daylong recordings
  • Language acquisition
  • Noise robustness
  • Syllabification
  • Word count estimation

Fingerprint Dive into the research topics of 'Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions'. Together they form a unique fingerprint.

  • Projects

    ACLEW: Analyzing Child Language Experiences Around the World

    Räsänen, O. & Seshadri, S.

    01/06/201719/05/2020

    Project: Academy of Finland: Other research funding

    Cite this

    Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. In Proceedings of Interspeech (Vol. 2018-September, pp. 1200-1204). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2018-1047