Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussavertaisarvioitu

Standard

Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. / Räsänen, Okko; Seshadri, Shreyas; Casillas, Marisa.

Proceedings of Interspeech. Vuosikerta 2018-September International Speech Communication Association, 2018. s. 1200-1204 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussavertaisarvioitu

Harvard

Räsänen, O, Seshadri, S & Casillas, M 2018, Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. julkaisussa Proceedings of Interspeech. Vuosikerta. 2018-September, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, Sivut 1200-1204, Interspeech, Hyderabad, Intia, 02/09/2018. https://doi.org/10.21437/Interspeech.2018-1047

APA

Räsänen, O., Seshadri, S., & Casillas, M. (2018). Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. teoksessa Proceedings of Interspeech (Vuosikerta 2018-September, Sivut 1200-1204). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2018-1047

Vancouver

Räsänen O, Seshadri S, Casillas M. Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. julkaisussa Proceedings of Interspeech. Vuosikerta 2018-September. International Speech Communication Association. 2018. s. 1200-1204. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). https://doi.org/10.21437/Interspeech.2018-1047

Author

Räsänen, Okko ; Seshadri, Shreyas ; Casillas, Marisa. / Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions. Proceedings of Interspeech. Vuosikerta 2018-September International Speech Communication Association, 2018. Sivut 1200-1204 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Bibtex - Lataa

@inproceedings{ce49edc281a24267a1d9516b50fdbbf4,
title = "Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions",
abstract = "Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.",
keywords = "Daylong recordings, Language acquisition, Noise robustness, Syllabification, Word count estimation",
author = "Okko R{\"a}s{\"a}nen and Shreyas Seshadri and Marisa Casillas",
year = "2018",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2018-1047",
language = "English",
volume = "2018-September",
series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
publisher = "International Speech Communication Association",
pages = "1200--1204",
booktitle = "Proceedings of Interspeech",

}

RIS - Lataa

TY - GEN

T1 - Comparison of syllabification algorithms and training strategies for robust word count estimation across different languages and recording conditions

AU - Räsänen, Okko

AU - Seshadri, Shreyas

AU - Casillas, Marisa

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.

AB - Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has used automatic syllabification of speech, followed by a least-squares-mapping of syllables to word counts. This paper compares a number of previously proposed syllabifiers in the WCE task, including a supervised bi-directional long short-term memory (BLSTM) network that is trained on a language for which high quality syllable annotations are available (a “high resource language”), and reports how the alternative methods compare on different languages and signal conditions. We also explore additive noise and varying-channel data augmentation strategies for BLSTM training, and show how they improve performance in both matching and mismatching languages. Intriguingly, we also find that even though the BLSTM works on languages beyond its training data, the unsupervised algorithms can still outperform it in challenging signal conditions on novel languages.

KW - Daylong recordings

KW - Language acquisition

KW - Noise robustness

KW - Syllabification

KW - Word count estimation

UR - http://www.scopus.com/inward/record.url?scp=85054995553&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1047

DO - 10.21437/Interspeech.2018-1047

M3 - Conference contribution

AN - SCOPUS:85054995553

VL - 2018-September

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 1200

EP - 1204

BT - Proceedings of Interspeech

PB - International Speech Communication Association

ER -

ID: 29109610