SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

Tutkimustuotos: Lehtiartikkelivertaisarvioitu

Standard

SylNet : An Adaptable End-to-End Syllable Count Estimator for Speech. / Seshadri, Shreyas; Räsänen, Okko.

julkaisussa: IEEE Signal Processing Letters, Vuosikerta 26, Nro 9, 09.2019, s. 1359-1363.

Tutkimustuotos: Lehtiartikkelivertaisarvioitu

Harvard

APA

Vancouver

Author

Bibtex - Lataa

@article{cdd5a0b7c7354657b7a6a95dff84fc45,
title = "SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech",
abstract = "Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic digital signal processing (DSP) methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This letter presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.",
keywords = "syllable count estimation, end-to-end learning, deep learning, speech processing, SEGMENTATION",
author = "Shreyas Seshadri and Okko R{\"a}s{\"a}nen",
year = "2019",
month = "9",
doi = "10.1109/LSP.2019.2929415",
language = "English",
volume = "26",
pages = "1359--1363",
journal = "IEEE Signal Processing Letters",
issn = "1070-9908",
number = "9",

}

RIS - Lataa

TY - JOUR

T1 - SylNet

T2 - An Adaptable End-to-End Syllable Count Estimator for Speech

AU - Seshadri, Shreyas

AU - Räsänen, Okko

PY - 2019/9

Y1 - 2019/9

N2 - Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic digital signal processing (DSP) methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This letter presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.

AB - Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic digital signal processing (DSP) methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This letter presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.

KW - syllable count estimation

KW - end-to-end learning

KW - deep learning

KW - speech processing

KW - SEGMENTATION

U2 - 10.1109/LSP.2019.2929415

DO - 10.1109/LSP.2019.2929415

M3 - Article

VL - 26

SP - 1359

EP - 1363

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

SN - 1070-9908

IS - 9

ER -

ID: 36533127