Mel-frequency cepstral coefficients derived using the zero-time windowing spectrum for classification of phonation types in singing

Research output: Contribution to journalArticleScientificpeer-review

Standard

Harvard

APA

Vancouver

Author

Bibtex - Download

@article{d2476c193afd46ac9c58aba164faadc8,
title = "Mel-frequency cepstral coefficients derived using the zero-time windowing spectrum for classification of phonation types in singing",
abstract = "Existing studies in classification of phonation types in singing use voice source features and Mel-frequency cepstral coefficients (MFCCs) showing poor performance due to high pitch in singing. In this study, high-resolution spectra obtained using the zero-time windowing (ZTW) method is utilized to capture the effect of voice excitation. ZTW does not call for computing the source-filter decomposition (which is needed by many voice source features) which makes it robust to high pitch. For the classification, the study proposes extracting MFCCs from the ZTW spectrum. The results show that the proposed features give a clear improvement in classification accuracy compared to the existing features.",
author = "Kadiri, {Sudarsana Reddy} and Paavo Alku",
year = "2019",
month = "11",
day = "8",
doi = "10.1121/1.5131043",
language = "English",
volume = "146",
pages = "EL418--EL423",
journal = "Journal of the Acoustical Society of America",
issn = "0001-4966",
publisher = "ACOUSTICAL SOCIETY OF AMERICA",
number = "5",

}

RIS - Download

TY - JOUR

T1 - Mel-frequency cepstral coefficients derived using the zero-time windowing spectrum for classification of phonation types in singing

AU - Kadiri, Sudarsana Reddy

AU - Alku, Paavo

PY - 2019/11/8

Y1 - 2019/11/8

N2 - Existing studies in classification of phonation types in singing use voice source features and Mel-frequency cepstral coefficients (MFCCs) showing poor performance due to high pitch in singing. In this study, high-resolution spectra obtained using the zero-time windowing (ZTW) method is utilized to capture the effect of voice excitation. ZTW does not call for computing the source-filter decomposition (which is needed by many voice source features) which makes it robust to high pitch. For the classification, the study proposes extracting MFCCs from the ZTW spectrum. The results show that the proposed features give a clear improvement in classification accuracy compared to the existing features.

AB - Existing studies in classification of phonation types in singing use voice source features and Mel-frequency cepstral coefficients (MFCCs) showing poor performance due to high pitch in singing. In this study, high-resolution spectra obtained using the zero-time windowing (ZTW) method is utilized to capture the effect of voice excitation. ZTW does not call for computing the source-filter decomposition (which is needed by many voice source features) which makes it robust to high pitch. For the classification, the study proposes extracting MFCCs from the ZTW spectrum. The results show that the proposed features give a clear improvement in classification accuracy compared to the existing features.

U2 - 10.1121/1.5131043

DO - 10.1121/1.5131043

M3 - Article

VL - 146

SP - EL418-EL423

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

SN - 0001-4966

IS - 5

ER -

ID: 38524293