Analysis and classification of phonation types in speech and singing voice

Research output: Contribution to journalArticleScientificpeer-review

Researchers

Research units

  • International Institute of Information Technology Hyderabad

Abstract

Both in speech and singing, humans are capable of generating sounds of different phonation types (e.g., breathy, modal and pressed). Previous studies in the analysis and classification of phonation types have mainly used voice source features derived using glottal inverse filtering (GIF). Even though glottal source features are useful in discriminating phonation types in speech, their performance deteriorates in singing voice due to the high fundamental frequency of these sounds that reduces the accuracy of source-filter separation in GIF. In the present study, features describing the glottal source were computed using three signal processing methods that do not compute source-filter separation. These three methods are zero frequency filtering (ZFF), zero time windowing (ZTW) and single frequency filtering (SFF). From each method, a group of scalar features were extracted. In addition, cepstral coefficients were derived from the spectra computed using ZTW and SFF. Experiments were conducted with the proposed features to analyse and classify phonation types using three phonation types (breathy, modal and pressed) for speech and singing voice. Statistical pair-wise comparisons between the phonation types showed that most of the features were capable of separating the phonation types significantly for speech and singing voices. Classification with support vector machine classifiers indicated that the proposed features and their combinations showed improved accuracy compared to usually employed glottal source features and mel-frequency cepstral coefficients (MFCCs).

Details

Original languageEnglish
Pages (from-to)33-47
Number of pages15
JournalSpeech Communication
Volume118
Publication statusPublished - 2020
MoE publication typeA1 Journal article-refereed

    Research areas

  • Phonation type, Voice quality, Singing voice, Glottal source, Glottal inverse filtering, Zero frequency filtering (ZFF), Zero time windowing (ZTW), Single frequency filtering (SFF)

ID: 41313006