TY - JOUR
T1 - Automatic Classification of Strain in the Singing Voice using Machine Learning
AU - Liu, Yuanyuan
AU - Reddy, Mittapalle Kiran
AU - Yagnavajjula, Madhu
AU - Räsänen, Okko
AU - Alku, Paavo
AU - Ikävalko, Tero
AU - Hakanpää, Tua
AU - Öyry, Aleksi
AU - Laukkanen, Anne-Maria
PY - 2025
Y1 - 2025
N2 - Objectives: Classifying strain in the singing voice can help protect professional singers from vocal overuse and support singing training. This study investigates whether machine learning can automatically classify singing voices into two levels of perceived strain. The singing samples represent two genres: classical and contemporary commercial music (CCM). Methods: A total of 324 singing voice samples from 15 professional normophonic singers (nine female, six male) were analyzed. Nine singers were classical, and six were CCM singers. The samples consisted of syllable strings produced at three to six pitches and three loudness levels. Based on expert auditory-perceptual ratings, the samples were categorized into two strain levels: normal-mild and moderate-severe. Three acoustic feature sets (mel-frequency cepstral coefficients (MFCCs), the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), and wavelet scattering features) were compared using two classifier models [support vector machine (SVM) and multilayer perceptron (MLP)]. Feature selection was performed using recursive feature elimination, and the Mann-Whitney U test was used to assess the discriminative power of the selected features. Results: The highest classification accuracy of 86.1% was achieved using a subset of wavelet scattering features with the MLP classifier. A comparison between individual features showed that the first MFCC coefficient, representing spectral tilt, exhibited the greatest between-class separation. Conclusion: This study demonstrates that machine learning models utilizing selected acoustic features can classify perceptual strain of singing voices automatically with high accuracy. These preliminary findings highlight the potential for larger studies involving more diverse singer groups across different genres.
AB - Objectives: Classifying strain in the singing voice can help protect professional singers from vocal overuse and support singing training. This study investigates whether machine learning can automatically classify singing voices into two levels of perceived strain. The singing samples represent two genres: classical and contemporary commercial music (CCM). Methods: A total of 324 singing voice samples from 15 professional normophonic singers (nine female, six male) were analyzed. Nine singers were classical, and six were CCM singers. The samples consisted of syllable strings produced at three to six pitches and three loudness levels. Based on expert auditory-perceptual ratings, the samples were categorized into two strain levels: normal-mild and moderate-severe. Three acoustic feature sets (mel-frequency cepstral coefficients (MFCCs), the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), and wavelet scattering features) were compared using two classifier models [support vector machine (SVM) and multilayer perceptron (MLP)]. Feature selection was performed using recursive feature elimination, and the Mann-Whitney U test was used to assess the discriminative power of the selected features. Results: The highest classification accuracy of 86.1% was achieved using a subset of wavelet scattering features with the MLP classifier. A comparison between individual features showed that the first MFCC coefficient, representing spectral tilt, exhibited the greatest between-class separation. Conclusion: This study demonstrates that machine learning models utilizing selected acoustic features can classify perceptual strain of singing voices automatically with high accuracy. These preliminary findings highlight the potential for larger studies involving more diverse singer groups across different genres.
KW - auditive-perceptual evaluation
KW - fisher vector
KW - mel-frequency cepstral coefficients
KW - multiple layer perceptron
KW - support vector machine
KW - wavelet scattering coefficients
KW - Auditive-perceptual evaluation—Support vector machine—Multiple layer perceptron—Fisher vector—Wavelet scattering coefficients—Mel-frequency cepstral coefficients
UR - http://www.scopus.com/inward/record.url?scp=105003207303&partnerID=8YFLogxK
U2 - 10.1016/j.jvoice.2025.03.040
DO - 10.1016/j.jvoice.2025.03.040
M3 - Article
SN - 0892-1997
JO - Journal of Voice
JF - Journal of Voice
ER -