Optimal temporal dynamics of MFCCs for low-complexity VAD systems - A case study

Alexandra Craciun, Tom Bäckström

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

42 Downloads (Pure)


Recent advances in machine learning strategies for speech classification require increasingly complex classifiers and large numbers of features. For practical application in lowresource systems, such methods use prohibitively large numbers of operations. A better approach involves reducing the features to the fewest, most salient ones, while simplifying the classifier structure to a minimum. The mel-frequency cepstral coefficients (MFCCs) are often used in speechrelated classification tasks, which suggests the compressed information therein is highly informative. They are computed by warping the spectral energy to a mel scale, followed by a logarithm and a discrete cosine transformation. To better understand the properties governing such features, we examine different MFCC configurations using a simple neural network classifier for a low-complexity voice activity detector. In particular, we investigate the optimal number of MFCCs, the extent of the required temporal information and the best compression rate for different analysis settings, with varying frequency resolutions.
Original languageEnglish
Title of host publicationITG-Fb. 282: Speech Communication
PublisherVDE Verlag
Number of pages5
ISBN (Electronic)978-3-8007-4767-2
Publication statusPublished - 2018
MoE publication typeA4 Article in a conference publication
EventITG Conference on Speech Communication - Oldenburg, Germany
Duration: 10 Oct 201812 Oct 2018
Conference number: 13


ConferenceITG Conference on Speech Communication
Internet address


Dive into the research topics of 'Optimal temporal dynamics of MFCCs for low-complexity VAD systems - A case study'. Together they form a unique fingerprint.

Cite this