In the era of the internet of things (IoT) the number of smart devices with permanent access to the internet is expected to increase. As the range of functions of these devices has increased rapidly, controlling them with conventional human-machine-interfaces (HMIs) can be bothersome. Thus, a more natural interaction without the need of cumbersome menus would be the implementation of natural language user interfaces (NLUIs). However, NLUIs are based on speech recognition frameworks which are computational complex and therefore unpractical for small, battery powered devices. Thus a preprocessing stage like a voice-activity-detection (VAD) can be implemented, to power up such a framework only when speech is present, to save power and prolong battery life. We implemented a low power VAD, composed of two different stages. The first evaluating features in the time-domain, the second complementing them by frequency domain features, both applying a linear classification scheme. The proposed approach was evaluated in different conditions, with signals degraded by noise and reverberation. We show that the proposed approach has a low computational complexity, while attaining error-rates smaller than 10 % even under adverse conditions. Moreover, we implemented a simple keyword spotting algorithm (KS), based on mel-frequency cepstral coefficientss (MFCCs), a linear classifier and a sequence detector. Based on this simple scheme, the achieved recognition rate was between 50% and 80% under non-reverberant conditions, though the performance drops with increased reverberation.
|Title of host publication||Proceedings: VDE-Kongress 2016 – Internet der Dinge|
|Publication status||Published - 2016|
|MoE publication type||A4 Article in a conference publication|
|Event||VDE-Kongress: Internet der Dinge - Mannheim, Germany|
Duration: 7 Nov 2016 → 8 Nov 2016
|Period||07/11/2016 → 08/11/2016|