Computers and technology are so deeply embedded in our lives today that people invest a considerable part of their day communicating with technology. Conventional modes of human-technology interaction have predominantly been device-centric, due to which the users are required to be in the vicinity of the device. This can become cumbersome as the number of personal devices owned by an individual increases. A recent positive trend is the evolution towards user-centric modes of communication with technology enabled by the growing use and adoption of speech user interfaces. Furthermore, developments in the field of virtual and ad~hoc microphone networks and sensor technology are supporting this evolution. As a result, speech processing methods are moving towards a more distributed and collaborative approach. However, this has resulted in new challenges and technical problems in managing speech enhancement, coding and user privacy in acoustic sensor networks. The objectives of this thesis are two-fold: to develop methods to enable the advancement of conventional speech coding for multiple microphones, to understand the state of privacy in speech-user interfaces. In the first part we study and develop postfilters for coding with the final goal of advancing the postfilters to enable conventional speech and audio coding methods in distributed microphone networks. A primary requirement in sensor networks is to have systems and algorithms that are simple and robust. Therefore, we develop methods that do not need the transmission of any side information or inter-microphone communication, and the postfilters are based completely at the decoder. To that end, we develop single microphone postfilters that employ the envelope and harmonic models of speech. Following this, we advance these methods to develop a model based postfilter for multi-microphone speech coding using conventional coding approaches. Our experiments demonstrate that by incorporating speech models in the postfilters as proposed, the output signal quality is improved in comparison to other baseline postfiltering and enhancement approaches. The lack of user privacy considerations in the design of speech interfaces has had an adverse impact on their widespread adoption. Therefore, methods to enforce the privacy of users within the framework of speech interfaces are necessary and timely. In the second part of the thesis, we address how to instill smart speech interfaces with an intuitive understanding of user privacy preferences. Towards that end, we investigate the perception of privacy for people in noisy acoustic scenarios. The results indicate that individuals have an intuitive understanding of privacy in speech communication that is dependent on the acoustic scenarios among other factors. The insights from these studies can be further exploited by conditioning the privacy preferences on the sensed acoustic environment in a speech interface.
|Translated title of the contribution||Robust and Efficient Methods for Distributed Speech Processing - Perspectives on Coding, Enhancement and Privacy|
|Publication status||Published - 2021|
|MoE publication type||G5 Doctoral dissertation (article)|