Preserving Speech Privacy in Interactions with Ad Hoc Sensor Networks

Pablo Pérez Zarazaga

Research output: ThesisDoctoral ThesisCollection of Articles


Speech is our main method of communication that allows us to intuitively communicate complex ideas and provide our messages with deeper meaning than the lexical content of such messages. For example, we can stress specific words to emphasise or subtract significance from different sections of a sentence. Considering this, the increasing popularity of voice user interfaces is only natural and expected to keep growing in the following years, as they allow us to interact with our electronic devices using our speech. Any device with which we can interact using our voice can be considered a voice user interface, and among them we can find a great variety of services, from telecommunication applications like Zoom or Skype, to virtual assistants like Alexa or Siri. However, in order to provide better services and more natural interactions, voice user interfaces require the gathering of a great amount of our speech data and transmitting it usually without us being aware of it. If that data is misused or an unauthorised user manages to obtain it, it would cause a grave violation of the user's privacy. In an environment where multiple electronic devices can provide a voice user interface, collaboration between them as a wireless acoustic sensor network can improve the services that they provide individually. It is important then to study those applications that require sending our voice to a remote party in order to provide their services, and more specifically, in a scenario where multiple devices can pick up the voice of multiple users, it is crucial to define which of these devices are actually allowed to record the user's speech. For example, if a user's voice leaks into another user's interaction, and is therefore transmitted to a destination that they have not specifically authorised, the privacy of the users is violated. As a solution, if our devices could perceive our privacy the same way as we do, they could adapt the information they shared to protect the personal data of the users. For that reason, we need to analyse how users perceive privacy in their spoken interactions, based on which we can devise rules that our devicescan follow when they provide a voice user interface. In this thesis we study methods to recognise when two devices are located in the same acoustic space based on the audio signals that they record. We show how acoustic fingerprints can be used to securely share the audio information from a device and estimate the physical proximity of devices. We also generated a speech corpus in conversational scenarios to analyse the effect that the acoustic properties of the environment have on the level of privacy that we perceive. Finally, we developed source separation methods to remove the voice of interfering speakers in a multi-device scenario, thus protecting the privacy of external users.
Translated title of the contributionPreserving Speech Privacy in Interactions with Ad Hoc Sensor Networks
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
  • Bäckström, Tom, Supervising Professor
Print ISBNs978-952-64-0971-9
Electronic ISBNs978-952-64-0972-6
Publication statusPublished - 2022
MoE publication typeG5 Doctoral dissertation (article)


  • voice user interface
  • experience of privacy
  • audio fingerprint
  • acoustic sensor networks


Dive into the research topics of 'Preserving Speech Privacy in Interactions with Ad Hoc Sensor Networks'. Together they form a unique fingerprint.

Cite this