DescriptionSpeech is one of our most essential means of communication. Through speech we are able to convey meaning and intent to other individuals. Yet, the contents of speech go beyond what we would call lexical and prosodic content. The complexity of speech production entails that every system involved in the process leaves its mark on our voice. This is what makes it possible for us to determine a speaker’s gender or age, whether they are healthy or ill, or even which emotions they are experiencing, just by listening to their voice. Its ubiquitous nature, uniqueness, the knowledge it conveys about the speaker, and the lexical information it carries, make speech a particularly useful but sensitive type of data that should be protected.
At the same time, mobile devices, cloud-based applications, social media platforms and, more recently, voice-based virtual assistants, are quickly becoming pervasive, giving companies and researchers unprecedented access to data. Combined with the advent of deep learning, this has allowed the development of highly accurate predictive Machine Learning (ML) models. In turn, these are offered back to users as Machine Learning as a Service (MLaaS) applications that automate time-consuming tasks – e.g. transcribing and annotating speech – and assist users in performing everyday tasks – e.g. voice-based virtual assistants.
As an increasing number of these services turn to speech as a means of interaction, authentication, as a biomarker for health, or simply as the focus of their target task, there is a growing demand for the implementation of measures that provide users with more control over the privacy of their voice along with all the information it contains, while it is being processed. The recent European Union’s General Data Protection Regulation (GDPR) is a strong indicator of the growing societal awareness to the problem of data misuse. In fact, in light of these regulations, speech may be legally considered as Personally Identifiable Information (PII).
With this thesis we aim to address the issue of privacy-preserving machine learning for speech by focusing on two paradigms – cryptographic processing, and privacy-oriented speech manipulation. In this proposal we present methods already implemented based on both paradigms, discuss the trade-offs offered by each, and outline objectives for future work.
|Aikajakso||23 helmik. 2022|