Federated Learning for Privacy-Preserving Speaker Recognition

Research output: Contribution to journalArticleScientificpeer-review

3 Downloads (Pure)

Abstract

The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to share. To overcome potential breaches of privacy, we investigate the use of federated learning with and without secure aggregators both for supervised and unsupervised speaker recognition systems. Federated learning enables training of a shared model without sharing private data by training the models on edge devices where the data resides. In the proposed system, each edge device trains an individual model which is subsequently sent to a secure aggregator or directly to the main server. To provide contrasting data without the need for transmitting data, we use a generative adversarial network to generate imposter data at the edge. Afterwards, the secure aggregator or the main server merges the individual models, builds a global model and transmits the global model to the edge devices. Experimental results on Voxceleb-1 dataset show that the use of federated learning both for supervised and unsupervised speaker recognition systems provides two advantages. Firstly, it retains privacy since the raw data does not leave the edge devices. Secondly, experimental results show that the aggregated model provides a better average equal error rate than the individual models when the federated model does not use a secure aggregator. Thus, our results quantify the challenges in practical application of privacy-preserving training of speaker training, in particular in terms of the trade-off between privacy and accuracy.
Original languageEnglish
Pages (from-to)149477-149485
Number of pages9
JournalIEEE Access
Volume9
DOIs
Publication statusPublished - 28 Oct 2021
MoE publication typeA1 Journal article-refereed

Fingerprint

Dive into the research topics of 'Federated Learning for Privacy-Preserving Speaker Recognition'. Together they form a unique fingerprint.

Cite this