Non-parallel voice conversion using i-vector PLDA: Towards unifying speaker verification and transformation

Tomi Kinnunen*, Lauri Juvela, Paavo Alku, Junichi Yamagishi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

61 Citations (Scopus)

Abstract

Text-independent speaker verification (recognizing speakers regardless of content) and non-parallel voice conversion (transforming voice identities without requiring content-matched training utterances) are related problems. We adopt i-vector method to voice conversion. An i-vector is a fixed-dimensional representation of a speech utterance that enables treating voice conversion in utterance domain, as opposed to frame domain. The high dimensionality (800) and small number of training utterances (24) necessitates using prior information of speakers. We adopt probabilistic linear discriminant analysis (PLDA) for voice conversion. The proposed approach requires neither parallel utterances, transcriptions nor time alignment procedures at any stage.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PublisherIEEE
Pages5535-5539
Number of pages5
ISBN (Electronic)9781509041176
DOIs
Publication statusPublished - 16 Jun 2017
MoE publication typeA4 Article in a conference publication
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - New Orleans, United States
Duration: 5 Mar 20179 Mar 2017

Publication series

NameProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
PublisherIEEE
ISSN (Electronic)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP
Country/TerritoryUnited States
CityNew Orleans
Period05/03/201709/03/2017

Keywords

  • i-vector
  • non-parallel training
  • Voice conversion

Fingerprint

Dive into the research topics of 'Non-parallel voice conversion using i-vector PLDA: Towards unifying speaker verification and transformation'. Together they form a unique fingerprint.

Cite this