Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

212 Lataukset (Pure)

Abstrakti

Vector quantized variational autoencoders (VQ-VAE) are well-known deep generative models, which map input data to a latent space that is used for data generation. Such latent spaces are unstructured and can thus be difficult to interpret. Some earlier approaches have introduced a structure to the latent space through supervised learning by defining data labels as latent variables. In contrast, we propose an unsupervised technique incorporating space-filling curves into vector quantization (VQ), which yields an arranged form of latent vectors such that adjacent elements in the VQ codebook refer to similar content. We applied this technique to the latent codebook vectors of a VQ-VAE, which encode the phonetic information of a speech signal in a voice conversion task. Our experiments show there is a clear arrangement in latent vectors representing speech phones, which clarifies what phone each latent vector corresponds to and facilitates other detailed interpretations of latent vectors.
AlkuperäiskieliEnglanti
OtsikkoProceedings of Interspeech Conference
KustantajaInternational Speech Communication Association (ISCA)
Sivut306-310
Sivumäärä5
Vuosikerta2023-August
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInterspeech - Dublin, Irlanti
Kesto: 20 elok. 202324 elok. 2023

Julkaisusarja

NimiInterspeech
KustantajaInternational Speech Communication Association
ISSN (elektroninen)2958-1796

Conference

ConferenceInterspeech
Maa/AlueIrlanti
KaupunkiDublin
Ajanjakso20/08/202324/08/2023

Sormenjälki

Sukella tutkimusaiheisiin 'Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä