Spherediar: An Effective Speaker Diarization System for Meeting Data

Tuomas Kaseva, Aku Rouhe, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

60 Downloads (Pure)

Abstract

In this paper, we present SphereDiar, a speaker diarization system composed of three novel subsystems: The Sphere-Speaker (SS) neural network, designed for speaker embedding extraction, a segmentation method called Homogeneity Based Segmentation (HBS) and a clustering algorithm called Top Two Silhouettes (Top2S). The system is evaluated on a set of over 200 manually transcribed multiparty meetings. The evaluation reveals that the system can be further simplified by omitting the use of HBS. Furthermore, we illustrate that SphereDiar achieves state-of-The-Art results with two different meeting data sets.

Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PublisherIEEE
Pages373-380
Number of pages8
ISBN (Electronic)9781728103068
DOIs
Publication statusPublished - 1 Dec 2019
MoE publication typeA4 Article in a conference publication
EventIEEE Automatic Speech Recognition and Understanding Workshop - Singapore, Singapore
Duration: 15 Dec 201918 Dec 2019

Workshop

WorkshopIEEE Automatic Speech Recognition and Understanding Workshop
Abbreviated titleASRU
CountrySingapore
CitySingapore
Period15/12/201918/12/2019

Keywords

  • segmentation
  • silhouette coefficients
  • speaker diarization
  • speaker embeddings
  • spherical K-means

Fingerprint Dive into the research topics of 'Spherediar: An Effective Speaker Diarization System for Meeting Data'. Together they form a unique fingerprint.

Cite this