BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Combination of Deep Speaker Embeddings for Diarisation and Discrim
 inative Neural Clustering for Speaker Diarisation - Brian Sun\, Quija Li\,
  Florian Kreyssig\, Cambridge University Speech Research Group
DTSTART:20210316T120000Z
DTEND:20210316T130000Z
UID:TALK157138@talks.cam.ac.uk
CONTACT:Dr Kate Knill
DESCRIPTION:*Combination of Deep Speaker Embeddings for Diarisation*\n\n*B
 rian Sun*\n\n*Abstract:* Recently\, significant progress has been made in 
 speaker diarisation after the introduction of d-vectors as speaker embeddi
 ngs extracted from neural network (NN) speaker classifiers for clustering 
 speech segments. To extract better-performing and more robust speaker embe
 ddings\, this paper proposes a c-vector method by combining multiple sets 
 of complementary d-vectors derived from systems with different NN componen
 ts. Three structures are used to implement the c-vectors\, namely 2D self-
  attentive\, gated additive\, and bilinear pooling structures\, relying on
  attention mechanisms\, a gating mechanism\, and a low-rank bilinear pooli
 ng mechanism respectively. Furthermore\, a neural-based single-pass speake
 r diarisation pipeline is also proposed in this paper\, which uses NNs to 
 achieve voice activity detection\, speaker change point detection\, and sp
 eaker embedding extraction. Experiments and detailed analyses are conducte
 d on the challenging AMI and NIST RT05 datasets which consist of real meet
 ings with 4–10 speakers and a wide range of acoustic conditions. Consist
 ent improvements are obtained by using c-vectors instead of d-vectors\, an
 d similar relative improvements in diarisation error rates are observed on
  both AMI and RT05\, which shows the robustness of the proposed methods.\n
 \n*Discriminative Neural Clustering for Speaker Diarisation*\n\n*Quijia Li
  and Florian Kreyssig*\n\n*Abstract:* In this paper\, we propose Discrimin
 ative Neural Clustering (DNC) that formulates data clustering with a maxim
 um number of clusters as a supervised sequence-to-sequence learning proble
 m. Compared to traditional unsupervised clustering algorithms\, DNC learns
  clustering patterns from training data without requiring an explicit defi
 nition of a similarity measure. An implementation of DNC based on the Tran
 sformer architecture is shown to be effective on a speaker diarisation tas
 k using the challenging AMI dataset. Since AMI contains only 147 complete 
 meetings as individual input sequences\, data scarcity is a significant is
 sue for training a Transformer model for DNC. Accordingly\, this paper pro
 poses three data augmentation schemes: sub-sequence randomisation\, input 
 vector randomisation\, and Diaconis augmentation\, which generates new dat
 a samples by rotating the entire input sequence of L2-normalised speaker e
 mbeddings. Experimental results on AMI show that DNC achieves a reduction 
 in speaker error rate (SER) of 29.4% relative to spectral clustering.\n\nT
 his talk is from SLT 2021 where it was awarded Best Student Paper.\n\n\n
LOCATION:Zoom: https://zoom.us/j/95352633552?pwd=RzJVK2UzOGZyNU5mVHd1Y1VPT
 2tDUT09
END:VEVENT
END:VCALENDAR