Pretrained ECAPA-TDNN net outputs same feature vector for different signals


the problem is, that the pretrained ECAPA TDNN outputs the same feature vector for two very different signals. Try it out by yourself:

signal1, fs = torchaudio.load('Cleanclnsp126_3Wjw0nadnM4_snr15_tl-22_fileid_0_00.wav')
signal2, fs, = torchaudio.load('Cleanclnsp129_lJoaywrZPsU_snr0_tl-27_fileid_252_40.wav')
classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb", freeze_params=False)

signal1 = signal1.repeat(2,1)
signal2 = signal2.repeat(2, 1)

embeddings1 = classifier.encode_batch(signal1)
embeddings2 = classifier.encode_batch(signal2)

e1 = embeddings1[0]
e2 = embeddings2[0]

I can’t upload the .wav files here that I used, but I am sure that this also applies to your .wav files. Just use two clean speech .wav files from two different speaker (I even used male and female) that are not in the training set of that pretrained net.

1 Like