I was using the speaker recognition scripts in the directory “speechbrain/recpies/VoxCeleb/SpeakRec”. I have trained the ECAPA-TDNN speakerRec model and try to test the performance of the model.
However, I surprisingly found that when i changed the batch size in yaml file, i got different scores of the same pair enrol.wav and test.wav in text.txt. What 's more, when i set only one pair in the test.txt, i always got the “-1.0” score using CosineSimilarity function.
Why did the scores of the same pair change with different batch size? It seems that the same audio file can get different embedding in speakerRec model, is it reasonable?
Wish for your result! i can’t thanks anymore!