Spkrec-ecapa-voxceleb is using mel filterbanks instead of MFCCs


First of all, I want to congratulate and thank you for this awesome repository! Thanks a lot for all the efforts!

I have a question regarding the Speaker Verification model: spkrec-ecapa-voxceleb

I saw that the model is based on this paper ECAPA-TDNN [2005.07143] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification.
In the paper, the authores are using MFCCs.

But the model implemented by speechbrain is trained on mel filterbanks instead. Do you have any experience whether mel filterbanks are working better than MFCCs? Or by how much the EER drops if we use MFCCs instead of the mel filterbanks?

Yes. FBANKs are normally better when using CNNs. As far as I remember, the authors of ECAPA used FBANKs as well in follow up papers.

1 Like

Thanks, that clarified it for me