Spkrec-ecapa-voxceleb is using mel filterbanks instead of MFCCs


I have a question regarding the Speaker Verification model: spkrec-ecapa-voxceleb

I saw that the model is based on this paper ECAPA-TDNN [2005.07143] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification.
In the paper, the authores are using MFCCs.

But the model implemented by speechbrain is trained on mel filterbanks instead. Do you have any experience whether mel filterbanks are working better than MFCCs? Or by how much the EER drops if we use MFCCs instead of the mel filterbanks?

Yes. FBANKs are normally better when using CNNs. As far as I remember, the authors of ECAPA used FBANKs as well in follow up papers.

Thanks, that clarified it for me