Am trying to compute MFCC of an audio signal. Output of SB’s MFCC is of dims (len,660) when the expected dims of a traditional MFCC is (len, num_of_mfcc_coeffcients)
1.) I just want to know how i should be understanding the output of SB’s MFCC.
from speechbrain.lobes.features import MFCC as sb_mfcc from speechbrain.dataio.dataio import read_audio, write_audio sbmfcc = sb_mfcc() sb_audio = read_audio(file_path) sb_full = sbmfcc(sb_audio.unsqueeze(0)).squeeze(0) sb_full.shape > (<len>,660)
2.) Is this output not comparable against torchtransform.mfcc’s output ? Is there any reason why SB has preferred to do a self implementation of these featurizers ? Are you trying to make it learnable. Would be great if you could elaborate on it.