Filterbank non-constant amplitudes

Plotting the triangular filterbank, I noticed that the amplitudes at the central frequencies are not
constant. Thus, even though having a 50% overlap, I think the filters don’t meet the COLA constraint and as a result trying to reconstruct the signal from the framed segments will be distorted.

Is that ok? What do you think?

This is ultra weird as we checked this part many many times. Could you share the code that produced this ?

import torch
import matplotlib.pyplot as plt
from speechbrain.dataio.dataio import read_audio
from speechbrain.processing.features import STFT

from speechbrain.processing.features import spectral_magnitude
from speechbrain.processing.features import Filterbank

# %%

signal, _ = read_audio('spk1_snt1.wav') 
signal = signal.unsqueeze(0) # [batch, time]

compute_STFT = STFT(sample_rate=16000, win_length=25, hop_length=10, n_fft=400)
signal_STFT = compute_STFT(signal)

compute_fbanks = Filterbank(n_mels=20)
STFT = compute_STFT(signal)
mag = spectral_magnitude(STFT)

fbanks, fb_mat = compute_fbanks(mag)
plt.figure(figsize=(8, 4), dpi=100)

I edited the Filterbank forward function to return the fbank_matrix as well
i.e, /speechbrain/processing/

return fbanks, fbank_matrix

Trying the Gaussian FB,
the first filter at DC starts at a value that is not zero.

compute_fbanks_gauss = Filterbank(n_mels=10, n_fft=1024, filter_shape="gaussian")

The higher the fft points used in the calculation of the STFT, the smoother the curves at lower frequencies get. However, the gaussian filter starts at: tensor(0.1353) instead of 0.


Ok this looks weird, we are looking into it. Please do not hesitate to investigate a bit and she your findings. Hard to keep up with all the questions :frowning: