Error (speech_separation project): mse_loss function, predictions and targets don't have the same length

Hi all!

I have run into a problem at mse loss calculation, as below:

The script complains that the predictions and targets don’t have the same dimension (predictions[“spec”] and clean_spec, in the figure, respectively). Has anyone run into the same issue before? I wonder why there would be a difference in sizes for predictions and targets. I believe the mixture and clean audio samples have the same length. Any help would be greatly appreciated!

Thank you,
Agudemu

Hi Agudemu,

It is possible that the dimension orders do not match, or the other possibility is that the sequence lengths are not the same. Can you print the shapes of predictions and clean_spec?

Best,
Cem

Hi Cem,

Thanks for your reply!

It is the mismatch in length; predictions: torch.Size([8, 553, 257]), clean_spec: torch.Size([8, 522, 257]).

I wonder why the model gives predicted spectrograms of the size that do not match the clean spectrograms. I am running the template script for speech enhancement in speechbrain. There was no such problem when the mixture was made “on the fly”. Now I am just trying to get it to work with fixed/premixed LibriSpeech dataset (data pair: mixture and clean target speech).

Thank you,
Agudemu

Follow-up: the spectrograms of the noisy mixture have the same dimension as the predicted spectrograms. It turns out the problem is with my dataset: in some instances, the noisy mixture and the clean target don’t have the same dimension!

Thanks for the help! Really appreciate it!