Speech enhancement model inference

Speech Brain is a great tool with many applications. I’m trying to do dnn speech enhancement by following instructions here:
My question is:
how to load the trained model (results\4234\save\CKPT*\model.ckpt) for inference? Is there any example for that?
A stretch ask is the feature to export the model to onnx format.

Hey, we provide different interface on sb.pretrained.interfaces. Pretty sure you will find the one you are looking for. If not, simply create your own from our Pretrained class.

We are doing our best to provide new features (such as ONNX). Do not hesitate to try it yourself and share your experience with us.

I see SpectralMaskEnhancement.from_hparams for pretrained interfaces. It currently support metricgan-plus-voicebank and metricgan-plus-voicebank. The model in this example link is not supported: SpeechBrain
It seems two files are needed: enhance_model.ckpt and hyperparams.yaml. Is there any instruction about generating hyperparams.yaml? It’s different with the file generated from training.
Is it possible to add the file and instruction to colab notebook to make it a complete example?

Hi @hua, interfaces are given to “freeze” a model. Hence, a slightly different (frozen) yaml file is needed. In practice, it is just a matter of removing unnecessary parameters (training ones) and renaming some models so they match the interface standards. See: speechbrain/metricgan-plus-voicebank · Hugging Face For an example of a cleanly formatted yaml and how to use it.

Hi, I’ve been trying to run inference on the templates/enhancement model. I’ve run training, and ckpt was saved. I’ve created a hyperparams.yaml file similar to the metricgan-plus-voicebank file:

But when I run my test code:
I get this error:
Any ideas what’s going on ? (Note, I’ve got the yaml file, the ckpt file, test code and the custom_model.py from the templates/enhancement all in the same folder).

Hi, why are you calling enhance batch directly instead of enhance file ? Consider simply not passing the length, as your custom model does not manage this parameter.

hi, ok thanks, yes if I call enhance_file instead:
It appears to create a wav file which is supposed to be enhanced, though actually doesn’t sound that great probably because the training dataset is too small and the model is quite simple ? I suppose I should probably check out the recipes/Voicebank/spectral_mask recipe and use the pre-trained model there:

btw when I tried to run:
I get a torchaudio error as it expects the third argument in torchaudio.save() to be the sample rate:

1 Like

@davlar could you open an issue on your latest point please ? (GitHub)

how do you open an issue, is there a guide on how to do this ?

Hi, simply go here and click on new issue :slight_smile: