Save/Load Models for Speaker Recognition & Yes/No for current speech

I’m using the Speaker ID tutorial from the google collab and have replaced the data set with mine (and a few other changes in the yaml file). The code runs and trains the model successfully. Now I need to save that model and load it later for predictions on the incoming data.


  1. I understand the results/ folder contains the checkpoints for the model, so I need to use the Checkpointer to recall the ‘best’ version- correct? Is this equivalent to loading a pre-trained model? I am not trying to continue training the model, but instead using it in the actual work.
    I’m looking at the Checkpoint tutorial and see:

checkpoint_dir = “./nutshell_checkpoints”
checkpointer = Checkpointer(checkpoint_dir,
recoverables = {“mdl”: model,
“opt”: optimizer,
“epochs”: epoch_counter})

Am I looking in the right place?

  1. The tutorial file ends with fit() and evaluate() - where can I see a predict() usage?

  2. Can speechbrain be used for a yes/no on whether the provided audio contains speech at all? Will the speech-to-text package give me a null or false if I were to provide noise, sound effects, instrumental music, etc.?
    I have long segments of multi-speaker audio that I have spliced into segments and I only want to pass the bits with speech to the Speaker ID system - to save on resources. I am also concerned about the results from Speaker ID if a particular segment has no speech, but the NN gives me its best guess anyway. Should the Speaker ID system have a tag for non-speech? Should I provide it with examples of music, noise, sound effects etc.?

Any feedback is appreciated. Thanks!

Q1 and Q2 - Found the inference syntax and model loading in the collab. Need to make a new yaml, etc. No problem.

Q3 - I suspect the probabilities would all come out low. I should really just run an experiment to test this.

Thanks for the great product. Will cite the hell out of this when I can.

Hi, we are happy that you appreciate SpeechBrain :smiley: Sorry for the late reply, during the summer most of the community is in vacation ahah.

Glad you found the answer for 1 and 2.

For 3, we currently have a PR on Voice Activity Detection that basically does a speech/non speech classification. I guess you could take the PR and train it, or wait for us to merge it with a pretrained model :slight_smile: