Pretraining,transfer learning

I want to train a ASR Model on a very large ‘Hindi’ Language dataset and then use the same trained model to finetune on few other small datasets using transformer+ctc attention model and conformer models. is this possible using this toolkit?

Hey, yes wee these two tutorials:

Pretraining and Fine-tuning with HF

Speech Recognition From Scratch


But,pretraining and fine tuning is done using hugging face models. I want to fine-tune using the model I trained.

It is exactly the same, just replace the hugging face hub name with your path :slight_smile:

1 Like

I want to train xvector model for speaker identification with initial weights of pretrained model on CommonVoice dataset. What should the manifest format be like? and how do I use initial weights?

Hi, start with this tutorial. The concept is really simple, put your xvector in the yaml, define the Pretrainer in the yaml as well (exemple: hyperparams.yaml · speechbrain/asr-crdnn-commonvoice-fr at main). And then calling


In y our recipe, and you are done !

thank you.
we follow this tutorial and train network without initial weights then we added this lines to file:

model = hparams["embedding_model"]
pretrain = Pretrainer(loadables={'model': speaker_brain}, paths={'model': 'speechbrain/spkrec-xvect-voxceleb/embedding_model.ckpt'})


I doubt the initial weights are loaded on the model. how can I test it? Is it true?

The logs should tell you if it is loaded or not :slight_smile: Otherwise you can just have a look at the weights directly by accessing them as for any PyTorch layer !

I’m interested in finetuning an xvector model (pretrained spkrec-xvect-voxceleb) to perform some classification tasks (eg. speakers of some characteristic vs speakers of some other characteristics).

I think in terms of the model pipeline, I need to modify only the classifier module (change #classes) and start retraining the whole pretrained network with my own data. I’ve read the tutorials but I’m still confused about how the whole process works.

Some questions:

  1. For the steps mentioned above:
model = hparams["embedding_model"]
pretrain = Pretrainer(loadables={'model': speaker_brain}, paths={'model': 'speechbrain/spkrec-xvect-voxceleb/embedding_model.ckpt'})


This only loads one module of the whole spkrec-xvect-voxceleb network right? How should I actually start training/finetuning the whole network with my own data? Do I just load this module and put it somewhere in the brain class initialization before retraining?

    # Brain class initialization
    speaker_brain = SpeakerBrain(
  1. What does defining the pretrainer in the yaml file actually do? The tutorials do not seem to be asking us to do that? They just told us to do this
from speechbrain.utils.parameter_transfer import Pretrainer

# Initialization of the pre-trainer 
pretrain = Pretrainer(loadables={'model': model}, paths={'model': 'speechbrain/spkrec-ecapa-voxceleb/embedding_model.ckpt'})

# We download the pretrained model from HuggingFace in this case
  1. What does “put your xvector in the yaml” mean above?

  2. How should my own data folder be formatted or look like? Is there a way to utilize the Dataset prep process for VoxCeleb in speechbrain to prepare my own data? What data info/txt/csv files should I generate before the preparation process, and are there any requirements for my audio files (eg. maybe duration?)

I have the same questions, did you get an answer to these questions, and did you train the model??? Can you help me clarify the training steps you follow?