Problem with defining pretrained_path in ASR train.yaml


I am trying to create my own Speech Recongition model.
I followed your instructions in Google Collab for Speech Recognition, and managed to 1. prepare data, 2. tokenize, 3. train LM model, but I am having problems with the 4th step.

I am having problems with defining pretrained_path when training ASR model.
I tried to define pretrained_path as path to the LM output directory, but there is no lm.ckpt, tokenizer.ckpt and asr.ckpt.
So…, I am wondering how can I train my own model when I can’t use saved LM output.


Maybe the output model names/paths of the pre-trained Tokenizer & LM are different from what you are expecting. I followed the recipe in ‘templates/speech_recognition’ and considering there are 3 folders: ‘ASR’, ‘LM’, & ‘Tokenizer’. The trained Tokenzer & LM are saved in folders (unless if you have not manually modified the code to change the path of the output) are as follows:-

  1. Tokenizer: Tokenizer/save/*******.model
  2. LM: LM/results/RNNLM/save/CKPT+2021-06-14+19-34-37+00/model.ckpt

So please check the path and name of the saved Tokenizer & LM models for your case.

For my case in the ‘ASR/train.yaml’ file, I manually changed the path (instead of providing pretrained_path variable value) as below:

pretrained_path: speechbrain/asr-crdnn-rnnlm-librispeech
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    collect_in: !ref <save_folder>
        lm: !ref <lm_model>
        tokenizer: !ref <tokenizer>
        model: !ref <model>
        lm: !ref ../LM/results/RNNLM/save/CKPT+2021-06-14+19-34-37+00/model.ckpt
        tokenizer: !ref ../Tokenizer/save/*******.model
        model: !ref <pretrained_path>/asr.ckpt

Also, only if you want to use the pre-trained Librispeech ASR for AM training, keep the parameters something like this, or else just comment the lines that start with ‘model: …’ and it will skip the pre-training.

Hope this help! @Grga


Thank you for fast response,

I will try that today. :grin: