Maybe the output model names/paths of the pre-trained Tokenizer & LM are different from what you are expecting. I followed the recipe in ‘templates/speech_recognition’ and considering there are 3 folders: ‘ASR’, ‘LM’, & ‘Tokenizer’. The trained Tokenzer & LM are saved in folders (unless if you have not manually modified the code to change the path of the output) are as follows:-
- Tokenizer: Tokenizer/save/*******.model
- LM: LM/results/RNNLM/save/CKPT+2021-06-14+19-34-37+00/model.ckpt
So please check the path and name of the saved Tokenizer & LM models for your case.
For my case in the ‘ASR/train.yaml’ file, I manually changed the path (instead of providing pretrained_path variable value) as below:
collect_in: !ref <save_folder>
lm: !ref <lm_model>
tokenizer: !ref <tokenizer>
model: !ref <model>
lm: !ref ../LM/results/RNNLM/save/CKPT+2021-06-14+19-34-37+00/model.ckpt
tokenizer: !ref ../Tokenizer/save/*******.model
model: !ref <pretrained_path>/asr.ckpt
Also, only if you want to use the pre-trained Librispeech ASR for AM training, keep the parameters something like this, or else just comment the lines that start with ‘model: …’ and it will skip the pre-training.
Hope this help! @Grga