ASR from scratch

Hello guys,
I want to train an asr with my own data. So I trained a tokenizer and LM. I want to use them to train my asr. I started with the template asr from scrath
I modified the train.yaml script in the ASR folder to be able to use my LM and Tokenizer trained previously, like this

pretrained_path: ../trained
#where my LM and Tokenizer are saved
llm: !ref <pretrained_path>/model.ckpt
 tokenizer: !ref <pretrained_path>/1000_char.model
model: !ref speechbrain/asr-crdnn-rnnlm-librispeech/asr.ckpt
#To use my LM and Tokenizer, and the asr pre-trained

But I got error

speechbrain.core - Exception:
Traceback (most recent call last):
  File "train.py", line 447, in <module>
    hparams["pretrainer"].load_collected(device=run_opts["device"])
  File "/home/ubuntu/speechbrain/speechbrain/utils/parameter_transfer.py", line 203, in load_collected
    self._call_load_hooks(paramfiles, device)
  File "/home/ubuntu/speechbrain/speechbrain/utils/parameter_transfer.py", line 218, in _call_load_hooks
    default_hook(obj, loadpath, device=device)
  File "/home/ubuntu/speechbrain/speechbrain/utils/checkpoints.py", line 143, in torch_parameter_transfer
    torch.load(path, map_location=device), strict=False
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNNLM:
        size mismatch for embedding.Embedding.weight: copying a param with shape torch.Size([1000, 256]) from checkpoint, the shape in current model is torch.Size([1000, 128]).

Can anyone help me ? thank you

You need to match embedding size in asr training configuration file with the one in LM training configuration.

1 Like

thanks for your help, it works !

2 Likes

can you provide your yaml file please