Training Arabic model on my own data

hi
I just prepare my JSON formated data and obtained Tokenizer and lm models
once I run the librisipeech tutorial I got a size mismatch error because the train.py download pre-trained model
how can I train my model without using pre-trained model
here the log
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/CRDNN_BPE_960h_LM/2602
mini_librispeech_prepare - Preparation completed in previous run, skipping.
speechbrain.pretrained.fetching - Fetch lm.ckpt: Using existing file/symlink in results/CRDNN_BPE_960h_LM/2602/save/lm.ckpt.
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in results/CRDNN_BPE_960h_LM/2602/save/tokenizer.ckpt.
speechbrain.pretrained.fetching - Fetch asr.ckpt: Using existing file/symlink in results/CRDNN_BPE_960h_LM/2602/save/model.ckpt.
speechbrain.utils.parameter_transfer - Loading pretrained files for: lm, tokenizer, model
speechbrain.core - Exception:
Traceback (most recent call last):
File “/media/hussein/DATA/speechbrain/templates/speech_recognition-Arabic/ASR/train.py”, line 447, in
hparams[“pretrainer”].load_collected(device=run_opts[“device”])
File “/media/hussein/DATA/speechbrain/speechbrain/utils/parameter_transfer.py”, line 203, in load_collected
self._call_load_hooks(paramfiles, device)
File “/media/hussein/DATA/speechbrain/speechbrain/utils/parameter_transfer.py”, line 218, in _call_load_hooks
default_hook(obj, loadpath, device=device)
File “/media/hussein/DATA/speechbrain/speechbrain/utils/checkpoints.py”, line 142, in torch_parameter_transfer
incompatible_keys = obj.load_state_dict(
File “/home/hussein/anaconda3/envs/speechbrain/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1223, in load_state_dict
raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
RuntimeError: Error(s) in loading state_dict for RNNLM:
size mismatch for embedding.Embedding.weight: copying a param with shape torch.Size([1000, 256]) from checkpoint, the shape in current model is torch.Size([92, 128]).
size mismatch for out.w.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([92, 92]).
size mismatch for out.w.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([92]).

I solved the problem by removing the method that calls the pre-trained model

1 Like