Problem of inference after training a wav2vec model

Dear Speechbrain community,

We are facing an issue of inference using a model we trained based on english wav2vec HuggingFace model. We basically followed the training process described here: speechbrain/asr-wav2vec2-commonvoice-fr · Hugging Face

  • For the training:
python train_with_wav2vec.py hparams/train_en_with_wav2vec.yaml --skip_prep=True --batch_size=12 --number_of_epochs=30 --data_folder="XXXX"  --output_folder="YYYY/" --data_parallel_backend
  • Training ends successfully. Then for inference, we are using the following code:
from speechbrain.pretrained import EncoderDecoderASR
asr_model = EncoderDecoderASR.from_hparams(source="YYYY/", savedir="ZZZZ/",run_opts={"device":"cuda"})
print(asr_model.transcribe_file("test.wav"))

But EncoderDecoderASR.from_hparams returns the following error:

  File "/usr/local/lib/python3.8/dist-packages/speechbrain/pretrained/interfaces.py", line 238, in from_hparams
    pretrainer = hparams["pretrainer"]
KeyError: 'pretrainer'

It is correct that there is no pretrainer attribute in YYYY/hyperparams.yaml, which is consistent with the example of training results (models, logs, etc) referenced in the tutorial

Do we have to patch the hyperparams.yaml generated to integrate such pretrainer? And how?

What did we missed? Any help would be very appreciated.

Fabien.

Duplicate of Github issue #1407