Fine-tuning, saving hparams, model

Hello!
I have fine-tuned a wav2vec model (like in the tutorial advanced) but i don’t really know how to save it.
I saw that create_experiment_directory can be used to save log and environnement.
I saw that speechrbain got a class speechbrain.Checkpointer but i cannot use it because speechbrain.Chekpointer doesn’t exist.
image
I would like to save my model to have the same structure as asr-wav2vec2-commonvoice-fr on HugginFace
What i am missing?

Thanks for help :slight_smile:

Hi again, i’ve found the speechbrain Checkpointer in speechbrain.utils.checkpointers but still… i can’t save the model like i would do. Is there another modules for this?

Hi, did you check our tutorial on this subject? Google Colaboratory

With that, you should be able to learn everything you need to know. (it’s just a matter of passing the wav2vec2 model to the checkpoint in the yaml basically).

1 Like

Thanks ! I’ve checked but I can only save the model and not the tokenizer.
In another way, I want to have the same structure as wav2vec2 with:

  • asr.ckpt : I don’t really know how and when it is saved and loaded
  • preprocessor_config.json : comes from the model i think but i don’t really know
  • tokenizer.ckpt : I don’t know how to save it : I’ve got an error RuntimeError: Don't know how to save <class 'sentencepiece.SentencePieceProcessor'>. Register default hook or add custom hook for this object.

Hi the tokeniser cannot be saved with the checkpoint (but it can be loaded), because it’s an external library. By default the tokeniser is saved as VOCABSIZE_type.model in the results folder

Hi again : i have found a solution :

import pickle
asr_model_brain = torch.nn.ModuleList(
    [fine_tuned_model.modules.enc, fine_tuned_model.modules.emb, fine_tuned_model.modules.dec, fine_tuned_model.modules.ctc_lin, fine_tuned_model.modules.seq_lin])

pickle.dump(asr_model_brain, open( "/path/to/save/asr_model.ckpt", "wb" ))

and :

import pickle
from speechbrain.tokenizers.SentencePiece import SentencePiece

tokenizer = SentencePiece(
        model_dir="/path/to/save/",
        vocab_size=500,
        annotation_train="/path/to/load/csv/train.csv",
        annotation_read="wrd",
        model_type="unigram",
        character_coverage=1.0,
        bos_id=1.0,
        eos_id=2.0,
    )

pickle.dump(tokenizer, open( "/path/to/save/tokenizer.ckpt", "wb" ))

Now, i have to fine-tuned a sentencePiece model but it is another subjet.

Thanks for help :slight_smile: and have a great day