Checkpointer Load Warning

Hi, I am newly in speechbrain. Since I want to train a ASR model in different language, I can not use pretrained models; thus I have to train my model from start (Tokenizer, LM, ASR). In order to learn using speechbrain toolkit I followed your speechrecognition tutorial and tried to training a model with mini_librispeech dataset. While trainin LM, I gave the tokenizer’s local path which is created in the previous step. RNNLM.yaml file is in the below.

[# ############################################################################
# Model: Language model with a recurrent neural network (RNNLM)
# Training: mini-librispeech transcripts
# Authors:  Ju-Chieh Chou 2020, Jianyuan Zhong 2021, Mirco Ravanelli 2021
# ############################################################################

# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 2602
__set_seed: !apply:torch.manual_seed [!ref <seed>]
output_folder: !ref results/RNNLM/
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

# If you plan to train a system on an HPC cluster with a big dataset,
# we strongly suggest doing the following:
# 1- Compress the dataset in a single tar or zip file.
# 2- Copy your dataset locally (i.e., the local disk of the computing node).
# 3- Uncompress the dataset in the local folder.
# 4- Set lm_{train,valid,test}_data with the local path.
# Reading data from the local disk of the compute node (e.g. $SLURM_TMPDIR with SLURM-based clusters) is very important.
# It allows you to read the data much faster without slowing down the shared filesystem.
lm_train_data: data/train.txt
lm_valid_data: data/valid.txt
lm_test_data: data/test.txt

# The train logger writes training statistics to a file, as well as stdout.
train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

# Tokenizer model (you must use the same tokenizer for LM and ASR training)
tokenizer_file: ../Tokenizer/save/1000_unigram.model

# Training parameters
number_of_epochs: 20
batch_size: 80
lr: 0.001
accu_steps: 1 # Gradient accumulation to simulate large batch training
ckpt_interval_minutes: 15 # save checkpoint every N min

# Dataloader options
train_dataloader_opts:
    batch_size: !ref <batch_size>
    shuffle: True

valid_dataloader_opts:
    batch_size: 1

test_dataloader_opts:
    batch_size: 1

# Model parameters
emb_dim: 256 # dimension of the embeddings
rnn_size: 512 # dimension of hidden layers
layers: 2 # number of hidden layers

# Outputs
output_neurons: 1000 # index(blank/eos/bos) = 0
blank_index: 0
bos_index: 0
eos_index: 0


# To design a custom model, either just edit the simple CustomModel
# class that's listed here, or replace this `!new` call with a line
# pointing to a different file you've defined..
model: !new:templates.speech_recognition.LM.custom_model.CustomModel
    embedding_dim: !ref <emb_dim>
    rnn_size: !ref <rnn_size>
    layers: !ref <layers>


# Cost function used for training the model
compute_cost: !name:speechbrain.nnet.losses.nll_loss

# This optimizer will be constructed by the Brain class after all parameters
# are moved to the correct device. Then it will be added to the checkpointer.
optimizer: !name:torch.optim.Adam
    lr: !ref <lr>
    betas: (0.9, 0.98)
    eps: 0.000000001

# This function manages learning rate annealing over the epochs.
# We here use the NewBoB algorithm, that anneals the learning rate if
# the improvements over two consecutive epochs is less than the defined
# threshold.
lr_annealing: !new:speechbrain.nnet.schedulers.NewBobScheduler
    initial_value: !ref <lr>
    improvement_threshold: 0.0025
    annealing_factor: 0.8
    patient: 0


# The first object passed to the Brain class is this "Epoch Counter"
# which is saved by the Checkpointer so that training can be resumed
# if it gets interrupted at any point.
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
    limit: !ref <number_of_epochs>

# Objects in "modules" dict will have their parameters moved to the correct
# device, as well as having train()/eval() called on them by the Brain class.
modules:
    model: !ref <model>

# Tokenizer initialization
tokenizer: !new:sentencepiece.SentencePieceProcessor

# Tokenizer parameters
token_type: unigram  # ["unigram", "bpe", "char"]
token_output: 1000  # index(blank/eos/bos/unk) = 0
character_coverage: 1.0
annotation_read: words # field to read



# This object is used for saving the state of training both so that it
# can be resumed if it gets interrupted, and also so that the best checkpoint
# can be later loaded for evaluation or inference.
checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        scheduler: !ref <lr_annealing>
        counter: !ref <epoch_counter>

# Pretrain the tokenizer
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    loadables:
        tokenizer: !ref <tokenizer>
    paths:
        tokenizer: !ref <tokenizer_file>

train_log file of LM training is:

Epoch: 1 - train loss: 2.23 - valid loss: 2.11e-03
Epoch: 2 - train loss: 3.65e-04 - valid loss: 2.06e-05
Epoch: 3 - train loss: 1.17e-05 - valid loss: 7.27e-06
Epoch: 4 - train loss: 6.27e-06 - valid loss: 5.48e-06
Epoch: 5 - train loss: 5.41e-06 - valid loss: 5.36e-06
Epoch: 6 - train loss: 5.36e-06 - valid loss: 5.36e-06
Epoch: 7 - train loss: 5.28e-06 - valid loss: 5.25e-06
Epoch: 8 - train loss: 5.18e-06 - valid loss: 5.13e-06

Then, because I don’t want use the huggingface’s pretreained model (like shown in the ASRfromScratch-Step4) I run the ASR training.py code with this train.yaml file:

# ############################################################################
# Model: E2E ASR with attention-based ASR
# Encoder: CRDNN
# Decoder: GRU + beamsearch + RNNLM
# Tokens: 1000 BPE
# losses: CTC+ NLL
# Training: mini-librispeech
# Pre-Training: librispeech 960h
# Authors:  Ju-Chieh Chou, Mirco Ravanelli, Abdel Heba, Peter Plantinga, Samuele Cornell 2020
# # ############################################################################

# Seed needs to be set at top of yaml, before objects with parameters are instantiated
seed: 2602
__set_seed: !apply:torch.manual_seed [!ref <seed>]

# If you plan to train a system on an HPC cluster with a big dataset,
# we strongly suggest doing the following:
# 1- Compress the dataset in a single tar or zip file.
# 2- Copy your dataset locally (i.e., the local disk of the computing node).
# 3- Uncompress the dataset in the local folder.
# 4- Set data_folder with the local path
# Reading data from the local disk of the compute node (e.g. $SLURM_TMPDIR with SLURM-based clusters) is very important.
# It allows you to read the data much faster without slowing down the shared filesystem.

data_folder: ../data # In this case, data will be automatically downloaded here.
data_folder_rirs: !ref <data_folder> # noise/ris dataset will automatically be downloaded here
output_folder: !ref results_8batch_80LMbatch_1lr/CRDNN_BPE_960h_LM/<seed>
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

# Language model (LM) pretraining
# NB: To avoid mismatch, the speech recognizer must be trained with the same
# tokenizer used for LM training. Here, we download everything from the
# speechbrain HuggingFace repository. However, a local path pointing to a
# directory containing the lm.ckpt and tokenizer.ckpt may also be specified
# instead. E.g if you want to use your own LM / tokenizer.


pretrained_path: ../

# Path where data manifest files will be stored. The data manifest files are created by the
# data preparation script
train_annotation: ../train.json
valid_annotation: ../valid.json
test_annotation: ../test.json
deneme_annotation: ../deneme.json

# The train logger writes training statistics to a file, as well as stdout.
train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

# Training parameters
number_of_epochs: 15
number_of_ctc_epochs: 5
batch_size: 8
lr: 1.0
ctc_weight: 0.5
sorting: ascending
ckpt_interval_minutes: 15 # save checkpoint every N min
label_smoothing: 0.1

# Dataloader options
train_dataloader_opts:
    batch_size: !ref <batch_size>

valid_dataloader_opts:
    batch_size: !ref <batch_size>

test_dataloader_opts:
    batch_size: !ref <batch_size>

transcribe_dataloader_opts:
    batch_size: !ref <batch_size>

deneme_dataloader_opts:
    batch_size: !ref <batch_size>


# Feature parameters
sample_rate: 16000
n_fft: 400
n_mels: 40

# Model parameters
activation: !name:torch.nn.LeakyReLU
dropout: 0.15
cnn_blocks: 2
cnn_channels: (128, 256)
inter_layer_pooling_size: (2, 2)
cnn_kernelsize: (3, 3)
time_pooling_size: 4
rnn_class: !name:speechbrain.nnet.RNN.LSTM
rnn_layers: 4
rnn_neurons: 1024
rnn_bidirectional: True
dnn_blocks: 2
dnn_neurons: 512
emb_size: 256
dec_neurons: 1024
output_neurons: 1000  # Number of tokens (same as LM)
blank_index: 0
bos_index: 0
eos_index: 0
unk_index: 0

# Decoding parameters
min_decode_ratio: 0.0
max_decode_ratio: 1.0
valid_beam_size: 8
test_beam_size: 80
eos_threshold: 1.5
using_max_attn_shift: True
max_attn_shift: 240
lm_weight: 0.50
ctc_weight_decode: 0.0
coverage_penalty: 1.5
temperature: 1.25
temperature_lm: 1.25

# The first object passed to the Brain class is this "Epoch Counter"
# which is saved by the Checkpointer so that training can be resumed
# if it gets interrupted at any point.
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
    limit: !ref <number_of_epochs>

# Feature extraction
compute_features: !new:speechbrain.lobes.features.Fbank
    sample_rate: !ref <sample_rate>
    n_fft: !ref <n_fft>
    n_mels: !ref <n_mels>

# Feature normalization (mean and std)
normalize: !new:speechbrain.processing.features.InputNormalization
    norm_type: global

# Added noise and reverb come from OpenRIR dataset, automatically
# downloaded and prepared with this Environmental Corruption class.
env_corrupt: !new:speechbrain.lobes.augment.EnvCorrupt
    openrir_folder: !ref <data_folder_rirs>
    babble_prob: 0.0
    reverb_prob: 0.0
    noise_prob: 1.0
    noise_snr_low: 0
    noise_snr_high: 15

# Adds speech change + time and frequency dropouts (time-domain implementation).
augmentation: !new:speechbrain.lobes.augment.TimeDomainSpecAugment
    sample_rate: !ref <sample_rate>
    speeds: [95, 100, 105]

# The CRDNN model is an encoder that combines CNNs, RNNs, and DNNs.
encoder: !new:speechbrain.lobes.models.CRDNN.CRDNN
    input_shape: [null, null, !ref <n_mels>]
    activation: !ref <activation>
    dropout: !ref <dropout>
    cnn_blocks: !ref <cnn_blocks>
    cnn_channels: !ref <cnn_channels>
    cnn_kernelsize: !ref <cnn_kernelsize>
    inter_layer_pooling_size: !ref <inter_layer_pooling_size>
    time_pooling: True
    using_2d_pooling: False
    time_pooling_size: !ref <time_pooling_size>
    rnn_class: !ref <rnn_class>
    rnn_layers: !ref <rnn_layers>
    rnn_neurons: !ref <rnn_neurons>
    rnn_bidirectional: !ref <rnn_bidirectional>
    rnn_re_init: True
    dnn_blocks: !ref <dnn_blocks>
    dnn_neurons: !ref <dnn_neurons>
    use_rnnp: False

# Embedding (from indexes to an embedding space of dimension emb_size).
embedding: !new:speechbrain.nnet.embedding.Embedding
    num_embeddings: !ref <output_neurons>
    embedding_dim: !ref <emb_size>

# Attention-based RNN decoder.
decoder: !new:speechbrain.nnet.RNN.AttentionalRNNDecoder
    enc_dim: !ref <dnn_neurons>
    input_size: !ref <emb_size>
    rnn_type: gru
    attn_type: location
    hidden_size: !ref <dec_neurons>
    attn_dim: 1024
    num_layers: 1
    scaling: 1.0
    channels: 10
    kernel_size: 100
    re_init: True
    dropout: !ref <dropout>

# Linear transformation on the top of the encoder.
ctc_lin: !new:speechbrain.nnet.linear.Linear
    input_size: !ref <dnn_neurons>
    n_neurons: !ref <output_neurons>

# Linear transformation on the top of the decoder.
seq_lin: !new:speechbrain.nnet.linear.Linear
    input_size: !ref <dec_neurons>
    n_neurons: !ref <output_neurons>

# Final softmax (for log posteriors computation).
log_softmax: !new:speechbrain.nnet.activations.Softmax
    apply_log: True

# Cost definition for the CTC part.
ctc_cost: !name:speechbrain.nnet.losses.ctc_loss
    blank_index: !ref <blank_index>


# Tokenizer initialization
tokenizer: !new:sentencepiece.SentencePieceProcessor


# Objects in "modules" dict will have their parameters moved to the correct
# device, as well as having train()/eval() called on them by the Brain class
modules:
    encoder: !ref <encoder>
    embedding: !ref <embedding>
    decoder: !ref <decoder>
    ctc_lin: !ref <ctc_lin>
    seq_lin: !ref <seq_lin>
    normalize: !ref <normalize>
    env_corrupt: !ref <env_corrupt>
    lm_model: !ref <lm_model>

# Gathering all the submodels in a single model object.
model: !new:torch.nn.ModuleList
    - - !ref <encoder>
      - !ref <embedding>
      - !ref <decoder>
      - !ref <ctc_lin>
      - !ref <seq_lin>

# This is the RNNLM that is used according to the Huggingface repository
# NB: It has to match the pre-trained RNNLM!!
lm_model: !new:speechbrain.lobes.models.RNNLM.RNNLM
    output_neurons: !ref <output_neurons>
    embedding_dim: !ref <emb_size>
    activation: !name:torch.nn.LeakyReLU
    dropout: 0.0
    rnn_layers: 2
    rnn_neurons: 2048
    dnn_blocks: 1
    dnn_neurons: 512
    return_hidden: True  # For inference

# Beamsearch is applied on the top of the decoder. If the language model is
# given, a language model is applied (with a weight specified in lm_weight).
# If ctc_weight is set, the decoder uses CTC + attention beamsearch. This
# improves the performance, but slows down decoding. For a description of
# the other parameters, please see the speechbrain.decoders.S2SRNNBeamSearchLM.

# It makes sense to have a lighter search during validation. In this case,
# we don't use the LM and CTC probabilities during decoding.
valid_search: !new:speechbrain.decoders.S2SRNNBeamSearcher
    embedding: !ref <embedding>
    decoder: !ref <decoder>
    linear: !ref <seq_lin>
    ctc_linear: !ref <ctc_lin>
    bos_index: !ref <bos_index>
    eos_index: !ref <eos_index>
    blank_index: !ref <blank_index>
    min_decode_ratio: !ref <min_decode_ratio>
    max_decode_ratio: !ref <max_decode_ratio>
    beam_size: !ref <valid_beam_size>
    eos_threshold: !ref <eos_threshold>
    using_max_attn_shift: !ref <using_max_attn_shift>
    max_attn_shift: !ref <max_attn_shift>
    coverage_penalty: !ref <coverage_penalty>
    temperature: !ref <temperature>

# The final decoding on the test set can be more computationally demanding.
# In this case, we use the LM + CTC probabilities during decoding as well.
# Please, remove this part if you need a faster decoder.
test_search: !new:speechbrain.decoders.S2SRNNBeamSearchLM
    embedding: !ref <embedding>
    decoder: !ref <decoder>
    linear: !ref <seq_lin>
    ctc_linear: !ref <ctc_lin>
    language_model: !ref <lm_model>
    bos_index: !ref <bos_index>
    eos_index: !ref <eos_index>
    blank_index: !ref <blank_index>
    min_decode_ratio: !ref <min_decode_ratio>
    max_decode_ratio: !ref <max_decode_ratio>
    beam_size: !ref <test_beam_size>
    eos_threshold: !ref <eos_threshold>
    using_max_attn_shift: !ref <using_max_attn_shift>
    max_attn_shift: !ref <max_attn_shift>
    coverage_penalty: !ref <coverage_penalty>
    lm_weight: !ref <lm_weight>
    ctc_weight: !ref <ctc_weight_decode>
    temperature: !ref <temperature>
    temperature_lm: !ref <temperature_lm>

# This function manages learning rate annealing over the epochs.
# We here use the NewBoB algorithm, that anneals the learning rate if
# the improvements over two consecutive epochs is less than the defined
# threshold.
lr_annealing: !new:speechbrain.nnet.schedulers.NewBobScheduler
    initial_value: !ref <lr>
    improvement_threshold: 0.0025
    annealing_factor: 0.8
    patient: 0

# This optimizer will be constructed by the Brain class after all parameters
# are moved to the correct device. Then it will be added to the checkpointer.
opt_class: !name:torch.optim.Adadelta
    lr: !ref <lr>
    rho: 0.95
    eps: 1.e-8

# Functions that compute the statistics to track during the validation step.
error_rate_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats

cer_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats
    split_tokens: True

# This object is used for saving the state of training both so that it
# can be resumed if it gets interrupted, and also so that the best checkpoint
# can be later loaded for evaluation or inference.
checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        scheduler: !ref <lr_annealing>
        normalizer: !ref <normalize>
        counter: !ref <epoch_counter>

# This object is used to pretrain the language model and the tokenizers
# (defined above). In this case, we also pretrain the ASR model (to make
# sure the model converges on a small amount of data)
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    collect_in: !ref <save_folder>
    loadables:
        lm: !ref <lm_model>
        tokenizer: !ref <tokenizer>
    paths:
        lm: !ref <pretrained_path>/LM/results/RNNLM/save/CKPT+2021-11-03+05-31-44+01/model.ckpt
        tokenizer: !ref <pretrained_path>/Tokenizer/save/1000_unigram.model

As you can see, I gave the filepath of trained LM in the prevous step (at the end of .yaml file). When I follow these steps WER of ASR model is generated terrible scores as shown in the below train_log.txt file.

epoch: 1, lr: 1.00e+00 - train loss: 8.27 - valid loss: 5.78, valid CER: 9.31e+02, valid WER: 1.31e+03
epoch: 2, lr: 1.00e+00 - train loss: 5.79 - valid loss: 5.47, valid CER: 5.38e+02, valid WER: 4.84e+02
epoch: 3, lr: 1.00e+00 - train loss: 5.64 - valid loss: 5.31, valid CER: 6.31e+02, valid WER: 8.43e+02
epoch: 4, lr: 8.00e-01 - train loss: 5.52 - valid loss: 5.19, valid CER: 3.17e+02, valid WER: 4.15e+02
epoch: 5, lr: 8.00e-01 - train loss: 5.45 - valid loss: 5.14, valid CER: 2.76e+02, valid WER: 3.74e+02
epoch: 6, lr: 8.00e-01 - train loss: 5.38 - valid loss: 5.10, valid CER: 1.69e+02, valid WER: 2.56e+02
epoch: 7, lr: 8.00e-01 - train loss: 5.32 - valid loss: 5.06, valid CER: 1.57e+02, valid WER: 2.35e+02
epoch: 8, lr: 8.00e-01 - train loss: 5.27 - valid loss: 5.04, valid CER: 1.95e+02, valid WER: 2.96e+02
epoch: 9, lr: 6.40e-01 - train loss: 5.20 - valid loss: 5.03, valid CER: 1.91e+02, valid WER: 2.88e+02
epoch: 10, lr: 6.40e-01 - train loss: 5.16 - valid loss: 5.02, valid CER: 2.06e+02, valid WER: 3.02e+02
epoch: 11, lr: 5.12e-01 - train loss: 4.70 - valid loss: 5.00, valid CER: 1.98e+02, valid WER: 2.86e+02
epoch: 12, lr: 5.12e-01 - train loss: 4.63 - valid loss: 4.99, valid CER: 2.09e+02, valid WER: 3.03e+02

And the very small part of wer.txt file:

%WER 121.16 [ 63701 / 52576, 11304 ins, 14021 del, 38376 sub ]
%SER 100.00 [ 2620 / 2620 ]
Scored 2620 sentences, 0 not present in hyp.
================================================================================
ALIGNMENTS

Format:
<utterance-id>, WER DETAILS
<eps> ; reference  ; on ; the ; first ;  line
  I   ;     S      ; =  ;  =  ;   S   ;   D  
 and  ; hypothesis ; on ; the ; third ; <eps>
================================================================================
1089-134686-0003, %WER 128.57 [ 9 / 7, 2 ins, 0 del, 7 sub ]
HELLO ; BERTIE ;   ANY    ; GOOD ;  IN ;   YOUR   ; MIND ; <eps> ;  <eps>  
  S   ;   S    ;    S     ;  S   ;  S  ;    S     ;  S   ;   I   ;    I    
 KNIT ;  TWO   ; TOGETHER ; KNIT ; TWO ; TOGETHER ; KNIT ;  TWO  ; TOGETHER
================================================================================
1089-134686-0028, %WER 100.00 [ 18 / 18, 0 ins, 2 del, 16 sub ]
THE ; RETREAT ; WILL ;  BEGIN   ;  ON  ; WEDNESDAY ; AFTERNOON ;  IN  ; HONOUR ;    OF    ; SAINT ; FRANCIS ;  XAVIER  ; WHOSE ; FEAST ;   DAY    ;   IS  ; SATURDAY
 S  ;    S    ;  S   ;    S     ;  S   ;     S     ;     S     ;  S   ;   S    ;    S     ;   S   ;    S    ;    S     ;   S   ;   S   ;    S     ;   D   ;    D    
 I  ;  DON'T  ; TWO  ; TOGETHER ; KNIT ;    TWO    ;  TOGETHER ; KNIT ;  TWO   ; TOGETHER ;  KNIT ;   TWO   ; TOGETHER ;  KNIT ;  TWO  ; TOGETHER ; <eps> ;  <eps>  
================================================================================
1284-1180-0029, %WER 100.00 [ 16 / 16, 0 ins, 4 del, 12 sub ]
SOMETIMES ;  IT ;    IS    ; CALLED ;  A  ;  CRAZY   ; QUILT ; BECAUSE ;   THE    ; PATCHES ; AND ;  COLORS  ;  ARE  ;   SO  ; MIXED ;   UP 
    S     ;  S  ;    S     ;   S    ;  S  ;    S     ;   S   ;    S    ;    S     ;    S    ;  S  ;    S     ;   D   ;   D   ;   D   ;   D  
   KNIT   ; TWO ; TOGETHER ;  KNIT  ; TWO ; TOGETHER ;  KNIT ;   TWO   ; TOGETHER ;   KNIT  ; TWO ; TOGETHER ; <eps> ; <eps> ; <eps> ; <eps>
================================================================================

In the ASR/train.py file I wrote a transcribe_dataset function as shown in the ASRfromScratch-Step5 for “deneme.json” dataset like that:

{
  "3947-13262-0024": {
    "wav": "{data_root}/LibriSpeech/train-clean-5/3947/13262/3947-13262-0024.flac",
    "length": 13.1,
    "words": "WHAT WOULD JESUS DO I KEEP ASKING IT THE ANSWER COMES SLOWLY FOR I AM FEELING MY WAY SLOWLY ONE THING I HAVE FOUND OUT THE MEN ARE NOT FIGHTING SHY OF ME I THINK THAT IS A GOOD SIGN"
  },
  "3947-13262-0001": {
    "wav": "{data_root}/LibriSpeech/train-clean-5/3947/13262/3947-13262-0001.flac",
    "length": 14.6,
    "words": "SAID RACHEL SHE TELLS ME THE ARRANGEMENTS ARE NEARLY COMPLETED FOR THE TRANSFER OF THE RECTANGLE PROPERTY YES IT HAS BEEN A TEDIOUS CASE IN THE COURTS DID VIRGINIA SHOW YOU ALL THE PLANS AND SPECIFICATIONS FOR BUILDING"
  },
  "3947-13262-0004": {
    "wav": "{data_root}/LibriSpeech/train-clean-5/3947/13262/3947-13262-0004.flac",
    "length": 14.285,
    "words": "WHAT HAVE YOU BEEN DOING ALL SUMMER I HAVE NOT SEEN MUCH OF YOU RACHEL SUDDENLY ASKED AND THEN HER FACE WARMED WITH ITS QUICK FLUSH OF TROPICAL COLOR AS IF SHE MIGHT HAVE IMPLIED TOO MUCH INTEREST IN ROLLIN OR TOO MUCH REGRET AT NOT SEEING HIM OFTENER"
  }
}

output of “transcribe_dataset” function for “deneme.json” dataset is:


[["I DON'T TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER", "HE SAID JIMMIE DALE OF COURSE TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER", "MISTER CROWN'T TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER KNIT TWO TOGETHER"]]

When I look the log.txt I saw messages like
*“loading from results_8batch_80LMbatch_1lr/CRDNN_BPE_960h_LM/2602/save/lm.ckpt, the object could not use the parameters loaded with the key: rnn.bias_ih_l0”

and

“loading from results_8batch_80LMbatch_1lr/CRDNN_BPE_960h_LM/2602/save/lm.ckpt, the transferred parameters did not have parameters for the key: dnn.linear.w.bias”*

some WARNING examples from log.txt:

2021-11-03 06:44:16,425 - mini_librispeech_prepare - INFO - Preparation completed in previous run, skipping.
2021-11-03 06:44:16,439 - speechbrain.utils.parameter_transfer - DEBUG - Collecting files (or symlinks) for pretraining in results_8batch_80LMbatch_1lr/CRDNN_BPE_960h_LM/2602/save.
2021-11-03 06:44:16,439 - speechbrain.pretrained.fetching - INFO - Fetch model.ckpt: Linking to local file in /stuff/guray/speechbrain/templates/speechrecognition_1/ASR/../LM/results/RNNLM/save/CKPT+2021-11-03+05-31-44+01/model.ckpt.
2021-11-03 06:44:16,439 - speechbrain.pretrained.fetching - INFO - Fetch 1000_unigram.model: Linking to local file in /stuff/guray/speechbrain/templates/speechrecognition_1/ASR/../Tokenizer/save/1000_unigram.model.
2021-11-03 06:44:16,439 - speechbrain.utils.parameter_transfer - INFO - Loading pretrained files for: lm, tokenizer
2021-11-03 06:44:18,184 - speechbrain.utils.checkpoints - WARNING - During parameter transfer to RNNLM(
  (embedding): Embedding(
    (Embedding): Embedding(1000, 256)
  )
  (dropout): Dropout(p=0.0, inplace=False)
  (rnn): LSTM(
    (rnn): LSTM(256, 2048, num_layers=2, batch_first=True)
  )
  (dnn): Sequential(
    (linear): Linear(
      (w): Linear(in_features=2048, out_features=512, bias=True)
    )
    (norm): LayerNorm(
      (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (act): LeakyReLU(negative_slope=0.01)
    (dropout): Dropout(p=0.0, inplace=False)
  )
  (out): Linear(
    (w): Linear(in_features=512, out_features=1000, bias=True)
  )
)       (w): Linear(in_features=2048, out_features=512, bias=True)
    )
    (norm): LayerNorm(
      (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (act): LeakyReLU(negative_slope=0.01)
    (dropout): Dropout(p=0.0, inplace=False)
  )
  (out): Linear(
    (w): Linear(in_features=512, out_features=1000, bias=True)
  )
) loading from results_8batch_80LMbatch_1lr/CRDNN_BPE_960h_LM/2602/save/lm.ckpt, the object could not use the parameters loaded with the key: rnn.bias_ih_l0
2021-11-03 08:53:03,011 - speechbrain.utils.checkpoints - WARNING - During parameter transfer to RNNLM(
  (embedding): Embedding(
    (Embedding): Embedding(1000, 256)
  )
  (dropout): Dropout(p=0.0, inplace=False)
  (rnn): LSTM(
    (rnn): LSTM(256, 2048, num_layers=2, batch_first=True)
  )
  (dnn): Sequential(
    (linear): Linear(
      (w): Linear(in_features=2048, out_features=512, bias=True)
    )
    (norm): LayerNorm(
      (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (act): LeakyReLU(negative_slope=0.01)
    (dropout): Dropout(p=0.0, inplace=False)
  )
  (out): Linear(
    (w): Linear(in_features=512, out_features=1000, bias=True)
  )
)

loading from results_8batch_80LMbatch_1lr/CRDNN_BPE_960h_LM/2602/save/lm.ckpt, the transferred parameters did not have parameters for the key: rnn.rnn.bias_hh_l1
2021-11-03 08:53:03,011 - speechbrain.utils.checkpoints - WARNING - During parameter transfer to RNNLM(
  (embedding): Embedding(
    (Embedding): Embedding(1000, 256)
  )
  (dropout): Dropout(p=0.0, inplace=False)
  (rnn): LSTM(
    (rnn): LSTM(256, 2048, num_layers=2, batch_first=True)
  )
  (dnn): Sequential(
    (linear): Linear(
      (w): Linear(in_features=2048, out_features=512, bias=True)
    )
    (norm): LayerNorm(
      (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
    )
    (act): LeakyReLU(negative_slope=0.01)
    (dropout): Dropout(p=0.0, inplace=False)
  )
  (out): Linear(
    (w): Linear(in_features=512, out_features=1000, bias=True)
  )

What could be the reason of this issue? Can you plase help me?

Regards.

Hi, this is a duplicate from Github. The discussion will happen on GitHub.