Error while training the audio data with common voice

Currently working on a volunteering project for the disabled people, I was looking for something related to speech recognition where they can use speech tool to achieve their day to day task such as save Mahesh as contact, please call sandeep if they have poor vision strength etc.

The linguistic culture is Indian, tried speechbrain/asr-wav2vec2-commonvoice-en
I spoke PLEASE CALL MAHESH got.

image (1)

I got hold of indian ascent data on which I will train the asr model but upon running

!python train.py hparams/train_en_with_wav2vec.yaml --data_folder=/content/nptel-pure
got this error 
speechbrain.core - Exception:
Traceback (most recent call last):
  File "train.py", line 310, in <module>
    character_coverage=hparams["character_coverage"],
  File "/content/speechbrain/recipes/CommonVoice/ASR/seq2seq/speechbrain/speechbrain/tokenizers/SentencePiece.py", line 170, in __init__
    run_on_main(self._train_BPE)
  File "/content/speechbrain/recipes/CommonVoice/ASR/seq2seq/speechbrain/speechbrain/utils/distributed.py", line 61, in run_on_main
    func(*args, **kwargs)
  File "/content/speechbrain/recipes/CommonVoice/ASR/seq2seq/speechbrain/speechbrain/tokenizers/SentencePiece.py", line 304, in _train_BPE
    spm.SentencePieceTrainer.train(query)
  File "/usr/local/lib/python3.7/dist-packages/sentencepiece/__init__.py", line 407, in Train
    return SentencePieceTrainer._TrainFromString(arg)
  File "/usr/local/lib/python3.7/dist-packages/sentencepiece/__init__.py", line 385, in _TrainFromString
    return _sentencepiece.SentencePieceTrainer__TrainFromString(arg)
RuntimeError: Internal: src/trainer_interface.cc(406) [!sentences_.empty()] 

What is the meaning of error?

Hi, so this is an issue related to the SentencePiece tokeniser. Would you mind opening an issue on GitHub as this is code-related ? I am really sorry for the late reply by the way.

Issue is opened.

However i get intuition ,It could be due to failure in parsing the raw speech dataset [Speech dataset]
As you can see the training failed.

Speech dataset

Thanks