Does SpeechBrain has support for low resource languages?

If not, is there a team dedicated to its development? Is there a way for me to contribute to this?



Hi, I am aware of very few work working on that. Especially by incorporating pre-trained wav2vec models (see the ongoing PR on that). However, we don’t have a “dedicated team” for that. I invite all peoples interested in this topic to discuss this here. I suppose that a good first step could be to select a low-resource dataset and develop an official recipe for it

Mozilla’s common voice is an open-source dataset for low resource language. Also, Facebook open-sourced its massive dataset on such languages. It would be nice if we can get something done which is different than transfer learning. (pretraining on high resource and fine-tuning on low resource)

Yes … For now, we don’t have a lot of recipes on CommonVoice. As a first step, a good contribution would be to run more (and thus distribute the models on HF afterward). As wav2vec2 are integrated, we could use the multi-lingual wav2vec2 model as a feature extractor for few very low-resource languages in CV and see what happens.

I have a forthcoming paper relating to integrating ASR and endangered languages, please have a look and let me know if you any questions.

Hi, i’m also wondering about how to go about my low resource problem with speechbrain. I have a lot of unaligned audio, no existing acoustic model to do forced alignment. Any advice welcome. For example, should I just hand label as much audio as i can so i can build an acoustic model and then do forced alignment? Or something else?

You will have to dive a bit into the literature. It’s not only SpeechBrain, it’s the same for all toolkits. You could start with a pretrained SSL model (W2V2) for instance, and fine-tune of few labels, then you can try to do self-training on your unlabelled data