I am interested in testing SpeechBrain against CHiME 5 and Reverb datasets.
Is any of these corpora already being worked on?
What about pre-trained models? For example, ASRs that were trained and tested against WSJ dataset?
Will they be offered? What about the licensing?
Assuming no further training is necessary, will users be able and allowed to use the pre-trained engines?
- We currently don’t have a recipe for CHIME5. Note that we would be very happy if someone does it. We could provide help if needed (but the main development has to be made by an external collaborator, as we are currently working hard toward the release).
- Pretrained models are being integrated for example, see our HuggingFace repository: speechbrain (SpeechBrain) . If you want to give it a try, you simply need to fetch this branch: https://github.com/speechbrain/speechbrain/pull/534 . Hopefully, we will integrate this feature before next Wednesday. With respect to your WSJ question, what you could do is: Look at the PR I gave you, especially the LibriSpeecg Seq2seq recipe (train folder). You will see that we now have a (pre-trainer). We will provide a tutorial soon. But the idea is that you could easily build a recipe for WSJ where you load the ASR or LM part of a pre-trained LibriSpeech model and try it. Building new recipes is the hardest part as you need to multithread Human Resources Usually, this grows once the toolkit is released (very soon now).
- We will never host any dataset that isn’t “open-sourced”. This is terrible for the speech community, but we still need to go through LDC for some of the datasets … However, more and more datasets are now released without any prohibitive licensing scheme (CommonVoice, VoxPopuli …)
- This is a very good question. I think that from a legal perspective, you can definitely use a pretrained model as you don’t any access to the data.
Thank you for your replies.
I can try to make a recipe for the CHiME 5 dataset.
Unfortunately, I am not very familiar with the codebase and would need some help
or documentation about the hierarchy, project tree, or just UML schematics that describe the interconnects between the different modules.
Also, a template recipe example can be very useful.
Right, thanks for that. What you can do:
- Starts with properly reviewing the basic tutorials: SpeechBrain You need to understand properly these ones to be able to perfectly grasp the toolkit. I believe that all tutorials can be understood in less than 1H.
- Then two choices: either you jump into a big recipe directly (like CommonVoice) or you prefer to start with a simpler template such as the one provided here: https://github.com/speechbrain/speechbrain/tree/develop/templates .
- To get some help: Simply consider forking the repository and opening a Pull Request. Contribution instructions are given here: Contributing — SpeechBrain 0.0.0 documentation This will help you setting everything properly so you can contribute This will allow you to see your code and maybe relate more easily to your problems if you have some.
Having a recipe for CHiME 5 and CHiME6 would be super cool! I know @popcornell was interest in that as well…