Why a Roadmap?
As a community-based and open source project, SpeechBrain needs the help of its community to grow in the right direction. Opening the roadmap to our users enables the toolkit to benefit from new ideas, new research axes, or even new technologies. The roadmap lists all the changes and updates that need to be done in the current version of SpeechBrain.
How can I contribute?
If you don’t have any precise idea on what you would like to do with SpeechBrain, feel free to simply pick one or many items from the list to implement/solve them! Do not hesitate to contact us to know more about the progress of the different elements that interest you, there certainly is space for your help!
How do I propose a new item?
The roadmap heavily relies on the needs expressed by the community. Hence, we need you to keep expanding it! If you think that an item should be added to the roadmap, simply open a topic on the right category to foster some discussions. If enough peoples are interested, or if the task leader is convinced by your idea, he/she will add it to the roadmap.
- Measure the performance of mixed-precision training
- Facilitate the use of multiple optimizers
- Facilitate partial and gradual unfreezing of architectures
- Ensure batch independent evaluations
- Facilitate the use of multiple dataloaders
- Dynamic batching
- Making beamformer jitable
- Improve and expand tutorials
- Online decoding
- Extend N-Gram decoding to word-level decoding
- Refactor the transformer interface for even more transparency.
- Refactor the rescoring interface for even more transparency.
- Windowed attention for faster training and decoding with attention
- Scale wav2vec 2.0 experiments for ASR (various datasets, architectures …)
- Other types of efficient transformers
- Jasper and QuartzNet
- Optimize and test for production scenarios (benchmarking)
- Real-time CTC decoding
- K2 integration
- HCLG ASR
- Full implementation of wav2vec 2.0 (not only loading from Fairseq or HuggingFace)
- Full implementation of PASE +
- More fine-tuned languages
- Adding MEDIA and Port-Media recipes.
- All done for now!
- Couple Diarization pipeline with VAD (+ put model on HuggingFace)
- Speech separation with a varying number of speakers
- wav2vec 2.0 for IEMOCAP
- Add more acoustic features (PLP, pitch …).
- Tacotron 2
- G2P on HuggingFace