Hi Speechbrain community,
First of all, many thanks for your tremendous job!
We are building an ASR for a specific technical domain, in which audios are noisy (background noise, audio distorsion, etc.) and recorded in 8k/8b. To build our STT engine, we collected a dataset of around 50k clips with transcriptions.
Our first attempts based on another ASR engine shown that we had better performances when fine-tuning an existing pre-trained model (after upsampling our audios to 16k/16b).
We now want to try your framework, but the list of possible recipes is impressive.
In order not to start randomly, we need some advise: which recipe(s) would you recommend for our case?