Which recipe(s) for fine-tuning with noisy dataset?

Hi Speechbrain community,

First of all, many thanks for your tremendous job! :+1:

We are building an ASR for a specific technical domain, in which audios are noisy (background noise, audio distorsion, etc.) and recorded in 8k/8b. To build our STT engine, we collected a dataset of around 50k clips with transcriptions.
Our first attempts based on another ASR engine shown that we had better performances when fine-tuning an existing pre-trained model (after upsampling our audios to 16k/16b).

We now want to try your framework, but the list of possible recipes is impressive.

In order not to start randomly, we need some advise: which recipe(s) would you recommend for our case?

Fabien.

2 Likes