Baseline results for very small datasets?

Are there any reported results with very small (~10-15 hours) training set from scratch (without using pre-trained model)? Or if developers did any similar experiment? I trained a system with 15hours from scratch with enc-dec attention architecture (CRDNN) and getting ~99% minimum WER (12 epochs). I assume its due to too small data to converge training but I am not sure if its so or I am doing something wrong.
Any help/comments are highly appreciated.

Many thanks,

Hey, it actually depends on many parameters. 15H can be sufficient, or completely insufficient depending on the complexity of the task (noise, vocabulary, number of speakers, recording conditions, languages etc etc). You can try to play with a CTC-only recipe first.

@titouan.parcollet Are you referring to the TIMIT/CTC recipe here?

For instance. We also have a CommonVoice CTC i think