I’m training ASR with train_fr_with_wav2vec.yaml
And spotted following options:
Does it mean that unigram gives best results?
Any hint how to setup output_neurons for generic purpose Spansh ASR with ~100k words?
Well, it depends on many factors with the biggest one being: your task. In this context, honestly, there isn’t any magical heuristic. Maybe you can try to find some paper investigating that … Unigram is good for ASR as it smooths the BPE with some "language information. But this could be bad for others task. For the size of the BPE, honestly, I have no real intuition … Sometimes large numbers work better, sometime they don’t.