Should LM is updated every change of corpus file?

Hello, I am training an ASR model in Turkish. While training ASR model i am planning to improve the corpus file periodically. Now, my question is should i retrain LM and ASR model every time i update corpus file?

And second, while training tokenizer if i set the “vocab size” 5000, should i set “output neurons” parameter again 5000 both in LM and ASR?

Thank you.

For the 1. This is an open question. Ideally, yes, you can slightly finetune your LM with your new data and re-inject few previous samples i.e., you don’t need to do the whole retraining.

  1. Yes all of them must be 5000 (and exactly the same tokenizer actually !!)

Thanks for your response. How can i do that can you please give some more detail?