I have trained an RNN-T model for Bangla speech recognition. It works pretty well on the frequently seen words but dramatically fails for OOV and performs poorly on under-represented words.
I have used BPE with a vocab size of 1000. Can you suggest to me how can I improve this scenario?