Handling OOV and under-represented words

I have trained an RNN-T model for Bangla speech recognition. It works pretty well on the frequently seen words but dramatically fails for OOV and performs poorly on under-represented words.

I have used BPE with a vocab size of 1000. Can you suggest to me how can I improve this scenario?