Using count-based language models

Hi there,
I’d like to use SB with count-based language models. Is there a recipe that shows how to use them (with, e.g., LM in arpa format)?

Many thanks,
Leonardo

Hi! Arpa LM integration currently is in a “in between” stage … On one side we are refactoring completely the decoding pipeline, and this will integrate ARPA LM. But this PR is quite complex, I suggest that, in the meantime, you have a look at CTC beamsearch decoding via ctcdecode by Antoine-Caubriere · Pull Request #773 · speechbrain/speechbrain · GitHub . Here, you will find an exemple of integration of ctc_decode from DeepSpeech, so you can use external ARPA LM. On our side, it worked wlll. Feel free to reach us if this PR isn’t clear enough.

OK, thank you. I’ll look into what you suggest.

p.s.: sorry for my late reply. I thought I got a reply alert in my email once I had received your reply.