Pretrained Models for Audio-to-Phoneme

I am doing a project and one of the element in the pipeline is to get phonemes from the audio data. I am trying to get phonemes directly from the audio rather than performing grapheme-to-phoneme conversion.

Is there any SpeechBrain pretrained model that I can use in my pipeline to get phonemes from the audio data? It would be super helpful if I could get directed to any other tools as well.

Hum, not really appart from TIMIT but I won’t expect much from it as it has been train on a very small amount of data. However, I guess it should be feasible to extract phonemes with G2P from a bigger datasets and to then do a new training …