Hi everyone i’m working on my undergraduate thesis on sign language recognition. My thesis title is “Recognition of signing sequences with hybrid archiectures”. My professor proposed that we use a DNN-HMM in kaldi. However the state of art perfomance of kaldi, I didnt find it very userfriendly. So I did my research and came across speechbrain.
So, since speech and vision recognition are both sequence learning problems, that kaldi needs to read a sequence of acoustic features (speech) or a sequence of visual features (vision) , I was wondering if I can use SpeechBrain for this task instead.
This is a great question.
First, you have to see with your advisor if the use of HMM is mandatory or not. SpeechBrain does not support them yet, even if it is planed.
If it is not, then it would become interesting in the sense that: SpeechBrain is designed to deal with sequences. As long as you have time dimension, you could in theory process any kind of “feature” dimension. So it could be either speech signal or images. However, I don’t think that the architectures we propose are well adapted to image processing. You would have to create an encoder that is able to encode a sequence of images before applying our decoder in the top of it. But this sounds cool .
1 Like
@titouan.parcollet Thank you for your answer, I really appreciate it.
I have some questions regarding your answer.
- Do you know when speechbrain is going to support HMM ?
- As I’m a beginner in the field of speech/visual recognition, can you describe me please some requirements the enconder will need to meet to apply it on top of SpeechBrain’s architecture ?
- And finally, after applying the enconder, can you suggest any tutorial for the decoding part ?
Thank you in advance.