FORMAT OF DATA SET FOR MAKING CUSTOM LANGUAGE ASR SYSTEM (Speech Emotion and Speech to Text)

Hello.

Im working on making ASR system from scratch in Urdu and Punjabi language as my thesis work (so I am not much of an expert). I have a huge bunch of WAV file data available from a Call center which means that amount of audio data is well above 2000 hours. I want to make a Speech Emotion Recognition System (with voice finger printing) and a Transcription system (on which I apply Sentiment Analysis). It is to be implemented on live call and on saved data as well.

Can anyone please show me a sample as to how to set up the dataset (even if it is in English or Hindi or Mandarin or some other language) so that I dont face implementation issues? I just want to understand the architecture so that in terms of data set I work one time and then I dont have to worry on it again.

Regards

2 Likes

Wha do you mean for “Set up the dataset”?