How long will it take to create a speech recognition and speech diarization app?

Hello there everyone !

I wanted to know how long will it take someone to create a well trained and accurate speech recognition and diarization app .

I’m traying to make an app that does these specific tasks using speech brain but I don’t know how long it will take me as I’m still following the tutorials in the google colab (I’m still reading and testing about the brain class part) and it already took me a week to reach to that part .

So I’m asking how long will it take some one to learn about the rest of the parts of speech brain as well as to train the app to make it as accurate as possible .

Also I wanted to know if it’s possible from speech brain to read an audio file and generate a json file that is similar to this using speech brain:

{
  "results": [
        {
          "timestamps": [
            [
              "hello",
              0.68,
              1.19
            ],
            [
              "yeah",
              1.47,
              1.91
            ],
            [
              "yeah",
              1.96,
              2.12
            ],
            [
              "how's",
              2.12,
              2.59
            ],
            [
              "Billy",
              2.59,
              3.17
            ],
            [
              "good",
              4.01,
              4.30
            ]
          ]
          "transcript": "hello yeah yeah how's Billy good "
        }
  ],
  "speaker_labels": [
    {
      "from": 0.68,
      "to": 1.19,
      "speaker": 2
    },
    {
      "from": 1.47,
      "to": 1.93,
      "speaker": 1
    },
    {
      "from": 1.96,
      "to": 2.12,
      "speaker": 2
    },
    {
      "from": 2.12,
      "to": 2.59,
      "speaker": 2
    },
    {
      "from": 2.59,
      "to": 3.17,
      "speaker": 2
    },
    {
      "from": 4.01,
      "to": 4.30,
      "speaker": 1
    }
  ]
}

Thanks!

Hey hi,

Building a full app is something quite hard, especially if it has to work well :stuck_out_tongue:

For an expert in ASR + Diarization + SpeechBrain, it would already take a month depending on the datasets available.

1 Like