Hi! I would like to detect a particular sound in a audio (like my doorbell).
The first approach was that I would like to use the model for recognize the commands, but for fine-tuning this model I should have a lot of data of my doorbell (that I don’t have).
So, make sense If I do data agumentation with my doorbell (maybe the number of data is still weak)?
I should use a different approach like a Siamese network? Speechbrain can help me in this?
Thank you guys!