Detect a sound in a audio

Hi! I would like to detect a particular sound in a audio (like my doorbell).
The first approach was that I would like to use the model for recognize the commands, but for fine-tuning this model I should have a lot of data of my doorbell (that I don’t have).

So, make sense If I do data agumentation with my doorbell (maybe the number of data is still weak)?

I should use a different approach like a Siamese network? Speechbrain can help me in this?

Thank you guys!