Another way of batch sampling


I did some experiments with the built-in SpeechBrain sampler. It turned out that the SpeechBrain sorting strategies don’t work well in our ASR setup.

  1. Sorting utterances in ascending order does not achieve maximum quality. (at least in our setup)
  2. Random mixing of utterances has drawbacks due to inefficient. This leads to extra memory usage when there are different utterance duration, which is caused by large padding.

We have a solution. It is not a clean code regarding SpeechBrain standards. Here it is, class named BucketingBatchSampler

It works like this:

  1. Groups utterances by duration.
  2. Each epoch mix utterances in groups and batches themselves.
    All this minimizes padding in the batch.

As a result, with BucketingBatchSampler it is possible to train the ASR model 1pp WER better.

What do you think? Is this functionality interesting for the project?

Hi, really sorry for the late reply.This is quite interesting, we have this implemented on our Dynamic Sampler (This the PR that does dynamic batching.) It should be merged very soon :slight_smile:

Another feature that is most likely missing in the Dynamic Sampler is the shuffling of utterances between epochs. As a result, we don’t have identical batches between epochs. It works like data augmentation. Am I right that Dynamic Sampler do not support shuffling? Is this feature looks interesting?
For example:
Epoch 1. Batches: [utt1, utt2], [utt3, utt4], [utt5, utt6]
Epoch 2. Batches: [utt1, utt3], [utt2, utt5], [utt4, utt6]