How to setup for 12 GB CUDA (train_with_wav2vec)

I’ve started using seq2seq/train_with_wav2vec.py with train_fr_with_wav2vec.yaml
The example parameters are tailored for 16 GB cards, mine is 12 GB and it is not clear how to change parameters to fit on my card. Especially, when training is running nice quite long, and close to the end (around 70%) crushes - out of memory.

Every time I train, I see that my memory consumption slowly increases during training. It may increase by several GB. Maybe it’s a sign that something has forgotten to be erased?

How do I adjust settings to make sure it will train on a 12GB card?

This is an illustration of my drama.
A few hours of training, monitoring with a spreadsheet and the final roulette of the big boom.

This increase in VRAM consumption is expected if you use sorting=ascending. In this setup, you have small length at the beginning and long sequences at the end. What you can do is: use sorting=descending and try to find the right architecture so it fits your GPU. Or you can remove too long sentences. Or you can reduce the batch size. Or you can checkout our PR on dynamic batching and try to adapt it to your need. The latter option would be the best, but also the more complex one :stuck_out_tongue:

Once you found the correct set of hyper parameters, do not forget to switch back to ascending.

Thanks, will check it.
Why not to sort random?

Sort random induces longer training times :slight_smile: