Speech Separation model do not use Multiple GPU. Only the first GPU is assigned


I am using the SepFormer model for speech separation. First, I tried to use this for the LibriMix dataset. No matter how many GPUs I assign for my job, it only uses the first GPU. I am trying this experiment on AWS. I am simply using DataParallel (DP) utility. Also, once I try to train the model using Distributed Data Parallel (DDP) with even 2 GPUs using the same command given in the LibriMix recipes (speechbrain/recipes/LibriMix/separation), I got an error, and it does not run. The command is:

python -m torch.distributed.launch --nproc_per_node=2 train.py hparams/sepformer-libri2mix.yaml --data_folder /yourdatapath --distributed_launch --distributed_backend=‘nccl’

Finally, I try to use the SepFormer model for my own dataset. I created the mixture dataset as Libri2Mix; however, I ran to the same multiple GPUs issue. Moreover, it is extremely slow (each epoch takes almost one day). I have to say that I am using exactly the same train.py file provided in the LibriMix recipe. Each utterance in my own dataset has the same length as Libri2Mix.

I appreciate if you can help me.

Hi, please see the github issues on this:

The given command on the issue should work correctly. I would suggest using DDP and not DP, as I observed worse performance with DP.

For The custom dataset: Can you try maybe training with shorter sequences? You can do this by setting
limit_training_signal_len: True
training_signal_len: 32000.

Let me know if it works.