Why log_probs need to be detached in transducer_loss

Hi, Dear!
I’m confusing abount the code in speechbrain/nnet/loss/transducer_loss.py, line 254:

 def forward(ctx, log_probs, labels, T, U, blank, reduction):
  log_probs = log_probs.detach()

what’s the reason in detaching log_probs? In my opinion, by doing this, the backward gradient could not influence log_probs, so the whole network would not be tuned.
please give me some insight, thanks very much!

@aheba, who implemented it, can probably answer in more detail, but for the Transducer loss, we compute the gradient manually for efficiency instead of using autograd (you can do it using autograd, but it will be much slower), so I’m guessing it has something to do with that

Hi @galois_xiong, let me give you some insights:
1- we use custom loss based on torch.autograd.Function allowing us to define @staticmethod functions (forward and backward). Thus, it help to send needed tensors from forward to backward using ctx (context). you can check an example here: PyTorch: Defining New autograd Functions — PyTorch Tutorials 1.7.0 documentation
2- We use log_probs = log_probs.detach() for the purpose of using Numba cuda //. Numba didn’t accept Pytorch tensor with ._grad which is not needed for computing alpha-beta matrix + grad.
3- the backward function of our Transducer loss will give the updated grad through log_prob tensor to the network…

Hi, @aheba , thank you very much! your answer is really helpful , and I can learn a lot from your codes :+1: