Tracking gradient w.r.t. to input audio sample: Fbank breaks computation graph

Hi,

I using the pretrained seq2seq model. I want to obtain the gradient w.r.t. to the audio input. Unfortunately, calling the compute_features method breaks the computation graph and the output features will not have a grad_fn associated with it.

Consider the following code example:

model = EncoderDecoderASR.from_hparams(
    source="speechbrain/asr-crdnn-rnnlm-librispeech"
)

x = torch.zeros(1,10000)
x.requires_grad = True

feat = self.model.hparams.compute_features(x)

assert feat.grad_fn is not None

The class Fbank offers a requires_grad attribute which defaults to False. However, changing it to True has no effect. I also tried to embed the line of code in a torch.enable_grad() context without success.

Of course, I could manually provide a modified Fbank but I would love to reuse methods available via model.hparams.

Any tips on how to best proceed are very much appreciated.

-HT

Hey hi, I think that this is due to the pretrained class (from which EncoderDecoderASR inherits). Freeze params is set to True by default. You need to set it to False.

Hi Titouan,

Thanks for your suggestion. I tried but this won’t resolve the issue, which makes sense I think. Let me share my thoughts:

Instantiating the pretrained model with the argument freeze_params=False will only leave the model in train mode and thus the parameters will have requires_grad=True.

But that’s not what I aim for. I’m rather happy that the model parameters are already frozen. Namely, I would like to backpropagate through the model w.r.t. to the input and not w.r.t. the parameters.

I initially fell short of additional motivation for my use case I guess. Namely, I am looking into generating adversarial examples. That’s the reason I’m interested in obtaining gradients w.r.t. to the original inputs and not w.r.t. to the parameters.

Following your suggestion, I continued to dig a bit deeper as well. As stated in my first post, the problem likely lies in the Filterbank, the feature extractor. It’s wrapped in a torch.no_grad() context. Thus this will always break the computation graph.

Consider the following example:

Let’s assume we have set freeze_params=False, freeze=False and requires_grad=True wherever necessary. The forward pass of a Fbanks object calls the forward of a Filterbank instance, see speechbrain/features.py at 5141b53a172385f08ea250774c9192e6114d647d · speechbrain/speechbrain · GitHub. In the latter forward call, we create a new tensor f_central_mat derived from the attribute self.f_central, which is a parameter and thus self.f_central.requires_grad=True. However, f_central_mat.requires_grad=False. Again, this is expected due to the outer context torch.no_grad() of the Fbank forward pass.

I wonder, if it is possible to set the torch.enable_grad() context when the Fbank.requires_grad is set to True.

A similar issue holds for the MFCC feature pipeline.

Looking forward to your thoughts.

-HT

@mravanelli any idea why we have torch.no_grad() in the forward ? This would prevent the use of requires_grad ?