Complicated dataio pipeline

Hi everyone, i’m trying to create an automatic speech recognition system. It should be capable of recognizing a target speaker speech from a noisy audio signal, created by mixing the target speaker speech from the dataset, with a speech signal from another random speaker from the same dataset.
I’m currently using a .json format data manifest file, created on MiniLibriSpeech with a modified version of prepareminilibrispeech.py script.
Each example has this structure:
{
“5694-64038-0000”: {
“wav”: “{data_root}/LibriSpeech/dev-clean-2/5694/64038/5694-64038-0000.flac”,
“length”: 2.595,
“words”: “ADVANCE INTO TENNESSEE”,
“speaker_id”: “5694”,
plus other fields for other operations…

I can’t figure out some aspects:

  1. how do i (smartly) include all the audio files i could use in the mixture inside the examples of the data annotation files? (there are lots of udio files to use for mixing, i don’t want to create very huge lists for every single example in the data manifest file)
  2. how do i sample one random speaker (different from the target speaker) from all the dataset examples in the pipeline?

I would really appreciate some help! bye.