Integrating Dragonfly with Speechbrain for Commands

Greetings Everyone! I use speech recognition to control my computer environment and work on open source projects to support others based on the dragonfly platform like Caster.

Dragonfly is a speech recognition framework for Python that makes it convenient to create custom commands to use with speech recognition software. It was written to make it very easy for Python macros, scripts, and applications to interface with speech recognition engines. Its design allows speech commands and grammar objects to be treated as first-class Python objects.

Dragonfly can be used for general programming by voice. It is flexible enough to allow programming in any language, not just Python. It can also be used for speech-enabling applications, automating computer activities and dictating prose.

Dragonfly contains its own powerful framework for defining and executing actions. It includes actions for text input and key-stroke simulation. This framework is cross-platform, working on Windows, macOS and Linux (X11 only).

Dragonfly currently supports the following speech recognition engines:

  • Dragon , a product of Nuance . All versions up to 15 (the latest) should be supported. Via Natlink (Windows OS only)
  • Windows Speech Recognition (WSR), included with Microsoft Windows Vista, Windows 7+, and freely available for Windows XP (Windows OS only)
  • Kaldi Via Kaldi Active Grammar
  • CMU Pocket Sphinx (with caveats)

As you can see dragonfly can integrate many different backend engines. Some use an intermediate package such as Natlink or Kaldi Active Grammar while others integrate directly like WSR or CMU Pocket Sphinx.

So dragonfly creates contexts for grammars to denote when sets of commands (rules/grammars) should be active. After all you don’t want all commands available all the time. This is done primarily through checking the foreground window window title/executable with more advanced functionality via function context. These rules and grammars are engine agnostic. I’m intrigued by the Speechbrain project. Therefore I wanted to to bring attention to the dragonfly project as a means to provide commands and execute actions on the behalf of the user.

Hi LexonCode, Dragonfly sounds awesome, thanks for bringing it to our attention. One issue with using an ASR model from SpeechBrain as a backend is that we don’t have online decoding yet (it’s on the roadmap), so it might be a little while before it’s feasible. But stay tuned!

(BTW, you might be interested in our spoken language understanding recipes: we have some for similar-ish voice control applications speechbrain/recipes/timers-and-such at develop · speechbrain/speechbrain · GitHub)

Thank you for your response. I look forward to for decoding! I did take a look at timers and in such. It’s geared more towards smart home/device integration versus OS integration.