Problem: Apply deep learning technology to recognize speech in English.
Context: For decades, we’ve communicated with machines through a mouse and keyboard. Transcribing audio into text has long been a very challenging task. With deep neural networks, the ability of computers to recognize speech has significantly improved. DeepSpeech, an open source speech-to-text engine based on the “Deep Speech: Scaling Up End-to-End Speech Recognition” research paper, has played a big part in this improvement. The latest pre-built American English model of DeepSpeech has a 11% word error rate.
Model type: Deep neural networks (DeepSpeech)
What we did: We deployed a DeepSpeech pre-built model using a SnapLogic pipeline within SnapLogic’s integration platform, the Enterprise Integration Cloud. An aside: you can deploy the SnapLogic pipeline on your own GPU instance to speed up the process. (More on how we built this demo.)
In this demo, click “Record,” then start saying words for five seconds. The model will then interpret what you’ve said. Note, since we have limited resources for demos, you may experience delayed responses. You can also click on “Random” to get a random voice from a subset of Common Voice.
In practice, you can train the model based on your audio dataset. You can deploy the pipeline on GPU instance to speed things up.
We promise not to store your voice in this demo.
The transcript will be displayed here.