Speech to Text with DeepSpeech

Speech to text (STT) is a useful building block so I took a look at setting up DeepSpeech 0.6.1 for a test drive. My recipe for installing DeepSpeech on a Pi 4 running Raspbian Lite follows. If the Pi 4 is running the GUI desktop some packages may already be installed.


sudo apt install git python3-pip python3-scipy python3-numpy python3-pyaudio libatlas3-base
pip3 install deepspeech==0.6.1
mkdir ~/dspeech
cd ~/dspeech
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-models.tar.gz
tar xvf deepspeech-0.6.1-models.tar.gz
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/audio-0.6.1.tar.gz
tar xvf audio-0.6.1.tar.gz
source ~/.profile
deepspeech --model deepspeech-0.6.1-models/output_graph.tflite --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-3980-0043.wav
deepspeech --model deepspeech-0.6.1-models/output_graph.tflite --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/4507-16021-0012.wav
deepspeech --model deepspeech-0.6.1-models/output_graph.tflite --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/8455-210777-0068.wav



At this point DeepSpeech should have transcribed the three test audio files.

The next step is to plug in a USB microphone to test live STT. Change the default ALSA device from 0 to 1. ALSA device 0 is the Raspberry Pi internal audio hardware. ALSA device 1 is the external USB microphone.Change alsa.conf file so the microphone is the default ALSA device.

sudo nano /usr/share/alsa/alsa.conf

OLD:defaults.ctl.card 0
NEW:defaults.ctl.card 1
OLD:defaults.pcm.card 0
NEW:defaults.pcm.card 1


Install examples including the microphone example.

git clone https://github.com/mozilla/DeepSpeech-examples
pip3 install halo webrtcvad
cd deepspeech-0.6.1-models/
python3 ../DeepSpeech-examples/mic_vad_streaming/mic_vad_streaming.py -m ./output_graph.tflite -l lm.binary -t trie -v 3


The results are very good. Once DeepSpeech is installed it does not depend on cloud servers or the Internet. All the work is done on one core of the Pi 4. With some additional hardware such as a Trinket M0, STT could be added to systems without STT but allow USB keyboard input.

References

https://github.com/mozilla/DeepSpeech
https://github.com/mozilla/DeepSpeech/wiki
https://discourse.mozilla.org/c/deep-speech

Comments

Popular posts from this blog

Defective Arduino USB Host Shield Boards

How to use USB RFID readers on an Arduino Uno

WeMos D1 R2 ESP8266 with USB Host Shield