Speech Recognition – Smart Microphone – Jetson Development Kits

This article starts a new series on Speech Recognition. A “smart microphone” is an array of microphones with special signal processing hardware to locate and isolate speech, even in noisy environments. Looky here:


All the cool kids now have in home, voice activated devices like Amazon Echo or Google Home. These devices can play your favorite music, answer questions, read books, control home automation, and all those other things people thought the future was about in the 1960s. For the most part, the speech recognition on the devices works well, although you may find yourself with an extra dollhouse or two occasionally.

One of the enabling technologies of these devices is what is called a microphone array. Several microphones are placed in a circle, with the output being sent to a Digital Signal Processor, or DSP for short. The DSP has several special algorithms which help detect where a voice originates from (localization) and uses audio beamforming to process, reduce echo and reverberation from the signal. The result is an audio stream that is an accurate representation of the original voice.

Once a suitable audio stream has been acquired, the stream can be either processed locally or sent to a server for further processing. In the case of something like an Amazon Echo, a local processor “listens” to the incoming audio stream for a keyword trigger, e.g. “Alexa”. Once the keyword has been identified, the rest of the audio stream is sent to online servers which do speech recognition on the stream, and then parse the audio into “actions”. The service then sends the action back to the device. These actions vary from device to device, but typically allow the user to request the device to play music, control home automation devices, or ask/answer questions. Amazon, Google and Microsoft all have APIs to interface their online services with audio.

The online services have large data bases which they have used machine learning techniques to train their speech recognizers. You may have noticed that many of the online services have become significantly better at recognizing speech over the last couple of years. This advance is mostly due to advances in machine learning.

Speech Recognition for the rest of us

The consumer devices are interesting, and now the technology for smart microphones is available separately from several manufacturers. In the video, a Seeed Studio Respeaker is shown. There are several other manufacturers, the Respeaker in the video was ordered through a Kickstarter campaign.

The Far Field Microphone Array is built around a XVSM-2000 chip from XMOS. Watch the video for a rundown of the rest of the fun hardware that is available on the Respeaker, with sprinkles like RGB LEDs and an Arduino type of processor. The Jetson can talk to either the Respeaker Core or Microphone Array using USB.


Over the course of the next few articles, we’ll figure out how to interface with the Microphone Array, gather the audio stream, and then perform speech recognition both locally and through online services.


    • There are several manufacturers of such devices. The one in the video is a a Respeaker from Seeed Studio. To me, the Mic Array ($ 79 USD ) is the interesting piece (https://www.seeedstudio.com/ReSpeaker-Mic-Array-Far-field-w%2F-7-PDM-Microphones-p-2719.html). The Respeaker Core has 1 mic in the middle, along with a WiFi chip, LEDs and such. The Mic Array appears to be able to run independently of the Core, and has 7 Mics, LEDs, and all the DSP goodies built in. But I have just started work on getting the Microphone Array to work on the Jetson. When the Mic Array is plugged into the Jetson, it does act as a USB Mic, so at least it appears to works.

  1. hello
    did you test recognition ?
    could you do a video with noise, and play recorded sample to hear echo cancellation and others goodies.
    Christmas is past, but we need to always hope !! Lol
    Thanks Jetson hacks for all your posts. (I’m happy owner of a TX1, and it’s due to you.. Thanks again, and don’t stop)
    PS: I’ll bother you regulary, lol.

    • The microphone array can be used stand alone, or attached to the ReSpeaker Core. The headers come attached to the mic array, no soldering necessary.
      The Mic Array board can be used independently either through the micro-usb connection, or though the headers.

  2. Anything new on using the ReSpeaker to connect to Amazon AWS (Alexa) or Google Home instead of the Echo Dot or Googles hardware? Saw you video where you control the robot using a headset but it would be nice to use the ReSpeaker. Also I understand that Google Home has now release the Speaker Recognition code. It would be even better if your robot knew it was you talking, not someone else! BTW, I have a TX1 and a ReSpeaker. Thanks

    • This was as far as I was able to figure out what was going on: https://github.com/jetsonhacks/JHReSpeaker
      The mic array is not terribly well documented, I couldn’t get some of the more interesting parts to work. I thought I had a better handle on it when I wrote this article. I was wrong. The code in the repository is all the low level stuff, it should now be possible to plug into recognition frameworks and such. A little work needs to be done to get the sampling rates to match the native rate.

  3. We can use for simulate sound source localisation to find azimuthal and elevation angle in ODAS open source software without microphone array.

Leave a Reply

Your email address will not be published.