Amazon Tap: Go hands-free but say goodbye to battery life

Amazon Tap’s latest update, announced February 9th 2017, lets users go hands-free with the portable speaker. But there’s a hidden cost for this new convenience: battery drain.

Tap users previously had to press its button to activate Alexa, but the update brings a new in-app setting that puts Alexa in always-listening mode. If enabled, the Tap will always listen for “Alexa” or one of the system’s other approved wake words, no touch of a button needed.

It takes a lot of energy to be in listening mode constantly, which is why the standard Echo needs to be plugged in (and why the battery-powered Tap launched with its namesake button to limit the time it spends in listening mode). The switch to a hands-free Tap means more convenience but more power consumption, too.

How much more? The Tap’s batteries can last about 3 weeks without being in listening mode. Activate always listening mode, and battery life decreases to 8 hours. That’s a 21x loss. If you’re streaming music or listening to other audio, as opposed to asking Alexa questions that require processing, Amazon says the Tap’s batteries will last for 9 hours.

Eight or 9 hours of battery under heavy use isn’t bad for a portable Wi-Fi speaker, but it could be much better. If Amazon used smart piezoelectric microphones like ours in the Tap, the device could stay in always listening mode for three months. Put another way, an Amazon Tap with a button for voice activation uses the same amount of power as a Tap that’s constantly in listening mode using our VM1010 microphone. The microphone’s incremental power consumption is so tiny that it’s a rounding error on battery life. And we have a roadmap to cut this super low power consumption in half and further double the battery life.

Imagine that you have a quiet vacation house in Maine (one can dream), and you accidentally keep your Amazon Tap in listening mode when you leave. It’d be dead in 8 hours. But let’s say you swapped out your standard Tap for a power optimized version using piezoelectric microphones: if you came back one, two, up to four years later, walked into the house, and said “Alexa,” your Tap would light up and listen.

The chart below shows how this power-saving works:

In step 1 , wake-on sound, Vesper’s system waits in extreme low power mode for the wake word. The whole system is consuming less than 100 microamps, yet it’s always listening.

Step 2  — sleep/low power — is the Tap’s lowest power mode today. It burns through 2,000 microamps while waiting to hear the wake word.

Step 3  — keyword identification — revs up when the Tap hears the wake word, “Alexa.” Power consumption goes up to 3,000 microamps here. The processing of “is this the right wake word or not” still happens locally on the device; nothing is sent to the cloud yet. The system is also working to filter out human voices from other background noise.

In step 4 , active voice processing, you’ve already said the wake word and the Tap has identified it as the correct one. Now you’re making your more complicated request, like asking for the weather or a certain playlist. The processing of your request happens in the cloud on Amazon’s servers. This greater processing demand, along with speaker and Wi-Fi usage, shows up in power consumption, which spikes to 100,000 microamps.


Today, all responsive, high-performance systems using voice have to be plugged in (or suffer a quick battery death). But extremely low power components, like piezoelectric microphones, remove the plug-in constraint. Voice-activated systems won’t have to live next to outlets anymore; they can be indoors, outdoors, in kitchens, in the woods, in pockets…anywhere. Voice interfaces will be truly everywhere.

1 comment on “Amazon Tap: Go hands-free but say goodbye to battery life

  1. Victor Lorenzo
    April 17, 2017

    I read about your VM1010 last year when it was not commercially available yet.

    Have you incorporated new technologies into the MEMS device? At that time this device was more a sound-activated than a voice-activated device, as it could be deduced from the block diagram you published then.

    IMHO, this threshold based operation is perfect for detecting events like the user tapping over the device, but will not be capable of differentiating a loud noise from a user calling “Alexa”, “Thomas” or “Pizza”.

    I see something that could improve it and will probably become a great innovation (oups, patent trolls don't read this ;D), incorporate the ability to detect vowel formants by using some sort of SAW filtering. Most vowels can be discriminated using two formants, F1 and F2, and more precise results can be obtained using more formants. You will increase quiescent current comsumption during standby by a factor of 3, maybe more, but false positive triggers will be dramatically reduced.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.