Usually, voice assistants only send commands to the cloud for further analysis when a previously registered activation word such as “Alexa”, “OK Google” or the associated brand name is recognized. But the devices are susceptible to the so-called fake wake phenomenon. “This phenomenon causes common voice assistants to recognize a different word as their own 'wake word' and listen for supposed commands,” explains Professor Ahmad-Reza Sadeghi, head of the System Security Lab. The voice assistant thus recognizes false wake-up words (“fuzzy words”), e.g. from passing conversations or television programs. These fuzzy words can be used by an attacker to activate the voice assistant without alerting the user. So far, research has focused on these sources of the fake wake phenomenon.
Small phonetic similarity is all that is needed
For the first time, the international research team led by Professor Wenyuan Xu, Dr. Yanjiao Chen and Professor Sadeghi has now succeeded in systematically and automatically generating their own false wake-up words instead of searching existing audio material for them. For the project, the four most popular voice assistants from each of the English and Chinese language areas were studied. The team's findings provide valuable information on how to better protect the privacy of users and how manufacturers can make their voice assistants more secure.
The generation of fuzzy words in the project began with a well-known initial word such as “Alexa.” In the process, the researchers did not have access to the model that recognizes the wake-up words or to the vocabulary underlying the voice assistant. They also explored the question of what causes the acceptance of incorrect wake-up words. First, they identified the features that most frequently contributed to the acceptance of fuzzy words. The determining factors focused only on a small phonetic section of the word. However, the voice assistants were also activated by fuzzy words that differed significantly more from the real wake-up words. Ambient noise, the volume of the words and the gender of the person speaking, for example, hardly played a role.
Machine learning for privacy protection
Using genetic algorithms and machine learning, the researchers were able to generate more than 960 custom fuzzy words in English and Chinese that activated the voice assistants' “wake-up word detector”. On the one hand, this shows the severity of the fake wake phenomenon and, on the other hand, it provides deeper insights into its causes.
The phenomenon can be mitigated by retraining the wake word detector with the generated fuzzy words. This allows the voice assistant to more accurately distinguish between fake and real wake-up words. Manufacturers can also use the generated fuzzy words to retrain existing models to make them more accurate and less vulnerable to fake wake attack. Thus, the research results offer a promising way to identify, understand, and mitigate privacy and security issues in voice assistants.
The results of the research project are presented in the paper FakeWake: Understanding and Mitigating Fake Wake-up Words of Voice Assistants (2021) by Yanjiao Chen, Yijie Bai, Richard Mitev, Kaibo Wang, Ahmad-Reza Sadeghi and Wenyuan Xu.