AI-equipped glasses can read silent speech
(Nanowerk News) Developed by Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) Lab, the low-power wearable interface requires only minutes of user training data before it can recognize commands and run on smartphones, researchers said.
Zhang is the main author of “EchoSpeech: Continuous Silent Voice Recognition in Minimally Obtrusive Glasses Powered by Acoustic Sensing” (pdf), which will be presented at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI) this month in Hamburg, Germany.
“For people who can’t speak their voice, this silent speech technology can be an excellent input for a voice synthesizer. It can restore the patient’s voice,” said Zhang of the potential use of the technology with further development.
In its current form, EchoSpeech can be used to communicate with others via smartphone in places where speech is uncomfortable or inappropriate, such as a noisy restaurant or an empty library. The silent conversational interface can also be paired with a stylus and used with design software such as CAD, all while eliminating the need for a keyboard and mouse.
Equipped with a pair of microphones and a speaker smaller than a pencil eraser, the EchoSpeech glasses become a wearable AI-powered sonar system, sending and receiving sound waves across the face and sensing mouth movements. A deep learning algorithm, also developed by SciFi Lab researchers, then analyzes these echo profiles in real time, with about 95% accuracy.
“We transfer the sonar to the body,” said Cheng Zhang, assistant professor of information science at Cornell Ann S. Bowers College of Computing and Information Science and director of the SciFi Lab.
“We are very excited about this system,” he said, “because it really pushes the performance and privacy front forward. It is small, low-power and privacy-sensitive, all of which are important features for deploying new wearable technology in the real world.”
SciFi Lab has developed several wearables that track body, hand, and facial movements using machine learning and a mini wearable video camera. Recently, the lab has switched from cameras to acoustic sensing to track facial and body movements, citing better battery life; tighter security and privacy; and smaller, more compact hardware. The EchoSpeech is built from a similar acoustic sensing device in the lab called EarIO, wearable earbuds that track facial movements.
Most technologies in silent speech recognition are limited to a predefined set of commands and require the user to face or hold the camera, which is impractical and infeasible, said Cheng Zhang. There are also major privacy concerns involving wearable cameras – both for the user and those with whom the user interacts, he said.
Acoustic sensing technologies like EchoSpeech eliminate the need for wearable video cameras. And because audio data is much smaller than image or video data, it requires less bandwidth to process and can be passed to a smartphone via Bluetooth in real time, said François Guimbretière, professor of information science at Cornell Bowers CIS and co-author.
“And because data is processed locally on your smartphone instead of being uploaded to the cloud,” he says, “privacy sensitive information is never out of your control.”
Battery life also increases exponentially, says Cheng Zhang: Ten hours with the acoustic sensor versus 30 minutes with the camera.
The team is exploring the commercialization of the technology behind EchoSpeech, thanks in part to Ignite: Cornell Research Lab to Market gap funding.
In upcoming work, SciFi Lab researchers explore smart glass applications to track facial, eye and upper body movements.
“We think glass will become an important personal computing platform for understanding human activities in everyday life,” said Cheng Zhang.