Voice-assisted technologies are extremely popular; already there are 2.5 billion such devices in use and that's expected to triple to 8 billion by 2023. This growth appears to be unstoppable—despite the privacy and security vulnerabilities in mainstream voice-assisted technology.
One of these is the "open-window" vulnerability where, for example, a malicious person walks by an open window and shouts, "Hey, unlock the door!" and gains access to the house. Researchers have also identified thousands of false-positive wake words for Alexa and Google, potential attack vectors to inject malicious commands. Some people bring up the risk of subsonic commands injected over TV. Amazon may already be using frequency manipulations to keep Alexa from activating during its commercials. And, as with any web-connected computer device, there's the potential for backdoors and other common vulnerabilities.
Open source for privacy and security
The company I work for, Mycroft, is built around the idea that voice-assisted technology can be secure. We have a process to handle common exploits and vulnerabilities (CVEs) that are disclosed. As a developer tool, we rely heavily on systems like SSH and always encourage changing default passwords on devices, including in setup wizards. We also move things like intent parsing, skills, and text-to-speech (TTS) to the device, which other major players do in the cloud. For the technically savvy home user, we'll soon release the necessary pieces to run the entire experience in a household behind a firewall. We want to give our users as much control over the software as possible.
Voice-printing, which enables the AI to differentiate people by voice, should ease the "open-window" and false-positive wake-word vulnerabilities by giving users the ability to assign permissions for locks, purchases, and other sensitive capabilities to specific people. These can all be bolstered with two-factor authentication or deeper voice biometrics using one-time spoken passwords. Google and Amazon have deployed some voice-printing in their assistants, and we're connecting with some companies that show promise in the field. Improvements in wake-word spotting will limit the lists of usable false-positives. In a properly designed feedback loop, a false positive should be unlikely to work, as it would be tagged in technology like Mycroft's Precise tagger.
Open source for innovation
Historically, the way to encourage experimentation is through open source. That's how real innovation happens and new technologies get built.
We give innovators the opportunity to experiment with voice assistant technologies with fewer strings attached. Open source makes the software a community effort. For example, instead of a team of 16 with nine technical members, Mycroft has thousands of developers providing feedback, building new features, and contributing to the core software. And thanks to our licensing, innovators can easily turn around and build a business out of their new, awesome implementation.
On a deeper note, being open means this technology can fully represent the user, not only the company providing it. Voice assistants have the capability to transform the way people interact with all technology. Voice will transform homes, offices, mobile devices, and public spaces in ways not yet imagined.
The question we hope people ask is: "Do I want that technology to work solely in my best interest, or in the best interest of a retailer or a search and data company?" We think most people would pick the former. And we know the only way to provide that is by being open, transparent, and community-driven.
Voice assistant features
Mycroft does the same things as other voice assistants—but differently. Most voice assistants' voice stack uses on-device wake-word spotting, then sends the rest of the interaction off to the cloud where it is handled, and streams a response back to the speaker.
Mycroft moves most of this onto the device where the software is running. We use on-device wake-word spotting to listen for a command. When the wake word is detected, the command is recorded and sent to the cloud for speech-to-text transcription. Once transcribed, the text file is sent back down to the device where the natural language processing, skill handling, and speech synthesis are carried out.
Skills give Mycroft its abilities. We've been steadily adding skills from both the internal team and the community to the new Mycroft Skills Marketplace. Mycroft can control multiple music sources, connect to numerous IoT platforms, get general info from 12 sources like Wikipedia and Wolfram|Alpha, play games, roll dice, tell stories, run speed tests, and more.
We made Mycroft modular, so it's easy for users to swap out pieces. For example, we currently provide two wake words and three voices, with more on the way, but Mycroft can also run custom wake words and TTS voices from any provider—cloud or local.