Someone remotely accesses your bank account using a computer-generated voice that sounds indistinguishable from yours.
A hacker uses the voice recognition in your Alexa to charge a bunch of Amazon orders to your account.
You flip on the TV and see leaked audio of a political candidate saying something explosive is sweeping across social media, disrupting a country’s presidential election. No one knows it yet, but that recording has been faked, too.
Sounds like a dystopian vision of the future. But according to Hafiz Malik, associate professor of electrical and computer engineering at UM-Dearborn, all of these scenarios are possible today — and are likely to proliferate as voice-based technologies become an even bigger part of our lives.
Malik seems to be among the few who saw it coming. He began working in the then-obscure field of “audio forensics” in 2010, when Alexa and Google Home were still in the thought-experiment stage.
The problem, as he defines it, is that big tech companies spent a ton of energy building products that could both understand your voice and emulate human speech. But their eye was on creating a super cool user experience, not on the scary things people could do with the tech.
As a result, many of our technologies that use voice are now vulnerable to attacks. Threats like “replay attacks” that use a recording of someone’s voice to fool a piece of technology, are simple but effective. Malik said an even more sinister threat comes from “cloned” audio. Using just a few seconds of a person’s real voice and off-the-shelf software, a computer can create a realistic simulation of that person saying anything.
The implications for the latter are particularly troubling, and not just when it comes to hacked bank accounts or smart speakers. In the political sphere, there’s a whole emerging world of bogus multimedia known as “deepfakes,” where a computer generates video or audio that’s close to indistinguishable from real video or audio.
Malik anticipated such potential abuses years ago, and soon after he began researching possible defenses. The first step was seeing if there was, in fact, a reliable way to sort out real from fake audio.
“Our initial hypothesis was that audio generated through computers, even though it was perceptually identical to human voice-based audio, would look different on the ‘microscopic’ level,” he said.
That turned out to be the case: When he subjected real and fake audio to spectral analysis, he found some easily detectable markers. The method he’s developed has near 100 percent accuracy.
But Malik said that’s still only a partial solution.
“In the case of political disinformation, for example, let’s suppose somebody creates audio of a political figure. It will spread almost instantly on social media. By the time law enforcement or the media or the campaign team responds, it’s already out everywhere. In some ways, the damage is already done,” he said.
Malik said a robust defense must not only be able to detect fake multimedia — it also needs to do it nearly in real time. That’s where Malik is turning his attention now. The latest suite of tools he’s developing would process an audio signal, identify tell-tale signs of computer fakery and then render an “authenticity score.”
NOMINATE A SPOTLIGHT
- The weekly Spotlight features faculty and staff members at the university. To nominate a candidate, email the Record staff at [email protected].
Newsrooms could use it to digitally “fact check” leaked audio from an anonymous source. Social media platforms could integrate these detection tools directly into their architecture — automatically processing multimedia and alerting users of suspicious content. And Alexa or Google Home could judge whether it’s really you who’s ordering that new gadget — or just a computer that sounds like you.
Malik said that even then, we are still in for a wild ride.
“As with most security issues, it’s a game of back and forth,” he said. “An attacker develops a strategy, you defend it. Then, the attacker figures out your defense strategy, so you have to update your defense.
“But I think we are very close to a place where seeing is no longer believing. I already feel that way, personally. I don’t assume something is authentic unless I can verify it from multiple sources. It’s scary, but it’s the world we’re living in.”