A new study published on September 24 in PLoS One has shown that average listeners can no longer reliably tell apart AI-generated “deepfake” voices from real human voices.
Study Findings: AI vs. Human Voices
-
Voice Cloning vs. From-Scratch AI: While generic AI voices made from scratch can still be spotted by most listeners (only 41% fooled), AI voice clones trained with real human samples are nearly indistinguishable: 58% of these deepfake voices were rated as genuine, compared to 62% for actual human voices.
-
Minimal Data Needed: Researchers created convincing voice clones using as little as four minutes of audio and basic commercial software, requiring minimal expertise and cost.
-
Security and Ethics Risks: Sophisticated deepfake voices make it easy for criminals to bypass voice authentication systems at banks, impersonate loved ones, or even fabricate public statements from politicians and celebrities.
-
Example: A victim gave $15,000 after being tricked by a “daughter’s” voice clone; con artists have cloned politicians' voices for scams, too.
-
Real-World Impact and Opportunities
-
Threats: The rise of indistinguishable deepfake voices raises urgent concerns about fraud, security breaches, misinformation, and sowing social division. A cloned voice could spoof financial transactions or manipulate public opinion with fabricated interviews.
-
Accessibility & Communication: On the positive side, AI voices offer new opportunities for accessibility (bespoke voices for communication aids), education, and creating high-quality voiceovers with minimal expertise, cost, or effort.
Summary:
AI-generated voice clones are now so lifelike, even with minimal training data, that the average listener can’t reliably tell them apart from real people. This opens up profound ethical, security, and societal risks, but also offers new options for education and accessibility.
Comments
Post a Comment