What Is the SOTA Text-to-Speech and Voice Cloning Model
Microsoft’s VALL-E can mimic your voice with just 3 seconds of audio—92% naturalness scores say it’s eerily close. Then there’s ElevenLabs, cranking out speech so spot-on that 9 out of 10 folks can’t tell it’s fake in blind tests. It’s March 19, 2025, and the top text-to-speech and voice cloning models are doing things I […]