Back to Gems of AI

ElevenLabs just killed the keyboard (and launched 3 new platforms)

ElevenLabs isn't just a voice tool anymore. With ElevenAgents, Creative, and API, they are building the infrastructure for the voice-first future.

We’ve spent the last three years obsessing over which AI model is the smartest. Is it GPT-4? Claude? Gemini? We argue about reasoning capabilities and coding benchmarks like sports fans arguing about stats.

But while everyone was looking at the brain, ElevenLabs was quietly building the mouth—and more importantly, the ears.

This week, they dropped a massive update that makes one thing very clear: Voice is the interface of the future. And they don't just want to be a part of it; they want to be the infrastructure it runs on.

They announced three new platforms: ElevenAgents, ElevenCreative, and ElevenAPI.

Here is why this matters more than another 5% bump in reasoning ability.

Talking > Typing

I spoke with Carles Reina, one of the earliest team members at ElevenLabs, earlier this week. He said something that stuck with me:

"Talking to technology is a lot more natural and engaging and quicker and easier than actually writing to technology."

It sounds obvious, but look at how we actually use AI today. We type. We edit prompts. We treat it like a terminal.

But humans weren't built to type. We were built to speak. We convey more in the tone of a "hello" than in three paragraphs of text. The friction of typing is the biggest bottleneck between us and the intelligence we've built.

ElevenLabs is removing that friction.

Platform 1: ElevenAgents

This is the big one. Until now, building a voice agent was a nightmare of gluing together a transcription model (Whisper), an LLM (GPT-4), and a TTS model (ElevenLabs). The latency killed the vibe. You’d say something, wait three seconds, and get a robotic response.

ElevenAgents changes the architecture.

Expressive Mode & Turn-Taking

The new "Expressive Mode" isn't just reading text; it's acting. It understands context. But the real magic is the turn-taking system.

In a real conversation, you don't wait for a "complete" signal to start talking. You interrupt, you pause, you read the room. ElevenAgents reads emotional cues from how you speak, not just what you say. It knows when you're hesitant, angry, or rushing, and it responds in kind.

This is already live with major governments (like Ukraine and US local municipalities), which tells me this isn't just a tech demo. It’s enterprise-ready infrastructure.

Platform 2: ElevenCreative

If ElevenAgents is for conversation, ElevenCreative is for production.

This is their play to own the entire media generation pipeline. It’s not just voice anymore. You can generate:

  • Images
  • Video
  • Music
  • Sound Effects

The "Flows" Editor

The most interesting part here is Flows, a node-based video editor coming soon. Instead of a linear timeline, you build automated content pipelines. Imagine a flowchart where one node generates a script, the next generates the voiceover, the third picks stock footage (or generates it), and the final node stitches it all together.

Music Marketplace

They’re also launching a marketplace where musicians can publish tracks or fine-tune models on their style. If someone generates a song using your style, you earn royalties. This is the "creator economy" model applied to AI generation, and it's a smart way to get artists on board rather than just scraping their work.

Platform 3: ElevenAPI

This is the foundational layer. ElevenAPI gives developers direct access to the raw models powering everything else.

Whether you need dubbing, transcription, sound effects, or speech-to-speech, it’s all here. This is the "AWS for Voice" play. By opening up the API, they ensure that even if you don't use their apps, the app you do use is probably running on their rails.

Why "Empathy" is the Killer Feature

We often talk about AI "alignment" in terms of safety. But there's another kind of alignment: Emotional Alignment.

The smartest AI in the world is useless if talking to it feels like filing a tax return.

I've noticed a shift in my own behavior recently. I speak to models far more than I type now. When an agent can hear the frustration in my voice and soften its tone, or hear my excitement and match my energy, the dynamic changes. It stops being a tool and starts feeling like a collaborator.

ElevenLabs understood this before anyone else. They realized that while OpenAI and Google fight over the brain, the real value is in the connection.

Conclusion

The keyboard had a good run. It served us well for the first few decades of computing. But as AI gets smarter, our interface needs to get faster and more fluid.

ElevenLabs isn't just making better voices. They are building the operating system for a world where we finally stop typing and start talking.

Give the new Expressive Mode a try. It might be the first time you forget you're talking to a machine.