The new model of speech model to Elevenlabs text is here with the highest degree of accuracy so far (96.7% for English)
Join our daily and weekly newsletters for the latest updates and exclusive content of a leading AI coverage industry. Learn more
Elevenlabs, the highly evaluated AI voice cloning and launch of a generation of former Palantir graduates, today Started Scribe V1A new model of speech to text, which, according to messages, achieves the highest accuracy in multiple languages. Users I can try it here On the Elevenlabs site.
According to the company’s indicators, it is superior to the Gemini 2.0 Flash, Whisper V3 of Openai and Deepgram Nova-3 with the accurate conversion of speech to the text on the network, achieving new percentages of recording errors.
The company claims that SCRIBE delivers state-of-the-art transcription accuracy to 99 languages, including improved efficiency in pre-rated languages ​​such as Serbian, Canton and Malayalam.
Like Flavio Schneider, a leading Elevenlabs researcher wrote on xSCRIBE is the “smartest audio understanding model” issued by Elevenlabs so far.
“Scribe not just copied – of course audio,” Schneider Continuation in response with thread. “He can find non-verbal events (such as laughter, sound effects, music and background noise) and analyze long audio contexts for accurate diarization, even in the most challenging environment.”
“Diarrification“Is the name given to the processes of dividing speakers by their voice qualities when recording.
In fact, the Eleven documentation states that Scribe can Distinguish and isolate up to 32 different speakers in the same audio file.
While Elevenlabs warns that Scribe is “used best when high-precision transcription is required, not real-time transcription”, the company also plans to introduce a low latency version soon, expanding its use for real-time applications.
The most common word errors (Wer)
Scribe is designed to cope with the audio challenges in the real world with precision. According to the results of Benchmark from Fleurs and Common Voice, it records the lowest word errors (WER) for many languages, including Italian (98.7%) and English (96.7%).
The main features include:
- Dyrization of speakers To distinguish speakers in many speakers records
- Timeline For detailed accuracy of transcription
- Opening events without speechLike a laugh and background noises
- Transcript For trouble -free integration via API
Pricing and availability
Scribe is now available through the Elevenlabs and API website.
The pricing is determined at $ 0.40 per hour input audio, with a 50% discount for the next six weeks. Also included is a low latency version for real -time applications.
What it means to businesses
For persons making decisions of the enterprise, SCRIBE presents a tool for a scales of high accuracy, which makes it useful for industries relying on automated documentation, transcription meeting and content accessibility.
The ability of the model to process different languages ​​with high accuracy is also beneficial for multinational enterprises, media companies and customer support applications.
Scribe’s pricing structure makes it a competitive business that requires high -volume transcription services, and its API -based integration allows for a seamless acceptance of enterprises.
In addition, the upcoming low latency version can position Scribe as a viable real -time communication tool option.
Coming the same day as the opponent of the model of the text in speech Octave Octave
Time is all, and Elevenlabs chose to release Scribe the same day when the opponent Hume Ai introduced Octave, a text model in speech powered by LLM This allows users to customize the voices generated by AI with adjustable emotions.
It is designed to create content, including audio books, podcasts and video game votes. Unlike standard TTS systems, Octave examines the context beyond individual sentences, regulation of tone, rhythm and cadasa dynamically to sound more natural.
HUME AI positions Octave as a direct competitor to Elevenlabs text proposals, emphasizing that Octave prices are about half the price of Elevenlabs current AI voice services.
While Scribe and Octave serve different functions, their development reflects the growing competition in audio models managed by AI.
Elevenlabs prioritizes precisely, multilingual speech recognition, while HUME AI refines expressive AI speech.
For enterprises, this means more specialized solutions for both transcription and synthetic voice applications, which allows more efficient production of content, customer engagement and accessibility tools.
Scribe is already live, and Elevenlabs hosts a virtual event next week with the team behind its development. More details, indicators and documentation for API are available in the official Blog postS