Move, Alex: Amazon launches a new real -time voice NOVA Sonic to develop third -party enterprise

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content of a leading AI coverage industry. Learn more


Amazon is best known as an e-commerce giant and then somewhere perhaps a little farther on the list of remarkable offers is his product for voice assistant Alexa Ai, which is simple I received a great upgrading of intelligence last month thanks to Amazon Nova and Amazon’s Investment Anthropic.

Now Alex will have to make room for a new brother of Amazon Voice AI: Today the company presents Amazon Nova SonicA new Foundation model designed to allow third -party application developers to build real -time, naturalistic, conversational interactivity of the voice of their products using the basis of the Amazon web platform.

It is now available through a two -way interface for programming applications for streaming (API). In fact, Amazon has already included some parts of it – a speech encoder that provides a performance and synthesizer of speech – in the new Alexa, Alexa+.

“This approach allows us to achieve the benefits of our speech technologies in various cases of use at the same time, while continuing to develop both systems based on customer reviews and technological progress,” said a spokesman.

Obvious use cases include customer support and service, guidance, information and entertainment.

A single approach

NOVA Sonic is a key challenge in Voice AI: Technology fragmentation.

Traditionally, the construction of voice interfaces requires the combination of individual speech recognition, speech processing and speech synthesis, according to Rohit Prasad, SVP and a manager of an Amazon artificial intelligence scientist, in an interview with VentureBeat yesterday, using Amaz’s video service.

This complexity often leads to robotic, unnatural interactions and increased development costs.

Sonic now seeks to improve this state of things by combining all three different types of models into one.

Prasad explained the basic innovation of the model: “New Sonic brings together three traditionally separate models-a text to text, understanding of text and text in a speech-in a single system that can model not only” what “but also” how “of communication.”

By maintaining the acoustic context – such as tone, cadans and style – NOVA Sonic helps maintain the nuances of human conversation.

Recognizing the subtleties and strangeness of the living, two -way audio conversations

One of the defining capabilities of Nova Sonic is his ability to deal with live, two -way conversations. It recognizes when consumers pause, hesitate or interrupt – often behaving in human speech – and reacts smoothly while maintaining context.

“The real breakthrough here is real -time interaction in real time, interactive and low latency, which means that you can interrupt the AI ​​mid -sentence and it will still maintain the context and respond in a coherent,” Prasad said. This feature is especially suitable for scenarios such as customer service where responsiveness and adaptability are crucial.

Nova Sonic is also designed to integrate seamlessly with other systems. It automatically generates the transcripts of the spoken input that can be used to activate API or interact with your own tools. This allows companies to build AI agents that can perform tasks such as reservation meetings, live information, or responding to complex customer inquiries.

“You can use NOVA Sonic via Amazon Bedrock and connect it to any tools or own sources of data, even visual, as long as they are wrapped as APIs that can be uploaded,” Prasad said. This flexibility makes the model suitable for a wide range of industries, education and travel to corporate operations and entertainment.

Comparison of Efficiency and Comparison of Industry

Nova Sonic has been compared to other real-time voice models, including the GPT-4O of Openai and Gemini Flash 2.0 on Google. In the overall EVAL data set, it has achieved 69.7% profit over Gemini Flash 2.0 and 51.0% profit over GPT-4O for American English conversations with one turn using a male voice. Such profits were observed with female and British English voices.

Prasad emphasized the strong presentation of a new Sonik in its main language markets: “New Sonik is currently the best in the US and British English, exceeding even GPT-4O in real time in both conversational naturalness and accuracy.” He added: “To the best of our knowledge, only two other GPP-4O models in real time and a GPT-4O mini-cosit version near what Nova Sonic does in combining real-time speech understanding and generating. This space is still early and very difficult.”

Multilingual capabilities and noisy environmental processing

When recognizing the speech, New Sonic also differs in multilingual and real conditions. It registers a degree of error (WER) of 4.2% of the multilingual library indicator exceeding the GPT-4O transcribes with over 36% in English, French, German, Italian and Spanish. In noisy environments with many speakers (measured using AMI indicator), NOVA Sonic showed a 46.7% improvement in WER over GPT-4O transcrution.

Expressive votes and expansion of the language

The model currently maintains many expressive voices, both male and feminine, American and British English. Amazon noted that additional accents and languages ​​are being developed and will be put into future updates.

Low latency and busy costs

Speed ​​and costs are also part of the appeal. Shows for comparative analysis of third parties NOVA Sonic supplies latency, perceived by customers for 1.09 seconds, compared to 1.18 seconds for Openai’s GPT-4O and 1.41 seconds for Google’s Gemini Flash.

From the point of view of pricing, Amazon positions Nova Sonic as a ready -made company solution. “We are nearly 80% cheaper than GPT-4O in real time, and this superb pricing resonates with businesses passing from experimentation to deployment,” Prasad said.

Early acceptance in sectors

According to Amazon, companies in different sectors have already begun to use or test a new Sonic.

ASAPP applies the technology to optimize the work processes of the contact center, praising its accuracy and the natural processing of dialogue.

Education First (EF) uses the model to maintain real -time pronunciation, especially for non -speakers with various highlights.

The statistics of the sports data provider perform the use of the low latency of NOVA Sonic and the simple setting to power fast, data -rich interactions in its OPTA AI chat platform.

Responsible AI and safety commitment

In addition to presentation and costs, Amazon emphasizes its commitment to the responsible development of AI. The NOVA family family includes built -in protective measures and is supported by AWS AI Service Cards, which outline cases of use of purpose, potential restrictions and ethical guidelines.

Prasad emphasized Amazon’s focus on trust and safety: “Trust is paramount to us – developers can customize the personality within borders, but we have put strong railings to prevent cloning out loud or unwanted mimicry.” He added: “We are working extremely hard to remove hallucinations and voice drift. The tape we have set for release is high because speech generation must be reliable.”

Amazon Nova Sonic is usually available through Amazon Bedrock. Developers and businesses interested in researching the model can start with a visit https://aws.amazon.com/nova/S


 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *