Meta’s Vanilla Maverick AI is ranked on a popular chat indicator

Rate this post


Earlier this week, Meta Landing in hot water To use an experimental, unlawful version of its Llama 4 Maverick model, to achieve a high result of the comparative indicator, LM Arena. Incident prompted LM Arena supporters to apologizeChange their policies and evaluate the unmodified vanilla mavery.

It turns out that he is not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17b-128E-Struct”, ” was ranked under the models Including Openai, Sonnet of Anthropic Claude 3.5 and Google’s Gemini 1.5 Sonnet on Friday. Many of these models are months.

Why poor performance? The Meta Experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversations,” explained the company in A A Published diagram Last Saturday. These optimizations have obviously played well on the LM Arena, in which human evaluations compare the results of the models and choose which they prefer.

As we wrote beforeFor various reasons, LM Arena has never been the most reliable measure of the presentation of an AI model. However, adapting a model to a standard – in addition to being misleading – it makes a challenge for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesman told TechCrunch that Meta experiments with “all kinds of personalized options”.

“” The Llama-4-Maverick-03-26-Expellent “is an optimized chat version with which we experimented, which also performs well on Lmarena,” the spokesman said. “We have already released our open source version and we will see how the developers customize Llama 4 for their own use cases. We are excited to see what they will build and look forward to their current feedback.”



 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *