Openai’s new GPT-4.1 models focus on encoding

Rate this post


Openai on Monday is launching a new family of models called GPT-4.1. Yes, “4.1” – as if the company’s nomenclature is no longer confusing enough.

There is GPT-4.1, GPT-4.1 Mini and GPT-4.1 Nano, all of which Openai says “Excel” when encoding and instructions. Available via API of Openai but not ChatgptThe multimodal models have a context window of 1 million, which means they can take approximately 750,000 words in one move (longer than “war and peace”).

GPT-4.1 arrives as Openai’s rivals such as Google and Anthropic Ratchet Up efforts to build sophisticated programming models. Google has recently been released Twins 2.5 ProWhich also has a context window of 1 million is highly ranked popular coding indicators. So do the anthropic Claude 3.7 Sonnet and Chinese bootable AI Depepeek upgraded v3S

This is the goal of many technological giants, including Openai, to train AI encoding models capable of performing sophisticated software engineering tasks. The big ambition of OPENAI is to create an “agent software engineer” as Financial Officer Sarah Fris put it During the Tech Summit in London last month. The company claims that future models will be able to program entire applications from end to end, processing aspects such as providing quality, testing of bugs and writing documentation.

GPT-4.1 is a step in this direction.

“We have optimized GPT-4.1 for use in the real world based on direct feedback to improve in areas that developers are most interested in: Frontend encoding, making fewer external editions, following the formats reliably, adhering to the reactions and ordering, consistent use of instruments,” “These enhancements allow developers to build agents that are significantly better at real-world software engineering tasks.”

Openai claims that the full model GPT-4.1 is superior to its GPT-4O and GPT-4o Mini Models of encoding indicators, including SWE-Tala. GPT-4.1 Mini and Nano are said to be more efficient and faster with the cost of some accuracy, with Openai saying that GPT-4.1 Nano is its fastest-the cheapest model ever.

The GPT-4.1 costs $ 2 per million input tokens and $ 8 per million. The GPT-4.1 Mini is $ 0.40/million input tokens and $ 1.60 tokens, and the GPT-4.1 Nano is $ 0.10/million input tokens and $ 0.40/million markers.

According to the internal testing of Openai, GPT-4.1, which can generate more tokens at a time than GPT-4O (32,768 vs. 16 384), noted between 52% and 54.6% of SWE-Pea, checked, human check. (Openai noted in a blog publication that some Swe-Bench solutions have inspected problems cannot be implemented in its infrastructure, therefore the scope of results.) These figures are slightly below the evaluations reported by Google and Anthropic for the same (6) (63.8%) and CLADE Benchmarking.

In a separate estimate, Openai examines GPT-4.1 using Video-MME, which is designed to measure the model’s ability to “understand” content in videos. GPT-4.1 reached 72% of the diagram accuracy in the category “Long, without subtitles”, according to Openai.

While the GPT-4.1 is evaluated quite well on the indicators and there is a more “interruption of knowledge”, giving it a better reference framework for current events (until June 2024), it is important to keep in mind that even some of the best models today are struggling with tasks that will not cope with experts. For example, very much research have shown These models, generating codes, often fail to correct and even introduce security vulnerabilities and errors.

Openai also admits that GPT-4.1 becomes less reliable (ie, the more likely to make mistakes), the more input tokens they have to handle. On one of the company’s own tests, Openai-MRCR, the accuracy of the model decreased from about 84% by 8,000 toilets to 50% by 1 million tokens. GPT-4.1 also tends to be more “literal” than GPT-4O, says that the company, sometimes imposing more specific, explicit promotions.

 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *