GPT-4.5 for Enterprise: Do their accuracy and knowledge justify costs?

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content of a leading AI coverage industry. Learn more


Release OPENAI GPT-4.5 was somewhat disappointing, with many pointing their insane price point (about 10 to 20 times more expensive than Claude 3.7 Sonnet and 15 to 30 times more expensive than GPT-4O).

However, given that it is the largest and most powerful model for the imperceptibility of Openai, it is worth considering its strengths and areas in which it is shining.

Better knowledge and alignment

There are few details about the architecture or the training body of the model, but we have a rough appreciation that it is trained with 10 times more calculations. And the model was so big that Openai had to distribute training to multiple data centers to complete within a reasonable time.

Larger models have more capacity to know the world of the world and the nuances of human language (given that they have access to high quality learning data). This is obvious in some of the indicators presented by the OpenAi team. For example, GPT-4.5 has a record high ranking of Personqa, a standard that evaluates hallucinations in AI models.

Practical experiments also show that GPT-4.5 is better than other general-purpose models, remaining true to the facts and following the instructions of users.

Consumers have pointed out that GPT-4.5 answers feel more natural and context-conscious of previous models. His ability to follow tone and style guidance has also improved.

After the release of GPT-4.5, AI scientist and co-founder of Openai Andrei Carpati who has early access to the model, said He “expects (ed.

Evaluation of writing quality is also very subjective. In a study that Carpathians deals with different prompts, most people prefer the GPT-4O answers over GPT-4.5. He wrote on x: “Or testers with tall arms notice the new and unique structure, but the low tasta are predominant in the poll. Or we just hallucinate things. Or these examples are just not that great. Or it’s actually quite close and it’s too small sample. Or everything above. “

Better processing of documents

In his experiments, a box that has Integrated GPT-4.5 In his product AI Studio Product, he wrote that GPT-4.5 is “especially powerful for the case of use of businesses where accuracy and integrity are critical to the mission … Our testing shows that GPT-4.5 is one of the best models that are in terms of EAV as well as it is the most of it.

In its internal estimates, Box found the GPT-4.5 to be more accurate in the tasks of answering questions to Enterprise Document-the original GPT-4 with about 4 percentage points of their test set.

Source: box

Box tests also show that the GPT-4.5 features mathematical issues embedded in business documents that are often struggling with with older GPT models. For example, it was better to answer questions about the financial documents that require reflections on the data and the calculations.

GPT-4.5 also showed improved performance in extracting information from unstructured data. In a test that included flight removal from hundreds of legal documents, GPT-4.5 was 19% more accurate than GPT-4O.

Planning, encoding, evaluating results

Given its improved global knowledge, GPT-4.5 can also be a suitable model for creating high-level plans for complex tasks. The destroyed steps can then be transmitted to smaller but more efficient development and implementation models.

According to Constellation“In the initial testing, the GPT-4.5 seems to show strong opportunities at the Planning and Performance Agency, including multi-stage encoding work processes and complex task automation.”

GPT-4.5 can also be useful in encoding tasks that require internal and contextual knowledge. Github now provides limited access Of the model in his Copilot encoding assistant and notes that the GPT-4.5 “presents effectively with creative prompts and provides reliable answers to unclear knowledge requests.”

Given its deeper world knowledge, GPT-4.5 is also suitable for “Llm-as-a-judge“Tasks in which a strong model appreciates the outcome of the smaller models. For example, a model such as GPT-4O or O3 can generate one or more answers, the reason for the solution and submit the final response to the GPT-4.5 for review and refinement.

Is it worth the price?

Given the huge cost of GPT-4.5, however, it is very difficult to justify many cases of use. But that doesn’t mean it will remain that way. One of constant trends In recent years, we have seen that reducing the cost of conclusion and if this trend applies to GPT-4.5, it is worth experimenting with it and find ways to put your power to use in corporate applications.

It is also worth noting that this new model can become the basis for future reasoning models. Per Carpai: “Keep in mind that GPT4.5 has only been trained in advance, controlled finally and RLHF (strengthening human feedback training), so this is not yet a model of reasoning. Therefore, this edition of the model does not push out the ability of the model in cases where reasoning is critical (mathematics, code, etc.) … Probably Openai will already seek further to train with strengthening the training at the top-4.5 model so that it can think and press the ability of the model in those domains. “


 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *