Observers in the industry say the GPT-4.5 is a “weird” model, put into question the price
Join our daily and weekly newsletters for the latest updates and exclusive content of a leading AI coverage industry. Learn more
Openai has Announced the release of GPT-4.5which Chief CEO Sam Altman earlier said it would be the last Non-Verig of Consideration (COT) model.
The company said the new model “is not a border model”, but it is still its largest large language model (LLM), with more computing efficiency. Altman said that although the GPT-4.5 does not think in the same way as the other new offers of Openai O1 or O3-Mini, this new model still offers a more thoughtful thought.
Observers of the industry, many of whom had early access to the new model, found the GPT-4.5 for an interesting move from Openai, hooking their expectations for what the model needed to achieve.
Wharton professor and AI commentator Molik published on social media that the GPT-4.5 is a “very strange and interesting model”, noting that it can become “strangely lazy about complex projects,” although he is a powerful writer.
The co-founder of Openai and former Tesla Ai Andrej Karpathy leader noted that GPT-4.5 made him remember when the GPT-4 came out and he saw the model potential. In a Posting to xCarpati said that while using GPT 4.5, “everything is a little better and is great, but also not exactly in ways that are trivial to indicate.”
However, Carpathians warned that people should not expect a revolutionary impact on the model, as “it does not push out the ability of the model in cases where reasoning is critical (mathematics, code, etc.).”
Industry details
Here’s what Carpathians have to say about the most GPT iteration in a long -lasting post on X:
“Today it marks the release of GPT4.5 by Openai. I look forward to this from ~ 2 years since the GPT4 was released, as this edition offers the quality measurement of the inclination of the improvement you get from the scale of the preliminary calculation (ie you just train a larger model). Each 0.5 in the version is approximately 10x pre -processing. Now, recall that GPT1 barely generates a coherent text. GPT2 was a confused toy. GPT2.5 was “skipped” straight into the GPT3, which was even more interesting. GPT3.5 crossed the threshold where it was enough to deliver as a product and cause Openai’s Chat Moment. And the GPT4, for its part, also felt better, but I will say that it definitely felt fine.
I remember being part of a hackathon trying to find specific prompts where GPT4 exceeded 3.5. They definitely existed, but clear and specific examples of “Selm immersion” were difficult to find. This is … everything was a little better, but in a diffuse way. The choice of the word was a little more creative. Understanding the nuance in the prompted was improved. Analogies had a slightly more meaning. The model was a little more ridiculous. World knowledge and understanding have been improved at the ends of rare domains. The hallucinations were a little less. The vibrations were a little better. I felt like the water that lifted all the boats, where everything was slightly improved by 20%. So with this anticipation, I came into testing the GPT4.5, which I had access to for a few days and for which I saw 10 times a bigger calculation than GPT4. And I have the feeling that I am once again in the same hackathon 2 years ago. Everything is a little better and is great, but not exactly in ways that are trivial to indicate. Yet, it is incredibly interesting and exciting as another quality measurement of a certain slope of the ability that comes “free” from the very large model.
Keep in mind that the GPT4.5 was only trained in advance, controlled finally and RLHF, so this is not yet a reasoning model. Therefore, this edition of the model does not push out the ability of the model in cases where reasoning is critical (mathematics, code, etc.). In these cases, RL training and thinking is incredibly important and works better, even if it is at the top of an old basic model (eg, the ability for GPT4ish or so). The state of art here remains the full O1. Openai is supposed to be looking for further training with reinforcing training on top of GPT4.5 to allow him to think and press the model’s ability in these domains.
However. In fact, we expect to see an improvement in tasks that do not think difficult, and I would say that these are tasks that are more (unlike IQ), connected and crowded with, for example, world knowledge, creativity, analogy, general understanding, humor, etc. So these are the tasks I am most interested in during the check of the Wibra.
So, from below, I decided that it would be fun to emphasize the 5 fun/fun prompts that test these opportunities and organize them in an interactive “LM Arena Lite” right here on X, using a combination of images and polls in thread. Unfortunately, X does not allow you to include both an image and a poll in one post, so I have to alternate posts that give the image (showing a prompted and two answers one in 4 and one in 4.5) and the poll where people can vote in which one is better. In 8 hours I will reveal the identity of which model is who. Let’s see what is happening 🙂“
The thoughts of the CEO of GPT-4.5
Other early users also saw potential in GPT-4.5. CEO of Box Aaron Levie said on x that his company uses GPT-4.5 to help retrieve structured data and metadata from complex content of businesses.
“AI breakthroughs just continue to come. Openai has just announced GPT-4.5 and we will provide it to affordable clients later today at the Box Ai Studio.
We tested GPT4.5 in early access mode with BOX AI for advanced corporate unstructured data use cases and saw strong results. With the Box Ai Enterprise Eval we test models against different different scenarios, such as accuracy of questions and answers, opportunities for reasoning and more. In particular, to study the capabilities of GPT-4.5, we focused on a key area with significant potential to influence the enterprise: extracting structured data or extracting metadata from the complex content of enterprises.
At Box, we strictly evaluate data retrieval models using multiple data sets for enterprises. A key set of data we use is Cuad, which consists of over 510 commercial legal contracts. Within this set of data, the Box identifies 17,000 flights that can be derived from unstructured content and evaluates the model based on the extraction of single shots for these fields (this is our most difficult test in which the model has only a chance to extract all metadata in one pass against several attempts). In our tests, GPT-4.5 properly extracts 19 percentage points more fields exactly than GPT-4O, emphasizing its improved ability to process nuanced contractual data.
Next, to ensure that GPT-4.5 can handle the demands of the real-world company content, we evaluated its implementation against a tighter set of documents, a set of Box’s challenges. We have chosen a subset of complex legal contracts with multimodal content, high density information and lengths exceeding 200 pages-to present some of the most difficult scenarios that our customers face. In this set of challenges, GPT-4.5 also constantly superior to GPT-4O when extracting key fields with higher accuracy, demonstrating its superb ability to cope with complex and nuanced legal documents.
Overall, we see strong results with GPT-4.5 for complex data on businesses that will even more use cases in the enterprise.“
Questions about the price and its meaning
Even as early users, they found a GPT-4.5 working-macar and a little lazy, they called into question it.
For example, the prominent Openai critic Gary Marcus called Bluesky’s GPT-4.5 “Nothing Burger”.
Hug the CEO of the person Clement Delang commented This origin of the closed code of the GPT4.5 makes it a “mech”.
However, many have noted that GPT-4.5 has nothing to do with its performance. Instead, people wondered why Openai would Release a model, so expensive that it is almost excessive to use but not as powerful as his other modelsS
One user commented X: “So you tell me that GPT-4.5 costs more than O1, but it does not perform so well on indicators…. Do it makes senseS “
Other X users They were theories that the high price of the token could be to deter competitors like Deepseek “Distil the 4.5 Model”.
Deepseek became a big competitor against Openai in January, with leaders in the industry Finding reasoning Deepseek-R1 is as capable as Openai more accessible.