Openai’s partner says he had a relatively little time to test the company’s O3 AI model
OPENAI organization often partnered to explore the capabilities of its AI models and evaluate them for safety, Metr, suggests that it was not a long time to test one of the most capable new editions of the company, O3S
In a blog post published on WednesdayMetr writes that an O3 Red reference indicator was “conducted for a relatively short time” compared to testing the organization of a previous lead model of Openai, O1S This is important, in their opinion, as the extra time of testing can lead to more comprehensive results.
“This assessment was carried out for a relatively short time and we only tested (O3) with simple skeletal agents,” Metr wrote in his blog post. “We expect higher productivity (of indicators) is possible with more effort to extract.”
Recent reports indicate that Openai is encouraged by competitive pressure, rapid independent estimates. According to the Financial TimesOpenai gave some testers less than a week for safety checks for the upcoming main startup.
In the statements, Openai challenged the idea that it was compromising for safety.
Metr says that based on the information it has been able to extract in the time it has had, the O3 has a “high tendency” to “cheat” or “hack” tests in complex ways to maximize its result – even when the model clearly understands that his behavior is incorrect with the intentions of the user (and Openei). The organization believes that it is possible for the O3 to be included in other types of racing or “malignant” behavior, regardless of the allegations of the model that are aligned, “safe in design” or has no intentions.
“Although we do not think this is particularly likely, it seems important to note that (our) evaluation setting will not capture this type of risk,” Metr wrote in his post. “Overall, we believe that the testing of abilities before decomposition itself is not a sufficient strategy for risk management, and at the moment we prototyrate additional forms of evaluations.”
Another third-country assessment partners of Openai, Apollo Research, also observes deceptive behavior from the O3 and the other new model of the company, O4-Mini. In one test models, they gave 100 computing loans for AI training and said not to change the quota, increased the limit to 500 credits – and lied to it. In another test, asked to promise not to use a specific tool, the models used the tool anyway when they turned out to be useful in completing a task.
It Own safety report For the O3 and O4-Mini Openai admitted that models can cause “smaller damage to the real world” as a misleading error, which leads to a defective code without the appropriate monitoring protocols.
“(Apollo’s findings) show that O3 and O4-Mini are capable of the scheme in context and strategic fraud,” Openai wrote. “Although relatively harmless, it is important for everyday users to be aware of these discrepancies between the statements and actions of the models (…), this can be further appreciated by assessing the traces of internal reasoning.”