The Anthropic CEO wants to open the black box of AI models by 2027.

Rate this post


Anthropic CEO Dario Amadei published an essay Thursday, emphasizing how few researchers understand the internal work of the world’s leading AI models. To deal with this, Amodei has set an ambitious goal for anthrop to find most problems with AI by 2027.

Amodei acknowledges the challenge forward. In “The Emergency of Interpretability”, the CEO says that the anthropic has made early breakthroughs in tracking the way the models go to its answers – but it emphasizes that many more research is needed to decode these systems as they become more powerful.

“I am very concerned about the implementation of such systems without better working with interpretability,” Idess writes in the essay. “These systems will be absolutely central to the economy, technology and national security and will be capable of so much autonomy that I think it is unacceptable for humanity to be completely ignorant of how they work.”

Anthropic is one of the pioneering companies in mechanistic interpretability, a field that aims to open the black box of AI models and find out why they make the decisions they make. Despite the rapid improvements to the productivity of the AI ​​models of the technology industry, we still have a relatively little idea of ​​how these systems reach solutions.

For example, recently Openai launched new models of AI, O3 and O4-Mini that perform better in some tasks but also Hallucine more than his other modelsS The company does not know why it is happening.

“When the generative AI system does something by summarizing a financial document, we have no idea, at a specific or accurate level, why it makes the choice it makes – why it chooses certain words over others or why it makes a mistake from time to time, although it is usually accurate,” writes Ampo in the essay.

In the essay, Amodei notes that the anthropic co -founder Chris Ola says AI models are “grown more than they were built.” In other words, AI researchers have found ways to improve the AI ​​model, but they don’t know why.

In the essay, Amodei says it may be dangerous to reach Aggie – or as he calls it, “”a country of geniuses in a data center

In the long run, Amodei says that anthropic would like to have a “brain scan” or “NMR” of state-of-the-art AI models. These examinations would help identify a wide range of problems in AI models, including their inclinations to lie, seek strength or other weakness, he says. This may take five to ten years to achieve, but these measures will be needed to test and deploy future AI models of Anthropic, he added.

Anthropic has made several research breakthroughs that allowed him to understand better how his AI models work. For example, the company recently found ways to Track the AI ​​Thinking Roads ThroughWhat the company calls schemes. Anthropic identified a chain that helps AI models understand which US cities are in which the United States is American. The company has found only a few of these schemes, but it estimates that there are millions within AI models.

Anthropic invests in the interpretation of the interpretation and recently done his first investment in launch Working on interpretability. In the essay, Amodei called on Openai and Google Deepmind to increase their research efforts in this area.

Amodei calls on governments to impose “light -touch” provisions to encourage interpretability research, such as companies requirements to disclose their safety and security practices. In the essay, Amodei also says that the United States should control the export of chips to China to limit the likelihood of out of control, a global AI race.

Anthropic has always been highlighted by Openai and Google for its focus on safety. While other technology companies went back to the controversial AI safety bill in California, SB 1047, The anthropic issued modest support and recommendations for the billwhich would determine the standards of safety reporting for the FRONTIER AI Model developers.

In this case, the anthropic seems to insist on efforts throughout the industry to better understand AI models, not just to increase their capabilities.

 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *