The “era of experience” will unleash the agents for self-study on the network-there is how to prepare

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content of a leading AI coverage industry. Learn more


David Silver and Richard Sutton, two famous Ai scientists, argue in New paper This artificial intelligence is about to enter a new phase, the “era of experience”. Here AI systems rely less and less on human data and improve by collecting data from and interacting with the world.

While the document is conceptual and promising, it has a direct impact on businesses that aim to build and for future AI agents and systems.

Both silver and Sutton are forged scientists with an attempt to make accurate forecasts for the future of AI. Validity forecasts can be seen directly in today’s most modern AI systems. In 2019, Sutton, a pioneer in reinforcement training, wrote the famous essay. “The bitter lesson“, In which he claims that the biggest long-term progress in AI has consistently emerged from the use of large-scale calculations with general-purpose search and training methods rather than relying mainly on the involvement of complex knowledge obtained by humans.

David Silver, a senior Deepmind scientist, was a key contribution to Alphago, Alphazero and Alphastar, all important achievements in deep reinforcement training. He was too Co -author of paper in 2021 This claims that strengthening the training and a well -designed reward signal will be sufficient to create many advanced AI systems.

The most modern large language models (LLMS) use these two concepts. The wave of new LLMs that conquer the AI ​​scene, as GPT-3s rely mainly on scaling of calculations and data to internalize huge amounts of knowledge. The most wave of models for reasoning, such as Deepseek-R1demonstrate that strengthening the reinforcement and the simple award signal is sufficient to learn Complex reasoning skills.

What is the era of experience?

The “era of experience” is based on the same concepts that Sutton and Silver have been discussing in recent years and adapts them to recent AI progress. The authors claim that “the rate of progress, managed solely by controlled human data training, demonstrates delay, signaling the need for a new approach.”

And this approach requires a new source of data that must be generated in a way that is constantly improving as the agent becomes more powerful. “This can be achieved by allowing agents to study constantly from their own experience, ie data generated by the agent interacting with their environment,” Sutton and Sreber wrote. They claim that “experience will eventually become the dominant environment of improvement and ultimately downplay the scale of human data used in today’s systems.”

According to the authors, in addition to learning from their own experienced data, future AI systems will “break the limitations of human AI systems” in four dimensions:

  1. Streams: Instead of working in excluded episodes, AI agents will “have their own flow of experience that progresses, like humans for a long period of time.” This will allow agents to plan long -term goals and adapt to new models of behavior over time. We can see the flashes of this in AI systems that have very long context windows and Memory architectures This is constantly updated on the basis of user interactions.
  2. Actions and Observations: Instead of focusing on human privileged actions and observations, agents in the era of experience will act autonomously in the real world. Examples of this are agent systems that can interact with external applications and resources through tools such as using a computer and a model context protocol (Mcp).
  3. Rewards: The current reinforcement training systems mostly rely on human-designed functions. In the future, AI agents must be able to design their own dynamic reward functions that adapt over time and match the preferences of consumers with signals in the real world, collected from the agent’s actions and observations. We see early versions of self -designing awards with systems such as Dreureka on nvidiaS
  4. Planning and reasoning: The current models of reasoning are designed to imitate the process of human thought. The authors claim that “more effective mechanisms of thought certainly exist, using inhuman languages ​​that, for example, can use symbolic, distributed, continuous or distinct calculations.” AI agents must engage with the world, monitor and use data to validate and update the process of their reasoning and Developing a worldwide modelS

The idea of ​​AI agents who adapt to their environment by strengthening training is not new. But before that, these agents were limited to very limited environments such as board games. Today, agents that can interact with complex environments (e.g. Using AI computer) And the progress in reinforcement training will overcome these restrictions, which will lead to the transition to the era of experience.

What does it mean to the enterprise?

Buried in the Sutton and Silver document is an observation that will have important consequences for the applications in the real world: “The agent can use” human -convenient actions and observations such as consumer interfaces that naturally facilitate communication and cooperation with the consumer. The agent may also undertake “supporters of the machine in service and call, allowing the agent to undertake ACT Actonous in the service of their goals.”

The era of experience means that developers will have to build their applications not only for people but also with AI agents. The machine -convenient actions require the construction of secure and affordable APIs that can easily be accessible directly or through interfaces such as MCP. It also means creating agents that can be made from detection through protocols such as Google’s AgentS You will also need to design your API and agent interfaces to provide access to both actions and observations. This will allow agents to gradually think and learn from their interactions with your applications.

If the vision that Sutton and the silver presence become a reality, there will soon be billions of agents wandering around the net (and soon in the physical world) to perform tasks. Their behavior and needs will be very different from human users and developers, and having an agent -friendly way of interacting with your application will improve your ability to use future AI systems (and also prevent damage they can cause).

“By upgrading the foundations of RL and adapting our basic principles to the challenges of this new era, we can unlock the full potential of autonomous training and pave the path to truly superhuman intelligence,” Sutton and Silver wrote.

DeepMind declined to provide additional comments on the story.


 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *