Researchers improved AI agent performance on unfamiliar tasks using 'Dungeons and Dragons'

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more

Organizations interested in deploying AI agents must first refine them, especially in workflows that often seem rote. While some organizations want agents that perform only one type of task in a single workflow, sometimes agents need to be introduced to new environments in the hope that they will adapt.

Researchers from Beijing University of Posts and Telecommunications revealed a new method, AgentRefine. It teaches agents to self-correct, leading to more generalized and adaptive AI agents.

The researchers said that current tuning methods limit agents to the same tasks as their training data set, or “retained” tasks, and do not perform as well for “retained” or novel environments. Following only the rules exposed through the training data, agents trained with these frameworks would have trouble “learning” from their mistakes and could not be made into generic agents and incorporated into new workflows.

To combat this limitation, AgentRefine aims to create more generalizable agent training datasets that allow the model to learn from mistakes and fit into new workflows. In a new newspaperThe researchers said the goal of AgentRefine is “to develop generalized agent tuning data and establish the relationship between agent generalization and self-improvement.” If agents are self-correcting, they will not retain any mistakes they have learned and carry those same mistakes to other environments in which they are deployed.

“We find that tuning the agent on self-improvement data improves the agent to explore more viable actions when faced with bad situations, thereby leading to better generalization to novel agent environments,” the researchers wrote.

D&D inspired AI agent training

Taking a cue from the tabletop RPG Dungeons & Dragons, the researchers created characters, scenarios for the agent to follow, and challenges. And yes, there is a Dungeon Master (DM).

They divided the AgentRefine data build into three areas: script generation, trajectory generation, and verification.

In script generation, the model creates a script or guide with information about the environment, tasks, and actions that people can take. (Researchers tested AgentRefine using Llama-3-8B-Instruct, Llama-3-70B-Instruct, Mistral-7B-Instruct-v0.3, GPT-4o-mini and GPT-4o)

The model then generates agent data that has errors and acts as both a DM and a player during the trajectory stage. It evaluates the actions it can take and then sees if they contain errors. The final stage, verification, verifies the scenario and trajectory, allowing the potential of the agents it trains to perform self-correction.

Better and more varied abilities to perform tasks

The researchers found that agents trained using the AgentRefine method and dataset performed better on a variety of tasks and adapted to new scenarios. These agents self-correct more to redirect their actions and decision-making to avoid mistakes, and become more robust in the process.

In particular, AgentRefine improved the performance of all models to work on deferred tasks.

Enterprises need to make agents more task-adaptive so they don’t just repeat what they’ve learned so they can become better decision makers. Orchestrator agents not only “route traffic” for multiple agents, but also determine whether agents have completed tasks based on user requests.

OpenAIis o3 offers “program synthesis” which can improve task adaptability. Other orchestration and learning frameworks, like Magentic-One from Microsoftsets actions for supervisor agents to learn when to move tasks to different agents.

Daily insight into business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory changes to practical implementation, so you can share insights for maximum ROI.

Read ours Privacy Policy

Thanks for subscribing. See more VB newsletters here.

An error occurred.