Researchers find you don’t need a ton of data to train LLM for a reasoning tasks

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content of a leading AI coverage industry. Learn more


Large Language Models (LLM) can learn complex tasks for reasoning without relying on large sets of data according to a New study by researchers at the University of Shanghai Jiao Tong. Their findings show that with only a small batch of well -cut examples, you can train LLM for tasks that are thought to require tens of thousands of training cases.

This efficiency is due to the inherent knowledge that modern LLM receives during the preliminary training phase. As new training methods become more efficient with data and calculations, businesses can be able to create personalized models without requiring access to the resources of large AI laboratories.

Less is more (limousine)

In their study, researchers challenged the assumption that you need large quantities of data to train LLM for reasoning tasks. They introduce the concept of “less is more” (limousine). Their work is built on top of Previous studies This showed that LLM can be aligned with human preferences with several examples.

Less is more (limousine) for reasoning (source: arxiv)

In their experiments, they demonstrated that they could create a set of limousine data for complex tasks for mathematical reasoning with several hundred examples of training. LLM Fine -tuned to the data set managed to create a complex consideration (COT) Reflection chains that allowed him to perform tasks with a very high rate of success.

For example, a Qwen2.5-32b-instructions The Fine -tuned 817 training examples selected on the basis of limousine reached 57.1% accuracy of the highly challenging indicator of Aime and 94.8% in mathematics, superior models that were trained a hundred times more examples. In addition, it is evaluated higher in indicators than a reasoning models such as Qwq-32b-inspection (A version of the QWEN model that is trained for reflection) and Openai O1-PreviewBoth are trained with larger data and computing resources.

In addition, the limo -trained models are summarized to be drastically different from their learning data. For example, on Olympiadbench Scientific indicator, Limo model is superior to QWQ-32B-Previen and the challenging GPQA indicatorIt achieved 66.7% accuracy, close to the leading result of the Openai-O1-Preview of 73.3%.

What does Enterprise AI mean?

Personalization of LLMS is an attractive case of use for corporate applications. Thanks to techniques such as generation extracted generation (Rag) and Learning in contextLLM can be customized to use ordered data or perform new tasks without needing expensive fine tuning.

However, the tasks of reasoning often require training and fine LLM settings. The widely held belief is that such tasks require large volumes of examples of training with very detailed chains for reasoning and solutions. Creating such data sets is slow and impractical for many applications and companies.

Recently, researchers showed that Pure approaches to strengthen the approaches It can enable models to study for reasoning by generating many solutions and choosing those who work best. Although this approach requires less manual effort, it still requires expensive calculation resources that are out of reach of many businesses.

On the other hand, the production of several hundred examples is an endeavor that many companies can handle by bringing specialized models for reasoning within the reach of a wider spectrum of organizations.

“This discovery has deep consequences for artificial intelligence research: this suggests that even complex competition level reasoning capabilities can be effectively triggered through minimal but cure training samples,” the researchers wrote.

Why the limo is working

In their experiments, researchers identify two key reasons why LLM can learn complex tasks for reasoning with less examples.

First, the most up-to-date Foundation models are trained in a very large quantity Mathematical content and code During preliminary training. This means that these LLMs already have rich reasoning in their parameters, which can be activated by carefully crafted examples.

Second, new training techniques show that allowing models to generate enlarged reasoning chains significantly improves their ability to think. In essence, giving models more time to “think” allows them to unpack and apply their pre-trained knowledge more efficiently.

“We are hypothesizing that successful reasoning emerges from the synergy of these two factors: rich pre -trained knowledge and sufficient computing resources during the conclusion,” the researchers write. “These developed collectively suggest a striking possibility: if models have a rich knowledge of reasoning and give them adequate computational space, then activating their reasoning capabilities can only require a small number Data settings settings. “

The choice of more complex problems to include in the training set data can have a significant impact on the accuracy of the training model in the tasks of reasoning (Source: ARXIV)

According to the findings of the researchers, the creation of useful sets of limousine data depends on the choice of the right problems and solutions. Data curators should prioritize the challenging problems that require complex chains for reasoning, a variety of thought processes and integration of knowledge. Problems must also deviate from the distribution of model training in order to encourage new approaches to reasoning and to force it to summarize.

Accordingly, the solutions must be clearly and well organized, with the stages of reasoning being adapted to the complexity of the problem. High quality solutions should also provide strategic educational support by gradually building understanding through carefully structured explanations.

“With a focus on a minimal but cavestly cure set of reasoning chains, we embody the basic principle of limo: high quality demonstrations, not the pure data volume, are crucial for unlocking complex possibilities for reasoning,” the researchers write.

Researchers have released the code and the data It is used to train the limousine models in their experiments. In the future, they plan to expand the concept to other domains and applications.


 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *