Do AI reasoning models require new approaches to prompting?

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more


The era of reasoning artificial intelligence has advanced.

After OpenAI once again kick-started the AI ​​revolution with its o1 pattern of reasoning already introduced in September 2024. — which takes longer to answer questions but with the benefit of higher performance, especially for complex, multi-step problems in math and science — the commercial AI field is flooded with imitators and competitors.

there is DeepSeek’s R1, Google Gemini 2 Flash Thinkingand just today LamaB-o1all of which aim to offer similar built-in “reasoning” to the new o1 and upcoming OpenAI o3 model families. These models are involved in suggesting a “chain of thought” (CoT). – or “self-prompting” – forcing them to think through their analysis halfway through, go back, check their own work, and ultimately arrive at a better answer than just firing it off their embeddings as quickly as possible, as other large language models (LLMs) do.

Yet the high cost of o1 and o1-mini ($15.00/1M entry tokens vs $1.25/1M entry tokens for GPT-4o on OpenAI API) has led some to balk at the purported productivity gains. Is it really worth paying 12x more than a typical, state-of-the-art LLM?

As it turns out, there are a growing number of converts – but the key to unlocking the true value of logic models may lie in the user prompting them differently.

Sean Wang (founder of the AI ​​news service resin) included in his Understeg over the weekend, a guest post by Ben Hilak, formerly of Apple Inc., interface designer for visionOS (which powers the Vision Pro spatial computing headset). The post went viral because it convincingly explains how Hylak makes OpenAI’s o1 model get incredibly valuable results (for him).

In short, instead of human users writing o1 model prompts, they should consider writing “briefs” or more detailed explanations that include a lot of context upfront about what the user wants the model to output, who the user is, and in what format they want model to output information about them.

As Hylak writes next Understeg:

With most models, we are trained to tell the model how we want it to respond to us. e.g. “You are an expert software engineer. Think slowly and carefully

This is the opposite of how I had success with o1. I’m not instructing him how – just what. Then let o1 take over and plan and decide his own steps. This is what autonomous reasoning is all about, and it can actually be much faster than if you manually review and chat as a “human in the loop”.

Hylak also includes a great annotated screenshot of an example o1 prompt that produced useful results for a list of moves:

This blog post was so helpful that OpenAI’s own president and co-founder Greg Brockman re-shared it on his X account with message: “o1 is a different kind of model. High productivity requires using it in a new way to standard chat models.”

I tried it myself in my ongoing quest to learn to speak Spanish fluently and here is the resultfor the curious. Maybe not as impressive as Hylak’s well-constructed prompt and response, but it definitely shows strong potential.

Separately, even when it comes to non-reasoning LLMs like Claude 3.5 Sonnet, there may be room for regular users to improve their prompts to get better, less constrained results.

As Louis Arge, former Teton.ai engineer and current creator of the neuromodulation device openFUS, wrote to X“one trick I’ve found is that LLMs trust their own prompts more than my prompts,” and gave an example of how he persuaded Claude to be “less of a coward” by first “triggering battle’ with him over his exits.

All of this shows that rapid engineering remains a valuable skill as the AI ​​age progresses.


 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *