2025 Handbook for enterprise AI success, from agents to evaluators

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more


2025 is shaping up to be a pivotal year for enterprise AI. Last year witnessed rapid innovation and this year will see the same. This has made it more important than ever to revise yours AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year.

1. Agents: the next generation of automation

AI agents are no longer theoretical. In 2025 they are indispensable tools for businesses looking to streamline operations and improve customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs.

At the beginning of 2024 agents weren’t ready for prime time, making frustrating mistakes like hallucinating URLs. They began to improve as the borderline large language models themselves improved.

“Let me put it this way,” said Sam Witteveen, co-founder of Red Dragon, a company that develops agents for companies, and which recently reviewed the 48 agents created last year. “What’s interesting is that the ones we built at the beginning of the year, a lot of them worked much better at the end of the year, simply because the models got better.” Witteveen shared this in the video podcast we recorded to discuss in detail these five big trends.

Models get better and hallucinate less, and are also trained to perform agent tasks. Another feature model vendors are exploring is a way to use the LLM as a judge, and as models become cheaper (something we’ll cover below), companies can use three or more models to choose the most the good result to decide on.

Another part of the secret sauce? Retrieval Augmented Generation (RAG), which enables agents to store and reuse knowledge efficiently, is getting better. Imagine a travel agent bot that not only plans trips, but books flights and hotels in real time based on updated preferences and budgets.

For home: Businesses need to identify use cases where agents can deliver a high return on investment – ​​be it in customer service, sales or internal workflows. The use of tools and advanced reasoning capabilities will determine the winners in this space.

2. Evals: the foundation of reliable AI

Assessments or “assessments” are the backbone of any robust AI deployment. This is the process of choosing which LLM – among the hundreds available now – to use for your assignment. This is important for accuracy, but also for aligning AI results with enterprise goals. A good rating ensures that the chatbot understands the tone, the recommendation system provides relevant options, and the predictive model avoids costly mistakes.

For example, a company’s evaluation of a customer support chatbot might include metrics on average resolution time, response accuracy, and customer satisfaction scores.

Many companies invest a lot of time in processing inputs and outputs to meet company expectations and workflows, but this can take a lot of time and resources. As the models themselves improve, many companies save effort by relying more on the models themselves to get the job done, so choosing the right one becomes more important.

And this process requires clear communication and better decisions. When you “become much more aware of how to evaluate the outcome of something and what you actually want, it doesn’t just make you better with LLM and AI, it actually makes you better with people,” Witteveen said. “When you can clearly articulate to a person: This is what I want, this is what I want it to look like, this is what I’m going to expect from him. When you get really specific about it, people suddenly perform a lot better.

Witteveen noted that company managers and other developers told him, “Oh, you know, I’ve gotten a lot better at giving direction to my team than just getting good at rapid engineering or just getting good at, you know, looking at writing correct ratings for models.’

By writing clear estimates, companies force themselves to clarify goals—a win for both humans and machines.

For home: Creating high-quality assessments is essential. Start with clear metrics: response accuracy, time to problem resolution, and alignment with business goals. This ensures that your AI not only works, but also aligns with your brand values.

3. Cost-effectiveness: scaling AI without breaking the bank

AI is getting cheaper, but strategic deployment remains key. Improvements at each level of the LLM chain result in dramatic cost reductions. Intense competition between LLM vendors and from open source competitors results in regular price cuts.

Meanwhile, post-training software techniques make LLMs more efficient.

Competition from new hardware vendors such as Groq’s LPU and improvements from legacy GPU vendor Nvidia are dramatically reducing inference costs, making AI accessible to more use cases.

The real breakthroughs come from optimizing the way models are put to work in applications, which is the inference time, not the training time, when the models are first built using data. Other techniques such as model distillation, along with hardware innovation, mean companies can achieve more with less. It’s no longer a question of whether you can afford AI – you can do most projects much cheaper this year than even six months ago – but how to scale it.

For home: Do a cost-effectiveness analysis of your AI projects. Compare hardware options and explore techniques like model distillation to reduce costs without compromising performance.

4. Customization of memory: tailoring AI to your users

Personalization is no longer optional – it’s expected. In 2025 Memory-enabled AI systems make this a reality. By remembering user preferences and past interactions, AI can deliver more specialized and efficient experiences.

Memory customization is not widely or openly discussed, as users often feel uneasy about AI applications storing personal information to improve service. There are privacy concerns and the ick factor when a model spits out answers that show she knows a lot about you — like how many kids you have, what you do for a living, and what your personal tastes are. OpenAI, for one, protects ChatGPT user information in its system memory – which can be turned off and deleted, although it is on by default.

While companies using OpenAI and other models that do this can’t get the same information, what they can do is create their own memory systems using RAG, ensuring that the data is both secure and impactful. However, businesses must tread carefully in balancing personalization with privacy.

For home: Develop a clear memory customization strategy. Inclusion systems and transparent policies can build trust while delivering value.

5. Test Time Inference and Computation: The New Frontier of Performance and Reasoning

The bottom line is where AI meets the real world. In 2025 the focus is on making this process faster, cheaper and more powerful. Chain-of-mind reasoning – where models break down tasks into logical steps – is revolutionizing the way businesses approach complex problems. Tasks requiring deeper reasoning, such as strategic planning, can now be handled efficiently by AI.

For example, OpenAI’s o3-mini model is expected to be released later this month, followed by the full o3 model at a later date. They introduce advanced reasoning capabilities that break down complex problems into manageable chunks, thereby reducing AI hallucinations and improving decision-making accuracy. These improvements in reasoning work in areas such as mathematics, coding and scientific applications where increased thought can help – although in other areas, such as language synthesis, progress may be limited.

However, these improvements will also come with increased computational requirements and therefore higher operating costs. The O3-mini is designed to provide a compromise proposition for cost containment while maintaining high performance.

For home: Identify workflows that can benefit from advanced inference techniques. Applying your own company’s specific logic chain steps and choosing optimized models can give you an edge here.

Conclusion: Turning insights into action

AI in 2025 it’s not just about adopting new tools; it’s about making a strategic choice. Whether it’s deploying agents, refining estimates, or scaling cost-effectively, the path to success lies in careful implementation. Businesses must embrace these trends with a clear, focused strategy.

For more details on these trends, check out the full video podcast between Sam Witteveen and myself here:


 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *