Galileo launches Agentic Evaluations to fix AI agent mistakes before they cost you

Rate this post

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more


Galileoa San Francisco-based startup, is betting that the future of artificial intelligence depends on trust. Today the company launched a new product, Agency ratingsto address a growing challenge in the world of AI: making sure that increasingly complex systems known as AI agents actually work as intended.

AI agents – autonomous systems that perform multi-step tasks such as generating reports or analyzing customer data – are gaining popularity across industries. But their rapid adoption raises a crucial question: How can companies verify that these systems remain reliable after deployment? Galileo’s CEO, Vikram Chatterjee, believes his company has found the answer.

“In the last six to eight months, we’ve started to see some of our customers try to adopt agent systems,” Chatterjee said in an interview. “LLMs can now be used as an intelligent router to select the correct API calls to actually perform a task. Going from just generating text to actually doing a task was a very big gap that was unlocked.”

Diagram showing how Galileo evaluates AI agents at three key stages: tool selection, error detection, and task execution. (Credit: Galileo)

AI agents show promise, but enterprises demand accountability

Large enterprises such as Cisco and Emma (the latter founded by Coinbase’s former COO) have already adopted the Galileo platform. These companies are using AI agents to automate tasks from customer support to financial analysis and are reporting significant productivity gains.

“A sales rep trying to do outreach and outbound would otherwise take maybe a week of their time to do that, compared to some of these AI-enabled agents, they do it in two days or less -little,” Chatterjee explained, emphasizing the return on investment for businesses.

Galileo’s new framework evaluates the quality of tool selection, detects errors in tool calls, and tracks overall session success. It also tracks key metrics for large-scale AI deployment, including cost and latency.

Dashboard showing how Galileo evaluates AI agents at three key stages: tool selection, error detection, and task execution. (Credit: Galileo)

$68 million in funding fuels Galileo’s push into enterprise AI

The launch builds on Galileo’s recent momentum. The company picked up $45 million in Series B funding led by Scale Venture Partners last October, bringing its total funding to $68 million. Industry analysts predict that the AI ​​operations tools market could reach $4 billion by 2025.

The stakes are high as AI deployment accelerates. Studies show even advanced models like GPT-4 can hallucinate about 23% of the time during basic question and answer tasks. Galileo tools help businesses identify these issues before they impact operations.

“Before we release this thing, we really, really need to know that this thing works,” Chatterjee said, describing customers’ concerns. “The bar is really high. So there we gave them that toolchain so they could just use our metrics as a basis for those tests.”

Addressing AI hallucinations and challenges at enterprise scale

The company’s focus on reliable, production-ready solutions positions it well in a market increasingly interested in AI safety. For technical leaders deploying AI in the enterprise, Galileo’s platform provides essential safeguards to ensure AI agents perform as intended while controlling costs.

As enterprises expand the use of AI agents, performance monitoring tools become key infrastructure. Galileo’s latest offering aims to help businesses deploy AI responsibly and efficiently at scale.

“2025 will be the year of the agent. It will be very fruitful,” noted Chatterjee. “However, what we’ve also seen is that many companies simply releasing these agents without good testing has led to negative consequences… The need for proper testing and evaluations is greater than ever.”


 
Report

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *