Building Reliable Agents - Agent Evaluations
Harrison Chase from LangChain discusses the critical role of evaluations in moving AI agents from prototype to production.
Harrison Chase from LangChain discusses the critical role of evaluations in moving AI agents from prototype to production.
Exploring the challenges of evaluating agent reliability and LLM performance.
Nick Ung, who leads data science for safety and customer care at Lyft, explains how his team builds evals for AI Assist, Lyft's customer care AI agent product.
An engineer at Chime (the US consumer fintech with 9.5 million members) describes how the team built 'Jade,' an always-on agentic financial co-pilot built on deep agents, and …