4 docs tagged with "llm-as-judge"

Building Reliable Agents - Agent Evaluations

Harrison Chase from LangChain discusses the critical role of evaluations in moving AI agents from prototype to production.

Building Reliable Agents - Evaluation Challenges

Exploring the challenges of evaluating agent reliability and LLM performance.

How Lyft Builds Evals That Actually Matter in Production – Nick Ung

Nick Ung, who leads data science for safety and customer care at Lyft, explains how his team builds evals for AI Assist, Lyft's customer care AI agent product.

Make Legal Write Your Evals – Chime

An engineer at Chime (the US consumer fintech with 9.5 million members) describes how the team built 'Jade,' an always-on agentic financial co-pilot built on deep agents, and …