Building Developer Support Agents – Evan Kormos (Coinbase)
Speaker(s): Evan Kormos (Engineering Manager, Coinbase)
Session: Interrupt 2026 · Day 2 (May 14) · ~3:10 PM PT
Source: in-person audio recording, transcribed locally with Whisper large-v3.
Summary
Evan Kormos, an engineering manager at Coinbase, tells the story of how a small developer-support engineering team went from zero to one on a self-improving, customer-facing AI support system for crypto developers. The Coinbase Developer Platform serves developers building on crypto APIs, wallets, payments, and staking; the original support model was people posting and answering in a Discord server, with a v1 hosted chat service that worked but offered no visibility into how it was performing beyond manually tallying thumbs up/down. The team rebuilt around an agentic foundation: delivery channels in Discord and on the web, Python services as first-class infrastructure, self-hosted LangSmith for tracing, a remote MCP server for developer documentation backed by a RAG fallback (vector DB) for reliability and testing, and LLM-judge evaluation of responses for accuracy and risk using a lightweight model. He demos several agents (Discord AI Chat, a Slack triage agent surfacing Discord activity with links to LangSmith traces, and an internal React-based support assistant) and describes an intent-based workflow that goes from traces and signal search ('just looking at the traces') through tech grooming, testing, cross-model evaluation, and human approval into production. His three takeaways: treat agent engineering as its own discipline, build the glass box before the agent, and the team is the multiplier.
Key Points
- The Coinbase Developer Platform powers a community building on crypto APIs, wallets, payments, staking, and infrastructure; the original support model was people posting and answering in a Discord server, later augmented with AI chat
- The v1 hosted Discord chat service 'was fine' and answered questions, but docs went stale and the team had no visibility — manually tallying thumbs up/down and 'really had no idea what was wrong'
- Rebuilt system design: keep customers where they are (Discord and web), provide quality tools for support on the backend, and make Python services a first-class infrastructure (the team's first service)
- Paired with self-hosted LangSmith for tracing; a remote MCP server for developer documentation was used by the agent, with a RAG callback (basic knowledge pipeline + vector DB) added because a remote MCP tool can be unreliable for customer-facing agents — also a way to test the tool
- Guardrails combine deterministic statements with an LLM judge using a lightweight model to evaluate responses for both accuracy and risk; this was the starting point for safe public auto-response
- Agents demoed: Discord AI Chat (built by team member Giovanni), a Discord support triage in Slack that surfaces internal messages with links to LangSmith trace data, and a locally hosted React-based internal support assistant that controls chat context by user access
- Intent-based workflow: traces and real-time feedback feed signal search ('I'm calling it backtesting... Just looking at the traces') using the CLI/code, then tech grooming with MCP tools, code artifacts (e.g., system prompts), rigorous testing, cross-model evaluation, and human approval before production
- Three takeaways: (1) treat agent engineering as its own discipline needing both product-engineering and ML mindsets across a group, (2) build the glass box before you build the agent — observability is the base layer not an add-on, (3) the team is the multiplier
Notable Quotes
build the glass box before you build the agent. Observability isn't a feature you add later. It's really the base layer, the starting point for everything else
The team is the multiplier. AI lets a small team do extraordinary things, but only if they are aligned with the collective execution.
I'm calling it backtesting. I don't know. Just looking at the traces.
Full Transcript
Show the full timestamped transcript (auto-generated; lightly cleaned)
[00:00] Please welcome to the stage Evan from Coinbase. I thought that was a cool presentation, but he
didn't. He said he was going to talk about how it was traceable, but he didn't actually talk about
that. What a great event so far. I want to try to set up a strong commission. Yeah, so am I. My name
is Evan Formos. I'm a builder and engineer at Coinbase. Today I'm excited to talk to you about our
developer support engineering team and describe
[00:32] the transformation of how we shift the scale AI support for crypto developers. It's a story of how a
small team was able to go from zero to one on a self-improving, customer-facing, and debt-taking
system. I believe that. First, we'll frame up the support challenge and how we're scaling for
Coinbase developers. Next, I'll discuss our approach for monitoring agent behavior. Then we'll dive
into some technical details and describe and show some of the capabilities
[01:06] and how they were produced. Throughout, I'll be referencing team members, our agent engineers, to
describe how this transformation unfolded, supported by Coinbase. Infrastructure, leadership, and
partners. First, a bit of setup so you know where we're coming from. The Coinbase developer platform
powers a community of developers building crypto APIs, building on crypto APIs, wallets, payments,
staking, infrastructure, power, and blockchain
[01:39] account. For a long time, our support model was people posted. Our Discord server, people answered
in the Discord server. We added AI chat, and we'll talk a little bit more about that. Of course,
this team is building with AI. I know there's a lot of attention around us, especially with our
company. But this is great, and I think it demonstrates our conviction to increase product velocity
and quality. Okay, let's dive into the challenge.
[02:12] In order to scale, we need to keep being great for a developer who is working a shift. And that's
where we're going to be. So, let's get started. Okay. So, we started as a side project for a trading
bot. But now, we have new challenges as we have new markets and new expectations into our scope. For
this team, working together on a common goal to serve the customer is focused by our PM, Harry, who
helped shape this work and prioritize. So, to recap, our goal was to keep customer satisfaction high
while increasing automation.
[02:43] So, we're going to talk about that. Let's get into the story. As I mentioned, we started with our
version one Discord chat. We started playing a lot of teams and integrated a hosted chat service.
And honestly, the first version was fine. It answered developer questions. Sometimes the docs got
stale. We had no way of knowing how people were using it, if it was doing a good job. We sort of had
to manually tally the downloads and the thumbs up and the thumbs down. I guess it's a little bit of
a challenge. But we had to do it. We had to do it. And we had to do it. So, this was, we really had
no idea what was wrong. So, a familiar situation.
[03:15] How do you stand up a new service that's as good as the one that you have? I do it better, and get
better every time. It did require some new capabilities. I'll talk about paired with our existing
paid throws but the goal was to harness agent behavior. Here's where we landed on system design. For
delivery channels we still need the
[03:50] customers where they are in discord and on the web and on the backend we provide quality tools for
support. With the GenTik foundation we could build more than partner slack channels or provide the
LCP channels to generate in the UX a lot as possible with this stuff. A huge unlock for us was
including Python services as first-class infrastructure. This is the team's first service at this
time. We paired it with
[04:21] self-hosted Lang Smith for tracing. A big driver of this transformation was Susan who is here today
to chat with us. There's him. There he is. Alright. A big driver. So he set up this up from a group
of concept and brought these skills onto the team. For more stories like this I'll give a plug to
the Coinbase engineering blog. Also a big thank you to our AI platform team who operates Lang Smith
and other infrastructure for R and other teams. Finally we have our own set of share tools. A remote
[05:02] MCP server for developer documentation was used by the agent. Customer facing agents, a remote MCP
tool can be a bit unreliable. So what we did was add a RAC callback. This is also a way for us to
test our tool. So behind the MCP we have a pretty basic knowledge pipeline, vector DB, which is the
RAC callback. Lastly hooked up to a variety of internal services, some event driven, some proxy to
[05:37] external services. Bringing it all together you have on-demand and ambient agent flows with our
overall pulse on the agent behavior. So building full-stack and now adding AI, this team already
shipped agents within this framework with more fun. The first three we'll showcase starting with
this way chat Shack triage. See how both of these are helping shape progress towards this for public
[06:11] response. The support engineer assistant is an exciting new surface for helping customers. We'll
take a look at that if you promise lot. We've also built or are building a handful of compliance
risk reduction or service quality related agents. Okay, so this is our Discord support bot and
there's a menu where users can launch AI Chat. Once open, they can open case, they can also open
case management other than Zoom to bot
[06:44] where we automatically will be able to respond to the channel. Starting in AI Chat, Discord a
developer can get expert responses to guide them on technical documentation or how to get further
agent support. Alright, I'm excited to tell you all about Giovanni. He's our team member that picked
these skills up and built a wonderful starting point for Discord AI Chat and so on. So we'll dive
into a little bit more detail on how we handle the customer and the state.
[07:17] So even though our agent doesn't have access to tools to transact or access sensitive data, you can
see there's still a lot of care that went into detecting and preventing issues. Alright. So when you
get the term guardrails, you can see there are deterministic statements in place. You can also see
the developer documentation tool that we mentioned, which is needed for the initial burden. LLM
judgment was applied to the output using a lightweight model to evaluate the response for both
accuracy and risk.
[07:52] This design was our starting point for public auto response but also is a way for us to safely
handle customer queries going forward. So under our next agent, the Discord support triage in Slack.
So I'll give you a second to check out the screenshots, but you can see within the Slack triage
channel, we have messages surfaced internally based on what's happening in Discord. We also have
links that you can provide directly into Langsmith trace data on the initial version as the
classifiers were used.
[08:28] In the future, this can also serve as a lens and control plane to flag signals for improvement as
part of the workflow or to sort of view and keep track of the agency motion. Alright. On to our AI
assistant. So you can see here within our service console, we have a new support assistant that
we're trying out, and it runs internally on the local server. It's a very simple support assistant.
[08:59] It's based internally on a locally hosted React site. It allows the user to control the chat context
based on what they can access in the underlying system. Initially, we only set up the building
blocks and queue in the loop, and I decided to add functions like sending us a response in the near
future. Because this could be popped out of a new window, we can have a responsive UX and really
control the experience very well. I'm excited to see where this goes and potentially for broader
adoption and feedback.
[09:32] Okay. So everything we do is intent-based. This is a little bit about the way we work. I'll give a
plug actually for the linear Slack bot. It does a great job of converting our conversational context
to intent-based work. Okay. So these tools may vary across teams.
[10:05] This is the way our team works. But everything we do is intent-based, like I said. Once we get the
intent driven, we then go through the tech grooming process using plug code of a number of MCP
tools. Really go deep research into the planning and execute work. We produce a number of code
artifacts. Some could be system prompts, things like that. They go through rigorous testing, cross-
model evaluation, and human approval before outputting and resulting in a production system.
[10:41] From the production system, we get traces and real-time feedback as we showed from the prior graph.
We can then use lengths in the CLI, the plug code, to look at the traces and find signals. And then
drive more intent from that. So let's dive deeper. I'm calling it backtesting. I don't know. Just
looking at the traces. I think it's a great way to begin to improve startup evals go live.
[11:12] And it's a great way to use natural language to investigate your working system. For us to perform a
signal search in our Discord chat, we plug in the code. We plot it in a basic process. We look at
the returns of the chat a certain way, profile them both randomly and deterministically. One
example, we recently double-checked our security dashboards. What we found was applicable more to
security. Deep diving on specific threads, we were able to discover signals to improve or hard-end
the system in some way.
[11:44] On the topic of security testing, we thought an adversarial data set would be good to have. But it
turns out they're not. But it turns out they're not easy to generate, actually, due to the terms and
conditions on some of the . Also, in an observation, we initially had, immediately had multi-level
conversations going on in our Discord. And Discord was something we wanted to prepare for. But with
regard to adversaries, consider multi-level ads.
[12:17] We still have some work to do on our automation numbers. I think that at the end of the loop, we'll
have a lot of work to do. We'll have a lot of work to do on our automation numbers. But we'll really
drive efficiency and provide a core level of tracking, including speccing out what common tokens
were spent to solve the customer case. Looking at this customer, Trace Ammonial, I was excited to
learn that we're helping customers build. So it's incredible what that enables a small team to
build, truly.
[12:47] The hard part of making it work for the customer and balances them is the input of human and machine
efforts. So I'd really take three things away. One, treat agent engineering as its own discipline.
This is something Harrison, I think, really hit home on last year's interrupts, and something we
took to heart. You really need a balance of product engineering and ML mindset strengths across a
group of people, not just an individual.
[13:17] And so I think hiring and growing balance, using super important. Two, build the glass box before
you build the agent. Observability isn't a feature you add later. It's really the base layer, the
starting point for everything else that materialize out. Three, the team is the multiplier. The team
is the multiplier. AI lets a small team do extraordinary things, but only if they are aligned with
the collective execution.
[13:48] All right, I'd like to say thank you for everyone at Langchain and all the interrupt sponsors for
this great event. To close out our sessions, please welcome to the stage from Berkeley,