Breakthrough Agents - Building Our Digital Workforce with LangGraph

Assaf Elovic

Summary
The journey of building an AI agent for campaign creation involved several key steps and insights, ultimately leading to a hierarchical multi-agent architecture. Here's a structured summary of the process and reflections:
-
Initial Approach:
- Started with a simple React app, which was easy to implement but lacked complexity.
-
Workflow-Based Approach:
- Transitioned to a directed graph structure for better performance, though it became cumbersome for future changes.
-
Multi-Agent Architecture:
- Implemented a hierarchical model with one supervisor node and several specialized sub-agents (e.g., researcher, positioning report generator, LinkedIn message writer, email writer). This approach allowed breaking down complex tasks into manageable parts and maintained flexibility while improving performance.
-
Key Insights and Lessons:
- Simplicity is Key: Complex structures can hinder long-term efficiency.
- Prompt Engineering: Essential for fine-tuning agent performance.
- Tools Over Skills: Agents should be equipped with the right tools and clear instructions rather than being overly complex.
- Human-like Collaboration: Viewing agents as part of a team of coworkers can guide design approaches.
-
Integration and Scalability:
- The agent was integrated into the product, likely through a dedicated interface or seamless user flow.
- The architecture is scalable; additional sub-agents or machine learning integration could enhance capabilities for more complex tasks.
-
Future Considerations:
- The current setup may need adjustments as needs evolve, possibly involving advancements in AI models and tools.
In conclusion, the journey underscores the importance of balancing simplicity with structured approaches, ensuring adaptability for future growth.
Auto-Highlights
Auto-Highlights: Highlight: building agents, Count: 3, Rank: 0.1 Highlight: AI agents, Count: 1, Rank: 0.09 Highlight: multi agent, Count: 8, Rank: 0.08 Highlight: other agents, Count: 1, Rank: 0.08 Highlight: sub agents, Count: 5, Rank: 0.08 Highlight: conversational agents, Count: 1, Rank: 0.08 Highlight: specialized agents, Count: 1, Rank: 0.08 Highlight: Imagine agents, Count: 1, Rank: 0.08 Highlight: autonomous agents, Count: 1, Rank: 0.08 Highlight: many sub agents, Count: 1, Rank: 0.08 Highlight: workflow campaign creation agent, Count: 1, Rank: 0.06 Highlight: AI products, Count: 1, Rank: 0.06 Highlight: AI dev tools, Count: 1, Rank: 0.06 Highlight: AI voice agent, Count: 1, Rank: 0.05 Highlight: AI Monday, Count: 1, Rank: 0.05
Photos
Image removed for testing
Transcript
Please welcome asaf, Head of AI at Monday, ASAF will discuss how Monday is building their digital workforce with Minecraft. Speaker B: So. Speaker A: Hey everyone. Speaker B: So great to meet you, Amasaf. And today I'm going to talk to you about how we're building our Digital [[email protected]](mailto:[email protected]) so very quickly about myself, I'm head of AI Monday. I'm scarpos Sequoia. I currently living in Israel, super jet lag right now. And I've also been building AI products for the past decade, including some you may know like GPU researcher and Tabil. So Monday.com maybe some of you know a bit about them, but basically we are a public company. We basically build a work OS where you can manage and do all your work in one single platform rather than CRM dev certified service work management. We eventually crossed 1 billion dollar ARR just this year. And there's one important factor that I think is worth noting and it's that we're actually processing around 1 billion tasks per year. And I want to just think about this for a second because when you think about 1 billion tasks, work tasks per year, just think about the opportunity for agents and AI that can actually take on those tasks. And this is a huge opportunity to see at Monday when we think about AI. And we'veactually launched our first AI feature around September last year and we've seen insane hyper growth. We've been growing 100% month over month with our AI usage. And just recently we launched our jewel workforce. So when you think about what is a digital workforce, think about agents working around the clock. Whether you're an SMB looking to scale up or an enterprise. Imagine agents working within the Monday ecosystem on any given task you can think of. And what I'm going to show you today is very powerful lessons learned that we had in our experience building agents. And it was said earlier here today by the Harvey team and I think others that to build very successful agents, you have to focus on product and user experience. And we have a saying at Monday that the biggest barrier for adoption is distrust. It's actually not technology. And I want to show you a few examples of things that we learned. So when we think about autonomy, I think, you know, we're all engineers and we love to think about autonomous agents and you know, agents doing everything around the clock. But actually the opposite is true. When we see how our users are using agents and what they think, imagine that every company, every user has a different risk appetite. And when you build AI agents, you should Give users that control. And what we've learned by applying this is that we've actually increased adoption in the same way, but giving the control and the user to decide how they want to control their agents. Secondly is entry points. Now, if you're building a startup from scratch, that's something else. But as a huge company like Monday, one thing that we've learned is don't rebuild a new user experience. Try to think how you can create these experiences within your existing products. So when you think about how agents can work at Monday, we already have people working at Monday. On Monday we just assign people so we can do the same with agents. Just think about how you can just assign digital workers or agents to actual tasks. And by doing that, our users have no new habits that they have to learn. It's seamless within their experience. Another super important thing that we've learned so originally when we released those agents, imagine that you can ask in the chat and say things, you know, like create this board, create this project, modify this item for our users. Monday Boards are production data. I think a very good example I like to give is think about Firster AI, which is an amazing product. We all buy code, as Andrew and D said earlier. But imagine it with cursed or AI. Instead of you as developers seeing the code, imagine it was pushed straight to production. I assume that none of you, maybe most of you, would have not used it, right? And that is just how important user experience is versus technology. Because technologically wise we could do that, cursory could have done that. And what we did is we saw users onboarding, testing them out and once came the time to actually push content to the board fasterly froze. So we introduced a preview and the streak increased adoption by insane because users now have the confidence that they know what's going to have this guardrail before they actually save and they know what's going to be the output before they see it saved. So when you think about building AI experiences, think about previews, think about even a loop, think about how users can have that control understanding before AI releases to production. And lastly is expandability now expandability, we heard a lot and it feels, I think with people it kind of feels like a nice to have. But expandability is much more than that. Think about expandability as a way for your users to learn how to improve their experience with the AI over time. Because when they have an understanding of why the outputs happen, they have an ability to change the outcomes. So these four are super important components that we'veactually produced in a product. That increase adoption very, very nicely. Now let's talk about the tech. So we actually built our entire ecosystem of our agents on LaneGraph and Langsmith and we'vetested out various frameworks and we found LinkGraph to be the number one by far. And just a few examples. So what's great about LingGraph is that it's not really a picnic, but it still does everything you don't want to deal with as an engineer, like interrupts and checkpoints, persistent memory. You mean the loop. Those are critical components that we don't want to deal with. But we have that. On the other hand, we have super customized, super great options to customize it just for what we need. I will show you an example. One second. And additionally native integration. We now process millions of requests per month using Lang graphs and it's proven to be super scalable. So let's take a look at how this is behind the hood. So we have langgraph as the center of everything we're building and around our langgraph engine, which uses also langgraph and Lang Smith for monitoring. We also have what we have built as what we call AI blocks, which is basically internal AI actions that we developed at Monday. We'veactually built our own evaluation framework because we believe that evaluation is one of the most important aspects when you're building an AI. And I think that it was a lot about evaluation as you guys seen, so I'm not going to dive into that. And then we also have our AI gateway, which is our way of preserving what kind of inputs and outputs are enabled in the system. Now let's take an example of our first digital worker that we released, which is the Monday expert. So basically what you see here is a conversational agent using the supervisor methodology that our system holds four different agents. We have a supervisor, we have a data retrieval agent which is in charge of retrieving all data across Monday, for example, knowledge base board data. We also use web search. Then we have our board actions agent that does actual actions on Monday. And lastly we have the answer Composer that based on the user, the past conversations, tone of voice, and all kind of other parameters that are defined by the Monday user actually composes the final answer. And we'veeven added a really awesome tool that we've learned, which is called Undo. So basically we give the supervisor the ability to dynamically decide what to undo within the actions based on the user feedback, which is by the way, proven to be one of the coolest use cases for building. And I want to share a bit of our lessons learned as we build this agent and what we're seeing. So when you're building conversational agents, assume that 99% of user interactions you're not going to know how to handle. And it's proven statistically, right? When you think about the infinite amount of things users can ask, probably you've only handled 1%. And for this we learned to start with a fallback. What happens in the 99% of interactions that we don't know how to handle? So, for example, what we did was if we detect and the user is asking some action that we don't know how to handle, we would search our knowledge base and give them an answer for how they can do it themselves. This is an example of one way of resolving fallback evals. We've talked so much today, so we'll dive into it. But I think the bottom line with evals is that evals of your ip, because models change, technology is going to change so much over the next few years. But if you have very strong evaluation, that is your ip, that will allow you to move much faster than your competitors. You made the loop critical. We've talked about this a lot at the beginning. I'm saying for those who have really shipped AI to production, I think we've seen that it's one thing to bring AI to 80%, but then it takes another year to get to 99%. And this is a very important rule because we really felt confident when we were working locally. Once we shift to production, we realize how far we are from an actual product. I can see some of the audience resonate with me. Amela, all guardrails, we highly recommend that you build outside the LLM. Right? We've seen things like LLM as a judge, even back to the Purser idea, by the way. I think Purser is such a great example for a way to build good product experience. Because I don't know if you guys, especially with light coding, after 25 runs and stops, right? This is an external guardrail they put in. No matter if it's actually running successful 25 rounds and it stops. Just think about how you can create those guardrails outside the element. And then lastly, and this is a very interesting one, is that it might be obvious that it's smart to break your agent into sub agents, right? Obviously we have specialized agents, they work better. But what we've learned is that there is a very important balance because when you have too many agents, what happens is what we like to call compound hallucination. So basically, it's A mathematic problem. Right. 90% accuracy times 90% accuracy. Second agent times a third times a fourth. Even if they're all at 90%, you're now at 70%. So. And it's a mathematical, it's proven mathematically. Right. So I think there's a very strong balance between how much of agents you want in your multi agent system versus having too much or too little. And it's something that I think there's no like rule of thumb. It's something you have to iterate on based on your use case. So let's talk about the future of war. And we believe that the future of war as what we're working on Monday is all about orchestration. And I want to give you an example. So this is a real use case that we tried to work on internally. We just had our earnings report just a few days ago. And for those of you working in large public companies, you probably, or if you'vebeen involved in this, earning reports. It's a tedious process. There's so much data, narrative information across the company, we have to gather so many people involved. So we said, what if we automate this? What if we had a way to automate and create an entire workflow that would automatically create everything we need for earnings? That would be a dream, right? But there's one problem with this and the problem is that it will only. Speaker A: Run once a quarter. Speaker B: We invest entire months building an amazing workflow and then within run it once and the next time we run AI is going to change dramatically. New models are going to come out, everything's going to change in the world and then we're going to have to rebuild everything, right? Yeah. So it got us thinking about how can we solve this. So I want you to imagine what if there was a finite set of agents that could do infinite amount of tasks. Now the irony is that this is not some big dream. This is exactly how we work as humans. Right. When you think about us, we each have our specialized skills. Some are engineers, some are data analysts. And then every time there is a task at work, some of us do A and some of us do B. So there's no reason why we shouldn't work with agents and AI. So when we think about the future, we think about what we see here. Imagine that for the same given task that we had, I showed you earlier, we had a dynamic way to orchestrate a dynamic workflow with dynamic edges and dynamic rules, choosing dynamic, very specific agents that are perfect for the task, run the task and then dynamically resolve. So this is super exciting and one of the things that we're working on with Linkchain and we really want to see this come to life in the future. So lastly, we're actually opening our marketplace of agents to all of you and we'd love to see you join the waitlist and join us in building and trying to tackle this 1 billion deaths we are trying to complete. So thank you very much everyone. This was a pleasure. Speaker A: Thank you, Assaf. And please welcome Sherwood and Keith from 11X. They build their agents Alice and Julian on Landraft Landra platform. They'll share their lessons learned. Speaker B: So let's give them a warm welcome. Hey everyone. Speaker A: How's it going? My name is Sherwood. I am one of the tech Leads here at 11x at Lead Engineering for our ALICE product. And today I'm joined by Keith, our head of growth, who is the product manager for this ALICE project now in 11x. For those of you who are unfamiliar, it's a company that's built building digital workers. We have two digital workers today. The first is Alice, she's our AI sdr. The second is Julian, he's an AI voice agent. And we've got more workers on the web. I want to take Everybody back to September 2024. It's for most people, not long ago. For us, it's half the company's history. We just crossed 10 million Carlr. We just announced our series end, then our series peak released 15 days later. With all this chaos going on, we relocated our whole team and company from London to San Francisco to our beautiful new office with our beautiful new cto. At the same time, we also bought a rocket. Because we're 11x. During all this chaos, we chose this moment to rebuild our core product from the ground up. And the reason we did that is because we truly felt at the time and proved to be true, is that agents were the future. So today's talk, I want to first tell you why we felt the need to rebuild ALICE from scratch. Hopefully, I think everyone is probably in agreement about agents being the future. Then I'll tell you how we did it. We built this enterprise grade AISDR in just three months. Then I want to talk you through one of the technologies that we experienced in spite of our architecture. And I'll wrap up with some reflections on building agents and some closing thoughts. So let's start with the decision to rebuild. Why did we feel like we needed to rebuild our core product from scratch at such a critical moment? Well, to understand that question, you first need to understand Alice 1. And Alice 1 was our original AI SDR product. The main thing that you could do with Alice was create these custom AI powered outreach campaigns. There were five steps involved in campaign creation. The first step is defining your audience. That's when you identify the people that you'd like to sell to. And in the second step you describe your offer. This is the product or service that you're hoping to sell. Then in the third and fourth step, you construct your sequence and also tweak the AI generated messaging. And finally, when everything is to your liking, you move on to the last step, which is you launch the campaign. And that's when Alice will begin sourcing leads that match your icp, researching them, writing those customized emails, and in general, just executing the sequence that you've built for every lead that enters the campaign. Now Alice 1 was a big success by a lot of different metrics, but we wouldn't really consider her a true digital worker. And that's for a lot of reasons. For one, there was a lot of button clicking, more than you would probably expect of a digital worker. And you also probably saw there was a lot of manual input, especially on that offers page. Our lead research was also relatively basic. We weren't doing deep research or scraping the web or anything like that. And downstream that would lead to relatively uninspiring personalization in our emails. And there on top of that, Alice wasn't able to handle replies automatically, she wasn't able to answer customers questions. And finally there was no real self learning component. She wasn't getting better over time. Meanwhile, while we were building LS1, the industry was evolving around us. In March of 2023 we got GPT4, we got the first cloud model and we got the first agent frameworks. Then later that year we got Claude 2 and we got function calling in the OpenAI API. Then in January of 2024 we got a more production ready agent framework in the form of Langgraph. In March we got Cloud 3, in May we got GPT 4.0 and finally in September we got the Repli agent, which for us was the first example of a truly mind blowing Agentix software product. And just to double click into the replay agent a little bit, this really blew our minds. It convinced us of two things. First, that agents were going to be really powerful, they could build entire apps from scratch. And second, that they're here today, they're ready for production. So with that in mind, we developed a new vision for Alice centered on seven agentic capabilities. The first one was Chat. We believe that users should mostly interact with Alice through chat, the way they would interact with the human team member. Secondly, users should be able to upload internal documents, their websites, meeting recordings to a knowledge base and in doing so they would train Alice. Third, we should use an AI agent for resourcing that actually bought a cat. Considers the quality and fit of each lead rather than a dumb filter. Search number four, we should do deep research on every lead and that should lead to number five, which is true personalization in those emails. And then finally, we believe that Al should be able to handle inbound messages automatically, answering questions and booking meetings. Also, she should be self learning. She should incorporate the insights from all of the campaigns she's running to optimize the performance of your account. So that was our vision and with that in place, we set it up to rebuild ALICE from scratch. And in short, this was a pretty aggressive push for the company. It took us just three months from the first minute to migrating our last business customer. We initially staffed just two engineers on building the agent. After developing the POC, we brought in more resources. We had one project manager, our one and only Keith, and we had about 300 customers that needed to be migrated from our original platform to the new one. And that was growing by the day. Our go to market team was just really not slowing down. There were a few key decisions that we made at the outset of this project. The first is that we wanted to start from scratch. We didn't want Alice 2 to be encumbered by Alice 1 in any way. So new repo, new infrastructure, new team. We also didn't want to reinvent the wheel. We were going to be taking on a lot of risk with some unfamiliar technologies like the agent and the knowledge base. We didn't want to add additional risk through technologies that we didn't understand. So we chose a very vanilla tech stack. And number three, we wanted to leverage vendors as much as possible to move really quickly. We didn't want to be building non essential components. This is the tech stack that we went with. I won't go into too much detail here, but I thought people would be interested to see and here are some of the vendors that we chose to leverage and work with. We I can't go into detail with every one of these vendors, but they were all essential to our success and wanted to shout everyone out that that has been useful. Of course, one of the most important vendors we chose to work with was LangChain and we knew that we were going to need a really good partner from the start, if we were going to pull this off, LangChain was a very natural choice for us. They were a clear industry leader in AI dev tools and AI infrastructure. They had an engine framework ready to go. That agent framework had cloud hosting and observability. So we knew we were gonna be able to get to production and that once our agent was in production, we would understand how it's performing. We also had some familiarity from Alice1. We were using the core SDK with Alice1 and then LangChain also had TypeScript support, which is important to us as a TypeScript shop. And last but not least, the customer support from the LangChain team was just incredible. They really felt like an extension of our team. They ramped us up on Mine Graph and the LangChain ecosystem and on agents in general. We are so grateful to them for that model. In terms of the products that we use today, we used pretty much the entire suite. And now I want to talk you through one of the main challenges that we encountered while building this, while building Alice 2, which was finding the right agent architecture. And you'll remember the main feature of Alice was campaign creation. So we wanted the ALICE agent to guide users through campaign creation the same way that a replit agent would guide you through creating an app. We tried three different architectures for this. The first was React, the second was a workflow. And then finally we ended on a multi agent system. So now I'm going to talk you through each of these, how it works in detail and why it didn't work for our use case until we arrived at multi agent. Let's start with React. Well, react is a JavaScript framework for building user interfaces. But that's not what I mean here. I mean the React model of an AI agent, which I think other people have talked about earlier today. This is a model that was invented by Google researchers back in 2022 and it stands for reason and act. And basically what these researchers observed is if you include reasoning traces in the conversation context, the agent performs better than it otherwise would. And with a React agent, the execution loop is split into three parts. There's reasoning, where the agent thinks about what to do. There's action, where the agent actually takes action, for example, performing a tool call. And then finally there's Observe, where the agent observes the new state of the world after performing the action. And I guess Reacto wasn't a very good name. As I mentioned, reasoning traces lead to better performance in the agent. This is our implementation of our React agent. It consists of just one node and 10 or 20 tools. And it's not very impressive looking, I know, but this simplicity is actually one of the main benefits of the REACT architecture, in my opinion. Why do we have so many tools? Well, there are lots of different things that need to happen in campaign creation. We need to fetch leads from our database, we need to insert new DB entities and draft emails, and all of those things become a tool. The react loop that I mentioned on the previous slide, that's implemented inside of the assistance and when the assistant actually performs an action that is manifested in the form of a tool call which has been executed by the tool node. One thing to note about the react agent is that it runs completions for every turn. So if the user says hello and they say I'd like to create a campaign, that would be two turns and the react agent runs to completion each time, that's going to become relevant later. Here are some of the tools that we implemented and attached to our agent. Unfortunately, Alice 2 predated MCP and so we didn't use an MCP server or any third party tool registries. A few things I want to tell you about tools before we can move on. The first is that tools are necessary to take action. So anytime you want your agent to do anything on the outside world, for example call an API or write a file, you're going to need a tool to do that. They're also necessary to access information beyond the context window. If you think about it, what your agent knows is limited to three things. The conversation context, the prompt and the model weights. And if you wanted to know anything beyond that, you need to give it a tool, for example a web search tool. And that's essentially the concept behind rac. Tools can also be used to call other agents. This is one of the easiest and simplest ways to get started with a multi agent system. And last but not least, tools. Tools are preferable over skills. So this is a framework I came up with. Essentially, if you think about it, if someone asks you to do something like perform a complex calculation, you can do that either through a tool like a calculator, or maybe you have the skill of the mental arithmetic that's required to perform that calculation. And in general it's better to use a tool than to use a skill because this minimizes the amount of tokens you'reusing in the context to accomplish that goal. What are the strengths of the React architect architecture? Well, I mentioned one already that is that it is extremely simple. We basically never needed to revise our infrastructure later on it was also great at handling arbitrary user inputs over multiple times. This is because the graph is running to completion each time. It allows the user to say something in step three that's related to step one without the the agent getting confused. It's actually robust to that. So that was a great advantage. But it has some issues. For example, the react agent was kind of bad at tools. We detached a lot of tools. And as you know that what sometimes happens when you do that is the agent will struggle with which tool to call and in what order. And this would sometimes lead to infinite loops where the agent is repeatedly trying to accomplish some part of campaign creation but not succeeding. And I and when those infinite loops. Speaker B: Would go on for a while, we. Speaker A: Would get a recursion limit error, which is effectively the agent equivalent of a stack overflowing. And also the outputs that we were getting from this version of the agent were relatively mediocre. The audiences, the sequences, the emails, they just weren't that good. And our hypothesis here was that because there's just one agent and really like one set of prompts that are responsible for doing the entire campaign creating process, it wasn't really good at any particular point. So what can we do? Like, how can we address these issues? In our case, we chose to add structure which led us to implementing a workflow. The workflow is defined by Anthropic as a system where LLMs and tools are orchestrated through predefined code paths. In this screenshot you quote from, they both come from an excellent blog post by Anthropic called Building effective Agents. Highly recommend checking it out. I shamelessly lifted it. Importantly, workflows are different from agents and this is one of the things that the agent community has been debating a lot on Twitter. For example, it's the reason why we have the term agentic for sometimes describing a system as opposed to agent. A system could be agentic, but not necessarily agent per se. Workflows are highly structured, as you probably infer from that. Pre predefined code paths. The LLM is not choosing how to how to orchestrate the code. The LLMs are just being called within these predefined code paths. And last but not least, workflows are not really a new technology. We'vehad them for a long time in other forms. And the most famous form is probably the data engineering dual airflow. And the clicker is okay, there we go. This was our implementation of a workflow campaign creation agent, agent. It's obviously a lot more complex than our react agent we now have 15 nodes. They're split across five different stages. These stages correspond to the different steps of campaign creation that I mentioned before. Interestingly, this graph, unlike the React agent, doesn't run the completion for every turn. It only runs completion once for the entire campaign creation process. And the way that we get user input or feedback at certain points within the graph execution is through the use of something called node interrupts, which is a Lang graph feature. There were a number of strengths involved with the workflow architecture. It basically solved all of the problems we observed with React. For one, we no longer have issues with tools because we just didn't have tools. We'vereplaced them now with these specialized nodes like a write email node. And we'vealso got a clearly defined execution flow with a fixed point number of steps. So no more infinite loops, no more recursion limit errors. On top of that, we got much better outputs. The emails and sequences that we were getting from this version of the agent were much better. And that's because we forced the agent to go through these particular steps. But the workflow architecture did have issues. For one, it was extremely complex. And now our front end campaign creation flow experience was coupled with the architecture of our agent and we would have to change that architecture and that graph structure anytime we wanted to make changes to the campaign creation experience. So super laborious and annoying. It also didn't support jumping around within the campaign creation flow. That's because the graph doesn't run to completion every time. When you get to step three and you stop using a node interrupt to collect feedback on that stuff, you can really only respond to what's happening in step three. You can't jump back to step one. So clearly workflows were not going to be in for our use case. What else can we try? Well, after some soul searching, we came across a blog post by LangChain that explains how to build a customer support agent using a multi agent architecture. And this is the blog post that gave us the insight that we needed for our use case in a multi agent system is one that's a hierarchical approach to building an AI agent. In this pattern, there's one supervisor and there are many sub agents that are specialized. And the supervisor is responsible for interacting with the user and for routing tasks to sub agents when the sub agents will then fulfill those tasks and they'll escalate back to the supervisor when they're complete. And we really devoured this blog post by LangChain. We went a little crazy in the process, but ultimately found a version of this that works for our use case. And here's what that looks like. We have a graph that a multi agent graph that accomplishes all of campaign creation except for audience creation, which we kept separate for different reasons. And you can see here at the top is our supervisor node. It's close to the start of the graph. And then we have four specialist sub agents. We have a researcher, we have something that generates something called a positioning report, which is how you should position your product or service for this particular lead. Then we have a LinkedIn message writer. And finally we have an email writer. In this multi agent architecture, it gave us the best of both worlds. We got the flexibility of the react agent and then we got the performance of the workflow. Now I want to share a couple reflections on building agents from this experience. And the first is that simplicity is key. All of that structure and scaffolding can provide for performance gains in the short term, but over the long term it locks you into a structure that can be counterproductive. And related to this is that a new model release can really change everything. Amjad from Replit told us this about the Replit agent. He said it wasn't really working until Sonic 3.5 came out and then they dropped it in and everything was magic. And that's really true. It's also useful to think of your agent as a human co worker or a team of coworkers. In our case we have different mental models. We thought that the the agent was a user flow within our product or a directed graph. Those were the wrong mental models and they let us to implement the wrong architecture. You should also break big tasks down into small tasks. In our case, the big task was the campaign creation. But there were small tasks like writing an email within that. And it became easier to implement the agent once we broke it down into the smaller component tasks. Tools are preferable over skills. Don't try to make your agent too smart. Just give it the right tools and tell it how to use them. And then, last but not least, don't forget about prompt engineering. It's easy to forget that your agent is just a series of Nolan calls within a while loop. If your agent isn't performing well, you should consider going back and doing some prompt engineering.