State of Agents

Andrew Ng


Harrison Chase
Summary
The conversation between Andrew and the other individual covers a range of topics, from the name "Vibe Coding" to advice for starting AI-focused startups. Here's a structured summary:
-
Name Suggestions: The person doesn't appreciate the term "Vibe Coding" and suggests finding a better name, emphasizing clarity and relevance.
-
AI Fund and Startups: Andrew announces a new fund focused on AI companies. He advises potential entrepreneurs to prioritize speed and technical expertise as key factors for startup success.
-
Startup Success Factors:
- Speed: The ability to execute quickly is crucial in the fast-paced tech industry.
- Technical Knowledge: Understanding technology and making quick decisions based on that understanding are rare and valuable skills.
-
AI in Coding: The discussion highlights how AI tools like coding assistants are changing the landscape, with Python being a preferred language but acknowledging the rise of JavaScript and TypeScript for broader applications.
-
Future Applications: There's interest in voice applications and AI tool companies, suggesting exploration into various AI applications beyond coding.
-
Human Evaluations: The importance of evaluating AI systems through human assessments is stressed to ensure reliability and effectiveness.
In conclusion, the conversation blends specific name suggestions with broader insights into the tech industry, emphasizing the role of speed, technical skills, and responsible AI development.
Auto-Highlights
Auto-Highlights: Highlight: more people, Count: 3, Rank: 0.07 Highlight: other people, Count: 1, Rank: 0.06 Highlight: most people, Count: 1, Rank: 0.06 Highlight: technical people, Count: 1, Rank: 0.06 Highlight: young people, Count: 1, Rank: 0.06 Highlight: build agents, Count: 1, Rank: 0.06 Highlight: more agent, Count: 1, Rank: 0.05 Highlight: AI coding assistance, Count: 1, Rank: 0.05 Highlight: things, Count: 21, Rank: 0.05 Highlight: AI systems, Count: 1, Rank: 0.05 Highlight: different tools, Count: 1, Rank: 0.05 Highlight: AI tool companies, Count: 1, Rank: 0.05 Highlight: voice applications, Count: 3, Rank: 0.05 Highlight: different types, Count: 1, Rank: 0.05 Highlight: human evals, Count: 1, Rank: 0.05
Transcript
For this next section. So we'll be doing a fireside chat with Andrew Ng. Andrew probably doesn't need any introduction to most folks here. I'm guessing a lot of people have taken some of his classes on for Sarah or deep Learning. But Andrew's also been a big part of the lane chain story. So I met Andrew a little over two years ago at a conference when we started talking about LangChain, and he graciously invited us to do a course on LangChain deep learning. I think it must have been the second or third one that they ever did. And a lot of people here probably watch that course or started on lane chain because of that course. So Andrew has been a huge part of the LangChain journey, and I'm super excited to welcome him on stage for a fireside chat. So let's welcome Andrew. Speaker B: Yeah, Harrison was really kind. I think Harrison and his team has taught six short courses so far on tnai. And our metrics. Right. And so on are that Paris causes are among our most highly rated. So since you go take all Paris causes, I think the different language clearance explanation. Speaker A: They'Ve definitely helped make our courses and explanations better. So thank you guys for that as well. You've obviously touched and thought about so many things in this industry, but one of your takes that I cite a lot and probably people have heard me talk about is your take on kind of like talking about the agenticness of an application as opposed to whether something's an agent. And so as we're here now at an agent conference, maybe we should rename it to an agentic conference. But would you mind kind of like clarifying that? I think it was like almost a year and a half, two years ago that you said that. And so I'm curious if things have changed in your mind since then. Speaker B: So I remember actually Herz and I spoke also at a conference like a year over a year ago. And at that time, I think both of us were trying to convince other people that agents are a thing and should pay attention to this. And that was before we, I think it was mid summer last year that a bunch of marketers got a hold of an agent in term and started sticking that sticker everywhere until last. But to her question, I think in about a year and a half ago, I saw that a lot of people are arguing, is this an agent? Is this not a different artist? Is it truly autonomous, not too agent? And I felt that it was fine to have that argument, but that we would succeed better as a community, which is Safe at their decrease to resumping this agentic. So so and then we just say if you want to build an agentic system with a little bit of autonomy or a lot of autonomy is all fine. No need to spend time arguing is this truly an agent. Let's just call all of these things agentic systems with different degrees of autonomy. And I think that actually hopefully reduce the amount of time people wasted spend arguing something as an agent and let's just call it all agenting and then you get along with it. I think it actually works out. Speaker A: Where on that spectrum of kind of like a little autonomy to a lot of autonomy do you see people building for these days? Speaker B: So my team routinely uses land graph for harvest problems, right. With complex flows and so on. I'm also seeing tons of business opportunities that frankly are fairly linear workflows or linear with just location. So more of businesses there are opportunities where you're right now with people looking at form on the website, doing web search, checking some of the database to see if it's a compliance issue or if there are, you know someone we should sell service up to. And it's kind of a well take something, copy paste it. They do a web search paste in a different form. So in business processes there are actually a lot of fairly linear workflows on linear with very small branches usually code. So I see a lot of opportunity. But one challenge I see businesses have is it's still pretty difficult to look at some stuff that's being done in business and figure out how to turn to get together. So what is the granularity which you should break down this thing into micro task. And then after you really reach prototype, if it doesn't work well enough, which of these steps do you work on to improve the performance? So I think that whole bag of skills on how to look at a bunch of stuff that people of doing breaker into sequential steps of where the small number of branches how do you put in place evals, you know all that that skill set is still far too. And then of course in a much more complex agentic reference. I think you heard much about the very complex groups that's very valuable as well. But I see much more in terms of number of opportunities, the number of value. There's a lot of siphon reference. Speaker A: Let's talk about some of those skills like so you've been doing deep learning. I think a lot of courses are in pursuit of helping people kind of like build agents. And so what are some of the skills that you think agent builders all across the spectrum should kind of like master and get started with Boy. Speaker B: It's a good question. I wish I knew glance at that. I've been thinking a lot about this actually. I think all the challenges of how did you put the either through a land route integration or we'll see MCP process on that too to ingest the data and then how do you prompt or process and multiple steps in order to build this intel system. And one thing I see a lot is putting in place the right eval's framework to not only understand the performance of the overall system, but to trace the individual steps you can hone in on. What's the one step that is broken? What's the one problem that's broken to work on? I find that a lot of teams probably wait longer than they should just using human evals where every time you change something, you then sit there and look at a bunch of outputs. I see most teams proudly so articulate place, systematic, evaluative design team. But I find that having ощутите right instincts for what to do based on the project is still really difficult. The skilled teams, the teams are still learning these skills will often go down blind alleys where you spend a few months trying to improve one component. The more experience people say, you know what, I don't think this can ever be made tomorrow. So just don't just find a different way around this problem. I wish I knew, I wish I knew more efficiently to get this almost tactile hours often get there. You know, look at the output, look at the trace, look at the elastic output and just go and make a decision right to do next. And that's still very difficult. Speaker A: And is this kind of like tactile knowledge mostly around LLMs and their limitations or more around like just the product framing of things and that skill of taking a job and breaking it down. Speaker B: That'S something that we're still getting across. I think it's all of the above actually. So I feel like over the last couple years, AI tool companies have created an amazing set of AI tools. And this includes tools like also how do you ideas like how do you think about rad, how you feel about the chat box? Many, many different ways of approaching memory. I don't know what else how do you put evalves, how to put audios. But I feel like there's this mind sprawling array of really sentient tools. One picture often have in my head is if all you have are purple LEGO bricks, you can't build that much interesting stuff. And I think of these tools as Being akin to LEGO bricks. Right. And the more tools you have is as if you don't just have purple LEGO brakes, but a red one, a black one and a yellow one and a green one. And as you get more different colored and shaped LEGO brakes, you can very quickly assemble them into really cool things. And so I think a lot of these tools everyone was rattling off as different types of LEGO brakes. And when you're trying to build something, you know, sometimes you need that right squiggly name shaped LEGO brake. And some people know it, they can plug it in and just get the job done. But if you never build evals of a certain type, then, you know, then you could actually end up spending whatever three mixture must do something that someone else that's done that before could say, oh well, we should just book e vals this way. Here's an oh, there's a judge. Just go through that process and get it done much faster. So one of the unfortunate things about ar, it's not just one school and when I'm coding I just use a whole bunch of different software. I'm not a master enough stuff myself. I'm learning enough tools to sell quickly. So yeah, and I think having that practice with different tools also helps with much faster decision making. One of the things it also changes. So for example, because alarms have been having longer and longer context, lot of the best practices for RAG from a year and a half ago or whatever, much less relevant today. I remember Harrison was really early to apply these things. I played early lang chain rag frameworks, recursive summarization and all that. As our context memories got longer, now we just dump a lot more stuff into our context. It's not that RAG has gone away, but the hyperparamet has gotten way easier. There's a huge range of hyperparameters that work like just fine. So as OLMs keep progressing, the instincts we hold, you know, two years ago, they don't be comprehended anymore. Speaker A: Today you mentioned a lot of things that I want to talk about. So okay, what are some of the LEGO bricks that are maybe underrated right now that you would recommend that people aren't talking about like evals? We had three people talk about evals and I think that's top of people's mind. But what are some things that most people maybe haven't thought of or. Speaker B: Good question. I don't know. Yeah, even when people talk about eval. So some reason people don't do it. I think it's because people often have. I saw a post on this, on evals, right. As well. People think of writing evals as this huge thing you have to do, right. I think of evals as something I'm going to fill together really quickly, you know, total minutes and it's not that good. But it starts to complement my human eyeball evals. And. And so what often happens, I'll build a system. And this one problem I keep on getting regression. I thought I made it work. Then it breaks climb up, just getting annoying. Then I put on a very simple eval, maybe with you know, five different examples and some very simple out this judge to just check for this one regression, right. To dis one thing. Right. And then I'm not swapping out human device for alternating evals. I'm still looking myself. But when I change something, I run this device to just, you know, take this word into something so I don't have to think about it. And then what happens is just like the way we write inclusion, maybe once you have some slightly hopeful but clearly very broken, imperfect eval, then you sum up, you know what? I can improve my eval to make it better. I can improve it to make it better. So just as when we build a lot of applications, we build some, you know, very quick and dirty things that doesn't work and we couldn't make it better for a lot of the way I know evals, I know really awful evals that barely holds. And then when you look at what it does, you go, you know what, this eval's broken. I could fix it and you can prove it better. So that's one thing I'll mention. One thing that people I've talked a lot about that I think is so underrated is the voice stack. It's one of the things that I'm actually very excited about voice applications. A lot of my friends very excited about voice applications. I see a bunch of large enterprises really excited about voice applications. Very large enterprise, very large use cases. For some reason, while there are some developers in this community doing voice, the amount of developer attention on voiceat applications there is some. Right. It's not really important. That's one thing that is much smaller than the large enterprise importance I see as well as applications coming down the market and not, not all of this is the real time voicing. It's not all speech to speech native audio models. I find those models are very hard to control. But when we use more agent it, I'm working with a ton of teams on voice back stuff that some of which hopefully will be announced in I'll see a lot of the results and then other things I underrated one of the ones that maybe is not underrated. I think many of you have seen that developers that use AI assistance not coding is so much faster in terms of domes. I've been, it's been interesting to see how those CIOs and CTOs still have, you know, policies that don't let me use AI system coding. I think maybe sometimes for good reasons, but I think we have to that because frankly none of my teams and I, I just hate to ever have to conate about AI systems. So I think some businesses, I think underrated is the I idea that I think everyone should learn to. One fun fact about AI fund, everyone in the I fund, including you know, the person that runs our front desk receptionist and my CFO and my attorney, the general counsel, everyone in AI club actually knows how to code. And it's not that I want them to be software engineers, they're not. But in their respective job functions, many of them I learned more about are better able to tell a computer what they wanted to do. And so it's actually driving relativity progress across all of these Java functions that are non stoppage. So that's pretty exciting. Speaker A: Talking about kind of like AI coding, what tools are you using for that personally? Speaker B: So we're working on some things that will not get announced. Speaker A: Exciting. Speaker B: So maybe I do use Pro Server, Windser and some other things. Speaker A: All right, we'll come back to that. Speaker B: Talked about voice. Speaker A: If people here want to get into voice and they're familiar with building kind of like agents with LLMs, how similar is it? Are there a lot of ideas that are transferable or what's new? What will they have to learn? Speaker B: So in terms of there are a lot of applications where I think voices increase certain interactions that are much more. It turns out that it turns out from an application perspective, input text prompt is kind of intimidating, right? For a lot of applications, say tell me what you think. Here's a block of text prompt, very much intensive that's actually very intimidating for users. And one of the problems with that is people can use backspace. And so you know, people are just slower to a small via text. Whereas for voice, you know, time goes forward, you just have to keep talking. You could change your mind. You actually say oh I changed my mind to get that other thing. And I'm also pretty good at you. But I find that the amount of applications with the user friction to just getting them to use it is lower. Just saying, you know Tell me what you think and then they respond in voice. So in terms of voice, the one biggest difference is in terms of engine requirements is latency because if you can, if someone says something, you kind of really want to respond in you know, sub 1 second. Right. Less than 5G 4 second is great. But really I give you some 1 second and we have a lot of HIV workflows that will run for many seconds. So when TBMWI works with real avatar to build an avatar if you want. Our initial version had kind of five to nine seconds of latency and it's just a bad user experience. You say something, you have nine seconds of silence, then our avatar response. And so we want to build things like we call it pre response. So just as if you ask me a question, I might know. Huh, that's interesting. So we constantly do that on your latency and it actually seems to work great. And there are things that will take as long in terms of building a voice customer service chatbot. It turns out that if you play background modes, custom contact center instead of dead silence, people are much more sensitive of that. So I find that a lot of these things that are different than a pre text based on them. But in applications where a voice based modality. Yes. And user be comfortable and just start talking, I think it sometimes really reduces using friction. So you know, getting some information on them. Say I think when we talk we don't feel like we need to deliver perfection as much as we do. Right. So it's somehow easier for you to just start your ideas and change your mind and go back and forth. And that lets us get the information from them that we need to how to use them to improve. Speaker A: Huh, that's interesting. One of the new things that's out there you mentioned briefly is mcp. How are you seeing that transform how people are building apps, what types of apps they're building or generally happening for the ecosystem. Speaker B: Yeah. So I think it's really exciting. Just this morning we released shortcodes on mcp. I actually saw a lot of stuff, you know, on the interweb on MCP that I thought was quite confusing. So when Gaussian interrupted, he said, you know, let's put a really good short course on mcp. That explains it clearly. I think MC is fantastic. I think it was a very clear market gap and you know that that Open Eye adopted it also I think speaks to the importance of this. I think the MCP standard will continue to evolve, say for example. So I think many know make it much easier. But agents primarily they're frank creating other types of software to plug into digitizing data. When I'm using elements myself or when I'm building applications, frankly for all of us we spend so much time deploying, right? So I think for those of you who watch enterprise as well, the AI especially you know, these models are like pretty d intelligent to can do a lot of stuff when given the right context. But so I find that I spend my team spend a lot of time working deploying data integrations to get the context of allowing to make an appeal to something that offers pretty sensible hands reading context. So MCP I think is a fantastic way to have a standardized interface. So a lot of tools or API calls as well as data sources. If the feels like. It feels a little bit like wow, you know, one of SCP servers define the Internet do not work right? And then the authentication systems are kind of, you know, even for the very large puppies, you know, with SCP service not clear the authentication token Tony works response and all that going on. I think the SCP protocol itself is also early right now. MCP is a long list of, you know, eventually I think we need some more hierarchical discovery. Imagine you want to build something I don't know, even I don't know SP interface to Land Graph and Land Graph has so many API calls. You just can't have like a long list of everything under the sun. So I think SP is a really fantastic first step. Definitely encourage you learn about it will make your life easier. Probably find a good MCP server implementation, some data integrations and I think, I think that would be important this idea of when you have, you know, n models or n agents and n data sources, there should not be an N times n effort to do all the integrations from N plus M. And I think MCP is a fantastic first step. It will need to be more like the fantastic first step to that type of data integration. Speaker A: Another type of protocol that seems less buzz than MCP is some of the agent to agent stuff. And I remember when we, when we were at a conference a year or so ago, I think you were talking about multi agent systems, which this would kind of enable. So how should you see some of the multi agent or agent to agent stuff? Speaker B: So I think, you know, agent and AI is still silver. Currently most of us, right, including me, we struggle to even make our code work. And so making my code, my agent work with someone else's agent, it feels like a two miracle. So I see that when one team is building a multi agent system that Often works because we build a bunch of agents themselves that works. But right now, at least at this moment time and Yu Young off the number of examples I've seen of when you know, one team's agent or correction agent successfully engages a totally different team station of graduation agents, I think we're a little bit early to that. I'm sure we'll get there. But I'm not personally seeing, you know, your success to successful is of that yet. I'm not sure. Speaker A: No, I agree. It's, it's. I, I think it's super early. I think if MCP is early, I think stuff that was used earlier. Another thing that's kind of like top of people's mind right now is it's kind of vive coding and all that. You touched on it a little bit earlier with how people are using these AI coding assistants. But how, how do you think about live coding? Is that a different skill than before? What, what, what kind of purpose does. Speaker B: That serve in the world? Yeah, so I think, you know, many of us cope with barely looking at the code. Right. I think it's a fantastic thing to be doing. I think it's unfortunate that that's called 5 coding because it's misleading a lot of people thinking just build the vines. And frankly when I'm coding for a day, you know, with five code or whatever, mechanical assistance, I'm frankly exhausted by the end of the day intellectual exercise. And so I think the name is unfortunate, but the phenomenon is row and it's been taking on carbon is great. So over the last year, young people have been advising others to not learn to code on the basis that AI will automate coding. I think we look back at some of the worst career advice ever because over the last many decades, as coding became easier, more people started to code. So it turns out, you know, when micro punch copy cost to keep all the terminals, it turns out I actually found some very programming went from assembly language to literally cobol. Then when people argue back then, yeah, we have cobol. It's so easy, we don't need programs anymore. When easier more people learn to code. And so with AI coding assistance, more, a lot more people should code. And it turns out one of the most important the schools of the future for developers and non developers, there's the ability to tell the computer exactly what you want so they will do it for you. And I think understanding at some level, understanding at some level how a computer works lets you prompt or instruct a computer much more. Which is why I still try to advise everyone to learn one program language, learn Python or something and then I think maybe somebody knows this but I personally am much stronger Python developer than say JavaScript. Right. But with AI system coding I now write a lot more JavaScript and TypeScript code than I ever used to. But even with developing, you know, JavaScript code that something else you don't think that write fingers really understanding you know what an error case is? What. Speaker A: If you don't like the name Vibe coding? Do you have a better name in mind? Speaker B: Oh, that's a good question. Speaker A: One of the things that you announced recently is a new fund for AI funds. So congrats on that. For people in the audience who are going to be thinking of starting a startup or looking into that, what advice. Speaker B: Would you please have that so AP funds a venture studio, so we build companies and we exclusively invest in companies that so I think back on lessons learned. The the number one, I would say the number one predictor of a startup substance is speed. I know it's look at value but I see a lot of people that I've never seen yet do the speed with which a skilled team can execute. And if you've never seen it before, I know many of you have seen it. It's just so much faster than you know anything that some of businesses know how to do. And I think the number two predictor also very important is technical knowledge. It turns out we look at the skills even with startup there's some things like how you market, how they sell, how we price, you know all that is important but that knowledge has been around, it's a little bit more widespread but the knowledge that's really rare is how this technology actually this technology evolving so quickly. So I have deep respect for the good market people. When pricing is hard, marketing is hard, position is hard, but that knowledge is more diffused. And the most rare reason is for someone that really understands how the technology works. So AI front, including my fractality team, technical people that have good instincts or understands do this, don't do that, this doesn't go ties as fast. And then I think along with the business stuff, you know that knowledge is very important but it's usually easier to figure out. Speaker A: All right, that's great advice for starting something. We are going to wrap this up. We're going to go to a break now but before we do, please join me in giving Andrew a big IT.