Breakthrough Agents - Learnings from Building AI Research Agents

Connor Heggie

Kunal Rai

Summary

The conversation focuses on enhancing an agent's capabilities through the development and implementation of new tools aimed at improving Internet research and browser access. The goal is to enable more effective and accurate information retrieval, which in turn improves the agent's performance.

Key Information:

Attendees: Only Speaker A is mentioned; others are unclear.
Topics Discussed: Development of tools such as deep Internet research and browser access to enhance agent capabilities. These tools aim to improve how agents conduct searches and handle online information, addressing issues like misinterpretation and the need for more comprehensive data ingestion.
Decisions Made: Implementation of these new tools (deep Internet research and browser access) to improve agent effectiveness.
Action Items: Investment in emails to highlight issues found through traces, with a focus on making the process more repeatable and scalable. No specific deadlines are mentioned.

Sentiment Analysis:
The tone is professional and focused on problem-solving, indicating progress and a proactive approach towards enhancing capabilities and recruiting.

Auto-Highlights

Auto-Highlights: Highlight: Google search, Count: 2, Rank: 0.08 Highlight: Internet search, Count: 5, Rank: 0.08 Highlight: agents, Count: 5, Rank: 0.08 Highlight: agent result quality, Count: 1, Rank: 0.08 Highlight: more tools, Count: 1, Rank: 0.08 Highlight: deep Internet research, Count: 2, Rank: 0.07 Highlight: Internet research, Count: 3, Rank: 0.07 Highlight: interesting agent problems, Count: 1, Rank: 0.07 Highlight: Google Maps, Count: 3, Rank: 0.07 Highlight: browser use, Count: 2, Rank: 0.07 Highlight: additional tools, Count: 1, Rank: 0.07 Highlight: Deep search, Count: 1, Rank: 0.07 Highlight: browser Access, Count: 4, Rank: 0.06 Highlight: interactive search experience, Count: 1, Rank: 0.06 Highlight: new tabs, Count: 1, Rank: 0.06

Transcript

Be as effective in tool calling or reflecting or in different parts of your agent workflow. So you probably want to do some kind of node based email. So the second axis was building more tools and we needed to think about what tools we needed to build. And so we thought about, okay, what use cases can we not support today that we really want to turn on with additional tools? So what customers can we power new workflows with by adding just a single new tool? So the four tools that we decided to add because of this were deep Internet research, browser access, searching HTML and dataset access. And so I'll go through a couple of these. So why we started with deep Internet research is that Internet search is still hard between SEO articles on Google as well as with search grounding to LLMs. With things like OpenAI and complexity, you're left with almost lot of result qualities out of your hands. And we also saw that in our agentic use that tools or calls to Internet were not being utilized like we would. So like this is how we would conduct research on the Internet before using algorithms. So we thought about how do we do it today? So we're pretty good at doing Internet research, but we do it fundamentally differently than agents are doing it or how our agents are doing. Initially. When you do a search on Google, you might search for a query, look at the top 10 links, implicitly filter out probably five of those just based on the source, open a couple of new tabs, maybe read through a couple of sentences on each before deciding that you need to do a different search query or that you found your answer. So we saw that our agents were not mimicking this common behavior and wanted to adjust course to improve agent result quality. So we upgraded from our initial pedantic model, which was initially we had a very naive structure which is just kind of like a query term. We kind of flipped that to include a bunch of other arguments with things like a category, whether we want a live crawl including the text in the summary and also maybe constraining domain or even published that date. And so by changing all these parameters, we're changing the trajectory of Google search as well. From first reviewing just the preview from an Internet search output, which is what we have on the left here to after getting both the URL and the actual page content in one tool file. Sorry. So what this allows us to do is pull in all this content at once and sidestep this issue that we were seeing with agents picking an answer just based on a Google search preview, which as we know isn't always Reliable or accurate. So the second main tool we built was browser Access. So how do we do it again? There's a lot of rich data online that scraping isn't able to capture. So between like online data sources or data sets that require you to enter a query, interactive search experience, or even things like Google Maps or images, you can't really capture that content and scraping. So we wanted to allow our unified agent to use the browser the same way we would. So we built Browser Access as a sub agent. So we gave this tool, which is basically browser access to this agent and what it does is decomposes the task into a browser trajectory using and then also uses Computer Use Preview to actually action on that. We evaluated Browser Use, the open source alternative, and we found that while it was marginally faster, it struggled in more complex browser tasks which let us use Computer News Preview instead. You can see an example of this here, which is where we try to find if Google has EV parking on site. And so it eventually ends up using the browserise tool, he goes to Google Maps, it ends up using Street View, going through looking for EV charging station in their parking lot and then also flipping to a new tab in the browser to check to see if it has ev. And on that last page there, it does actually confirm between Google Maps and that page that there is an EV charging station. So we learned a lot from these tools. And one thing we learned was, okay, we can't use this kind of naive approach to Internet search. Internet search and Google is great, but you still need to empower your agent to be able to look at the data, ingest the right content in the context and then action based off of that context. Deep search and this pivot to changing and pulling in content at once massively reduced the amount of misinterpretation we had in Internet search and changed how we conducted research. And these other tools like browser use and searching HTML unlocked completely new use cases for our agent. So as a result, the new agent or the new champion we have in Prod is Kunal Browser Agent. As you can see, we've still kept in the name or theme of naming our agents. So a couple quick next steps is based on these changes in tools. We want to invest a little bit more time in emails to actually highlight some of these issues that I found just looking through traces or we found looking at outputs to make this process a little more repeatable and scale. Awesome. We're solving a lot of interesting agent problems. So if you also want your name in our code base as an agent come chat with us after or apply online. We are hiring tons of engineers. Thank you guys.

Summary​

Auto-Highlights​

Transcript​

Summary

Auto-Highlights

Transcript