Skip to main content

Agents in the Enterprise – Aaron Levie (Box) & Harrison Chase

Speaker(s): Aaron Levie (CEO, Box); Harrison Chase (Co-Founder & CEO, LangChain)
Session: Interrupt 2026 · Day 1 (May 13) · ~4:10 PM PT
Source: in-person audio recording, transcribed locally with Whisper large-v3.

Summary

In a fireside chat with Harrison Chase, Box CEO Aaron Levie lays out why agents are far easier in coding than in the rest of knowledge work: code is verifiable so ROI is provable, coding users are hyper-technical, and codebase permissions are mostly wide open—whereas knowledge work has messy, per-person permission permutations, less verifiable output, and non-technical users who can't navigate access limits the way a human can. He argues this means years of deployment, change management, and adoption, and that 'doers' will be wrong on the take-off because of slow diffusion into organizations. Levie explains Box's horizontal position (domain expertise in enterprise content, file systems, and access controls), how decade-old architectural bets—single file system, single canonical ID, single governance model—now pay off for agents, and his view that software should be both a world-class product experience and world-class headless APIs, accepting that volume will skew headless (citing Salesforce's headless endorsement). He champions building on a coding-agent harness even for non-coding knowledge work ('an unlimited supply of engineers'), and predicts cost will become the dominant story over the next three years, pushing enterprises toward multi-model setups as token budgets tighten.

Key Points

  • Coding agents work well because the output is verifiable (you know if the code worked), users are hyper-technical and can fix or grant access, and codebase permissions are generally wide open—so getting context to the agent is easy.
  • Knowledge work is the opposite: output is less verifiable, models are still catching up on non-coding tasks, users are less technical, and every individual/team has a different data-access permutation that agents inherit, multiplying complexity when an agent hits an access limit it can't navigate.
  • Levie expects 'years and years of deployment and change management and adoption,' a cognitive dissonance between hyperbolic coding-agent capability and slower knowledge-work progress, and that 'doers' will be wrong on the take-off due to the rate of diffusion into organizations.
  • Emerging patterns: vertical/domain-specific agent startups (purpose-built context, MCP connectors, forward-deployed engineers) and enterprises building internal 'AI agent forward-deployed engineer' roles to bridge agents into the org.
  • Box is horizontal with deep expertise in enterprise content—document conversion pipelines turning content into agent-friendly formats (an API converting any content type to markdown), plus security/access controls; the Box agent is an expert at enterprise content and plugs into broader agents via its MCP server.
  • Decade-old architectural decisions now benefit agents: a single canonical ID per object, a single file system, single governance model, and single permissions architecture (versus legacy competitors with multiple ways data shows up); they did have to retune the search engine because an agent can consume ~100x the search results of a human.
  • On product strategy, Levie says expect to be both world-class as a product experience and world-class as headless APIs—volume will likely skew headless—citing Salesforce 'volunteering to headless' as a symbolic moment and the Jevons-paradox effect of using systems 10x-100x more for tasks no human would have been assigned.
  • He advocates building on a coding-agent harness even for non-coding knowledge work (agents will always be best at code, hence good at tool use/MCPs/writing code on the fly) layered with vertical/horizontal domain expertise—Box tunes its agent on file-system navigation (when to search vs. browse folders).
  • On models, Box is multi-model 'with some asterisks' for capability, performance, and cost differentiation; Levie predicts cost becomes the big story over the next three years, citing Uber's CTO and ServiceNow blowing through token budgets and that public companies with EPS targets can't suddenly spend $10M more on an AI build—forcing efficient model mixtures.

Notable Quotes

the work is not as verifiable as code

it's also why the do-ers are generally going to be wrong on the take-off because of the sort of rate of diffusion of how this is going to actually enter organizations

what if every knowledge worker or domain in a business had like an unlimited supply of engineers

Full Transcript

Show the full timestamped transcript (auto-generated; lightly cleaned)
[00:00] You know, kind of old, uh, I'm not gonna talk about internet, Box, the world itself is verifiable,
so you kind of know if the code worked, and so that helps hold an ROI process, and the actual, kind
of, when you're using the agent to do some work. Um, the users are hyper-technical, bit by
definition, and so, like, when the agent does something, and it goes wrong, you're like, you know
how to fix it. Or if it says I need access to a data source, you generally have the ability to, you
know, you know, get the right NTP server involved in that workflow. Um, uh, you know that, like,

[00:30] when you're in a coding agent, and it's like, hey, I need to download this CLI from this vendor,
this package, you have the wherewithal to say yes or no based on that situation. Um, uh, I was, uh,
as a brief aside, my dad has gotten into coding, but he's not, you know, a perfect coder. He's kind
of good at Excel and data science stuff. And I was just watching him use PlotCode, and he was just
like, yes, download that package, and yes. And it's like, for sure, for sure, his computer now is
full of malware. Like, there's no chance.

[01:00] Every NTP I've been attacking him to see in the past, like, three days, like, he's got every one of
those. So, so, but like, that's, that's what, and I'm sorry to ask wherever you are, but like, um,
this is what a lot of, uh, you know, teams adjacent to engineers are going to experience. And then
imagine if you're even more adjacent, and you're like, just in, you know, you're in marketing, and
you're in sales, all that technical complexity. Um, so that's a bunch of issues. And then you
actually have a kind of pesky problem that is super, so boring to talk about,

[01:30] but it's just so real. When you're an engineer, you generally, via the codebase, have access to most
of the, most of the data and the context to work within for whatever you're working on. Um, and, and
like, permission structures on codebases are generally, apparently, wide open because you just need
access to the codebase to do whatever you're doing. In knowledge work, this is not the case. Like,
every single individual and every single team within an enterprise has a different permutation to
what they, what data they can access.

[02:00] And so by definition, agents also have those same permutations, which just multiplies the complexity
of when an agent goes and does a task. So, think about all those things we just said, right? So, so,
the work is not as verifiable as code. The models are still working on getting better at, at these
non-coding domain tasks. Users are not as technical. So, kind of getting their whole, the whole
universe of, of things they need to plug in to make the agents work is obviously harder. And so, you
know,

[02:30] the more complex the data they need, often to do the work that they need. And so agents, again, are
at that same level of lack of access. Which means that, you know, as a knowledge worker, we can just
go around and we can go and say, hey, you know, Sally, can you give me access to that thing? But an
agent doesn't have that same ability to navigate that, that complex paradigm of like, well, what
does an agent do when it runs into a limit of what it has access to? So this is the gap. And so then
the question is, how are we all going to, I think you're starting to see some of the makings

[03:00] of examples of where this might work. So first of all, like highly domain and vertical specific
agent startups are emerging. And that's one great thing, because now you can basically say, you know
what, we can deliver this in a really kind of purpose-built way. So we can dramatically reduce all
the issues on how do you get the context to the agent? Well, we know the domain, so we're already at
a leg up on that. How do you get the right data to the agent? So there's a set of MCP connectors

[03:30] that we're already going to work with. There's usually some version of a forward-deployed engineer
in that workflow. So already you have people that are going to be wired to that vertical that can
help support that. So I think on the startup side, I think this creates an incredible opportunity if
you're doing domain and vertical specific agent work. And then on the enterprise side, it's another
sort of advantage, which is you understand your enterprise better than any outside vendor. So it
also creates an opportunity for how do you go and bridge

[04:00] these agents and this technology into your organization. And that's another set of, I think, roles
and interesting opportunities that we're starting to see emerge, which is like, okay, does your
technology team have an AI agent sort of internal forward-deployed engineer, and how are they
setting things up? So that's a little bit of the early things that we're seeing, but I think we have
to be in for many, many years of the use of this technology. And it would be interesting if we're
going to have this huge

[04:30] cognitive dissonance between the, again, extreme capability of AI coding agents that just are
continuing to go hyperbolic versus what we're going to do with the rest of the knowledge work
because of all those issues that I mentioned. And we're just going to be in for years and years of
deployment and change management and adoption, and that's what we all have to kind of question and
sign up for. And, oh, by the way, just a brief asterisk, it's also why the do-ers are generally
going to be wrong on the take-off because of the sort of

[05:00] rate of diffusion of how this is going to actually enter organizations. How are you guys at Vox
thinking of pulling that with Vox Agent? So that seems to me to be horizontal, but you're building
upon a pretty solid data layer. And so how are you guys thinking about that, and how are you
thinking about bridging the gap? Yeah, so I think we're a little bit as quasi-interesting character
because we're a horizontal, so we don't have the domain name level of domain specifics of

[05:30] insurance and banking and legal, et cetera. The thing we have domain understanding on is enterprise
content at a horizontal level, and how to work with things like file systems and how to handle all
the ways that agents and models need to understand working with our file system and working with
enterprise content. And how do you create a pipeline of making sure your data is sort of ready for
agents in a secure way, in a well-covered way, with the right access controls and the right sort of
security and governance on that? So we do a lot of things even

[06:00] well before we even think about AI. We get enterprise content or unstructured data ready for AI. So
we have a whole document conversion pipeline that turns your content into a variety of file formats
that agents work well with. We do a lot in the security access controls area. So that's a lot of the
kind of core planning. And then with the box agent, it's really, its job is to be an expert at
working with enterprise content, and it needs, obviously, the context of that workflow. So it's
going to need either skills

[06:30] or custom agent kind of knowledge, which is the form of, like, what is your business process? What
are you trying to automate? And that looks like, I think, anybody else's agent that you could go
build. Or we'll plug into a broader agent. So, you know, via our MCP server, we'll plug into a RV or
a quad for legal, and then that will obviously take on the domain specific sort of understanding.
You've had a lot of this core plumbing around permissions and governance before. Have you seen that
it's

[07:00] changed because of the agent? Yeah, so I think there's a bunch of things that we've benefited from.
Just maybe sometimes we got lucky, like we made an architectural decision ten years ago that we
obviously made it for people. But we equally get the same kind of gains within a world of agents.
We've been, like, you know, most companies, you will have lots and lots of these deeply religious
battles on things like, you know, which JavaScript framework to go with. And which, you know, which
database to use.

[07:30] And that's obviously, like, you know, the classic engineering battle that would happen. We had years
and years of those same kind of incredibly intense battles on things like, how do you design a file
system? And how should the permissions architecture work? And so we benefited from incredibly
intense sort of processes and ways of making decisions to ensure that we had the most amount of
flexibility. Again, more for people use cases, sometimes for application use cases. But now we're
benefiting from the fact that agents get all of those

[08:00] benefits as well. So within Box as an example, there's not two ways you can store files. There's
only one way you can store files. There's only one canonical ID for every object in the system.
There's only one governance model. So we, you know, versus many of our current legacy competitors
where there's like three different ways that data could show up in the system. Single file system,
single governance model, single permissions architecture. So agents get the benefit of all that.
Then there's some areas where we realize, oh, like, we probably do have to evolve a file.

[08:30] Our search engine needs to take in different signals because the agent can now consume a hundred
times the search results of the human. So like if it can consume a hundred times the search results
of the person, then you might need to rank things differently. Or maybe ranking is less important
than just context in the search results because then the agent can do its own ranking. So we've had
to do a bunch of tweets to our search engine as one example. We've had to do a bunch of things where
in our docnet pipeline that we convert data in an agent-ready format. So we have an

[09:00] API now that will take any, basically any content type and turn it into the markdown. So some things
that just make it so our agent or any external agent can work with this data much better. You
mentioned before that you have an agent inside a box and you expose that at the MCP and you guys
also have just regular search endpoints over MCP as well. I was just thinking about the company,
like, strategy. Do you have a strong opinion of how much you want people to be consuming Box AI in
the Box AI interface versus the PC?

[09:30] We have sort of like what we wish would happen and then we're also realists. So, and we, you know,
by being an enterprise software, anybody in an enterprise software, you kind of know that ultimately
the customer's going to win. There's very few companies that can kind of hold out from overall
customer trends and whatever that kind of mega trend is. So I think, you know, and this is probably
something that every single one of the software players has to face right now, which is when does
your agentic

[10:00] experience show up versus when are you headless? And even headless takes on two forms because it
could be our AI being headless or it could just be our regular APIs being headless. So that has its
own kind of, you know, set of dimensions to it. But I think the way that we kind of think about it
is we have to get insanely good in our harness at things where, you know, working with content is
the most important thing. Where being, you know, kind of token efficient is the most important

[10:30] thing. Where having the right choice of models is the most important thing. We've got to be the best
at that when it deals with enterprise content. So we have, for instance, an agent that just does
document extraction. And we like to think that that agent doing document extraction is going to beat
any kind of horizontal system that you could send that problem to because we're just hanker-wired to
that. We have our own emails for that. We only work on that problem. Conversely, if you're just
working with, you know, Cloud Cowork or some other chat's BT and you want to have content show

[11:00] up inside of that experience, then we know that that's going to be 100% headless. And that'll be
really driven by the customer's, you know, sort of interaction and whatever they decide to roll out
to their organization. And we are completely indifferent sort of philosophically to either of those
approaches that the customer wants to do. I think that's what we all are going to have to sign up
for if you're building software is you should just expect that you should create a world-class
experience within your product and you should create world-class APIs that any external system can
interact with.

[11:30] And then by volume, you know, it might be that by volume of your system usage, it's mostly going to
be headless. And then, but for people, you know, kind of doing a bunch of work, it will often be
that they'll need your interface or your agent for that. Yeah, and I think we saw Salesforce the
other week and the other month explicitly volunteering to headless And that was actually a pretty
big symbolic moment. So, you have basically the world's largest SaaS company fully endorsing that
headless is the future. I've already used

[12:00] their headless MCP a ton of times for ways that previously, interestingly, this is the bold case for
all of this. I've used their MCP server now for tasks that were both more complex and much higher
volume than what I ever would have had an internal analyst go do. And this is sort of the Jeb's
paradox kind of whole thing on this, which is there's lots of things where when you're inside of a
workplace, you kind of like, hey,

[12:30] that's a person that's going to have to go spend five hours on this project or three hours on this
project. Is that really the most important thing for them to go do versus everything else that
they're going to work on? Whereas an agent, I don't care other than the token costs. So I give tasks
to a variety of agents all the time that I wouldn't have asked the person to do. But it ends up
doing them useful work for me. And so Salesforce MCP server is a great example where I'll just hand
off a thing of some sort of research and it'll go into maybe 20 minutes of work. That wouldn't be
the equivalent of hours of human work

[13:00] that I just absolutely would not have ever had somebody else go and do. And that's the example of
now the new surplus of what we're going to start to see is we'll just use these systems 10x or 100x
more depending on how valuable that data is or where it shows up as being valuable to the
organization. Yeah, that's what you were going to say, because I think there is a bunch of fear
around being headless and getting disinterpreated or having other people drive it. It seems like
it's just driving more people. I think there's some areas where I can appreciate

[13:30] the fear. There's not like a perfect quadrant. Maybe you guys can publish one so we can all know
what's going to happen. But there's probably something of like, is this type of data, does it have a
bunch of unsaturated use cases in the enterprise? If that data could be unleashed for everything,
how much more would it get used? Versus there are some data sources where, no, even if you could
throw any amount of

[14:00] compute at that, I just don't think you could do more with that information. And I think that that's
a thing, you know, we just have to run and get to like how do we parse, what types of systems could
you really use a ton more versus which ones are already meeting their set of needs. And that will
help us I think, you know, kind of figure out what things will have this be a huge opportunity for
versus maybe just control. You talked a little bit about your harness for kind of like knowledge
work and document intelligence. We also talked about coding agents and how those are

[14:30] taking off pretty rapidly. How much do you look at like coding agent harnesses for inspiration
around what your harness should look like and how similar do you think they'll end up being? Yeah, I
mean I think, and what you guys are doing with the agent is sort of, to me, like right on point
because if you think about it, and I was late to this conclusion and I only sort of seen the light
maybe in the past six months on this, you know, but I think you guys were super early and a bunch of
folks were early, which is like these agents will

[15:00] always be better at code than probably everything else. So if they're always better at code than
everything else, then we should probably start to use that as an advantage in everything that we're
doing agentically. And guess what? Like it turns out that if you're really good at code, then you're
probably also really good at tool use and using MCPs and writing code on the fly to perform some
kind of action. So we should probably like take advantage of that. And it's like an interim analogy,
but like to some extent it's sort of like what if every knowledge worker or domain in a business had
like

[15:30] an unlimited supply of engineers for right next to them to do whatever they are trying to do now,
you know, essentially for free. You know, for free. Turns out we gotta be like trying to work with
other people. But like, we thought it was free. So we need to figure that out. So, you know, so God
bless Nippos for that problem. But basically if you imagine like your your social media manager or
your performance marketer

[16:00] or your sales system analyst, if they had an engineer full time next to them, what would they have
to go do? Well you'd probably be like, I'm gonna wire up the most crazy, automated, you know,
marketing campaign integrated strategy. I'm gonna get the best customer intelligence from all of my
data systems and then be able to sort of surface them up to my sales rep. So what are those? Those
are coded tasks that you could have like if we had expendable engineering resources previously,

[16:30] put an engineer on that and you just never would have because you don't have expendable coders in
organizations. But now you do. And so the idea of take a code harness, make it really, really good
now at non-coding tasks. I mean, they're coding, but they're not for engineers. They're for
knowledge work. That kind of becomes an interesting primitive that sort of makes sense across all of
knowledge work. You know, a lot of the way we're now seeing this codex as a super app or seeing this
as a lot of code work. And so I think until proven otherwise,

[17:00] I think as a harness, you know, harnesses now make sense to be these expert coders. They should have
a computer, they should have a sandbox, they can write code and run code in. They actually obviously
need connectors to all the different systems that they're using. And then there's an overlay of what
makes them now not just coders. And that's like the vertical or horizontal domain expertise. So in
our world, we have, you know, we spend a lot of time in our system with the command prompt and the
variety

[17:30] of the harness. We spend time making sure that our agent knows about file systems and like how do
you peruse a file system and when should you use a search tool versus when should you navigate the
folder path. And it's like a really interesting problem because if you just went to search every
single time, then often search doesn't tell you the surrounding environment that something might be
within. But clicking a couple folders deep, you all of a sudden get that information. And so you
actually need to know like what is the type of task I'm being given and should I be doing a search
to find this single document

[18:00] or should I be going through a folder tree to find this sort of workspace that everything lives
within. And so we spend a lot of time to make our agent really good at those types of things. And
then obviously that needs to then show up in a vertical specific way because you can build across
the page and so on. But I think everybody's going to have some form of that but we basically built
on the backbone of a coding harness. And if you're going to think about building the agents, how do
you think about model selection choice for what you want to do? And do you use multiple models

[18:30] in one agent run when interacting with a bot's agent or call multiple on it? We do with some
Asterix. So I mean there are some kind of penalties you get like hashing and so it's like, I think
we're all trying to figure out what's our best version of hybrid on that. But I do think this is
going to be this one back and forth ping pong right now in the industry which is on one hand you
could go to a harness that is sort of purpose built for a model and by definition of purpose built
for a model

[19:00] it's going to be, you know, the model provider on the market is going to be more oriented toward
token maximization, token cost maximization, not even in any kind of like maybe explicit way as the
built harness is just sort of like they would inherently care to make it efficient. That wouldn't be
core to their business model. And obviously it's going to be wired up against effectively one model.
And so for everybody who's not a front tier lab, you're probably doing things intentionally

[19:30] to be multi-model. And if you're multi-model, what are the benefits? One, different capabilities
from the models that become useful. So like one model is better at coding, another is better at
people contracts, another model is better at looking at insurance claim documents. Okay, once you
have eVAL you kind of know that. That gives you more advantage being neutral is sort of capability
differentiation. There's a performance differentiation because I can be like, okay, Gemini, you
know, free hash is faster than Opus

[20:00] 4.7. And so there are some things that you just want different types of workloads being applied
against. And then there's finally cost. And cost might become the big story in the next three years.
And so if cost becomes the really big story in the next three years, then what's going to happen is
actually sending these tasks to the most general purpose system is going to basically average out
your cost. Like that means like sometimes, you know, you'll

[20:30] do it in a performant way because it just happens to kind of align with that system and sometimes
you'll be spending way too much money for that task. And so I don't know when and how this happens,
but like there's some threshold, another graph you guys should publish. But it's a great idea. Yeah.
I don't know. But there's probably some threshold where like, you know, the cost savings of the task
become worth having a dedicated artist do. And we're all kind of figuring out where those points
are, right?

[21:00] So if you were to go through an insurance claims process and you were going to go and like automate
that as much as possible and either generate $50 million more per year or cost $50 million per year,
but it's like a big upside or big cost, then probably there's some point in the threshold where it
would be better for you to be hyper tuned to the right model, the right capability, the right custom
instructions for that workload. And because that could net-pens and you could get millions of
dollars of either more revenue

[21:30] or more cost savings. And so not every work will have that. I mean just chatting and trying to
figure out like how do I summarize a document doesn't really have that component. But a really heavy
duty workflow will. And so I think that will mean that over time as basically you have like frontier
intelligence that will always be expensive and you can kind of peel off you know, just frontier
minus one intelligence from like three months ago, then all of a sudden it starts to make sense to
peel those kind of workloads probably

[22:00] to cheaper models, where there's some kind of capability maybe focused. And then ultimately we want
a harness that could kind of handle that level of sort of neutrality. And that I think will be an
interesting trend to watch with what happens, what parts of the industry will that happen in, etc. I
think this is already happening over the past few months. I think there was the Uber CTO who said
they blew through all their token budget. I want to say ServiceNow recently said something similar.
I think you had a tweet on like token budgeting.

[22:30] I think it's happening with coding agents first because I think adoption is first and foremost. But
then once everyone starts adopting these frontier models costs explode. What do you think the end
game for that is in coding in particular? Are you guys running for that at Box at all? Are you
seeing this pretty commonly in enterprises? Yeah, it was so funny because there was like I think the
token maxing meme only lasted three days. Because it was like, you know, first of all only Facebook
can actually afford to do that. And so it was like,

[23:00] I'm sure, you know, this price is very well on the startups here as well that can do it. But like
for us near-mortal companies we have to like have budgets and we have to like, you know, have
planning cycles and how much can we use. And so we have not yet gone into like major existential
issues on the coding side. We're still more on the ascent, like where we're just continuing to say,
yeah, use it as much as you can. Certainly it's productive, but like we're not like rewarding you
for using lots of tokens because you just get weird kind of obviously like the set of comments on
that.

[23:30] So it's just like, use it productively. And we use our roadmap really as a means of pushing beyond
work. So we just, we want to stack more things into the roadmap and that should actually control and
drive the token utilization. But I do think that as this sort of translates into enterprises and
this is again another maybe just like different than a Curious Silicon Valley funded startup, like
you can basically convert $60 into tokens in a way that like if you go to a Wall Street bank or a
public company

[24:00] and name your industry, they can't do that. They have like EPS targets they have to hit every single
quarter. And so they can't all of a sudden spend $10 million more on an AI build. And so that's,
once that pressure starts to build, all of a sudden you're going to have now the, okay, we have to
do things more efficiently, which means we probably need a mixture of models in these types of
workloads. And that's just going to become, it'll have to become a big part of this conversation.

[24:30] Even though we're seeing the early signs of the totem card. We're wrapping up with one last quick
question. Can you give us a sneak preview of your next tweet? What's top of mind? What do you know
about it? Any ideas out there? What should we talk about? Anybody got anything? Okay. Don't respond
to my tweets with large audiences. It's a completely different... Okay, there you go. We'll see how
often it does. No, I mean it's, I'm incredibly excited to be doing

[25:00] anything with AI right now. I think we're all fortunate to be in sort of building positions, whether
you're in a larger organization or within the startup world. And ultimately I think it's a moment of
like how do we bridge this technology into organizations. So yeah, super excited to be in this
moment. Congrats on all the awesome announcements today. So yeah, great work. I think we all are.
Let's give it up for Aaron. Thank you.