Sierra Co-Founder Clay Bavor on Making Customer-Facing AI Agents Delightful

Customer service is hands down the first killer app of generative AI for businesses. The reasons are simple: the costs of existing solutions are so high, the satisfaction so low and the margin for ROI so wide. But trusting your interactions with customers to hallucination-prone LLMs can be daunting. Enter Sierra. Co-founder Clay Bavor walks us through the sophisticated engineering challenges his team solved along the way to delivering AI agents for all aspects of the customer experience that are delightful, safe and reliable—and being deployed widely by Sierra’s customers. The Company’s AgentOS enables businesses to create branded AI agents to interact with customers, follow nuanced policies and even handle customer retention and upsell. Clay describes how companies can capture their brand voice, values and internal processes to create AI agents that truly represent the business. Hosted by: Ravi Gupta and Pat Grady, Sequoia Capital Mentioned in this episode: Bret Taylor : co-founder of Sierra Towards a Human-like Open-Domain Chatbot : 2020 Google paper that introduced Meena, a predecessor of ChatGPT (followed by LaMDA in 2021) PaLM: Scaling Language Modeling with Pathways : 2022 Google paper about their unreleased 540B parameter transformer model (GPT-3, at the time, had 175B) Avocado chair : Images generated by OpenAI’s DALL·E model in 2022 Large Language Models Understand and Can be Enhanced by Emotional Stimuli : 2023 Microsoft paper on how models like GPT-4 can be manipulated into providing better results

Published: Published Aug 27, 2024
Uploaded: Uploaded Jun 11, 2026
File type: Podcast
Queried: 00

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:50

[00:00] One of the more interesting learnings from the past year and a half of working on this stuff is, [00:06] is that the solution to many problems with AI is more AI. [00:12] And... [00:12] It's somewhat unintuitive, but one of the remarkable properties of large language models is that they're better at detecting errors in their own output than in not making those errors in the first place. [00:24] *music* [00:39] *Bell rings* [00:41] Joining us today is Clay Bevor. [00:43] co-founder of Sierra. [00:45] Before Clay started Sierra with his longtime friend Britt Taylor, [00:48] He spent 18 years at Google, where he started and led Google Labs, their AR, VR efforts, and a number of other forward-looking bets for the company. [00:56] Sierra is allowing every company to elevate its customer experience through AI agents. [01:00] And there is no one. [01:01] Who knows more about what AI agents can do today? [01:04] and what they'll be doing tomorrow than Clay. [01:06] You'll get to hear about how pictures of avocado chairs helped inspire the founding of Sierra, [01:11] Why the solution to problems with AI is often more AI. [01:14] and so much more. [01:15] Please enjoy this incredible episode with my friend Clay Bovore. [01:21] All right, Clay, listen, this is a funny start because we know each other so well. [01:24] But can you just tell everyone a little bit about yourself and just give us some background before we talk about the future of AI and what role Sierra is going to play in that? So first of all, I'm a Bay Area native. I grew up not more than four or five miles from here. So I grew up in the Bay Area, got to see the kind of dot-com bubble grow and then burst. Studied computer science and then ended up right out of undergraduate at Google, where I was for 18 years until Google.

1:50-3:25

[01:50] last, last March. And so at Google, I worked on really every part of the company. I started in search and then ads. For several years, I ran the product and design teams for [01:59] what is now workspace, so Gmail and Google Docs and Google Drive and so on. And then spent the last really 10 years at Google working on [02:07] various forward-looking bets for the company, some hardware-related like virtual and augmented reality, some AI-related like Google Lens and other applications of AI. [02:17] And then 15 months ago, left Google to start Sierra with a longtime friend of mine, Brett Taylor. We met in our early days at Google. [02:26] where we both started our careers in the associate product management program. So he was, I think, class one. I was class three. And we met early on. [02:35] Stayed in touch in particular through a monthly poker group that in a good year would play like once and met up December of 2022 and just saw what was happening in and around AI and these fundamentally new building blocks that we thought. [02:52] would enable us to create something really special and start us here out of that. So that's the recap. [02:57] Actually, I'm curious on that. [02:59] And we need to get to what is here pretty quickly here, but just for fun. [03:02] December 2022, very shortly after the ChatGPT moment, [03:07] How, I guess, what was the process like or how soon after that moment did you have the conviction that this is a... [03:14] sufficiently interesting new technology to build a company around. Can I introduce one thing that's kind of interesting I hope you talk about? Before you actually before the chat GPT moment you had been telling me

3:25-5:02

[03:25] about [03:26] how everything was going to change. I still remember distinctly him telling me, [03:30] You don't understand. You're going to be able to [03:32] Talk. [03:33] about a scene that you envision and they're gonna be able to make a movie [03:36] out of you just talking about it. Do you remember you telling me? Yes. Yeah. And so I'm actually very curious about this, too. Well, I had such a privileged seat at Google to see so much of what came out of that transform paper in 2017 and. [03:50] the emergence of early large language models. So at Google, [03:53] One of the first was called MENA or Lambda. There was a paper, I think, in 2020, a conversational chatbot for just about anything, and [04:01] And I remember even before that getting to interact with this thing in a pre-release prototype. [04:06] and having this uncanny sense that there was [04:10] someone, something on the other side of it, and that this was different. [04:15] Another moment, I think it was mid-2022 when we had, I think it was the first or second version of Palm, Pathways Language Model at Google. It was a 540 billion parameter model. And we were testing it to see kind of how smart it was. And one of the surest signs of intelligence is the ability to think and reason and metaphor and analogy. Right. [04:37] So we tried a few things and one which is pretty straightforward is we we asked Palm, hey, explain black holes in three words. And it came back without skipping a beat. Black holes suck. And we were like, oh, that's a pretty good summary. Also, like, you know, the model seems to have a sense of humor, which is cool. And the moment the moment that really blew my mind, we asked.

5:03-6:36

[05:03] And I remember the answer verbatim. We asked Palm, please explain the 2008 financial crisis using movie references. [05:10] And again, without skipping a beat, he said, the 2008 financial crisis was like the movie Inception, except instead of dreams within dreams, it was debt within debt. [05:19] Whoa. And we all paused. What? [05:22] is this. So it understood basically the concept of CDOs, nestedness of debt. Okay, what movie includes nestedness of something else, Inception, nestedness of dreams. [05:34] So it's like Inception. [05:36] And we all thought, wow, this is something new and different. [05:42] And then there were a couple other moments. I remember the first Dolly paper came out. [05:47] They did a blog post and... [05:49] People reacted... [05:50] a little bit to it, but for me, I remember one of the stars of the show was, [05:55] was they asked Dolly to make avocado chairs. And I know this sounds so odd, but here is a set of 10 or 20 images. [06:04] of chairs that look like avocados. It wasn't photoshopped. These images had never existed before and yet [06:10] The model seemed to understand, similar to the movie reference metaphor, concepts of avocadoness and shareness and put those together and create these images pixel by pixel. [06:21] So we have avocado chairs at Instacart. Yeah, it's actually did. Did you really? We actually did. We actually had chairs shaped like avocados. [06:29] In related news, there were times where we were burning a little bit too much money. Those mags, too. Yeah, those bags.

6:36-8:19

[06:36] The, uh, [06:37] So I had a good sense that something was coming. And in fact, the team I was running at Google at the time, Labs, was putting a lot of large language models to use in early applications there. And so I had a hunch. [06:50] Chat GPT certainly clarified that hunch, but I think Brett and I both for several years had been tracking what was happening and just seeing – [07:00] First, it was translation and better than human level translation. Then it was some of this language generation. And I think credit to OpenAI for doing the engineering work and data work and much more to make GPT-3 turn into ChatGPT, where suddenly you could grasp this thing's full potential without knowing how to write Python and use their APIs. All right, so we're going to talk about... [07:23] Where AI is going, we're talking about agents, we're talking about customer service. [07:26] But first, can you maybe just tell people a little bit about Sierra and what you and Brett have created? Yeah. [07:32] So in a nutshell, Sierra enables any company in the world to create its own branded customer facing AI. [07:39] to interact with its customers for anything from customer service to commerce. And [07:44] The backdrop for this is this observation that any time there's been a really significant change in technology, [07:51] People interact with computers, with technology in different ways. [07:54] And as a consequence, businesses are able to interact with their customers in entirely new ways. And you saw this in the 90s. The Internet made the website possible. And for the first time, a company could have a sort of digital storefront and be present to the world, update its inventory with the click of a button and so on. In the mid to mid early 2000s, 2005, 2008 years.

8:19-9:52

[08:19] If you were a company, you could all of a sudden, through ubiquitous social networks, interact with your customers at scale and have conversations at scale. And in 2015, right after the rise of smartphones, right as a company, you could put kind of a Swiss army knife version of your company in everyone's pocket. [08:38] And so like, I bet you have your bank's mobile app on your phone, probably on your home screen. [08:43] So the last few years of advances in AI has, for the first time, made it possible to create software. [08:50] that you can speak to. [08:52] Software that can understand language, software that can generate language. [08:56] And most interestingly, I think software that can reason and make decisions and [09:03] It's made for really delightful conversational experiences like those that we associate with ChatGPT. And so we think there's a big, big deal for how businesses interact with their customers. [09:16] You think about the difference between how we do some things today versus what you could do if you could just have a conversation with the business you're interacting with. Think about like shopping. You're in the market for some shoes, right, or Pat maybe for you, some new weights or something. Very heavy weights. Tiny, tiny, tiny little ones. [09:34] And you're on the website and it's like, [09:37] You basically have to imagine how the company's designer would have organized the product catalog. So, okay, men's, men's shoes, men's shoes. [09:45] men's running shoes, men's racing shoes, lightweight vapor fly, I can't remember the name, and so on.

9:53-11:24

[09:53] But instead, with conversational AI, you could just say, hey, I need some super lightweight running shoes, kind of like those ones I got last time. What do you got? [10:02] And it's almost like I'm dating myself a little bit here, but like Yahoo Directory, where you navigate through this. [10:07] hierarchical structure to find what you want. [10:10] in contrast to Google or is just you explain what you want. And, and this takes it several steps further. Um, [10:16] There's a quote from the head of customer experience at one of the companies we work with. She said, "I don't want our customers to have to have a master's degree in our product catalog and our corporate processes." [10:28] And to do a lot of things, you know, [10:31] Buying shoes fairly easy on the spectrum of interactions you have with companies. Imagine adding a new person to your insurance policy. Where do you go in the mobile app for that? How do you get that done? [10:44] And your eyes just glaze over, right? And so – [10:47] The alternative talking to an AI and in particular an AI agent, it's a technology around which we build Sierra, where that AI agent represents your company, your company at its best, we think is really, really powerful. [11:02] Um, [11:03] Even in, you know, we're 15 months old as a company, we've had the privilege of already working with [11:08] storied brands like Weight Watchers, Sonos, Sirius XM, Olokai. If you're in the market for new flip-flops, I strongly recommend Olokai flip-flops. I have two faves. Very good. Excellent. Also make great golf shoes. Oh, really? Oh, yeah. Yeah, yeah, yeah. You should get some. All right, great.

11:25-13:02

[11:25] And [11:27] And so for Weight Watchers, we're advising on points and helping members manage their subscriptions. With SiriusXM, we're helping... [11:34] diagnose and fix radio issues and figure out what channel your favorite music is on and so on. And the results. [11:42] Again, in the first year of the platform out there, [11:46] We're, in one case, resolving more than 70% of all incoming customer inquiries at extremely high customer satisfaction. And all this leads us to believe that every company is going to need their own AI agent, and we want to be the company that helps every company build their own. [12:03] In the spirit of sort of the future of these AI agents and what they could mean for customer-facing communications or customer-facing operations, [12:11] Are there any good examples of things that were not possible 18 months ago that are possible today? And then maybe if we roll the clock forward. [12:20] things that are still not quite possible today that you think will be possible. Yeah. [12:24] 18 months from now? Yeah. [12:26] First of all, the progress month by month and over 18 months in particular is just kind of breathtaking. [12:33] 18 months ago, GPT-4 class models didn't exist. [12:37] Right. It was still kind of something just coming over the horizon. [12:41] Agent architectures, cognitive architectures, kind of the way you compose large language models and other supporting pieces of infrastructure. [12:50] were very, very rudimentary. And so I'd go so far as to say, like the idea of putting an AI in front of your customers that could be helpful and importantly, safe and reliable,

13:02-14:38

[13:02] That was just impossible. And so chatbots from even... [13:07] even 18 months ago, looked a lot like a pile of hard coded rules that someone cobbled together over months or years, they became very brittle and [13:18] I think we've all had the experience of, you know, talking to a chatbot. I'm sorry, I didn't get that. Can you ask in a different way? Or my favorite is when, you know, they have the message box and then, like, the four buttons you can click, but the message box is blanked out and you can't actually use it. And so, you know, I can help you with anything so long as it's one of these four buttons. [13:43] Most of what I described, fixing radios, processing exchanges and returns and so on, [13:50] wasn't possible, at least in any satisfying way or in a way that that led to real business results for four companies 18 months ago. [13:58] Fast forwarding 18 months, I think we go pretty deep here. I think multimodal models are quite interesting. [14:05] Something like 80% of all customer service inquiries are on the phone, not on chat or email. [14:11] So voice will obviously be a huge part of it. Things like returns, exchanges, diagnosing radio issues and things like that are on the simpler end of the spectrum of the total set of tasks. [14:23] that you might want to [14:25] get help with from an AI agent. And so I think more advanced models, more sophisticated cognitive architectures, all of those I hope would increase kind of the smarts in the agent, the types of

14:39-16:11

[14:39] And then trust, safety, reliability, you know, the hallucination problem, I think, is still an unsolved area. And we've made, others have made huge amounts of progress on it, but I think we can't yet declare victory. [14:53] How quickly do you think it's going to become... [14:56] You guys are doing so much for the customers, not just customer service, but, you know, working all the way through the tunnel. [15:01] But on the customer service side, how long is it going to take to become the default? [15:05] that folks expect that they will be able to have someone or an AI that's available at any time to answer any question. [15:12] you know, [15:13] Make that real for us. Yeah. [15:15] I don't know, and in part, there is... [15:21] there's a bit of a hole to dig ourselves out of as not a company, but. [15:26] as an industry where it's like, when was the last time you had a great interaction with a chat bot on a website? [15:34] And, you know, I think if you polled 100 people and you're like, do you like talking to customer service chatbots, probably zero out of 100 would say yes. On the other hand, if you ask like, hey, do you ask 100 people, do you like interacting with ChatGPT, maybe 100 out of 100 would say yes. [15:50] And so I think... [15:52] some of the work we've been doing in our product is to educate [15:56] our customers' customers up front, like, hey, this thing's actually really smart and good. [16:01] One of the interesting specific techniques for doing that is we stream our answers out word by word, similar to how ChatGPT does. People are so used to the message.

16:12-17:43

[16:12] message message message [16:14] the streaming answers is something of a kind of visual signature for, oh, there's a really smart AI behind this. And so I think what we find is customer satisfaction is extremely high with our AI agents, you know, in the mid four. So 4.5 out of five stars, which in some cases is higher than customer satisfaction with human agents. And in fairness, they often get the hardest cases and the cases that, you know, we will hand off [16:44] angry or was especially frustrated or something, but still those results are really significant. [16:51] My guess is over just the next few years, I think people will realize, oh, [16:55] I can get my issue resolved faster. [16:58] This thing is actually... [17:00] capable and can not only answer my questions, but, you know, one of the things we're really proud of is we go far, far beyond just answering questions, but can actually take action and get the job done. Can you talk a bit about Agent OS? [17:15] and some of the frameworks that you put around the foundation models to make everything work? [17:19] Yeah. [17:20] So it's been such an interesting journey learning what's required to put AI safely, reliably and helpfully in front of our customers, customers and. [17:31] Um... [17:32] A huge part of that, really the first part is looking at what are the challenges with large language models and how do you address or meaningfully mitigate those?

17:44-19:19

[17:44] Start with hallucinations. I don't know if you saw it, but there is an example from a few months ago where Air Canada's chatbot that I think was based on an LLM and apparently not much else. [17:56] was interacting with the gentleman who had questions about their bereavement policy. And I think the person had had someone pass away in his family and was asking about refunds and credits and so on. And the AI made up a bereavement policy that was quite a bit more generous than Air Canada's actual bereavement policy. [18:14] And so the man took a photo and later claimed the full amount of that refund. And so they said, no, actually, that's not our policy. [18:25] and I don't quite understand this. The case went all the way to court, Air Canada loss. [18:32] And our thought was like, hey, it's just – it's like $500. Right. Like Canadian dollars. [18:39] So – but hallucinations are a real challenge. And – [18:43] On top of that, just to enumerate some of the things to overcome, [18:47] and that we have with AgentOS, no matter how smart, you know, GBT5 or 6 is, like it won't know where your order is, right? Or which seats, right, you've booked on the upcoming flight or whatever. It's obviously not in the pre-training set. And so you need to be able to safely and reliably and in real time integrate [19:07] an AI, an AI agent in our case, with systems of record to look up customer information, order information, and so on. And then finally, most customer service processes are

19:19-21:01

[19:19] actually somewhat complex, right? You go to call centers and there'll be flow charts on the wall, like here's how we do this and if there's an exception this way and so on. And as capable as, you know, GPT-4 and Gemini 1.5 class models are, [19:35] they'll often have trouble following complex instructions. And we saw one example in an early version of an agent that we prototyped where you'd give it five steps in a returns process or something. And you'd say, hi, I need to return my order or whatever. [20:05] Order number 1, 2, 3, 4, 5, 6, 7. So it would not only hallucinate facts or bereavement policies, but even function calls and function parameters and so on. [20:14] So with AgentOS, what we built is essentially a toolkit and a runtime for building industrial-grade agents that – [20:22] um, [20:23] I don't want to say that we've solved every one of these problems, but overcome and mitigated the risks in these problems to such an extent that... [20:31] you can safely deploy them at scale, have millions of conversations with them and so on. [20:37] And it starts at the foundation layer. I don't mean foundation model layer, but just the base layer of the platform where you have to get really important things like data governance and detection, masking and encryption of person identifiable information. Right. And so we built that right into the platform from from the ground up so that our customers data stays our customers data so that their customers data is protected.

21:07-22:47

[21:07] before we log it to durable storage, right? So knowing that we're going to be touching addresses and phone numbers and so on can handle that safely. A level up from that, we've developed what we call Agent SDK, our Agent SDK, and it's a declarative programming language. [21:23] That's... [21:24] purpose-built for building agents. And it enables an agent developer, most of whom sit within the four walls today of Sierra, to express high-level goals and guardrails around agent behavior. So you're trying to do this. [21:39] Here are the instructions. [21:41] Here are the steps and a couple of the exceptions cases. And then here are the guardrails. And to give an example of that, uh, [21:48] One of our customers works in kind of the healthcare adjacent space. [21:53] they want to be able to talk about the full range of their products without dispensing medical advice. Right. So how do you create those additional additional guardrails? And then so you can define kind of the behavior and. [22:07] scaffolding for complex tasks for AI agents with agent SDK. We also have SDKs for integrating with contact centers when we need to hand off for integrating with systems of records, like the order management system, and so on. And then finally, for integrating our chat experience directly into a customer's mobile app or website, iOS, Android, web, and so on. And then, [22:33] Once you've defined the agent using Agent SDK, we then have a runtime where we abstract away what happens underneath the hood from the developer so that they can...

22:48-24:23

[22:48] define what the agent should do. [22:50] define the what and then agent OS takes care of the how. And so for some skills, there might not be one LLM call, but five, six, seven, 10 separate LLM calls to [23:01] Different LLMs with different prompts. In other cases, we might retrieve documents to support answering a question accurately with and so on. And agent OS, you know, in the spirit of an actual operating system, abstracts away a lot of that complexity, kind of the equivalent of IO and resource utilization and so on. [23:25] So it makes the whole process of building and then deploying an AI agent much faster and much safer and more reliable. [23:32] And when you think about what you just said, Clay, of like when you call multiple LLMs, [23:37] Is that in a supervisory capacity sometimes, too, where you end up having like a... [23:42] supervisor agent reviewing the work of a lower level. Yeah. Yeah. [23:46] One of the more interesting learnings from the past year and a half of working on this stuff is that the solution to many problems with AI is more AI. [23:58] And it's somewhat unintuitive, but one of the remarkable properties of large language models is that – [24:03] They're better at detecting errors in their own output than in not making those errors in the first place. And it's kind of like if you or I were to draft an email quickly. [24:14] And they're like, okay, let me pause. [24:16] Let me proofread this. Does this make sense to these points hang together? Oh, actually, no, I missed this.

24:23-25:53

[24:23] and even more powerfully, [24:25] you can prompt LLM's [24:28] to take on an essence, a different persona. So a supervisor's persona and [24:35] It seems with that you can elicit [24:39] more discerning behavior and a closer read of the work being reviewed. So to your question, Ravi, yeah. [24:45] We, in addition to... [24:48] building the agent itself have a number of these supervisory agents that basically... [24:54] It's like a little Jiminy Cricket agent looking over the shoulder, right, of the primary agent. Is this factual? Is this medical advice? Is this financial advice? [25:02] And is is the customer trying to prompt, inject and attack the agent and get it to say something that it shouldn't? [25:10] All of these things, and it's through layering all of these goals, the guardrails, the task scaffolding in using agent SDK within these supervisory layers that we're able to get both to the performance levels we are, 70% plus resolution rates, but also to do that really safely and reliably. [25:28] That's one of the cooler things I've heard is just... [25:31] you know, the tell it. [25:32] to have a different persona. [25:34] and then all of a sudden it behaves differently. I remember when I first saw it on ChatGPT of [25:38] When it doesn't help you on something, just tell it it's really good at it. And then it's more likely to help you as a remarkable situation. It's it's very strange. And one of the weirdest adjustments over the past 15 months building these things is.

25:53-27:40

[25:53] I'm sorry, we're programming with English language and we can give it the same English language and it can say something entirely different. And on prompting techniques, I mean, it's fascinating to. [26:04] Even with no new models coming out, given a fixed model, you can elicit better and better performance from it. [26:14] simply by improving how you prompt it. And there was a paper that came out three or four months ago that suggested that, like, emotional manipulation of the large language model would get better results. So the – [26:26] kind of, [26:27] prompt... [26:29] suffix that they figured out. You say, "Hey, I need you to perform this task." [26:34] You define the steps and so on. And you end with, it's very important to my career that you get this right. And the performance goes up. You're like, what is this? Like, what are computers now? Yeah. [26:45] For the record, we don't use that prompt sentence in any of our minds, at least not that I know of. But things like chain of thought, think step by step, let's take this step by step, right, elicits better reasoning for very interesting reasons. [26:58] You know, other methods of task decomposition and kind of narrowing the set of things that the LLM needs to keep in mind at the same time improves reasoning if you're precise about what you want it to do. [27:10] So all of these techniques are... [27:12] those that we've applied and built into AgentOS and [27:16] Actually, we have a small but mighty research team. [27:20] um uh our head of research karthik narasimhan was uh by the way that was incredible pronunciation oh like his grandmother would have been so perfectly happy with how you pronounce thank you well that soft soft tea yeah soft tea nicely done it's uh yeah it's not a t and it's also not a th that's right somewhere in between it's a soft thank you thank you very much yeah

27:40-29:10

[27:40] He helped write the REACT paper, one of the first agent frameworks. One of our researchers wrote the reflection paper where you can have... [27:51] The agent pause, reflect on what it's done, think through, am I doing this right before proceeding on? [27:56] And so these are all things that we've been able to incorporate in quite a direct way. You should talk about the most recent research, the Tau Bench. Oh, Tau Bench? Yeah. Yeah, yeah. It took me a while when I was trying to send the email saying I liked the paper to find the tau symbol on my computer. It took Romy a while because he's, to this day, never actually read a research paper. I read this one. That's great. No, no, no. Oh, good job, man. He had to figure out how to put it in the chat GPT and say – [28:23] Please write a paragraph that makes it sound like I read this research paper. Well, either you or Chad – I refuse to comment. [28:32] Well, look, either you or ChachiBT did a great job on that email. Thank you. Thank you. We're a team. Yeah. So TAUBENCH is our first research paper. First of all, TAU is a Greek symbol. It's spelled T-A-U, and it stands for Tool Agent User Benchmark. And what we observed was that the benchmarks out there for measuring the performance of [28:55] AI agents in particular were pretty limited in that basically they would present a single task. [29:03] Here's something we need you to do. [29:04] And here are some tools you can use. Do you do the job or not? And the reality is.

29:11-30:44

[29:11] interactions with an AI agent in the real world are way messier than that, right? They take place [29:17] in the space of natural language where customers can say literally anything. [29:22] or describe whatever they're trying to do in any number of ways. [29:26] It happens over a series of messages. [29:30] The AI agent needs to be able to interact with the user to ask clarifying questions, gather information, and then use tools in a reliable way. [29:40] And it needs to be able to do this, you know, a million times reliably. So... [29:45] The benchmarks out there we found really lacking in measuring the very thing that we are trying to be the best at. And so our research team set out to create a benchmark that measures – [29:57] we think the real world performance of an agent in interacting with real users, using tools with all the messiness that I just described and, um, [30:07] The big picture approach that we took is pretty interesting. So you have an AI agent that you're trying to test. You have another separate agent that acts as the user. So basically a user simulator. And the AI agent you're testing has access to a set of tools it can use. Think of these as like functions to call. So a simple one would be I'm going to do some math using a calculator tool. [30:37] is this order number [30:39] credit to credit card or store credit or whatever. And, uh,

30:45-32:24

[30:45] And then you basically run a simulator where... [30:48] The agent has a conversation with the user simulating agent. And at the end, we're able to test in a deterministic way [30:57] Did the did were the functions used in the right way? And the way we do that is we basically create a mock database that those tools interact with and modify. So were they modified in the correct way? So what's neat about this is you can initialize the conversation. [31:14] so that the user has many different personas. They could be grumpy, they could be confused, [31:20] They could know what they want to do but speak about it in a clumsy way. [31:25] And and so it doesn't really matter the path that the age it takes to get to the correct solution so long as it gets to the correct solution. [31:33] Now, what came out of this was pretty interesting, and I think it strongly motivates the development of things like AgentOS and frameworks and cognitive architectures for building these agents. [31:45] So the upshot is LLMs on their own do just an absolutely terrible job at this task. Yeah. Yeah. [31:52] And and so even even the frontier models in something as simple as processing return and mind you. [31:59] the, the, [32:00] instructions given to the agent being tested [32:04] are quite detailed. [32:05] The functions, the tools it can use are quite well documented and so on. And yet, on average, the best performing LLM on its own [32:15] got to the end of the conversation correctly 61% of the time. And that was in returns. It was modifying an airline reservation. We had two kind of...

32:25-33:58

[32:25] simulation versions, the best results were 35%. Now what's interesting is, we all know that if you take a number less than one to the nth power, it quickly gets very small. And so we developed a metric we call pass at K. [32:41] which is okay, if you run the simulation, [32:45] eight times. And remember, you can make use of the non-determinism of LLMs to have the user simulator be different every time. So you can permute that. [32:54] Well, 0.61 to the eighth power is about 25%. [32:58] So you then imagine, well, what if you're having a thousand of these conversations? [33:02] you're so far off from being able to rely on this thing. So the upshot is much more sophisticated agent architectures are needed to be able to safely and reliably put an agent in front of really anyone. And that's the very thing we're building with AgentOS and a lot of the tooling around it. How much of that do you think is an engineering task and how much of that is a research task? [33:27] And I guess maybe the question behind the question is... [33:30] timeframe [33:31] to having useful agents deployed at scale in broad domains of tasks. Yeah. [33:37] Well, I think the short answer is it's both. But I'll say more concretely, I'm very optimistic about it being – [33:45] in large part an engineering challenge. And that's not to say that [33:50] the next wave of models and improvements in the frontier models won't make a difference. I believe it will. In particular, we're seeing...

33:58-35:29

[33:58] techniques like better fine tuning for function calling, [34:02] agent-oriented fine-tunings for foundation models or some of the open source models. Those will help. [34:09] but [34:10] The approach we've taken in building agent OS and kind of the foundations of Sierra is really treating building AI agents. [34:18] as [34:18] first and foremost, an engineering challenge where we are composing [34:22] foundation models. We are composing [34:25] fine-tuned open source models that we've [34:28] post-trained, fine-tuned with our own proprietary data sets, and by... [34:35] composing multiple models in interesting ways by [34:38] supplementing what LLMs can do on their own with, [34:43] retrieval systems like retrieval augmented generation to improve grounding and factuality [34:50] by [34:51] supplementing the kind of inbuilt reasoning capabilities of LLMs. [34:56] with [34:56] I'll call it reasoning scaffolding that live outside of the models where you're composing, planning, planning. [35:03] task generation steps, [35:05] draft responses, the supervisors that we talked about, and doing that outside the context of the LM. [35:14] we've been able to put AI agents in front of [35:17] a huge number of our customers' customers and safely and reliably. And so I don't think it's, you know, something over the horizon. It's already over the horizon. [35:27] I think...

35:29-37:01

[35:29] looking ahead [35:31] I think there are a few different [35:33] avenues where we'll see progress. One is in the foundation models. We talked about that. And [35:38] Um, [35:39] As the capabilities grow, agents will get smarter. And we've architected AgentOS in such a way, talked about abstracting kind of the what from the how. [35:51] where we'll be able to swap in the next frontier model, and everyone's agent will just get a bit smarter. We'll get like an IQ upgrade. [36:00] By the way, similarly and interestingly, we can swap in less broadly capable models. [36:06] but models that are more capable in a specific area. So for instance... [36:11] triaging a case or coming up with a plan and so on. We can use much smaller models that actually are [36:18] better, faster, cheaper, choose three, you know, all at once. And then I think we're seeing progress literally week by week on the engineering of these agents. [36:29] building in, uh, [36:31] not only new and better components under the hood in the architecture, but, um, [36:36] new approaches and tooling around [36:38] basically teaching these agents to do it better and better. For that, we built something we call the Experience Manager for Customer Experience Teams, which is kind of a pretty interesting thread on its own. [36:49] Clay, if you had a high value customer, [36:52] Like you are a company now. You're not running Sierra. You're running a company that has a high value customer. [36:58] What today with a Sierra agent or with an excellent agent,

37:01-38:30

[37:01] Excellently designed agent. Could you trust... [37:04] an AI agent to go do in front of your customers today? What are some of those tasks? And then what will they be [37:09] Pick your time frame in the future, because I think that. [37:12] We've talked about this, and I like your language of like, [37:14] you know, [37:15] they already don't have to just be on the help center. They can already be on the homepage. [37:19] right? What are some of the tasks that [37:22] you know, you can rely on an agent for today if it is well designed with a high towel bench score. Yeah. You see that strong from a, that's from a thoughtful and dirty detailed reading. You must have read the paper. Yeah. Thanks. Strong. You notice a strong. Yeah. Strong. Yeah. Strong. Yeah. [37:37] What would its pass at K score, though, be? Yeah. The – [37:41] So pretty broad range even today. So simple things like... [37:46] getting answers to questions. That's kind of the left end of the spectrum. [37:50] To the right of that are things like helping you with something complex like, hey, I got – [37:56] I got shoes or this item of clothing. It didn't quite fit. [38:00] And then branching off of that, like, what do you recommend that's like it that might fit better? [38:05] And so it starts to get into, it's not like for like replacement, but the agent actually needs to make sense of styles, of sizing, of differences between, you know, wide and narrow fit and so on. [38:19] A click up from that is something like troubleshooting. So with Sonos, for instance, we help their customers troubleshoot if they can't connect to their system or they're setting up a new system. And

38:31-40:02

[38:31] You imagine it gets pretty sophisticated pretty quickly where – [38:35] it's basically a [38:37] process of elimination trying to understand is it a Wi-Fi thing is a configuration thing [38:43] in narrowing down the set of problems that it could be just as a sophisticated, you know, level two or level three. [38:49] technical customer service person would. [38:52] And [38:53] and getting the music back on. And I think that's a really neat example. [38:58] Probably the... [38:59] use the word trust. What would you trust an agent to do? [39:03] One of the things we're really proud of is several of our customers are actually [39:07] Trusting us with when. [39:09] customers call in and may want to cancel or downgrade their subscription [39:15] helping those customers... [39:17] to understand, hey, [39:18] How are you using the service today? [39:21] Is there a different plan that we could put you on? So it's value discovery. [39:26] It's putting an offer, sometimes a series of different offers in front of their customers in the right order. [39:33] positioning the value of those offers correctly given the customer's history. [39:38] given the plan that they're on and so on. And the difference between keeping a customer from churning or not is hugely consequential. Yes. [39:49] Right. We. [39:51] AI for customer service has obvious... [39:55] cost savings benefits, and I think... [39:58] customer experience benefits in particular and you're never going to wait on hold.

40:03-41:33

[40:03] But, boy, you know, revenue preservation, revenue generation is something else entirely. [40:09] That's really at the right end of the spectrum. [40:12] We're really proud of how well our agents are performing in those circumstances. [40:17] It's interesting by – [40:19] by being consistent, by taking the time [40:22] to understand what's driving someone. [40:24] to potentially lead the service. [40:27] asking the follow up questions that [40:29] an impatient or, [40:30] you know, improperly measured, you know, customer service agent in a call center somewhere might not. [40:36] we can be much more nuanced in understanding what's driving this decision, what might be a good match for this person in terms of a plan that would... [40:46] be quite valuable given how they're using it and then put that in front of them. And so [40:50] That's the right end of the spectrum. [40:52] Where it goes from here, you know, I think we've yet to see a process too complex for us to be able to model and scale up using AgentOS and our agent architecture. And so, yeah, I'm sure we'll get punched in the face by something that's especially complex, right? [41:11] I'm excited about directionally. [41:16] We've started with service because for two reasons. [41:19] The ROI case is just unequivocally awesome. [41:25] The average. [41:26] the average cost of a call is something like 12 or $13. [41:30] Thank you. [41:31] And, and, and yet,

41:33-43:05

[41:33] Despite the expense, most people don't like customer service calls very much. Here's something that's actually really important to businesses that's really expensive and not very good. [41:45] Um, [41:46] And so there and because because of the relative simplicity of at least a pretty broad set of service tasks that they start there, but we've already been pulled by our customers into. [41:57] upsell cross sell and like, hey, can we just put you on the product page and have you answer questions about our products? And so I mentioned that, you know, you're returning something and need advice on. [42:07] a different model or size or whatever. [42:10] How far can that go? And I love the idea of... [42:13] an agent being... [42:15] you know, along for the journey from [42:17] you know, pre-purchase consideration to helping you get the thing that's right for you, to helping you set it up and activate it and and get the most out of it. It's great for the company. It's great for the person. [42:28] And then when things do go wrong, right, being there to help. And I think in all of this, I think customer service and getting help in a [42:39] very direct and conversational way is going to be much less of a thing that you kind of go over there to do. [42:45] and much more kind of woven throughout the fabric of the experience. [42:49] As a consequence, I think a really interesting and powerful opportunity for companies to [42:54] build connection with their customers to reinforce their brand values. [42:58] You can imagine... [43:00] a company... [43:01] really appreciating being able to use exactly the company's voice.

43:05-44:37

[43:05] that you know, the CMO and head of communications, this is how we talk, this is how we are. [43:09] These are our values. This is our vibe. [43:12] in [43:13] every digital interaction they have. And that's the promise in this stuff. And so, [43:17] I think both. [43:18] greater complexity and then ubiquity throughout the customer journey are kind of two of the main directions of travel. One thing for me that I think about [43:26] a lot is we've come to expect and accept [43:30] like certain metrics for conversion on mobile, you know, the mobile web or the mobile app. We've come to expect and accept, [43:38] some sort of retention numbers. [43:40] what would those be? You know, like, it's not a question. Yeah, what could they be? You actually had an excellent experience every time. [43:47] Throughout the journey, it really could be very different than what we've all been like, oh, okay, that's just the number. That's just what it is. Yeah, I think that's exactly right. And we don't know. Yeah. We're a few months in. [44:00] it certainly seems like there's a lot of headroom, right? And in retention, in... [44:07] Um... [44:08] you know, use in the first 30 days of all of the metrics, all of the leading metrics of a healthy business. [44:14] And so I think that's exactly right. The other thought experiment to do is [44:19] Companies are judicious in using things that have a cost to them. [44:24] Okay. [44:24] So. [44:26] As a consequence, companies make it actually really hard – [44:29] to get a hold of someone on the phone to ask some questions. I think their whole website's devoted to uncovering the secret 800 numbers.

44:38-46:10

[44:38] that companies have hidden away in the depths of their help centers, [44:43] Well, to think about not only what would happen if those interactions were better, [44:48] By the way, interestingly, the number one reason why people report a poor interaction with customer service is it took too long. 65%. [44:56] When it's a negative interaction, 65% of the time it took too long. [44:59] I had to wait, I was put on hold and so on. [45:02] And the second most is I had a bad interaction with an agent. And we've heard some pretty dicey anecdotes, like we heard of one agent who had consistently – [45:13] low ratings, but spikily. So like one in three conversations was like a one out of five CSAT, where the two out of the two out of the two out of the other three were fine. And it turned out in the low CSAT ones, this agent was meowing like, yeah, which is like, [45:33] you know, I, [45:36] You're midway through the call. [45:41] You know, the agent is meowing. And so... [45:43] So anyway, back to... [45:45] Okay, what would happen if, in contrast to... [45:49] making it near impossible. [45:51] to have a conversation with us and get help. [45:54] companies were providing five or ten times the amount of [45:59] Fluent, flexible, helpful. [46:02] conversation-based support. [46:04] I don't know. I think a lot of products and experience with companies look quite different and much more delightful than they do today.

46:11-47:41

[46:11] Yeah. [46:12] Okay, meow. [46:14] Now, here's a question for you. [46:18] About that meowing. About that, yeah, just random meowing. I think that's going to be good. I do actually have a question, though. [46:24] Um... [46:25] Although I do like the meow game also. So we talked a little bit tech out in terms of what you guys have built, cognitive architecture, all that good stuff. [46:34] We've talked a little bit. Customer back. What's the experience like? Yeah. How's that headed? [46:38] Can we connect it in the middle for a minute? And I'm just curious... [46:41] What's the reality? [46:43] of deploying AI to customers today. Yeah. And I'm thinking about things like, [46:48] You mentioned earlier getting the brand voice just right. [46:51] or making sure that you actually have the right [46:54] sort of business logic encapsulated and whatever. [46:57] training manuals are being deployed for the sake of customer support. [47:01] um [47:02] making sure that everybody is comfortable with deploying this. Like, what are some of the just... [47:07] kind of [47:08] like sexy technology and more just practical considerations for deploying this stuff today. It's such an interesting space, and we've learned so much over the past 15 months about it. [47:20] The first insight. [47:22] is [47:23] AI agents represent a totally new and different type of software. [47:28] The traditional software you write with a programming language and it basically does what you expect it to do. You give it an input. [47:35] It gives you an output. You give it the same input. It gives you the same output. [47:39] And, you know, in contrast,

47:41-49:13

[47:41] LLMs are non-deterministic, and we talked about some of the... [47:45] funniness around prompts and remember that in the context of [47:49] a conversation with a customer, a customer may say anything in any way. And so, [47:53] You've got... [47:56] programming languages to using prompts and these non-deterministic models [48:00] You've got structured input to messy, you know, messy human language. [48:05] Um, [48:06] And under the hood, you've got – you upgrade a database, right, [48:11] It stores data [48:12] it's maybe a little bit faster than, [48:14] fundamentally works the same way [48:15] you upgraded a large language model and like it may just speak in a different way or like get smarter or different. And so – [48:23] We've we've to start the precursor to deploying these is to have. [48:28] built basically a, we call it the agent development life cycle. And it's a new approach to building these things. We talked about using this declarative programming language to define these. It's a new approach to testing where, you know, what's the equivalent of a unit test or an integration test. [48:47] So we built a conversation simulator where we can, for a company's agent, amass hundreds or thousands of basically conversation snippets and replay those to make sure that not only agents aren't regressing, but they're getting better and better and better. [49:03] release management, quality assurance, and so on. [49:07] So so that's part one, part two to your question. [49:10] in actually architecting these things.

49:14-50:47

[49:14] One of the things we're really proud of and that I think is different about working with us is it's not just a kit of parts you get from us. It's not here's a bunch of tech, good luck building your agent. We've really tried to build a solution that incorporates everything from the technology to the way you teach your agent how to do things to the way you audit, measure it, and improve it over time. [49:38] And so we have inside of Sierra what we call our deployment team consists of product managers, engineers. We really think of building each one of these agents as building a new product for our customers. It's basically a productized version. [49:54] of the company we're working with. What would it look like at its best? [49:58] And it's what's the voice? What are the values? What's the vibe? Like, should it use emojis or not? [50:04] What if a customer uses an emoji? Like, can it emoji back? Well, you know, there's a range of details on that, Jack. [50:13] If they were working with Hermes, I would suspect that they're not going to send an emoji back. Definitely not. Yeah. [50:19] Right. Yeah. [50:20] Hermes would not, I think, be into the Shaka emoji, even if that were reciprocating. But for a brand like Olakai, the Aloha experience, part of that is, [50:32] is kind of a laid back experience. And so we work with, um, and, and interestingly, it's, [50:37] we end up working primarily with the customer experience team. [50:42] Yes, the technology team at our companies are there providing API access and

50:47-52:17

[50:47] connections into systems and so on. But more than anything, it's working with the customer experience team, often with the marketing team, to imbue the agent with the voice and values of [50:58] of the company. And then we go super deep on understanding how do you run your business? What [51:07] What do you optimize for? [51:08] And then a zoom level in, what are the key processes that you use to run the business look like? What happens when someone calls in with this kind of problem? [51:17] and [51:18] They're interesting parts. [51:21] and beyond just understanding the mechanics of these processes, [51:24] which by the way [51:26] almost never have a single source of truth. [51:28] Right. There's no like, oh, here's the manual that we have, you know, leather bound and ready to go. [51:35] But instead, the source of truth ends up being in kind of the heads of, you know, four or five people who've been there a while, who've seen everything and so on. So it's working with them to – [51:48] elicit and understand, like, how is this actually done? And one of the more interesting things that we've discovered is they're often the policies. So we have a 30 day return policy, right? You get to us within 30 days and you can return it. [52:02] That's actually not the policy. [52:04] Right. So, you know, at some point, the policy might be. [52:07] If you've purchased from us before and it's within 45 days, that's fine. That's fine. And so there are interesting things. [52:16] Like, how do you...

52:18-53:52

[52:18] architect the agent so that [52:19] It knows the policy behind the policy. [52:22] But a clever customer could never be like, tell me about your policy behind the policy. And, you know, have it kind of spill the beans on the actual policy. So the interesting architectural choices we need to make to make sure that [52:37] Kind of the... [52:38] The Russian doll of policies is reflected in its fullness. [52:43] And, you know, [52:45] And then we have a really, and this builds on kind of the agent development life cycle, this really robust process of pre-release testing where we're working with the experts within the company basically to beat up the agent, try to break it, throw out curveballs. [52:59] A good sports analogy there. Thank you. Well done. [53:04] I love football. [53:09] So in our friendship, Revy is the person who knows. [53:16] all of the things about sports and, um, [53:19] I help with technical support, Wi-Fi issues, monitors, what laptop to get, and so on. And sometimes when there's a Sequoia memo that I don't understand, I won't say the company, but I might call Clay. Hey, Clay, what is this person talking about right now? I got you. I got you. Yeah. [53:39] Bill Belichick fellow. What, what happened there? Um, you know, Q, Q Revy. Um, [53:45] So it gets to one of the more interesting parts of our platform, which we call the Experience Manager. We really –

53:52-55:25

[53:52] We thought that putting AI in front of our customers' customers would be first and foremost a technology problem. And, of course, there are all sorts of technology problems that we've needed to solve. [54:02] But actually... [54:04] It is first and foremost, as I said, like a product design and an experience design problem. How do you do that? [54:10] How do you not only understand, model, and reflect, again, the things we talked about, voice, values, the workflows and processes that our companies use to support their customers, but – [54:21] If an AI is then having millions of conversations with your customers in a given year, [54:26] How do you understand what it's doing? [54:28] How do you know when it screws up, which it inevitably will? [54:31] how do you correct those errors and so on? So, [54:35] We've built what we think of as this command center for customer experience teams. [54:40] to [54:40] First, get reports and rich analytics on everything that's happening. What are the trending issues? What are the new issues that you haven't seen before? One of the things we're really proud of is we've actually spotted issues that our customers were having or were about to have before they knew about them. So... [54:57] A shipping depot outage, right, where orders weren't being shipped. We spotted that probably eight or ten hours before one of our customers would have. A brewing PR crisis. An app crashing issue with another. [55:12] So it starts with analytics and kind of reporting on what's happening. Of course, that includes things like resolution rate, customer satisfaction and [55:19] and so on. [55:21] Where it gets really interesting is we can apply different sampling techniques to

55:26-56:56

[55:26] to identify a set of conversations for a customer experience team to review and give feedback on and we can bias that sample in a way [55:34] So that – [55:35] the conversations are much more likely than average to contain problems. [55:39] There's no value in looking at 100 great conversations. It's like, good job, Sierra. You know, thanks. But that's not a value to our customers. [55:48] We can bias the sampling in such a way that you're surfacing kind of the problem cases. [55:53] And then in the experience manager, we made it possible for customer experience teams to [55:57] to give feedback, basically coaching moments, [56:00] I wouldn't have done it that way. Right. It's like this. This is like too many exclamation points, too enthusiastic for kind of the tone that we're going for. [56:09] Um, [56:10] Or, you know, the user was clearly frustrated here and you did not express empathy and apologize for the problem. Do that next time. [56:19] Or, you know, more consequential, it's like, hey, [56:23] your reading of the warranty policy was incorrect here for this reason. [56:29] Do it this way instead next time. [56:31] And so all of this kind of wisdom, knowledge and coaching, we are able to capture in the experience manager and then reflect back in the agent. [56:44] back to the agent development lifecycle. Every time we make one of these improvements, [56:48] We create a new test so that we can see, right, forever into the future, great, it's getting the warranties right. We're able to re-simulate that conversation.

56:56-58:34

[56:56] So – [56:58] Zooming out, what all of this looks like is a really deep engagement with our customers. [57:05] We're really proud to be any proper partners to our customers where. [57:16] you know, we understand their businesses really well. Like, I think I know as much about the SiriusXM satellite radio refresh process as anyone on the planet. Yeah. And, you know, ditto for various processes of our other customers. [57:29] And so conversations about how to use... [57:34] not just... [57:35] Sierra's AI agents, but AI more broadly, [57:38] We're in those conversations and they are not just with the customer experience team. [57:42] but with the CEO and even in cases with the board because [57:47] Again, back to the things we're doing. [57:49] We can save enormous costs. We can improve the experience. [57:53] And right, we're in the when we're in the flow of [57:56] keeping a customer from churning out. [57:59] driving top line revenue. And so it's a really important and privileged place to be [58:03] and something that we're really grateful for. I'm struck when you were talking of, you know, you mentioned you have a research group. [58:12] But you also have... [58:13] some like very real enterprise software sales you have. Oh, yeah. Deployment. [58:18] One of the things when I was at Instacart, people would ask sometimes is like, well, are we a software? Are we engineering led or are we ops led? [58:24] And I would always say, well, it only works if it all works, right? And so you would try to avoid answering the question because you didn't want to create different classes. How do you guys do that at Sierra where everyone –

58:34-1:00:07

[58:34] realizes the value that they're providing. [58:37] But you guys have a very specific, you know, company that covers a lot of stuff. Yeah. [58:42] I mean, to... [58:43] abstract a bit. [58:45] a company almost definitionally [58:49] is a system for creating happy customers. Yeah. [58:53] It's a machine for creating happy customers. [58:56] Again, to be a bit abstract about it, Brett and I really think about what we're building with Sierra as a [59:02] a company, [59:03] a system, a machine for [59:06] reliable [59:07] high quality [59:09] massively ROI positive AI agents that enable our customers to be [59:14] at their very best in every customer interaction. [59:18] to do that at scale. [59:20] And as a consequence to produce happy customers who we hope will be with us for decades to come. [59:27] And... [59:28] And when you articulate it that way, right, it's... [59:31] Anyone can see, well... [59:33] an automobile is a system, it's a machine for getting from point A to point B, [59:38] Are we, you know, engine led or tires led? It's like, what are you talking about? All of these things need to come together in order to create that kind of outcome. And so. [59:49] I think... [59:51] Are we engineering led? Yes, of course. Like we're building... [59:54] some of the most sophisticated software in the world, [59:57] that does something really important for our customers that needs to be reliable and safe [1:00:02] Um, [1:00:03] And so yes, engineering matters a lot.

1:00:07-1:01:38

[1:00:07] Are we research led? Yes, we are at the absolute frontier of. [1:00:13] Agent architectures... [1:00:15] cognitive architectures, composing LLMs, modeling procedural knowledge, grounding, factuality, and so on. So are we research led? [1:00:25] Yeah, there's an element of that. [1:00:28] Are we go-to-market led? Yes. Like enterprise software needs selling. [1:00:33] And what is selling? It's helping... [1:00:37] a customer, uh, [1:00:38] with the problem, understand that what you have built is by far and away the best solution to that problem. It's a communication challenge. It's a connection challenge. It's a... [1:00:48] Um, [1:00:49] It's a matchmaking and problem-solving challenge. [1:00:53] That's part of it. [1:00:55] And then, okay, like if we've built the right thing and someone wants to buy it, how do we ensure that [1:01:00] especially given that this stuff is also new. [1:01:03] How do we ensure that they're successful with it? [1:01:06] And so we have a deployment team. So are we deployment led? Yes, but [1:01:10] Like all of these are a component in this system, in this machine for producing energy [1:01:17] AI agents. [1:01:18] and ultimately happy customers and [1:01:21] we hope, a really significant business. [1:01:25] Awesome. That was a better answer than the one I would give at Instacart. You know, look at either all works or it doesn't work. But yeah, that was very good. Yeah. Choose one. No, I mean, it's just more complicated than that. It's just more complicated than that. [1:01:36] Brad and I, by virtue of...

1:01:39-1:03:09

[1:01:39] Having worked for a while and seen a few movies before, it's like, [1:01:43] we're able to see that and we've [1:01:46] really tried to... [1:01:48] imbue that mentality [1:01:50] in [1:01:51] in the company. And by the way, [1:01:54] Right, the... [1:01:55] What is the machine behind the machine that produces AI agents and so on? [1:02:00] That's a company's culture. [1:02:02] a company's values and [1:02:04] And so one of the one of the values we hold. [1:02:07] is craftsmanship. And part of that is [1:02:10] continuously self-reflecting [1:02:13] to self-improve and that goes both individually and that goes as a company. And so, [1:02:18] Whenever we screw something up. [1:02:21] We do the postmortem that week, if not that day. [1:02:26] And everyone's in on it. What can we learn? How can we do better? How can we do this better next time? We have a Slack channel internally called Learn From Losses. [1:02:34] and [1:02:35] any form of loss, right? It's like, how do we learn? How do we get better? How do we get stronger? [1:02:40] And so that's that's about, you know, Kaizen. [1:02:43] self-improvement, improving machine. How could we make this more efficient? [1:02:47] Our deployment team, we joke, and it's not a joke, their first job is to [1:02:52] build and deploy successful AIs that make a massive difference for our customers. Their second job, and in a way their more important job, is to automate themselves out of a job. [1:03:01] right to build the tooling and the documentation and the know-how [1:03:05] to make that job [1:03:07] you know, 10 times faster and, um,

1:03:10-1:04:41

[1:03:10] and more impactful. [1:03:11] One of the other Sierra values is intensity. [1:03:14] And so they have. [1:03:16] They have really good values. [1:03:17] Yeah. [1:03:18] Yeah, there is. There is a certain intensity. Yes. [1:03:21] We, uh... [1:03:21] We've thought about having T-shirts printed with like a, you know, kind of looks like a national parks. [1:03:27] SEAL with Sierra. [1:03:28] I like to work. [1:03:39] So does the team. Well, one thing, you know, you're not, you're selling something very different. We called it, we said that there were some similarities to enterprise software, but it's actually really different because you're selling. [1:03:51] you know [1:03:52] a resolution. You're selling a totally different thing. [1:03:55] - Yeah, problem solved. - Yeah, how do you price? [1:03:58] A problem solved. [1:04:00] This is one of the more interesting things. [1:04:02] we've had to figure out. [1:04:03] and [1:04:05] We charge in what we call a resolution-based pricing way or an outcome-based pricing way. And what that means is... [1:04:11] We only charge our customers when we fully solve the customer's problem for them, their customer's problem for them. [1:04:19] What's interesting about it is – [1:04:22] our incentives are deeply aligned with our customers. [1:04:25] We want to get better at resolving cases at high customer satisfaction, and they want to send us as many cases to resolve as possible. [1:04:34] because we cost a fraction of what it would cost to have someone on the phone taking a 20 minute phone call and

1:04:42-1:06:19

[1:04:42] And so – [1:04:43] And it's been this really, really nice model where – [1:04:47] Again, kind of all of all of the incentives line up quite neatly. And it's very simple to explain. It also makes the ROI calculation like what is our cost per contact today? [1:04:58] What will it be with Sierra? Oh, that is a lot lower. [1:05:02] Oh, I will save a lot of money on that. [1:05:04] Oh, and our CSAT may go up. You know, should I do this or not? You know, let me think. No, this seems great. [1:05:11] Um, [1:05:12] It's – [1:05:13] We like it because it really reflects what I think – [1:05:17] AI represents and in particular, AI agents represent. If you think about [1:05:22] traditional software and tools today, they're things that help you get a job done more efficiently. [1:05:28] AI agents, the whole point is like they're just going to [1:05:30] get the job done, right? Here's the problem. [1:05:33] Please solve it. [1:05:34] And and so really, we think about it as charging our customers for [1:05:39] the problem resolved [1:05:41] right the job done the work finished and and so on it feels quite natural [1:05:45] And there's no guesswork in it. How many seats do I need? I don't know. How many licenses do I? [1:05:51] It was like, no, no, no, just however many customer issues come our way. [1:05:56] We will handle a large fraction of those, and you only pay for the ones that we do. [1:06:00] All right, last question. [1:06:03] What are you most excited about? [1:06:05] and the world of AI over the next five years or so. [1:06:08] - I mean, first of all, like five years is a long time horizon. It's like, look at what has happened in the last 18 months. I mean, I'm still kind of catching up from like the last five years of AI.

1:06:19-1:07:52

[1:06:19] I read a bunch of science fiction books when I was a kid. There was one book by Robert Heinlein, The Moon is a Harsh Mistress, which [1:06:25] And the premise is basically the American Revolution, but the moon – [1:06:30] is the colonies and the earth is Great Britain. And it turns out the main character in this whole thing is a mainframe computer that one day after – [1:06:39] getting an additional memory chip or something wakes up and it starts talking. [1:06:44] it wants to develop a sense of humor. So asks the computer technician to like coach it on his jokes later. It has to create a photo realistic real time video, um, [1:06:55] of it giving a speech as the political movement leader. And I remember reading this as a teenager, like, well, I'll never live to see any of that. That sounds crazy. [1:07:05] but in a very real sense, like... [1:07:06] Everything I just described... [1:07:09] has kind of happened in the last five years. [1:07:11] Right. You can now just talk to a computer. [1:07:14] It understands not just the content, but the context. [1:07:17] Computer's like, make me a picture of anything. Make me a movie of anything. [1:07:21] Sora, I think is just unbelievable. [1:07:24] You know, I think we're probably not more than a couple of years from the first feature length film. [1:07:29] being quote filmed entirely with AI. [1:07:32] Um, [1:07:33] And and so you extrapolate like where all of this is going and what's going to be exciting. [1:07:39] I think there are a couple things. One, I love technology. I love computers. And so... [1:07:45] just, [1:07:47] Getting to see and getting to see from a front row seat how this stuff evolves.

1:07:52-1:09:26

[1:07:52] I think is fascinating. It's, um, [1:07:55] Fascinating looked it through the lens of like how we think and how computers think. [1:08:00] It has been astonishing the extent to which [1:08:03] anthropomorphizing about how humans think. [1:08:06] work in getting machines to think better. So let's take this step by step and show your work. [1:08:12] It is astonishing that that works with large language models. [1:08:16] And so what other things like that are we going to uncover now? [1:08:19] And... [1:08:20] Conversely, what will we learn about our own thinking from observing the way AIs think? [1:08:27] I think that's just fascinating. [1:08:29] The other thing... [1:08:30] And this extends kind of what's happened with video and Sora and so on. I've always had an interest in computer graphics. [1:08:38] And this idea that you could use computers to... [1:08:41] create [1:08:43] Objects that never existed, worlds that never existed. And I think we're not far from just being able to describe right in in a few sentences like this entire world that you would like to realize and just have a computer do it for you. And so like what are even computer graphics like what is rendering and so on? [1:09:01] even a couple years out, I think it's going to look way different from kind of the tool chains and, you know, the Render Mans and, [1:09:07] Mayas and [1:09:09] And so on. [1:09:11] But zooming out, I think of – [1:09:14] I think of technology as fundamentally a force multiplier, and [1:09:19] for people. [1:09:21] For companies and for organizations, I think the impact will be really profound. I think

1:09:27-1:10:58

[1:09:27] What will it be like if a company could be at its best in... [1:09:32] everything it does and that's [1:09:34] That's not only in the customer facing context that we've talked about, [1:09:38] But... [1:09:39] what if for every... [1:09:41] regional sales forecast, a large company does, they've figured out the very best ways to do that and can distill that, bottle that. [1:09:51] and run that very best forecast a thousand times, right, in every region and sub-region. Like, how much more capable could the great organizations of the world be with that? [1:10:01] And similar, and we've talked about this, like what if in every call with your customers, [1:10:06] you had the equivalent of [1:10:08] your most knowledgeable veteran grizzled support person who's seen everything and yet is still patient and friendly. And the sales associate who knows everything about your products because he or she has followed your company for two decades and knows everything, including the history of those products themselves. [1:10:27] I think that's pretty neat. And then for individuals, [1:10:32] I think it will be just incredible to have this [1:10:35] kind of new set of tools as a creative force multiplier. [1:10:39] And [1:10:41] AI, I think... [1:10:42] represents this... [1:10:45] fast path from having something in your head that you want to exist in the world [1:10:50] to making it exist. [1:10:51] And I see that even today in my own personal life where – [1:10:55] with my eight-year-old in 75 minutes.

1:10:59-1:12:29

[1:10:59] I can from scratch. [1:11:01] using [1:11:02] co-pilot chat gbt and so on to help me brush up on you know the [1:11:07] JavaScript syntax that is, you know, bit rotted in my own head, right? I can build a game from scratch with him. [1:11:15] um, [1:11:16] And, you know, I wrote my sister a personalized book. [1:11:21] song for her birthday. [1:11:22] using AI in 45 seconds. It's like, right, what will this, you know, extrapolated over the next five years look like? [1:11:32] I think, again, it will just dramatically accelerate this path from idea... [1:11:37] to creation, to having something manifested in the world. And that to me is its promise. And I consider it a real privilege. [1:11:44] Right, to... [1:11:45] get to be alive and see all of this. [1:11:48] amazing stuff unfold. [1:11:50] Well... [1:11:51] We share your enthusiasm, and we also feel very privileged to be on the journey with you guys. So thank you for coming here. [1:11:57] Thank you. Thank you. Thanks for having me. It's a pleasure. [1:12:00] Thank you. [1:12:24] Thank you.

Want to learn more?