Episode 26

From Session Replays to Autonomous Improvement: Shipping the First AI Product Engineer with Milana

April 24, 2026 · 01:00:01

Nitay (00:01.884)
Rohan and Raghav, thanks for joining us today. We're very excited to have you guys. Why don't we start with, give us kind of a brief intro on your experience and what led you to where you're doing today. Rohan, you want to start us off?

Rohan Katyal (00:13.793)
Yeah, absolutely. So thanks for having us. Quick introduction, up in India, went to college there, been known Raga for a long time. We've been friends for like 10, 15 years at this point. Been talking about starting companies for a long time. So it's finally happening. That's exciting. Spent some time at George DEC, studying computer science design, and then a bunch of different companies, Yahoo, Yelp and Meta. Throughout these, I noticed a very interesting trend. A lot of the growth,

that growth work that people focus on is like very operational. part when coming up with insights is very scientific, but figuring out solutions to a problem was more art than science. It seemed like a bunch of people get in a room and they brainstorm and start thinking about like, some people are dropping off on a particular screen. This part is very scientific. People have numbers, walk into a room with like metrics and data. And when coming up with solutions, it's just...

spitballing a couple of different ideas and then you just go and A-B test and figure out which one's working in a scientific manner as well. This got me thinking about potentially building an AI growth team because the process of coming up with solutions to fix all of these drop off points and problems that people were discovering didn't seem scientific at all to increase velocity. And that's when I got talking to Raghav and Raghav was at the same time thinking about a different idea, which I'll let him.

give his version of the story and how our ideas came together and we ended up at Milano.

Raghav Sethi (01:41.962)
Yeah, thanks for having us. Excited to be here. Yeah, I think where we started, where I started was I've been building apps and websites for a long time since I was a kid. Just kind of did that all the way through undergrad. Got to the point where I'm like, I can do this, but I was building on some of these technologies that I couldn't build, like App Engine and some of these databases. Decided to go deep on distributed systems, did that during my grad school and then went to Meta that I was working on.

Presto, which that code is now in Trino and Starburst. But it was like this big open source SQL query engine that was running on just like hundreds of thousands of machines at Meta. So that was lot of fun. But I always, I never liked SQL as the interface into this. I was always annoyed that like, hey, this is such a powerful thing. And I hate that I have to learn this like really obscure language to actually query it. That's what drew me to a small startup called Airtable at the time, where I was the 20th engineer.

did a bunch of infrastructure, then product stuff, then management, built out a bunch of really great products there. And then one of the things that kept coming up was it was really hard to understand why users were doing various things. we, Airtable as a company, spent a lot of time trying to do user research, doing data science, really trying to understand what makes users successful, what makes users unsuccessful, how do we improve this product, what are the parts of the product that are really challenging to use.

And this is one of those things that you kind of lose empathy with over time, because everybody at the company is really good at using their own product and they really understand all the downfalls and all the complex bits. So you just lose touch with customers over time. I always really hated this. So that's been in my mind for a while. Then I was doing AI stuff at Airtable. shipped a bunch of really great AI features with like GPT 3.5 and 4, so a couple of years ago.

Then I was playing with an idea which was this AI journaling app, which was like a B2C app. idea was that everybody would have an AI therapist or coach, and we would just onboard people in person. We would just really pay attention to people onboarding in person. And I don't know if you guys have ever had the opportunity to do user research, but it is horrifying. It is a horrifying experience because you're just like,

Raghav Sethi (04:08.588)
I thought I understand my users, but I really don't. They're just way weirder and more interesting than I thought. And then you go into a little bit of like a pit of despair and then you're like, okay, then you get out of it and you make the product better and the product then starts to work. So user research was always like really important to me, really valuable. And even when we got to some scale, we just couldn't do that anymore. And that just like really bothered me. Like I really want to understand the customer experience at scale.

logs and events are just not cutting it because they can't capture the richness of what's going on. So that's what got me thinking about session replays as a concept, which is, of course, a very old concept, but potentially that there is an opportunity there to put some AI into session replays and build something great. Build something that people that understands what are the parts of the problem, the product that aren't great, what parts of the product need to be improved. And as Rohan said, like our ideas came together really neatly, which is

I was like, hey, I think we have an opportunity now to understand what about any product is broken, is problematic, is confusing. And also we have the opportunity to actually go fix it or ship an A-B test to try and improve it automatically. So that's how we started working together. So it came together quite neatly.

Nitay (05:29.411)
Very cool. And I want to jump on a couple of the things that you guys said, because you said some very interesting tidbits there. Ron, you were talking about kind of making solutioning a science, which we'll come back to in a minute. But first, Raghav, it sounds like you have some interesting stories around the most surprising things you've seen from user research. Do you have an example or two?

Raghav Sethi (05:44.974)
Yeah, for sure. I'll do an easy one, is easy for everybody to understand. So we built this onboarding flow in LightPage, was just like, tell us who you are, tell us a little bit about yourself so we can personalize this AI assistant to you. And then we were doing end-to-end encryption and so on and so forth. So there was just three or four steps in this onboarding, which seems like a very straightforward and linear onboarding. But we found that when we were doing this research,

everybody who hit this page that was about end-to-end encryption would immediately pause and they would just stop on that page and be okay, wait, I don't understand this. And they would X out of the onboarding. And we hadn't built the experience for when you X out of the onboarding because we didn't expect anybody to do that. It was just like this blank screen with nothing on it. So that was one of these things that's just hard to predict over time. It would have been hard to predict ahead of time.

And this was one of many, many things that we learned over the course of building this product, particularly because this one was a bit more privacy sensitive. But at Airtable, I keep going back to this, is, I don't know if you guys are familiar with Airtable, but Airtable is basically an app building platform, and to be very reductive, it looks a little bit like a spreadsheet, and the spreadsheet has these columns in it. And if a user starts typing in a column name, if they start typing in status or...

or like, I don't know, note or something else, you know what they want to do. If you're an AI, you can look at this and be like, I know exactly what this user wants to do. And you can actually compare whether they were successful at like doing the thing they wanted to do with their stated intent. so you can figure out the intent from the screen and you can also figure out the success from that same recording. So that was like a really powerful concept to me, which is like, hey, like we actually can define success. We don't have to have these proxies.

for success anymore. And we don't really have to have proxies for intent anymore. And this is a bit of a tangent, AI products are incredible in many ways. But one of the ways they're incredible is that the intent is really clear. Anytime that you put a text box in your product, I know what you mean. I know what that user wants because the user will type it in. So the opportunity is immense in, OK, can I compare what the user wanted with what the user actually got?

Nitay (08:09.07)
And I love those examples. The first one I find is always an interesting one around like how and where you put in security from a user experience perspective. Obviously you want your and everybody builds their products to be as secure as possible. But especially in the onboarding sense, like I've found it to always be this tricky thing because on the one hand you want to give them the trust and confidence that this product secure. On the other hand, if you give almost too much information you actually scare them away because they're like, well, wait, I'm sharing my data and this, I'm bad. Like it's actually this like interesting kind of fine line.

Raghav Sethi (08:34.507)
Exactly.

That's X-ray.

Nitay (08:38.668)
And the other example he gave there is very fascinating as well. And so it sounds like that kind of feeds into what you were saying, Rohan, in terms of making solutioning a science, because this is where some of these signals come out of. so why don't you guys tell us a bit about like, kind of why hasn't this been done already? As Raghav, think you mentioned, like, you know, session replays and that idea of like capturing every click and mouse movement, like that's not a new thing that's been done for a while. What is it that's kind of new now under the sun, so to speak?

Rohan Katyal (09:07.263)
Yeah, one on the, so I worked on experimentation at Yelp. We built out the experimentation infrastructure, the platform, and the entire goal was to, the not start of the entire program was to figure out shift, increase the volume of shift experiments. Cause there was a lot of tooling and infrastructure in place to just run experiments, but we, and Yelp had like played around with a bunch of them, but the overall decision velocity was not actually increasing. vast majority of experiments were just being

not shipped and also the average number of experiments being run by every single team was also not increasing. What was interesting learning during building the experimentation platform was the teams. It wasn't the teams that had the most data resources or that was the most data savvy who were running better experiments that got shipped. Just teams who were running more experiments.

were more likely to hit their goals, period. It was more about execution velocity and momentum, rather than let me go dig into the data, come up with the best insight. Is this hypothesis well written? Is the p-value, does all of this make sense? Does this correlation make sense? It was more about getting just more done, more shots at goal. And then we set up the entire experimentation program around this idea of you getting teams to increase their velocity of shipping experiments. And you could just see

teams who were not hitting goals, which is crushing their goals just for tooling to have more shots at goal. We didn't give them better data tools. We just had them increase overall velocity and that led to more experiments being shipped. I think that's kind of what led to this idea of what we're exploring right now on how do we give this superpower to all the new teams who are like, native teams who are using coding agents who now have abundant capacity.

And to your point about session replays, think there's two things. One, I think if you would have asked us about 18 months ago, we just couldn't have done this. Like vision LLMs weren't at a point where you could point it toward the session replay and extract interesting insights, whether a user is struggling, ask it to take the role of a user and analyze the session, see where users struggle. We literally get LLMs to extract user emotions out of sessions from like a bunch of different perspectives.

Rohan Katyal (11:24.127)
And this was just not possible before. 18 months ago, we would have not been able to do it. Only now are we in a spot where we can use session replace as a data asset, which does not feel like CCTV footage. Like how many session replace can you truly watch, right? It's like you have to watch hundreds to find one meaningful thing. As a concept, it wasn't usable before. So one, Vision LLMs, I think, unlocked this capability for us. And secondly, I think the beauty of using

session replays and the fact that it's such a mature technology is it takes minutes to install and your entire product is instrumented in some sense. You don't need to talk to a data scientist or an engineer to decide when you're launching a new product, what parts of the product you'd have to measure. Let's go add events now, let's go add logs now. Once you put in this pixel, we start ingesting all your sessions and everything end to end is now logged always. So think it's a combination of like timing that it became possible and now

we can use this data asset without getting teams to put in more work every time they launch some.

Raghav Sethi (12:26.988)
Yeah.

Nitay (12:27.339)
I'm curious one thing actually that like, cause you reminded me actually of a funny story I had, is I spent some time at Facebook and while I was there for a while, I actually worked on the A.B. testing platform and Facebook was famous for having at least for its time, like, mean, there'll be thousands, like tens of thousands of experiences happening at any given time. In fact, there was even a quote, a bug slash outage. don't know how you want to call it. It wasn't technically an outage. You guys may remember for a while, every single experiment was enabled for every single user.

because basically the actual flag, like A-B testing system, like the thing that puts people into the right cohorts and buckets, like it had a bug in it. And the amazing thing that happened was a couple of things. One, the site didn't crash. The traffic went 10x because this was at a time when one of the experiments was making videos, autoplay, and making and all this stuff. everything suddenly is like the page is 10 times more dynamic for everyone. And I remember you reminded me of the funny story of when that happened.

The question that that leads me to, which you reminded me of, which is, how do you, from a science perspective, say you enable this, and we'll come back to the actual kind of details around the session replays, because I think there's a lot of interesting nuance there. But like from the output perspective, say you do this right, the output is that I can now have, I can be like a Facebook or Yelp or whatever, AeroTable, and do these thousands and thousands of experiments and move super productively. How do you actually think about that? Because I remember from our days, for example, at Facebook,

it started to get hard for even a human to understand like, wait a minute, like what is the dependency or the relationships between this experiment and these five others? And are they also gonna be in this group? And I have to like, we literally had PhD statisticians coming in regularly into meetings just to say, no, no, no, this is actually okay because here we have law of large numbers and it'll be all right. But actually in this case, we need to do a long-term test and so on and so forth. And so as you kind of push this boundary of making it much more of a science.

Do you end up being like scale issues with the amount of experiments and how they work together?

Raghav Sethi (14:27.222)
Yeah, I can take that. And Rohan, please chime in. think you've probably done a bit more thinking on parallel experiments than I have. But I think the first thing I would say is I think the A-B testing is overstated, overrated in some ways and also underrated in some ways. So in the sense that I think Rohan will tell you a little bit about what the result of most A-B tests is. But at Airtable, for example, we ran a bunch of A-B tests.

But because Airtable is, I think it's obviously an overall very successful company and doing well. But one of the questions is, how much do you target your A-B test? If you, for example, are trying to improve Airtable for marketing or legal use cases or something else or engineering or product management, you actually want to target a smaller audience and you want to give them a specific experience that is customized and tailored to them. But there's a tension and a trade-off here, which you were talking about, a little bit, which is...

Okay, if you actually reduce the audience size, you need to run a longer experiment. And there's a point at which actually you slow yourself down because you are waiting for experiment results. And then maybe you're like, oh, actually I need to run multiple experiments. But then if you're running multiple experiments, you have to think very carefully about overlap and the effects of overlap and so on and so forth. So I don't think A-B testing is a panacea by any means. I think you want to run A-B tests. You want to make it very cheap and easy to run A-B tests.

But the goal for any A-B test should be it gives you signal very quickly, probably in like, I would say like a couple of years ago, it should give you signal in like a week or two. And now with the pace of software development, you actually want to give you signal very, very quickly within like a few days because you're able to move so much faster on actually implementing things that the A-B testing kind of needs to keep up. So you need to generate new hypotheses, generate new A-B tests, run them, get results, and then decide what you want to do. And all of this should happen.

it should be compressed from months to weeks and ideally shorter. But you need enough of an audience or large enough audience to actually get that time down and still be statistically significant.

Kostas (16:31.607)
Hey guys, I want to ask you something before we continue this very interesting conversation. So, okay, I'm sure like most of our people like our audience, like they have heard like the term of like A-B testing. But we mentioned some stuff like session recording and replaying, experimentation platforms. Let's talk a little bit about

Raghav Sethi (16:31.671)
Yeah.

Kostas (16:58.509)
these things and like give some definitions, right? Because not everyone like necessarily knows why we are recording. Like, first of all, what is a session? I think maybe it's like, people might instinctively think of, like sessions, what I'm doing like on Google analytics, like there is a session there, right? It is defined in a very like specific way. Maybe it's something different here. I don't know. And

Why do we need like whole platforms, right? Like Rohan was saying, like we had, I guess like you have like whole team like building a platform and maintaining a platform to be able like to do these experiments. Sounds like a lot of investment, right? So what is this? What does it mean? Like an experimentation platform? What is this thing and how they relate if they relate, right? Like this session recordings with, with the experiments.

Rohan Katyal (17:53.377)
I have a great story about platform, which I will come back to when Theragav is done. I'll come back to that.

Raghav Sethi (17:53.517)
Yeah.

Raghav Sethi (17:57.827)
Yeah. Yeah, I can talk a bit about session replays and then, Rohan, you should chime in on experimentation platforms. So let's start with the session. think a session is basically, it's kind of what you expect. It's a user trying to do something on a, let's say on a web page for simplicity or a web app. And obviously the concept of a session is like encoded into the HTTP spec. There's a concept called a session in a browser, there's a session storage, there's a session.

headers, there's a bunch of stuff around that. But a session is basically a user is trying to do something on a web app for a certain amount of time. And there's no clear definition of when a session ends. So for example, in our platform, we think a session ends when a user has been inactive for about 30 minutes. But a session basically encompasses everything that a user does on a website or a web app or anything for as long as they want, as long as they continuously do it and they don't have more than a 30 minute break.

Other people define this slightly differently, but the concept is basically that. And a session replay is basically, it's very interesting. People who have never looked at a session replay are just like, are you recording my computer screen? I've been asked by candidates, do you ask people for permission to record the screen? And I'm like, no, no, no, this is just logging. And it's just really specific logging. And the logging is the first log, the first log line is basically a snapshot of the entire HTML DOM.

So everything that's rendered in a web page is in the DOM. So if at any point, if you snapshot the DOM, if you copy the DOM into another tab or something, you can render that exact web page as it was at that point in the session. So you start with that. You capture the entire web page. And then you capture all the changes made to the web page. So for example, you capture a scroll. You capture a click. You capture a button changing from blue to green or a form being submitted. All of these things are just log events that are captured.

And this again is a very old technology. I tell people, if you have visited 10 websites today, probably three of them had session replays. This is very old technology. And really, you should just think of it as really, really high visibility, high granularity logging that doesn't miss anything. It's a way of logging that doesn't miss anything. And when you take all these logs and you put them together, you can basically produce what looks like a video of what the user is doing. It's not actually a video. is, but it looks like a video.

Raghav Sethi (20:22.102)
And one of the pieces of technology that we use is we actually have a vision engine that encodes part of this, extracts frames. There's a bunch of stuff that we do to make this tractable. But at the end of the day, this is just a log. You can press a play button, it looks like a video, but it's not. But you can do video-like things with it because of that. And this feeds into experiments in a clean way, which is...

This is all about understanding user behavior. Every product company is obsessed with understanding user behavior, is obsessed with trying to change user behavior in certain ways. So I think of session replays as one of the ways that you understand user behavior and then experiments as one of the ways that you have these hypotheses about what improvement, what change would have to user behavior and then use the experiment to actually test statistically whether you were right or not. Yeah, so Rohan, you should come in.

Rohan Katyal (21:13.537)
Yeah, coming back to the, do we need a platform? I remember the conversation at Yelp, before we started making an investment in this, one of the execs were having a conversation. Why do we need these many people? Do we need to invest so much in building a custom platform?

went back and I talked to a bunch of, and I kind of had a hunch that this was going to be the case. So went back and I talked, there was like, we had these like product reviews where teams would walk in to the execs and have these metrics and everything that shows up, talk about experiments. Everyone quotes like active users, impact on revenue. And then I went back and pulled the SQL query for everyone's definition of active. 10 teams had used eight different definitions.

Like they had a completely different definition of what a daily active or a monthly, at the metric level in the deck, it all looks the same, but everyone's accounting for like different metrics. Someone says like, you have to create five reviews. Someone says you have to create like X number of bookings or views or impressions. So we were just comparing apples to oranges to make trade-offs across the organization, which I think this with the exec team was like super helpful in understanding why do you need.

a platform which is like consistent and outside the core idea that you provide people with like shared capabilities that they don't have to build again and again and everyone can leverage, you know, platform essentially what everyone's built. The shared knowledge piece of having a platform, especially in a large organization is super high value. Like everyone now has the same definition of active. Everyone has the same definition of like what churn means. Everyone's walking towards the same goal.

This just makes it super easy for execs to make trade-offs. I think that's also where experimentation platforms and generally speaking other platforms also help to essentially accelerate not just the actual execution, but also like good decision-making at the end of the day.

Kostas (23:13.475)
Right, that's super useful. Going back to what Raghav was saying about the session recording. There are two things there. There is the instrumentation of the application that's, let's say, I would like to see like a new feature, I'm a product manager, there are metrics that I want to measure about this thing. So I'll...

create like my spec for this, I will also include there like the measurements, what I need, right? Like I want to, every time someone clicks here, I want like to send this event, blah, blah, blah. And like this way, in theory at least, right? Like you end up having some analytical capabilities like to understand the behavior. Now there is, what you're describing is a little bit more

Like in my mind, like when I hear that it's like, okay, like if this is the case, like we just like record the whole dome and we keep everything. We don't really need like before the fact, like define our metrics we can, or what we have like to track. Like we can do it after the fact because we have the whole state, right? Like, so we don't miss any information.

it kind of feels to me that like this co-exists. why is that the case? Like why we don't use, like, I don't know, like to me, it sounds like, it sounds like recording the whole dome is like superior at the end of the day. Right. why people still go through like the whole process of defining these events and like keeping track of these events and sending them back to the data warehouses and modeling them and blah, blah. Like all these things.

Raghav Sethi (25:01.71)
Yeah, so I'll answer the question in two different ways. So guess the first one is there is this concept of auto-capture that has also, I think, existed for four to five years, if not more. I don't know if you're all familiar with this company called Heap Analytics, but they kind of pioneered this thing of auto-capture. So anytime you click any button on the screen, it will be captured. So in theory, it's the same. At high level, it's the same thing as what we're doing. But the problem is just with rich applications and single-page applications and all of these

really dynamic behavior. What you would see in something like a heap is you would see, well, submit button clicked. But like, which submit button? There's like, which model was I in? What feature was I interacting with? So there's a lot of context that's missed typically with AutoCapture tools. Now you had, so that's one point. The other question you had was basically, I think it was basically like, why doesn't everybody do session replays? Or if session replays are a superior log, why don't people use that?

It's in some sense the same problem, is because the piece of data is so dense and it's not specified, it's really hard to answer questions. another way to think about this is like if you were the product manager building a feature, What typically what would happen is when you're like four weeks away from launch, you and the data scientist and the engineer would have this meeting where you're like, hey, we know that after we launch this, we will have to tell the execs how this feature is doing.

And then you'll try to define some reasonable sounding metrics for this. And then you will have the engineer go off and add three or four or five or 10 events that will capture what you think you need. And this works, right? So if you have a really strong and correct thesis about what good looks like and what thing you're trying to measure, you can 100 % measure it. The problem though is that you're missing everything that you hadn't thought about. Because logging, you're doing a bunch of logging.

But there is a huge amount of bias in A, what you choose to log. And actually, even more, even if you had auto-capture, what you choose to look at afterwards. now remember, you have in a session or in a user journey, have hundreds, thousands, like maybe millions of events, which events in what order are like indicative of a certain kind of behavior, right? Like there's literally an infinite permutation, like the infinite permutations of these, which ones do you pick? So the whole exercise is extremely biased.

Raghav Sethi (27:24.206)
And to your point, Kostas, about what we talking about before, the people who are most successful are people who have a lot of tribal knowledge. And in my, like one big source of tribal knowledge in a product company is people who have been around long enough and interacted with enough people on the go-to-market side, have interacted with enough customers, that they have this mental model of what they should look at, what they should track. And that's a part of the reason why these people are successful is because they're looking for the right things. They're instrumenting the right things.

The power of session replays and especially session replays where you can extract arbitrary things out is you can find things that you did not expect. You can look for things like confusion. You can look for things like people that were hunting too long to find a certain menu option. If you imagine the query, the SQL query that it would take to find users that are having to dig through menus to find a menu option, it would blow your mind.

It's just, it's an impossibly difficult query to write and nobody would ever actually write.

Kostas (28:26.159)
No, %... Sorry, Rohan,

Rohan Katyal (28:26.305)
And also to add something to that, to your point about why do people use events, there are certain things where events just make more sense. Let me say, if CFO wants to walk into an investigation call and say, we make X million dollars and buy a million users and you just need significantly higher accuracy, sure, get an engineer to go get in the product, invest time, make sure these logs are always up to date. Every time someone makes a change, your definition of activation changes.

whatever, like you get an engineer to do this and maintain super high accuracy. But for 99 % of the use cases, you don't need high accuracy. It's the same thing. The entire industry has assumed that 95 % is the confidence, the P-value you have to like chase, but you don't really need to chase that for everything. It really depends on like how critical this decision is. And you find if you're 80 % confident, that's fine. Like if the entire confidence interval is above zero, you're fine. And also now the physics of building something has changed. Before,

The bar, I think the bar was significantly higher for every single insight before because inch time was really expensive and not plenty. And right now teams have abundant capacity to build, which they're actually not consuming. So you don't have to get to like, this is plus 23.78. That's not the accuracy you need right now. You just need to know this moves the needle and it's a real problem.

Kostas (29:47.471)
Yeah, 100%. By the way, just like to add some from my experience because I've been like to work for a while like in a platform that's primarily doing like instrumentation like Router Stack. And we did have like from the beginning, like the capability like for people to just like record everything. Like, as you said, like we're talking about the, about heap and...

The problem was that actually from a growth perspective, it was good to have because it was helping like to onboard people really, really fast on the product, right? Because they don't have like two reason about what kind of, what should I go and track here? You just like click a button and everything flows into like your data warehouse. But

Raghav Sethi (30:38.722)
Exactly.

Kostas (30:41.581)
The problem you had then was that you're introducing a lot of churn because people were like, shit, like, what is this mesh of data that I have here, which is pretty much trying to represent DOM in a relational model, which is unbelievably hard. Plus you have the time dimension, right? So you really need to, you work like, you get yourself in the worst possible scenario, super noisy data.

hierarchical data without like a strict schema, trying to put them like interrelational model and then using SQL like to go and like model this like it's just like these people are never like going like to do that. So yeah, like people were like going back to instrumenting like with, with proper events at the end, right? Plus I would say there is like an engineering problem there, which is what kind of like store ads and query engines do I need to be able to

work with this type of data, right? These are not like star schemas anymore, where I just have like slowly changing dimensions. Like that's like a completely different thing. And it kind of reminds me a little bit guys, like what you are describing something that anyone who has talked like with cybersecurity folks, like they might have experienced. So you have like in cybersecurity, you have like this reality where you, have a lot of data coming in.

Right. And obviously there's like a lot of noise and you have like, kind of like two systems, right? Like you have the systems that they need to be very specific on what they are looking for. Right. But then you also keep all your own logs on like something like S3, and then you use something like Presto when the time comes to go and do this, really expensive long queries.

go and find the needle in the haystack because you don't want to lose any information. You can't afford that when you need it. So it of reminds me of a similar scenario where, yes, you have the very well-defined tracking to your point, Rohan. If you want to track signups, yes, of course, go and make sure you don't miss any signups. You shouldn't.

Kostas (33:08.301)
But then you can keep also like the whole DOM because storage is also like cheap and you can go and like do these things. My next question is, okay, if I understand correctly, like something has changed with that stuff and that's like LLMs, Why do we need LLMs that they can process like visual information when what we are storing at the end is like mutations on the DOM, right? Which is...

let's say code at the end is the data structure, right? And their mutations.

Raghav Sethi (33:40.687)
Yeah. Great. Yeah. And so this is a very fair question, right? I think it's a really good question. It's like the heart of what we do. So I'll say a couple of things. So the first is, what do, okay, actually let's start here. So what do LLMs give you? LLMs give you basically a way to structure over what you can think of it as dense data.

Rohan Katyal (33:42.341)
You're questioning our entire existence, Kaushtika. Questioning our entire existence right now. This is what Raghavan and I talk about every day.

Raghav Sethi (34:09.856)
Right? So your point maybe is like, do you need video or do you just need logs? Do you just need this DOM? What is the actual data representation that an LLM can reason over? So for what it's worth, LLMs are surprisingly good at reasoning about the DOM. You can try this experiment at home now. It's like take a piece, take a DOM, put it into, actually GPT-4 could do this, put it in a GPT-4. And GPT-4 can draw you a picture of what the screen would look like.

So the LLMs are trained, due to their training, they have the ability to actually reason about some of these things. But that plus the temporal aspect, I think they start to struggle a little bit. So that's why we think that visual is good, visual with timestamps, and then you kind of get into video is good, because you can extract actually not just snapshots, but actually behaviors. if you think of it like, the user, to think about the heap example, the user click Confirm.

but what did they click confirm on? To understand that, you have to go backwards in the log. You have to understand all the events that led up to that. So video and vision is actually a clean way of doing that. So that's one bit of this. And the other question I think you had was, well, okay, you have all of this, you have these logs, what is the actual best way to represent them or the most efficient way to represent them?

So actually, again, an experiment you can run is you take a DOM or a session replay, you just take literally the DOM plus that entire log, that has more tokens in it than a video of that same interaction. And the reason is that the video is actually a really good and efficient encoding of a user interaction because it is basically it's only capturing the changes, it's only capturing the relevant changes.

The fact that you're actually drawing the screen helps the LLM reason about what the software designer intended, what the user was seeing. So it's actually, I would say, it is a very good representation. It's a very compact representation, even more so than text. So for both of those reasons, I think video is good. We don't always use video. We sometimes use video. We use video plus screenshots plus we do a bunch of other fun stuff. We actually do Fusion.

Raghav Sethi (36:29.942)
of the video with DOM transcripts, to extract certain things from the DOM. So it is overall a complex thing, but conceptually, I think that actually video is a very, very good and powerful model for getting these kinds of things out of it. And Kausar, you said something very interesting, is you have all this data, like right before you said, you have all this data, how do you understand what it means? And the way that you actually do this is, in a sense, you do a just-in-time structuring of the data.

So one example that we talk about, we actually do something different production, but one of the examples is, let's say you ran a prompt on every session, where you're like, as a user researcher, tell me all the things that a user researcher would flag as potentially being problems in this thing. And instead of asking for just the LLM to give you text, you give it like a response schema. So you have it output like structured information about what happened. So you'll say,

highlight me frustration, highlight me confusion, highlight me various other things. So now you've taken the session replay and you have structured it. You have taken this dense thing, you've gotten structured data out of it. And now you can do all the other stuff that you wanted to do. But you never had to change the representation. And tomorrow, if you want to come back and say, hey, I'm a growth person, go look at this session replay and understand what should change, you can do that as well. So you do have to structure the data in some sense to actually produce statistical and aggregate results.

We also invest a bunch in semantic clustering so that we can do this. But at the end of the day, yes, you do have to structure it somehow. But the magic is that you can just in time structure the data for the question that you asked about, rather than having to think about the question first, produce only those events and logs, and then only answer the question that you thought about earlier.

Nitay (38:16.618)
There's a bunch of really interesting stuff you said there. One thing I want to touch on is, so this notion that you said, is that essentially if you have the DOM in order to actually really understand the context of like the submit button and what they clicked, essentially what you need to understand is actually like the bounding boxes and all these different things, which you don't get purely just out of reading the text and doing some pattern match. You get it because you essentially render and you visually see.

okay, this is where it's almost like one of these game engines doing physics collisions, right? You have to actually act it out to see, okay, this is where it kind of lands. And so one of the things that you kind of made me think of there is, given that you're doing this, like you said, just in time structuring and kind of this video analysis, so what parts of your engine do you find?

you need the determinism like as one concrete example, I imagine the stitching together of the doms into the render, I imagine that's done mostly by code, meaning you're not actually going to the alarm and saying, make me a video of this. But I imagine there's other parts where you do want an LLM to have its level of probabilistic kind of fuzzy matching to say, hey, go find me, look for these patterns and so on. So how do you kind of break down this engine into which parts need the determinism and they're like kind of.

static code if you will and which parts you want to be dynamic and LLM based.

Raghav Sethi (39:36.193)
Yeah, so you want to capture, I would say you want to capture as much deterministic information as possible and then use that to inform the non-deterministic bits. Because obviously video algorithms are getting good, there's a bunch of really interesting people even at South Park Commons that are inventing new techniques to extract data out of video. But I'll give you quick example. So we found this really interesting niche that we are very powerful at, which is analyzing agent behaviors.

Now, there's a bunch of companies that do agent tracing. But what we do is because we capture the entire session, we can actually understand the entire context of that. So for example, let's say the user is interacting with an agent on some website. Let's say Figma, for example. There's some agent in Figma, and you go in and you're asking the agent to do various things. Then the agent does some stuff, and then you go in to the right side, like the actual canvas, and then you make more tweaks to what the agent did. Now, what is this?

indicate. If you only looked at the agent trace, you wouldn't realize that the agent didn't get it exactly right. If you only looked at the canvas trace, you actually can't make sense of the canvas trace because the canvas, just imagine the logging that would take to make sense of the canvas trace. But if you have the entire session and you have the DOM, you can actually look at the whole picture. So we can look at exactly what the user typed in to the text box. Because of course, that's a DOM change, right? That's deterministic. We can capture that. We can capture reliably, like,

button clicks, can reliably capture like scrolling, we can reliably capture like what part of the screen was visible and what part wasn't. And we can have the LLM interpret the things that it does see in that context. So the interpretation I would say is where the LLMs become particularly useful and what our standard cost is like a bunch of the projection with the just in time understanding, that is what we use the LLMs for.

But you want to rely on deterministic data as much as possible because you want to produce a system that can reliably produce the same result and good results and the more determinism you have the better. So you rely on the LLMs to do the minimum work in some sense that allows you to produce the data that you want.

Nitay (41:48.859)
And we'll come back in a second too, because I want to touch on more kind of the, just-in-time structured data handling and all that. But, but there's one thing you said that I don't want to ask about, is, so there's this interesting kind of use case. sounds like you found of using session recording and replacing analytics on agents to find out what they're doing and to optimize them. Is there anything fundamentally different that you've seen about when you're doing this on an agent versus a human behavior that you've noticed?

Raghav Sethi (42:13.71)
Sorry, the thing I was speaking about was humans using an agent, humans using the chat agent in a piece of software that also has other parts of software in it. So user interacting with the Figma agent and also interacting with the Figma canvas, for example. So that's what I was talking about. But to your point, there's a lot of interest now in understanding how computer use agents use things because they look very different.

Nitay (42:30.139)
Okay, so it's the user doing both actions. Ah, okay, got it.

Nitay (42:39.685)
Mm-hmm.

Raghav Sethi (42:41.888)
Right? The computer, you can see when we look at traces and logs that from computer use agents, they look nothing like user. They don't look anything like humans because they're just clicking around. You can just see them click on random parts of the page that seem reasonable. And I think this is actually a very important area to optimize. Folks that we talked to in e-commerce that are customers, potential customers, they're very invested in figuring this out because they want their...

their stores and so on and so forth to be easily accessible by agents, understandable by agents. Some of this is like GEO, like generative experience, like optimization, where yes, you want your store to rank highly on chat GPD, but if you ask your agent to go buy you like a bag or, I don't know, some clothes, it should be able to actually go off and do that on your website or if you have an API like using your API. So I think there's definitely a bunch of interest in understanding how

how computer use agents use websites. And there's a bunch of optimizations that you might want to make purely for computer use agents.

Nitay (43:46.426)
Yeah, that's fascinating. can see that being an entire use case or product line of its own almost just to do that. Coming back to the structured data stuff. talk to us a little bit more about kind of what are the nuances and what you guys have solved there because it does seem like a very thorny problem of like at the end of the day when you actually get down to the bits and bytes of it, you have some row and some database that has the entire DOM or some sort of document kind of model.

And you need to make sense of all this and you need to be able to query and you need that LM to be able to utilize and so on. So how do you make that stuff actually work?

Raghav Sethi (44:20.928)
Yeah, it's a yeah, Rohan should I think talk about he literally like this is I would say it's like 30 % of Rohan's job is just like really understanding how to how to do this right. It is definitely more of an art than a science. I think it would become maybe more of a science as like the base models get better. But it is yeah, Rohan you want to talk about like the schema stuff and semantic clustering.

Rohan Katyal (44:42.721)
Yeah, I hope it remains more art than science. That's, that's, wanted to be really hard for people. But coming to this, so we've spent quite a few hours just watching sessions and then figuring out, I think it's it's it's a combination of this bunch of different like data that exists when you're looking at a session, we put it, there's DOMs, there's some metadata, there's some events and there's some LLM generated output.

And then this and all of this exists at every individual session level. And then there's also an aggregation across sessions. Let's talk about both. Let's first talk about the individual session. At an individual session, extracting information is like a combination of looking at all the hard facts that you have. Like this user was in Korea on a web browser with whatever sign-up dates or whatever metadata or whatever experiment feature flags you have access to. So you have that hard context that you know for the fact. Then we get the LLM.

to analyze these sessions from one that perspective. And also we've given the number of sessions, the millions of sessions have run through our system now and like we've evaded against a bunch of them. So we have a good understanding for what the boundaries of all of these LLMs are, where would they mess up, they're, what they're good at. So we have a strong understanding of that. So things where we know they're already, they're not great at, we will get that. We, we prompt it in a way where the LLM knows to like look.

for the right things and encode videos in a certain way as well. Also, when we onboard an initial customer, there's an entire process around getting the customer's context into the Vintara agent. What do the customers care about? What are their priorities? What are they trying to do right now? Every bug that can be found should not be found. Sometimes it's just not worth it. That's one of the things. And at an every...

At a session level, the agent can mess up. They can say that a problem occurred, but it didn't seem like confusion. It might have seemed like a frustration, but it didn't. But once you start to semantically cluster across all the sessions, if something happens in, let's say, 10 % of the sessions, it's definitely happening. it will be basically, at a very trivial level, we will go get an LLM.

Rohan Katyal (47:08.413)
ingest all the hard facts, ingest all the priorities, design the response schema in a certain way where the session is first analyzed in an unbiased manner, and then it is analyzed in a very targeted manner, and then semantically cluster a couple of different ideas from this schema that I can come and talk about more across a bunch of different sessions. And this aggregation essentially helps you figure out what the true obstructions or true pain points or things that the user is struggling with are. We get LLMs to take on

multiple different roles. They take on the role of a user, they take on the role of the founder of this one, the company customer, the company that we're working with, take on the role of a PM and go analyze these sessions. And all of them end up analyzing sessions with a different perspective and extracting structured information, which we semantically cluster and aggregate to find things that are worth fixing.

Raghav Sethi (48:02.006)
Yeah, another way to think about this, Nidha, is like, this is very simplistic, you're either extracting like an enum value where you're like, this is either frustration or confusion or a bug or an upsell or something. And you can just have the LLM produce one of these enum values or like null if it doesn't know, right? That's like an easy way. Like that's of the bare minimum way to get structured data out something unstructured. But...

And that's fine. For some cases, for some kinds of things, we do that because we have a really clear definition. The LLMs are good at it. And actually the descriptions, the enums are exhaustive in some sense. But the risk is that the enums are not exhaustive, right? That you're not actually finding like some of these unsupervised, like these more complex behaviors that just aren't captured. So like confusion, for example, is is much harder to reliably extract than like bug or upsell.

So for that, what we do is we have the LLM produce basically like strings of phrases that are in a certain format about what that description of that is. And again, to Rohan's point, this is slightly noisy. At a particular session level, this is somewhat noisy. The enums will be reliable. The actual output text or the output bit of like some description of a behavior that has to vary across sessions. Like it's an LLM of at the end of the day.

But we have invented and perfected some of these techniques to actually then take that and cluster it. And the clustering is actually the process that separates the noise in this noisy process from the truth. And if you get really good at semantic clustering, and now we have gotten very good at semantic clustering, all the stuff that's real and all the stuff that happens to enough users just rises to the top. And then you only look at the top 10, top 20, top 50 things. And those we have just found like

over the last like three or four months, they are now like more and more and more accurate. And now they're like, they were like 40 % right and now they're 70 % right. And then we think we can get them like 90 % right. Does that make sense?

Rohan Katyal (50:05.641)
Yeah, also the beauty of this clustering approach across different sessions is you obviously do it at like a global level, but then you can also do this for like every single feature, every single aspect of the product that you care about, just the payment flow, only users in Korea. So it's like no limitations of time and space. If you could find the top three problems in literally every part of the product for every cohort of user segment that you care about, that's essentially what Milana does.

Nitay (50:34.712)
Is the semantic clustering that you're doing then, what is the iteration of that look like? Like you said, you've got to 20%, 40%, 60%, et cetera. Is that just tuning prompts? Is that tuning the definition of the enum, as you said? I remember back when we were defining some of these things for some experiments, what was the term used? I think it was MISI, mutually exclusive, collectively exhaustive, right? Meaning like each enum uniquely describes something that the others don't, the other values, but if you take them all together, it describes the whole world.

and that that's like your perfect definition. And so is that, is it moving towards more kind of things like that and so forth? Like what are you doing to improve the system?

Raghav Sethi (51:04.14)
right.

Raghav Sethi (51:13.518)
I want get too much into it because I feel like this is part of our secret sauce, but yeah, the Meesee bit is important. Definitely the Meesee bit is important, but the Meesee bit is still in Enum land. It's still like, I think I can actually enumerate all of the things that I care about ahead of time. For the cases where you can do that, you should definitely do Meesee and you should try to use Enums and you should try to structure in that way. But a lot of the stuff that we find is interesting, organic user behaviors.

that it would actually be impossible to predict. Like maybe they fall into one of these buckets, but just the fact that they fall into the bucket doesn't tell you what you care about. You want to actually learn the behavior itself. So the thing we would cluster on is not, it a bug? Is it a frustration? We would cluster on the particular frustration. We would consider the particular bug. So that you can't be mutually, you can't do me see on that. And getting good at that was definitely a big part of, I think what we've been working on and as part of the secret sauce.

Kostas (52:13.931)
Okay, I have two questions. The first one is, okay, we've been talking about like all this data and like all this processing that needs to happen, like different modalities of data. What does that mean in terms of like the infrastructure that you need to efficiently work with that? Because from what I hear, at least, we're talking about like very, very different things, like, right, like

You can start from like the DOM. Somehow you need like to store that and like process it. Then you have the rendering of these, then you end up with structure outputs from your LLMs. Tell us a little bit about how do you build like such a system and being a startup, right? Where do you have limited resources?

Raghav Sethi (53:08.686)
Yeah, I think honestly it is much easier. I mean, every year it gets easier. Some of this infrastructure gets more mature and more reliable and like you just get better and better primitives. To some degree, we couldn't take advantage of that or we actually didn't take advantage of that very deliberately. because we were trying, our goal is to go after bigger customers, customers that have like strong security requirements, like compliance requirements and so on and so forth. So we did a bunch of stuff. I would say that

like puts us in line, like puts us sets us up well to go after those customers, including like potentially in VPC deployments, like a really strong like security model, like really strong data ownership model. So we did some of these things that are harder, but we think are important for us to go off to bigger customers. And we made those investments early because it's very, very difficult to go from like an easier, like to go from like one of these, these newfangled, like I love Cloudflare, for example, like they have a bunch of really cool stuff, but there's no way that I would be able to run anything like Cloudflare.

in a way that I can prove to you that is secure in your VPC. So we made a bunch of these decisions. now that we're in like, okay, we're going off, we're going with established clouds, we're going with reliable stacks, we're going with stacks that we think we can actually run securely within potentially even customer environments, then what do you have? Then the goal is to basically build a system that is really decoupled, that's like really can scale horizontally. I think...

A big advantage is, for example, we've built all this on top of queues. Obviously, you can imagine a really long processing pipeline. We have dynamic processing, we have backfills, we have a bunch of this infrastructure that we've built, but we built it on top of very simple primitives that we think can go into a customer VPC. That we think that if we had to move to a different cloud provider tomorrow to work with the customer, we could actually very easily do that. So we have tried not to take these dependencies.

on some of the cool, new, and frankly, amazing, useful pieces of technology so that we can actually go after these bigger customers that have more requirements. But overall, it's a long, complex ingestion pipeline that's all strung together with queues. Each of the components of that horizontally scales. We have a query processing system that is able to do all of these things reliably over time.

Raghav Sethi (55:28.172)
And it's all built on technology that you could, in theory, take, and then we could move to another cloud in a few weeks. So that's how we've made some of these decisions.

Kostas (55:39.631)
That makes total sense. All right. One last question from me then I'll give it back to, uh, Nithai. So there is a thing that I, I don't know, like I've been, uh, coming back to it since like, started like working with, uh, with data, which is somehow like when we track, like all these events from our users, like at the end of the day, like, as you said, like we're trying to, uh, extract behavior, right. And there is.

or at least in my mind, this concept of like, there is some kind of extracted ideal behavior for my customer, like what my users are doing, right, on my application. And I can figure out, let's say, like a model that, on average at least, models my user. And then of course I can use that like for many different things, right? Like I can...

make things like, I don't know, is this person going to turn if this and that happens and all that stuff. The first thing that I ever thought and actually I tried to build was to build a Markov model over all the events that you get and you collect. I can't say that it worked very well, but probably because I can build something that trains a Markov model, but doesn't mean that like...

I'm a statistician or like an email engineer who knows how to fine tune these Markov models. Now we live in a world where we have these technological DLLMs that they are kind of like universal simulators, right? Like they can simulate like whatever like you ask them like to do. And my question to you guys is like, do you see like a world where doing what you are doing to distill, let's say all this signal at scale?

from all these users can end up like using them again, like the yellow lips in a way that they can simulate accurately enough our user in a way that now we can experiment against that instead of having to go and recruit more users or have more users, right? Like to do that.

Raghav Sethi (57:59.855)
Yeah, don't want to please chime in if you want, I have like, have an interesting thesis on this. I think this is, there's a bunch of stuff in our industry that is all predicated on the idea that it is very expensive to ship software. That's why the division bar is high. That's why there's so much planning. That's why there's so much resourcing. We do a lot of things under the assumption that it will take me three to six months and a team of three to six very highly paid people to execute on this thing.

I think that simulation is very interesting. think we're definitely like, we're in a very good position to understand how users behave. I think potentially we could even build a model that is like pretty good at predicting user behavior and actually simulating. I think we could build that. And it would not be incredibly challenging to say, would you like to try this feature or try this feature flag against like simulated users? I think that we could do that. My question is, I have two questions. One, would you believe me?

if I told you that my simulated users are accurate, how do I convince you that this is strong enough signal to actually make the change that you want or not? And so that's one. Do we meet the burden of proof? The second is, now that it is cheap to ship, why would you not rather ship and see in production instead and not leave this to chance? And our thesis is, if you make it really easy for you to ship, if you make it really easy for you to generate good hypotheses,

that are backed by real data, by sessions, and use feedback that you can watch and look at and understand, and then quickly, and then make it as easy to generate an A-B test as clicking like three buttons, then wouldn't you rather do that? Because at the end, you would have 100 % confidence in the results.

Rohan Katyal (59:48.033)
Also, think the beauty of Milana is that we're finding insights from your perspective, from with giving your company's context, not some general purpose, tell me everything that's broken. Like people know things are broken, they don't care about it. Like similarly, if we were to simulate it, like right now, I've seen a bunch of simulation products out there that you can throw in copy and say some things like if for users between 30 and 50 years in San Francisco, how would they react to it?

I think that's pretty generic. Even if it's accurate, it's not particularly useful. Your context is very different. And even if you were to simulate, has to be. think someone like us is in a significantly better position to simulate than a company that's using generic data sets, especially because what we would simulate is in-product decisions and in-product behaviors. And we have a very strong understanding of the different cohorts, how different changes have moved your metrics, and what are the frustrations that your users deal with.

Nitay (01:00:51.41)
I want to maybe tie this to something you said before, Raghav, because now that we're talking about simulation, we're kind of talking about the future of where this world goes, right? And you said something interesting there of like, you know, everybody's coding with agents now. That's obviously creating a lot of the productivity that touches on what you said of like, it doesn't take three months anymore. doesn't, you can undo this first principle assumption you have of like, it's hard to ship software. And so if you undo that, everybody now cares about productivity, cares about how do I get the most out of my agents.

So how do you guys kind of view this world of like, how do you make these things the most productive? How do you use Milana? And like, where do you see this kind of in the future? How do you see it playing out?

Raghav Sethi (01:01:30.05)
Yeah, I think the key bit is, can you generate hypotheses that are backed by data quickly enough? I think that to us, like that is the key question. Because right now, we talk to startups, these startups are really advanced, they're like at the bleeding edge, they're using every tool known to mankind, they have the Devon subscription, they have the Codec subscription, they have the Cloud Code subscription, but they don't have wins, they don't have potential wins stacked up.

So they are going off and looking for potential wins. They are interviewing customers, they're recruiting user panels, they're hiring user researchers, they are digging through their logs and events to try and find, hey, what is going on? And a lot of these people are very sophisticated. Today, would say, obviously, every customer we have has a funnel. They know how many people came into the top, they know how many people activated, they know how many people converted, they know how many people retained. They have this high-level data.

Going from a drop off in a funnel to understanding what theses I should go after, this is now the most expensive part. So if we can make this easier, then now you can start, now you have a lot more theses. And we're like, okay, now you have a lot more theses. But if you have to take these theses and sort of assign them or like make a resourcing division about, hey, which engineer should go do this? Which like product manager should go do this? Which team should do this? Where should this like exist in my OKRs? Then we have moved the needle, but only by a little bit.

If you can actually then from that point, click a few buttons and kick off like the bug fix, kick off the change to the product, start an A-B test. Now you have actually completed the loop and now you have actually improved the velocity. And in some sense, you have lowered the barrier for what thesis you are willing to explore. The whole goal is like generally lots of good thesis. These have to be good. They can't be bad, but they don't have to be as great as they were before because now it is very easy to evaluate every thesis.

So that's how we think about this. And we think that companies that get good at this, whether it's with Milan or something else, they will beat the companies who are shipping faster and understanding their users and fixing all the problems that are affecting all of their metrics the fastest. Those people will win. And those people are the people who are using their coding capacity to its maximum.

Rohan Katyal (01:03:51.336)
Also, my heart take is that analytics companies are going to die. The bar is not insights anymore. No one cares about having a dashboard. We've just living. The expectations are changing where people expect you to be driving outcomes for them, not just delivering insights. And the future we want to move towards is where Milana owns one OKR for you. You just come and tell Milana, this is the OKR. I want you to drive up signups by X percent. Don't come back to me with like five ways to like drive up signups.

you actually drive the signups and come back to me. I think we are somewhere in the transition point where people just don't ship something because an AI said so. There's a high bar to prove. Easy, good for us. Session replacements are inherently transparent. You can see that. But we will reach a point where everything is going to be so good, including human trust on these agentic systems that you will trust an agent to own an OKR for you.

Nitay (01:04:46.049)
That's a fascinating take and I completely agree with you. think that's where the world is going to go. And Raghu, I think you said it very well. The main thing startups have, the main power and advantage is iteration cycle. And everybody all the way up to Elon Musk with SpaceX has shown this, right? If you can iterate faster and faster, you end up winning. So I couldn't agree more. Cool. We are unfortunately running out of time here, but this has been a super fascinating conversation.

Any last thoughts you guys have? think there's been some amazing takeaways here in terms of, know, challenging some first principles assumptions around how software gets built and kind of what holds you back. Being able to push the envelope on ideation and coming up with one more experiments, enabling your agents and even over time enabling full automation where they can fix and completely and things like Milani can completely own KPIs for you. Anything else you would kind of be there?

like to see happen in the ecosystem or like kind of people to understand as we close out here.

Raghav Sethi (01:05:47.224)
I think the crazy thing about what we're doing is the puck is moving so fast. We are constantly, literally almost on a per model release basis. You have to update your assumptions about, what product should you build? Because the kinds of products you build and how you build them will change dramatically. In fact, I would say even the more interesting thing, who is building the product is going to change dramatically. For example, these like...

there's pro software and consumer software developers. Imagine the term consumer software developers. Consumers don't produce software, but now they do. What do these people need? What do the creators and software builders of the future need? I think it's just really fascinating. We honestly have updated assumptions on this at least a half dozen times, and I think we will keep doing that. But it's just a wild time to be alive and be building this because you're just building this thing. We really love what we're doing because I think it's a good foundation.

It's like understanding the user is the foundation to improving the software. So we are just like, where do we point? How do we get to that point when the models and the users get there? How do you prepare for that is just a thing that we think about a lot and we enjoy talking about.

Rohan Katyal (01:07:00.991)
Yeah, the new category of creators is something I'm particularly excited about and you know, falling very, very closely. This new category of creators coming up on like the lovable and replicates of the world who are now starting to build very, very serious businesses. The one person billion dollar companies that are being talked about a bunch. These people can't have dashboards. One person can build but not be looking at dashboards. These people will need agentic systems like ours to help them grow their business as well. So pretty excited about that trend.

Nitay (01:07:29.039)
Absolutely. Well, with that, we'll definitely have to bring you guys back on and see how the assumptions have changed next time you guys come on. There's definitely a lot more conversation here to be had. Thank you guys for joining us. This has been really a delight.

View episode details

Listen to Tech on the Rocks using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

From Session Replays to Autonomous Improvement: Shipping the First AI Product Engineer with Milana

Subscribe