Episode 25

From Art to Science: Wild Moose and the Future of AI-Powered Debugging

March 17, 2026 · 52:40

Nitay (00:02.254)
All right, welcome Tom, Roy and Yasmin. We're excited to have all three of you, the full Wild Moose team on the pod with us today. It's very exciting. Why don't we start by going around, just kind of give each of us some of your background, experience, and kind of what led you to where you are today. Yasmin, you want to go first?

Yasmin Dunsky (00:21.006)
Sounds good. So great to be here. I'm Yasmine. I'm a founder and CEO of Wild Moose. I started my career with the Special Forces of the Intelligence Unit and doing special ops as an analyst, and then as a commander, and went to study computer science as an undergrad. Actually worked a few months at Google as a software engineer, but then quit to start an NGO in Israel to teach girls to code, which I've done for a few years as the CEO, and then moved to the board.

and then moved to the US to do my MBA at Stanford. And basically on my days in Stanford, started chatting with Rory and Tom about doing something together. And I'll let them introduce themselves in our backstory. We have a cool backstory together.

Nitay (01:07.629)
Cool, okay, you wanna go next?

Roei (01:09.789)
Yes, so I started my career in 8200 after doing an undergrad in Technion. I met Tom there like on, I think my first day if I remember correctly. And we had a good few years there. And right after I got out, I...

basically continued studying and went on to do a PhD. I moved to New York to Cornell Tech to complete it there. And during the PhD, like in 8200, we did a lot of cybersecurity. That's where my PhD started, but then it veered on to AI mostly and specifically AI security. And I specifically focused on NLP.

because I had this idea that human language is really interesting. And if I'm doing a PhD, I might as well deal with the most intellectually interesting thing for me. And so natural language processing was it. And little did I know that we already solved all the problems in NLP, but we still don't know how human language works and anything of the sort is not that intellectually interesting nowadays.

even though everything is solved. But after the PhD, which I ended up doing because natural language processing, I ended up doing a lot of work with transformers, what later became more known as LLMs. And I did a quick postdoc, but in the beginning of my postdoc, which I did in Toronto, I was already talking to Tom and Yasmine.

quite seriously about starting something and very quickly after like half a year, I quit the postdoc and we started the Wild Moose.

Nitay (03:11.915)
Very cool. Tom?

Tom Tytunovich (03:14.378)
Yeah, hi. So I'm Tom, co-founder and VPR and D of Wild Moose. lead the engineering team here in Tel Aviv. And yeah, I'm a software developer. I have been for maybe close to 25 years now. I actually started programming on like one of those old school text-based online multiplayer games, like way back. yeah, so as I mentioned, I then joined after completing my bachelor's degree.

at the Open University. I also joined 8200. And yeah, and years since then, I haven't done anything academic, like since the bachelor's degree. It's nothing compared to OE for sure. But I did work for a lot of different types of companies, startups, larger corporations, nothing really that they share in common, would say, except for the fact that all of them had a lot of issues in production, a lot of incidents. Like I did the math recently, I've probably spent like...

Roei (04:06.361)
Thanks for watching!

Tom Tytunovich (04:14.104)
I don't even know it was something like 10,000 hours at this point, probably just on handling production issues. Also at every company that I joined, I somehow ended up having to draft their incident response process. I will say like the space I think has improved in recent years in the sense of at least now of the emergence of some companies that address at least the logistics aspects of it. Things have gone slightly more systematized, but it always struck me as odd that the actual core work.

of how do you actually debug an issue, what's actually going wrong with the system, has always been to some extent more art than science, which is very unusual, I think, within the software engineering discipline. Yeah, so that made this area very interesting to me. so we have started this company to address this.

Nitay (05:06.879)
so why don't we start there a little bit? mean, it sounds like you guys had an interesting coming together story and each kind of from different backgrounds, but overlapping areas. So what was the kind of spark that said, okay, there's an opportunity here and then, you I'm gonna go invest all my blood, sweat and tears. Startups are hard, but this is the one.

Yasmin Dunsky (05:26.39)
Yeah, so when we started chatting, it was very crazy. Roy was living in New York, but doing his postdoc in Toronto. And Tom was in Israel back then, just moved back to Israel from New York. And I was based in California. And the three of us made the decision that we might want to start working together. I actually didn't know Roy and Tom almost at all before we started talking about doing a startup together. They were very close friends for many years.

And we were chatting about like maybe we can do something together and started the ideation process and we did the very crazy decision of testing and What we decided to do is to live together for three months in an Airbnb in the middle of nowhere in California but in the middle of nowhere in California and We said like if by the end of this time we have an idea that the three of us are really excited about

And we kind of stress tested our relationship and everything is working well. Then we're quitting as postdoc. And we're actually going for it. So those were very crazy days. And when we did this ideation and started to think about what could be a good fit for us, what could be something that the three of us are really passionate about and that we see a big opportunity behind, it was clear that we're all coming with a very technical background, especially Uyen and Tom.

But all of us coming with the technical background and kind of figuring out that I think the first decision that we've made was that we want to do a startup for technical people. And when we thought about what's happening for the technical persona, we look at the life cycle of the developer. And what we notice is that everything around writing code and testing code was already starting to really change. And as we mentioned, he was deep into LLMs from, you know, he was

Back then, like, Transformers was the thing. And he constantly told us, like, listen, I'm telling you, this is our why now. There is something happening here. Just to emphasize this, it was before the launch of Chachi PT. So things were already changing and happening in that space. But it wasn't as clear as it is today that this is a revolution. But it was early to note that there is going to be something huge here. And the three of us looked at what's happening at the lifecycle again.

Yasmin Dunsky (07:49.182)
I really remember a specific conversation where Tom was saying, OK, if I'm thinking 10 years from today, there is no way debugging is going to look the same. And as opposed to writing code and test automation, all of that, is text generation, more classic of a problem, what we thought was really interesting about production debugging is the fact that it's actually more of a search problem. So I think all of us connected to this from both it being a very technical

like challenging problem to solve with a very clear like why now because Transformers are going to like change everything in this world. It wasn't clear at all like who's going to go after this opportunity and how difficult it's going to be to solve it. But we definitely recognize that the opportunity there is huge and yeah we started focusing on it really early on. We were probably the only one, only ones it was like us and maybe like Flip.ai were probably the only companies that started already working on something in this area. Maybe we're missing someone but that's

to our knowledge. ever since then, AI agents became a very obvious thing to do. And the category has been so crowded and exploding with a lot of different kinds of approaches to tackle this. But yeah, we were very early.

Nitay (09:04.021)
And I'm curious to hear, since you touched on kind of starting this before the whole wave of chat GPT and LLMs were early. so you identify, okay, debugging is going to change 10 years from now, going to look completely different. it's it's at its core search problem, but the whole world of agents and so on, like that didn't even exist yet. mean, the word obviously exists and AI agent is a thing, but like what it, what people think of it as it is now today didn't really exist. So back then, what was kind of your perspective of like.

Roei (09:27.433)
I don't know.

Nitay (09:30.746)
Okay, given that we view it as a search problem, which maybe others don't, what does that mean for how debugging should look or how, what it will become?

Roei (09:39.721)
So people forget but but even before chat GPT came out there was GPT 2 and GPT 2 was pretty good at writing code to good enough that there were a few startups using it to to do code completion and I remember seeing their demo sometime maybe 2020 2019 and it was like very impressive and And that's and then then we knew that code writing was gonna change significantly

Before Chatchi PT, they were very good at code completion. So that's kind of what triggered our thinking about, okay, code writing is gonna change significantly, but what about debugging? And yeah, the fact that it's a search problem makes it, mean, obviously debugging is harder and debugging is more of, or arguably more work that software engineers do is debugging.

rather than writing code. also we talked with a bunch of CTOs and a bunch of EPRNDs and they told us we have two problems. One problem is migration. We don't know how to migrate from one cloud to another or it's hard, it takes a long time. And when companies grow, that's very important. And the second problem is production incidents, which were a complete menace and still are.

Tom Tytunovich (11:10.05)
Yeah, and to add to that, think it's when you're dealing with a text generation problem like writing code, part of what makes it easier, as we said, is that if you have a proximate solution, it's generally not bad. Like it's something that an engineer can then take and potentially iterate on and tweak as needed and get to a place that works well. And even if you gave something that isn't quite right, you've still delivered value. And I think a lot of the early AI tools in the space of code generation were sort of like

that kind of value proposition. And in our space, that's not true. When you're talking about handling an issue in production, throwing a bunch of text at you, half of which may be wrong or even generously speaking is close to correct, but still ultimately wrong, is much, much worse than saying nothing at all. Like that kind of noise when you're handling an issue in production, it's already stressful. And leading you down some rabbit hole that has nothing to with the actual problem,

it's worse than useless. And we saw this a lot with our early customers that they had very high thresholds for what kind of precision we need to be able to show and what percent of cases, and they were measuring us and evaluating us from very early on, what percent of cases we need to get to the actual correct root cause, not like some signals or things that get you maybe somewhere near the direction. And I think that characterized a lot of the way that we built the system, where from the early days we had to...

show things that were correct. Maybe it was in limited scope, so maybe it was under circumstance. But whenever we would say anything, it had to be the right thing that just gets you the information that you need and that analyzed the problem correctly. And I think today, the code generation tools have also improved a bit. Autonomy has improved a lot. Agents nowadays are able to do things that are a lot more correct from the start in a lot of ways and reason about things more correctly. But still, when you're thinking about something at the

edge of how things work, like production issues fundamentally are about how things break, how they move forward in design. Still, it's quite a challenge to even in those cases not fall into all these rabbit holes and to really get to the correct solution consistently.

Kostas (13:21.825)
So when we're talking about generating code, think one of the, let's say like the catalyst that made this part of LLM's like one of the first big successes is that there is like a very short and tight validation feedback loop. Like you have the compiler, right? Like, okay, you generate like a piece of code. At the end of the day, you are going to compile it like in a few, maybe seconds, let's say.

you will know this is wrong or right. And there's no dispute on that, right? Like either compiles or doesn't compile. And even if like the language, let's say, is like, you don't have like the compiler to help you, there are a lot of tools that the industry has built like for decades now, like to help with this. But when you're talking about like production and when you're talking also about like search and retrieval, because...

And correct me if I'm wrong, but I think like the problem here is that something happens, you have a signal, but that something happens like, I don't know, like an error somewhere. But then you have a lot of information spread in many different places that somehow you have like to process and like figure out what's going on here. But you don't have a compiler there, right? So how do you deal with that?

Yasmin Dunsky (14:50.766)
Yeah, I'll start and then you can talk about microagents. I understand now that we also didn't properly introduce what we do in Wildmoose. So maybe just a word about that. The way our product works in Wildmoose is basically whenever there is an issue in production, we have our platform that automatically will get triggered, understand what kind of problem there is, and then automatically start collect data from.

logs, metrics, traces. could also be from definitely code changes, depending on how much access we have in a particular production environment. It could be also their code and Slack channels. Wherever a human would look for information, we would want to have reading access as well. And then once there's an issue, it will collect all this data, analyze it. And we put a very high bar here because of our approach. And we can elaborate more about it.

We're coming back with a result after one minute. It's really important for us because we believe that in many occasions, when an engineer will wait for more than four or five minutes here, they will just go to their tools and start their investigation. We believe that the bar here should be an automatic, less than a minute result that is actually advancing them significantly and in many cases just knows the actual cause of the problem.

And we come back with not only recommendation about the root cause, but also about the impact of the issue and what should be the mitigation step. So to your point, yes, it's exactly the problem is, and kind of going back to it being a search problem, there is a ton of data that we can use here to analyze and bring back a result. But if you throw all the data on the model or give a generic AI agent access to all the data in the company,

than in real time trying to figure out which data is relevant and doing this work. It's kind of, if you think of how an engineer that sits in an on-call shift will think about it, they won't start from scratch every time to try and understand where is the important data. In reality, if you give an AI agent this kind of ability, what will happen is it will take a ton of time. It will burn tokens. And in many cases, it also will be not accurate.

Yasmin Dunsky (17:11.694)
Because again, it's very different than how a human will think. Like a human has this muscle memory of like, okay, here's the issue. Where should I go next in my investigation? And, you know, what we're trying to do here in our product is basically an approach that tries to kind of compile and advance those workflows and this muscle memory in a way that then in real time, the AI could replicate and execute in the same way you would expect a senior engineer in the organization to operate.

and maybe you can share more about our microagents approach.

Roei (17:48.793)
Right, so you did say a lot and specifically I think addressed Kosta's question, but yes, so I mean, there's two things here. One is to iterate on a piece of investigative agent basically is what we're building. Like we're building investigative agents for every one of our customers. And to build that, of course the customers also participate.

You really, really want to as you said, Costas, like some sort of criteria for when it succeeds or fails. compilation error, unfortunately, we don't have that. What we do have is we can run it on past incidents. And so that's the first thing I kind of want to emphasize is when you run an investigation or investigative agent on past incidents and you get that it found the root.

cause for 20 of them, then very likely for the next five, it will find the root cause in high percentage of them. Because whatever happened in the past is likely to happen in the future. So that's the whole premise of machine learning. And here we are actually machine learning agents themselves, in a sense. So that's how our customers iterate on their investigative agents. They try to make them work on past incidents.

And the second thing that Yasmin touched on was that our investigative agents are not agents in the ordinary sense that were kind of what people expect when they hear the word nowadays, which is a loop where you call an LLM and the LLM decides what to do, whether to get more input from the user or to call a tool or whatever. But they are a series of operation and then a series of layers of LLMs

analyzing the operation and analyzing the analysis and basically exporting a very high SNR report of what happened. We call them microagents because they run really, really fast. That's their goal. And they are very optimized to answer, basically to repeat workflows that we found that are in the muscle memory, so to speak.

Roei (20:12.617)
of investigators, lower like debugging developers. So that's the two points I kind of wanted to touch on.

Tom Tytunovich (20:21.26)
Yeah. And I would say that's also what makes this one of the things that makes this really cool is that with code, you already have the compilers, all the tools for the STLC, but in our case, you sort of have to build all of that. You have to build the machine language as we was describing. You have to build the way to do tests and like the different layers of that. And you have to build like the entire ecosystem here, which yeah, makes our job really interesting, I think.

Kostas (20:47.047)
Yeah, so you talked like a couple of like very interesting things here. So this muscle memory, Yasmin, like you mentioned, I would say kind of like, I don't know, it makes me think also of what if you talk like in some organizations, like people will talk about institutional knowledge, right? Like there is some knowledge that it's not explicit anywhere, but it's like people that they've

figured out some things and they just like to repeat doing them, which it's not just like the people, it's also like the systems, right? Like at the end of the day, maybe we all use Elastic or we use, I don't know, Splunk, but the way that like these systems are set up, they operate in our environment. They're like always like kind of unique because our systems are like unique in the way that they have evolved, right? So.

When you're talking about these, do you have your agents, these microagents or the whole platform, this swarm of agents, has to learn from these people and adapt and become part of the organization? Or you have figured out, let's say, some kind of recipe that just works for everyone out there.

Yasmin Dunsky (22:09.08)
So definitely, we believe the best way of thinking about it is that even the most senior engineer, when they will be in a new production, in a new company working with a new production environment, it will take them months until they will be able to sit in an on-call shift. So every company has its uniqueness, and it's our job to figure out how to get this information in the most frictionless way possible. Does it involve humans or not?

What we're doing is building the tooling that allows us to start with a company where they're at and build the knowledge for the agents, for our agents to then operate in an optimal way. So if there's a human that really has all the information in their mind and can interact with our platform, then in a day, we can basically kind of take all the information from that person's mind and it will be embedded in the agent's mind as well.

by the way they interacted with it. And we built a tooling for that. our agents are basically built as code. And like Roy mentioned, our approach is really test driven. We make sure that once an agent is deployed, it is already tested on previous issues. And we know that in real time, it will behave in a predictable and very secure way. And the

If there's a company that has a lot of information about previous issues and their ground truth of what was the root cause, then we don't need any human in the loop. Our AI would be able to basically learn the agents and the agent architecture that will then in real time operate together. So there will be some generic issue. One agent of ours, which is a more generic health agent, will start running. And then it will call a different agent that is optimized for performance issues.

and chain those agents one after another to solve the issue. And no human was needed in the loop to create those agents because we had enough information that our AI could use to create and learn those agents. But what we believe, first of all, even in those cases, we would want a human in the loop to re-throw through those agents, look at the results, look at the expectation that we managed to get from their Slack channels, postmortems, notion pages, whatnot, actually fit what they expected it to do.

Yasmin Dunsky (24:34.452)
And in most cases, what actually ends up happening is that the information that exists and written is not holistic enough. And there is a need in interaction with the human that will help fill those gaps. And our tooling is meant for them to very intuitively communicate where are the gaps and see where they stand, which data they needed to add for it. Let's say for one company, one team is actually really well documented and we didn't need to

a human there, but we see for a bunch of other triggers and alerting, there's not any information. So either we understand from that that something is off in the monitoring itself and maybe potentially these alerts are not well documented because no one actually looks at these alerts and they could be removed completely, or that we understand that there is some documentation gap here and we can help. We basically become the new source of truth by interacting with us. If that makes sense.

Tom Tytunovich (25:29.954)
Yeah.

Nitay (25:30.08)
Yeah, go ahead Tom.

Tom Tytunovich (25:33.87)
Yeah, I was just going to say, even where the documentation does exist, I think we've literally never worked with a single company where they've told us, yeah, all of our run books, playbooks, like they're completely up to date and we're super happy with them and they capture reality perfectly. And I think that kind of goes back to the nature of how these things have been managed to date, where you're expecting people to write.

someone in vacuum somewhat after issues happen, like write down their tribal knowledge in an unstructured way, like when they're already outside the context of the problem. And it makes sense that these things go out of date and that they don't capture reality. And the approach that Yasmine and I were describing where we're able to make it a lot more, not just structured, but a lot more tested, a lot more in touch with like the ground truth of, as we were saying, like 20 past issues and making sure that actually produces the right results for all of them.

making that connection to the actual results, you're able to produce something that is a lot better than this documentation at capturing your actual processes and that therefore actually affect.

Nitay (26:38.717)
Yeah, so you guys had some really interesting points there that I want to double click on a little bit, which is kind of about the workflow and how the humans are part of this and where and how the workflow changes, right? Because if I take this back to, you you were saying ultimately it's a search problem. And in search, in my experience, basically it comes down to three things. There's the corpus, how good is the data you're working with? There's the ranking algorithm, and there's the queries coming from the people.

And the thing that you got me thinking as we were talking about this is, well, in your case, in many, in many cases, I imagine with the workflow, the queries aren't necessarily coming from the people, right? The agents themselves are autonomous and ready, but then in other cases, the humans, the people are jumping in and having some sort of insight or some sort of steering or so on to the workflow. so, tell us a bit more about like, how do you then optimize that system? How do you give given the queries, right? Like if I'm using something, you know, traditional Google.

Google search, people learned over time and arguably now they're unlearning how to search with keyword ease. But in particular, they've learned that like, I might not get the first result. Maybe I go and I edit it and so on. There's kind of a natural loop there that forms. And so what is the loop that you see forming around debugging and how does it then feed back into like, I need to go make the corpus better or actually I changed the workflow or the query in this way.

Yasmin Dunsky (27:59.118)
So our experience is very similar to how you are using ChatGPT or whatever tooling you're using for interacting when you're creating some sort of a draft. I'm as a CEO, most of what I do is emails. But even back then, you have some, maybe you would come with a very, and actually I'm going to use the email example.

But sometimes I come with a very, like, OK, this is exactly what I want in my email. Just tweak this. Sometimes I come with, like, I have this situation where I need to email something about this and blah, and it's kind of more complex. Can you give me some advice on what my email approach can be to the situation? And you can use the AI Assistant in many ways and work on this together until you have something you feel happy with. And this is how.

our experience right now of building the agents is with the customers. Some customers are coming very opinionated. They know exactly what they want the architecture of the agents to be. They can say one, two sentences. They have all the documentation, or they have a very good start there. And then they can come with that and say that to our AI, basically, environment. And it will generate the agents for them. And then they can test it, and they can see it. And they very much are taking the

the wheel there and driving seat, I think is the way to say that. And then in some other cases, it's much more of, hey, AI, I need your help in creating agents. Here are my last 10 alerts. These were the root causes. Can you help me write a playbook for this? Can you help me write the create the agent for this and then test the agent for this? So.

The process can look different for different kinds of personas, and it's very much similar to how we're using text generation tools today.

Tom Tytunovich (29:55.34)
Yeah. And then also to the point of how that corpus then improves over time, I guess you could divide that into like three components here, where ultimately you have the system itself, like the system running in production. You have the sort of tooling instrumentation around that, whether that's your observability, tooling, alerting, know, your logs, your metrics, what have you, and the processes around that. And then you have...

like what we sort of encapsulate, which is the processes of how you use those tools to investigate the problem with that system. And you need to draw on understanding of all of these to sort of be able at the end of the day to correctly search and find what you need. And so when people interact with the system, it does feed back into each of these. The most obvious one is, of course, we take feedback about what went well, what did not, and improve our own agents and optimize them and all of that. That's the obvious part.

But more recently, we're also starting to look into how to improve those other two parts, like how to improve places where maybe your logging could be better, and that would make these searches easier, or maybe some documentation, maybe even the code itself. Like maybe if you have enough repeating root causes that turn out to be a certain type of problem or some component in your system, maybe that's also something that you should do better with. ultimately, I think the

a system like this to be a real AISRE, like the way that the name of the category, mostly a name at the moment, you really need to get to the point that you're improving that entire cycle and not just the, do you investigate the production issues at the end?

Kostas (31:37.591)
I think there's another dimension in the search problem here that makes it quite interesting because you're not just searching across vast amounts of data, you're also searching across very different modalities of information. And again, if I'm wrong, correct me, right? But it might not be, let's say, what people usually think when they...

Here like the word like multimodal where they have like like I can search among like pictures and I can also search like among like text blah blah blah in video but if we take it into like the our industry it is like multimodal like situation in a way, rather like a log at the end of the day and the piece of code they are like different in the way that you would handle them like both from like a systems perspective but also from

how you interpret them, right? Like the semantics or like what is explicit, what is implicit is like very, very different. And at the end of the day, when you are like in operations, what humans do is like, they are really good at like fusing these different things into like one representation, right? How does this work with models? Because yeah, okay, like when I use like code, code like to write code at the end of the day, like I'm just writing code and

To be honest, like it's even more specific than that, right? Like if I'm writing, let's say something in TypeScript, my whole context will be around like TypeScript. Now, if I'm writing TypeScript and then I have also like a database out there and there's an ORM in between, and then I have logs and then I have metrics and traces and like all these different things, like how do we, how you?

deal with that because I feel like that's something that I don't know like from my experience at least doesn't get enough like exposure when we are talking about this type of like systems that you guys are building that in my opinion like adds exponentially more complexity to what like the models have like to do there.

Kostas (33:56.768)
So how do you deal with that, right?

Tom Tytunovich (33:59.48)
Yeah, so there's a few things, I'll leave it to Rui to speak to some of this as well in terms of how we actually use the models in different ways. But I will say one of the first things that we built to address this kind of inherent complexity in different types of data, these different modalities, we created an abstraction layer that allows you to carve out certain parts of what needs to be done so that you can use the right tools for the right jobs. For instance, you know,

One of the classic things that you needed to do when you're looking at production issues is look at the dashboard, see if anything stands out. Like, look if you're seeing any spikes and drops, anything that seems correlated to the problem. So of course, we could have just gone with the sort of like monolithic approach of just give the LLM all of that data, like, you know, encode it in some way that it can consume and just hope for the best. But we found that that doesn't actually produce the best results. Instead, for this example, for the dashboards, we created the

the sort of primitive that is able to use vision models, is able to use some heuristics and traditional algorithms from statistics, is able to combine a few different things in sort of a predictable way in order to really use the best tools to be able to detect ultimately what is interesting with these dashboards. And of course, if you're looking at logs, then you have other primitives that look for what kinds of errors seem to be standing out and what patterns you're finding within that. So at the end of the day, of course, you want

the agent to be able to take all of this into account and you want to be using intelligently, but giving it this layer of primitives or of building blocks that's able to operate that are really good at working with the specific data types does go a long way towards improving the accuracy of the results.

Roei (35:42.345)
Right, so the APIs that observability frameworks expose are not necessarily built for humans to ingest. They might be built for code to ingest and then for doing a lot of processing on the results. So our abstraction layer that Tom mentioned, it's more an operation that a human would do and the data chunk

and kind of bite size that a human could deal with. So that's one thing, but it doesn't address exactly the different types of data. And the different types of data is a deep question that I feel like maybe the world hasn't completely gotten into because we are now starting to see that there are models that are better, like right now.

as of this date, February 2nd, Claude is really good with code, whereas OpenAI's models are better at math. And so which one is smarter? What is smarter, code or math? We don't know. But we know that the type of data you feed and your training procedure really matters in terms of what tasks the model will be best at. And we still don't know which.

For our domain, for debugging at large and for production debugging specifically, we still don't know what the best training procedures are going to be. And it's going to be fascinating. We really can't wait to start training models and seeing how, and basically tackling the challenge of what sort of intelligence is the intelligence that's most important for debugging. Because I can make the argument that it's, you know,

you need to be able to think very rigorously. So maybe math models would be the best at it. But also, obviously, you need to know a lot of code. So maybe code models would be the best at it. And those are just two types. There are many others. And it's very interesting.

Nitay (37:53.786)
How do you, since you touched on it a little bit, how are you guys kind of thinking about adapting and utilizing best the kind of ever-changing world, right? I imagine even to your point, even just in the last couple of years, we've obviously seen a lot of changes. And so how are you kind of structuring your product? How did the workflows change? How does the underlying technologies?

Roei (38:13.545)
The best practices also around that keep changing. Sorry Izmi.

Yasmin Dunsky (38:13.582)
This is actually.

Yasmin Dunsky (38:17.634)
No, I want to say that this is actually one of the biggest things that we are starting to see as a benefit when we are working with companies because everything is changing so quickly. we are constantly changing the, in a very specific, we compile our microagents in a way that will be optimal model per task. Like we said, sometimes it can be that for, in the same workflow, you actually need a different model, both in terms of cost and.

in terms of reliability and efficiency, which model is the right one for each task. So it's been really interesting working with customers when everything is changing so quickly. And one of the things that we're seeing recently is the fact that we are able, this is something that we can basically create an abstraction layer for our customers.

by doing this behind the scenes. For us, yeah, we can address more how we're thinking about it like on an everyday level. But I will say like it's one of the biggest challenges today in the space. Like I think everyone that is working with AI, like things are just changing so quickly that for us as founders and for companies like building with AI, this is just a very interesting and challenging aspect of it.

Roei (39:35.533)
Yes, and so obviously we try and everybody tries to have their stack model agnostic and provider agnostic and that's one key thing that you really want to do. But we specifically more than others also have the challenge of optimizing our code for speed and efficiency and we felt that sometimes that means

choosing, selecting the one best possible model for a very specific flow of code that runs. And then that model becomes very hard to replace. Even when other models are smarter, they're not necessarily as fast. They're not necessarily like you, because of like as usual, optimization breaks up, like runtime optimization breaks abstraction. So this is the tension that we're dealing with as well, because we respond to

How many now? Tens of thousands alerts per month. it's just not going to be viable if every alert costs us tens of dollars, which is completely realistic. So what we did was we highly optimized everything for speed. We use very cheap models, but we use a pretty elaborate stack that orchestrates them so that we get intelligent results. And that is...

there is a tension between that and being agnostic to the model, to the provider, being generic and being adaptable to changes. So it's been difficult for sure.

Nitay (41:14.539)
And you raise an interesting point there, right? Because a lot of AI companies, what you end up seeing is kind of the famous triangle, right, of like speed, quality, cost, right? And in most use cases, certainly from the customer perspective, what I've seen is basically everybody, at least today, mostly just cares about quality and they just throw money at it and the latency may not be necessarily that relevant. And you guys have this very kind of interesting and unique use case where the latency matters a lot, but obviously the accuracy also matters. And so how are you kind of like...

walking that triangle into your point when you go and specialize in something. So I assume if I gather what you're saying correctly, you're taking like some small model, some 7B or some whatever, rather than like some, you know, 500 whatever, and maybe fine tuning it or maybe playing with it and kind of like focusing on that one because you know it can reach the speed that you want, length you want, and the accuracy is kind of good enough. And so how do you then kind of iterate on that in the going forward and how do you think about kind of that triangle?

Roei (42:11.241)
Exactly, so that's why I keep saying the word microagents, but it's just something I'm trying to make happen. But in either case, yes, it's...

I don't know if I have a lot of like meaty things to add on top of what you already said, which is that there is that triangle. We are at a point where we can't ignore, we can't just throw money at it and or we don't want to either. Some people choose that approach of basically, you you plug in a bunch of skills, you plug in a few MCPs and you tell the model one of the skills it has is which things you want to run when a production incident happens.

That is a reasonable approach if you want to spend a lot of money and a lot of time for an incident. And we are not in a place where we want to do that.

Tom Tytunovich (43:07.352)
Yeah.

Yasmin Dunsky (43:08.302)
I think what's interesting here and to my point before, think it's like everything is changing so quickly and I think we are learning, there's not a lot of companies that are AISREs in production today. And like you say, there is a high sensitivity in this space for accuracy. And I think a lot of companies still don't understand even the costs that it will take to just let AI randomly.

go over their production environment every time that there is an issue and investigate it. So it looks really good at a demo at a hackathon. And then when you try to actually operationalize it, then you start monitoring the costs and the time and the accuracy, and you see that things are not as pretty now. And we are very deep into this space, working with dozens of companies and enriching hundreds of thousands of alerts per month.

We believe that, and I think that's our approach in general, that the customer should have the choice of where they want to do compromise and why. They should have the control over what is extremely important and you can't, the standard has to be really high here on quality and therefore it will cost more. And in general, when we encounter these kinds of questions, we believe that this should be the customer's choice.

But to be able to provide that, we need to have the ability to plug and play different models for different tasks in a very granular level. So it's not like we're trying to find the one perfect model that will be the one for every kind of debugging issue is the result. It's actually in a very, very tactical level. When you take data and we analyze it, and you do that many, times per second when you operate and scale,

then you need to think about those things that maybe sound like a little bit, I don't know, like boring or something, but end up being the most important consideration when you actually think about what is sustainable and what is actually makes sense for an enterprise company.

Kostas (45:22.498)
All right, I'd like to spend a little time on these microagents concepts that you talked about guys, so It's been a while now that in the industry there's like almost I'd say that like two extreme Opinions like there are the people who say and by the way both sides they publish research on trying like to

prove their points, right? That you just need like one event loop there for your agent and you don't need multi-agent systems actually going into multi-agent systems like the great like performance. Then there are the other side that says no, like you need multi-agent like systems. are multiple different like architectures that they emerge. I mean, people...

Anyone who is using cloud code, they can see that they spawn subagents to do things, blah, blah, all that stuff. So there was also a publication from Google recently about, did some interesting investigation there. They tried to quantify when a multi-agent system makes sense and when it doesn't make sense, which I found quite interesting because...

I think like, I'll like a bit of like around like for the industry, like from my side here. It's kind of funny that, you know, like engineering by definition is all about like trade-offs, right? Finding the right trade-offs for the problem that you are solving. But as an industry and like a guild of like engineers, also like, we love to be polarized. Like, no, it's this or that. There's no like...

something in between, which is kind of crazy. And I think like, there's a lot about like human nature. But anyway, tell us about your experience, right? Like what's, what made you decide to embrace one specific architecture? And what are the benefits that you see there? And what's your also advice to people who design and build and try to build systems like that?

Kostas (47:45.923)
Because at the end of the day, there's a lot of noise out there. think the best signal you can get is from people who have successfully deployed things out there and they are working and they are honest by saying what the trade does are.

Tom Tytunovich (48:02.154)
Yeah, think as much as dogma is comforting, do think engineering principles and thinking of tradeoffs is the spear approach. So in that sense, we also use those different approaches for different kinds of problems. Like just to give an example, when we're working on the step where we're building and optimizing an agent, there it can make sense in a lot of cases to do kind of more of a multi-agent sort of like traditional

architecture where you isolate the context and you do those abstractions and so forth. And it works and it's expedient and it can be helpful in some ways. Whereas in some cases, when you're talking about our microagents actually run in real time when an issue happens, they're potentially a more linear approach or even in some ways it's even more constrained than linear approach because again, it needs to be optimized and everything. But it can make more sense.

And so it's not so much that we fall into one of these camps, but rather that we try to take the best from each of them. think we can give general advice for anyone out there building these kinds of systems. But in terms of our approach to this, think that's how we see it.

Yasmin Dunsky (49:12.854)
I will add to this, that from my perspective, working with Ouyin Tang was always very interesting from day one because we are obviously a team with two very strong technical personas with, I don't know if you noticed, but very opinionated and not always thinking the same. And I think that was very beneficial for the company from day one. And more specifically in our case, Ouyin is coming with a very AI academic background.

I would say big believer in what AI can do. And Tom is a little bit more on the skeptical, engineering classic approach of things need to work. And I think from day one, we had these two perspectives of, how can we make our customers happy tomorrow, but also hold the fact that AI is changing the world? And those two perspectives really helped us.

gets to an approach that I think is a very much combination of those two perspectives. And like Tom says, for every specific problem, we found the solution. But in most cases, that's after deep conversations, bringing both of these approaches to the table.

Nitay (50:31.732)
Oh, well, that makes a lot of sense. And it's interesting to go through kind of those those trade-offs as you're saying and kind of constantly be arguing and then not arguing in a good way, like trying out different approaches and maybe testing them, if you will, almost. I want to slightly change topic, perhaps to add in a little spice and fun, if you will.

I'd love to hear from each of you. One, kind of what's your favorite outage story? It can be your own outage, it be a customer's outage, can be, you don't have to name names or anything, But like being in AI, sorry, I imagine you guys see a lot of fun stuff. And two, actually going all the way back from the beginning, you said something interesting where you guys were living together for three months. What's the most interesting thing you learned about each other in those three months?

Yasmin Dunsky (51:20.564)
Amazing. We have a video proving that I am more decent than Roy is. I will say it's a video game though. It's not real boxing.

Roei (51:20.795)
Ismin is really good at boxing.

Roei (51:37.715)
At some point, Tom brought a VR set and we were boxing against Ugly Joe, was that his name?

This one was much better than me.

Nitay (51:50.14)
amazing.

Yasmin Dunsky (51:53.866)
I'm sorry to tell the world that it wasn't the high bar though. Yeah, no, that's a great question. I'm sure Roy and Tom have much more spicy outages stories. I just feel like my almost maybe one of the recent ones during Roy's birthday. So we were celebrating in Brooklyn and it was very late, maybe 4 a.m. and we were still celebrating and there was an outage. that was...

a fun little experience to handle it in that time of the night. But yeah, again, and Tom probably have stories.

Roei (52:32.129)
My favorite outage story is when I got called in on a Saturday and something was not working. Essentially, saw that in the logs, the system was outputting logs for a while, then it stops outputting logs for a while, and then it resets and starts operating logs again. Because when it crashes,

the watchdog brings it back up and then everything works fine. And pretty quickly I realized that there was a memory leak. So like when the system goes up, it starts memory leaking. When the memory is too like clogged, then it stops working altogether for like 20 minutes until it resets because it crashes and the watchdog brings it up. So...

reason this is my favorite out is because the way I solved it ended up solving it which was not a solution at all basically I I I told that I explained what happened to the incident commander and the incident commander said okay so if it crashes faster this will solve the incident right and then so what I did was I wrote a very small program called watch cat that makes the the the system

crash every 10 minutes and then the watch dog brings it up and then it worked pretty smoothly and like there was very little loss of processing data so that's my story.

Nitay (54:10.93)
You know, before we move on, have to tell you that the thing that makes me particularly laugh about this is as we wrote almost exactly the same thing and my company, we didn't call it WatchCat. We called it, what was it? It was like Thor's hammer or mighty hammer or something like this. And it was because the Watchdog service didn't restart things fast enough. And we wanted something that like really killed Ash9, like really, really, really, like really quickly. It was basically like this like hackiest duct tape thing you could imagine, but it kind of did the job for long enough.

Roei (54:34.441)
Exactly, exactly.

Nitay (54:40.754)
You just reminded me of that.

Tom Tytunovich (54:46.286)
Yeah, I think, I don't know what my favorite Out of Story is. I just, I remember the most traumatic, I guess, which was, I think, all the way back in the army. So I guess a lot of the details actually I probably shouldn't mention, but I will say that, yeah, basically we were, let's say, releasing a version for something and it just absolutely would not work. Production went down. I mean, nothing, nothing was functioning.

as it should. And I thought, well, OK, but the changes here are ultimately pretty small. There was some hardware involved. There were some details. But it seemed like it should be OK. And I think I was telling my commander at the time, we're going to have this worked out in five minutes. This shouldn't be a problem. And so I started like, I mean, we rolled back. We started debugging to understand, OK, what's the difference? What's the difference? And we kept trying for the next three months.

and, at the end of three months, having tried everything to understand like why this doesn't work. And it was really important for this to work because yeah, there was like this new hardware that we had to be able to work with. And it was important for national security and whatnot. And, the end of all that, turned out that there was something like in one standard, one protocol that we were working with. There was one bit that in the standard is marked as a spare bit. But if you send.

A one there that is not an okay spare bit. The spare bit has to be a zero or else the entire thing just collapses irrecoverably. yeah. So, I mean, in the years since I've taken down production systems a lot, not this company, of course, here we're 100 % up all the time. We've never had any issues. But I think that one still sticks out the most, just the sheer absurdity and the level of attention to detail that you had to get into.

to get to that, like literally a single bit.

Nitay (56:47.92)
That's amazing. I that story. Some of the best and most famous outings are all these like hyzen bugs and things that are like, know, very, very detailed, very nitpicky. And the other kind of fun part that I think you're both alluding to, which always amazes me is, you know, one of my own personal stories, was very, very early on in the startup. And I basically had to come in to your point, wait for the whole weekend to work on something. And I like that your story started with my favorite outage was when I had to come in on a Saturday.

because afterwards that Monday my boss was was so apologetic, he's like, I'm so sorry you had to come in on weekend and I'm so apologetic. Like, you know, if you want to take off some time, I was like, that was some of the best time, like it was so fun. And I think some, I think a lot of people don't realize that even things like a serene and on call and so on actually can be a lot of fun and like getting to it and like finding the issue and tracking things down like is incredibly rewarding and satisfying. Okay, so with that, we're probably just about running out of time here. So maybe we close it out by like,

Tell us kind of maybe in a couple of words, what's your vision for the future and what's the biggest thing you would like to see happen from the larger ecosystem, from the market overall.

Yasmin Dunsky (57:59.212)
Yeah, mean, to Tom's point, think right now, AASRE is still people are looking at this category as something that is very reactive. Like there is an issue coming in and how do you automate the investigation? We believe the big like power of AI tooling today is taking a step back and looking at the system at a much higher level, creating like visibility that wasn't available before to how things operate and

you know, taking the AISRE category from a junior engineer perspective to a senior engineer that actually looks holistically at the system, understands where are the gaps, like in the observability coverage, you know, components in your systems that are constantly creating issues, and how can you leverage AI to actually create a self-healing, reliability environment. We believe that with the data that you have from

enriching and responding to every issue and the observability data that is already there, the combination can allow you to now take a step back and actually think much more holistically about the system and create something that heal itself. So that's 100 % the vision. And yeah, we're already seeing steps and signs that it's getting there, which is pretty exciting. We've been here for three years. So I think we are definitely seeing different phases. Like I at the beginning, it was

We were very early. We were very early to this stage where it was, to this space where it was still unclear at all that AI would be able to solve this issue to a place where everyone thinks it's super trivial that it will, to then again being disillusioned and seeing the cycle in different waves of this category. it's honestly really just interesting for us to go through it and learn so much from it.

Tom Tytunovich (59:53.528)
Yeah, tying into that, I hope that with this sort of like approach and with this maturity of the space, we'll be able to get to the point that, alluding to what I mentioned at the start, dealing with production issues doesn't have to continue being seen as like this last bit of software engineering that is, you know, an art. Hopefully with these kinds of tools and with these kinds of approaches, it can be properly systematized and it can be properly like...

structure and become a real discipline that in every organization, they understand it needs to be taken with the same level of effort, intentionality, and tooling as writing the code, testing it, deploying it, and everything else, which should also save engineers a lot of heartache and having to pull out hairs at 3 AM to randomly try to dig around and figure out why things are

Nitay (01:00:48.121)
Fantastic. That's a great way to sum it up and making SRE go from an art to a science, I think, will be an absolutely beautiful thing. And I think we can certainly all see that with this huge wave of the amount of code everybody's generating, SREs only become 10x more important. And so I think the need for such great tooling and help on this is going to be very, very well experienced by the market. Cool. So with that.

It's me, Noi, Tom. Thank you very much. It's been absolutely insightful and great conversation. Really enjoyed it. And I wish you all the best and thank you for joining us.

View episode details

Listen to Tech on the Rocks using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

From Art to Science: Wild Moose and the Future of AI-Powered Debugging

Subscribe