In this episode, we sit down with the full founding team of Wild Moose — CEO Yasmin Dunsky, CTO Roei, and VP R&D Tom Tytunovich — to explore how they’re transforming production debugging from an art into a science using AI.
The trio shares their unconventional founding story — from meeting across three different cities to living together for three months in a California Airbnb to stress-test both their idea and their relationship. They discuss how they identified production debugging as a massive unsolved problem before ChatGPT even launched, recognizing that while code generation is fundamentally a text problem, debugging is a search problem that demands a completely different approach.
We dive deep into Wild Moose’s “microagents” architecture — fast, highly optimized AI agents that replicate the muscle memory of senior engineers to automatically investigate production incidents in under a minute. The team explains why accuracy trumps everything in their space (wrong answers are worse than no answers when you’re debugging at 3 AM), how they navigate the speed-cost-quality triangle, and why they built a test-driven approach to validate agents against past incidents.
We also get into the multi-agent vs. single-agent debate, handling multimodal observability data (logs, metrics, traces, dashboards, code), and how the rapidly evolving LLM landscape creates both opportunities and challenges for production AI systems. Plus, the team shares their favorite outage war stories — including a “WatchCat” hack and a three-month hunt for a single rogue bit.
Topics covered:
- The Wild Moose origin story and the California Airbnb experiment
- Why production debugging is a search problem, not a text generation problem
- Microagents: fast, specialized AI agents for incident investigation
- Building institutional knowledge into AI — capturing engineering muscle memory
- The speed-cost-quality triangle in real-time AI systems
- Multi-agent vs. single-agent architectures: when to use what
- Handling multimodal observability data with LLMs
- The future of AI SRE and self-healing production environments
- Favorite outage war stories from the trenches
Chapters00:00 Introduction to the Wild Moose Team
04:12 The Spark Behind Wild Moose
08:41 Understanding the Debugging Landscape
12:45 The Role of AI in Debugging
17:31 Building Investigative Agents
21:55 Optimizing Workflows and Feedback Loops
29:12 Navigating Complexity in Software Systems
33:42 Adapting to Rapid Changes in AI Technology
40:02 Microagents: The Future of AI Architecture
44:46 Outage Stories: Lessons from the Trenches
50:49 Vision for the Future of AI in Production