Episode 22

Navigating the Future of AI and Data Infrastructure with Bauplan

September 8, 2025 · 58:45

Nitay Joffe (00:03.804)
Shira and Yocopo of Bauplan. It's great to have you guys on with us today. Thanks for joining. Why don't we start with giving us kind of some of your background and what led you to creating Bauplan. Yocopo, you want to start?

Jacopo (00:17.646)
Yes. So thanks so much guys for having me. 10 out of 10 for Jakob pronunciation, which is very good. I'm not going to ask you my last name though, because we didn't rehearse that. That would be very, very hard. So I'm the CTO and founder of Baplan. I was previously the CTO and founder of Tuso. We're going to talk about that with the same CEO, Ciro here and Mattia.

So the same company, you know, doing the same thing, doing this, doing the adventure twice. And as you say, you know, fool me once, you know, shame on you, but fool me twice, you know, shame on me. So I guess it's all my fault. I have doubled up and down the stack for the last 10 years of my career, like up and down the stack, mean, the entire stack from the Linux kernel to agents and literally everything in between.

I have the open source and open size contributor, you know, thousands of stars in open source, millions of downloads, 50 papers on top conferences. And I teach machine learning at NYU with my friend Ethan, he's a course a year and he's mostly notable because the only job that my parents actually understand. So that's actually good. That's it. So super excited to be here and talk about data and AI and all those buzzwords with all of you.

Ciro Greco (01:36.13)
All right, so yeah, a lot of what he just said, because like we've been together now for a while, we built too. So there was a information retrieval company was applying NLP in those days when NLP was still hard. And then we did a lot of recommendation systems, a lot of machine learning. I also ended up like being like a little bit of a part-time professor at Columbia.

AI engineering. There's a lot of work that we did in building essentially data stacks to support those use cases. And that's pretty much like how we moved from, I think, like our first experience that was like mostly around AI and ML in the application layer where like what we're building was something that people would use at the end and my user, my client might not be a technical person at all.

into more like going deep into the stack and the data preparation as that became like increasingly hard. The more the use cases was complicated and also I think like the models like became like much better and then like that never began. The data part never really became easy and that's how we ended up like working in data infrastructure.

Nitay Joffe (02:56.688)
So tell us a bit, maybe we're going back to your previous company together. As you said, clearly you guys have worked together for a while now. And we're working on NLP back when NLP was hard. What do you mean by that?

Ciro Greco (03:07.618)
Where I couldn't just like call up an AI mostly. That's what it is.

Nitay Joffe (03:11.228)
You

Jacopo (03:12.718)
Way before that, was before OpenAid was before you could call OpenAid, it was before you could import Dagger and Face. And it was actually even before you could use Keras to do it. It was like a conditional random fields. Those was the day. There was no MLOps. None of this was a thing. have to somehow figure out all of this by yourself. And your models were actually incredibly hard to build and they were not working very well. Since the opposite now, they're super easy to build and they cannot work like to the box.

Ciro Greco (03:20.95)
Hahaha

Jacopo (03:42.542)
was fun days.

Ciro Greco (03:42.922)
Yeah, exactly. There was like a lot of our work was, I think like it was very interesting work in a sense that to have a model. So our job basically was like, hey, you have a search engine and a search engine is somewhat dumb and your search engine is a search engine that works on your website. So our clients need an idea, which is like large retailers, for instance. Right. So it's kind of like on-site search.

And, you know, like, you know, search engine is not particularly smart. So we're going to bring the NLP and, you know, natural language processing and understanding capabilities to the search engine. So a lot of our work was actually kind of like engineer an end to end software application that could like gather the data that we needed. Put the data in the right shape and the right modeling, and then finally train a model and do like all the, you know, hyper-privatured optimization and, you know, like the evaluation and so on. And then.

served the model and engineer kind of like a feedback loop all in all. So the model was somewhat important, but at that moment in history, the model just didn't work. It wasn't like now. So like 90 % of your job was actually figuring out how to data engineer your way to a model to work.

Nitay Joffe (05:03.068)
And you said some interesting things there around because you alluded to the data problem hasn't been fixed, but the AI has gotten substantially better. so is it that the data problem was never fixed, but it was just that the ML side and the NLP side was so much harder that it almost didn't matter, and it's just the same problem now? Or is it that the balance somehow shifted with all the latest technologies? What did you guys find?

Ciro Greco (05:11.017)
yeah.

Ciro Greco (05:29.082)
I found that the balance shifted like pretty hardcore as in like, think like a good thing, like good example to explain that is what I see pretty consistently now when I talk to my prospect clients and so on, is that the cycle, you know, before it was we're going to build a data platform or like do something around data. Then we're going to do analytics and then maybe, and I say maybe we're going to do natural language processing and ML and no defense.

But the application is the last thing that you do. And today, just because like you have this inferential capabilities that are insane and are just like an API away is completely flipped. Like people start with the product. And then when the product actually works and has users and it's out there and you use it, now they have to figure out the data. But they start with the model. They don't start with the data. So the first person who is going to do an AI thing is actually a software engineer is not going to be.

your data platform guy, which is completely different to what it used to be. Like the first person you're going to hire is somebody who understands Spark. And then way, way, way, way down the line, you're going to have somebody that can do notebooks probably. And now it's completely flipped.

Nitay Joffe (06:42.684)
So it makes sense then that you go from, because you mentioned that your first company, you guys were kind of targeting the business users and trying to make things easier. And now kind of it's flipped given what you said, just said, it kind of makes sense that you target that new user. I'm curious. So what was kind of the learnings from the previous company in terms of even just founding a company and what it takes to like, you know, find product market fit and all those kinds of things that you took into this next venture.

Ciro Greco (07:10.21)
You want to start Apple? Like I have a lot of, I have a lot of learnings here.

Jacopo (07:14.11)
I mean, I know you went to a few companies yourself. you know, I don't know. Like, I'm sure you also have like some stories because I think we have some common friends. It's a common Italian friend in San Francisco from your previous adventure. There's a few things like, know, one thing that I think is keeps constant, but I think it's it's worth saying, you know, as how you say, cheesy as it is, is that at the end of the day, especially as more companies made by people.

And I think we're very proud of the people that we brought along from our previous company. So not just the founder, but also the first engineer who's been with us for long time. And we're very proud of the people building both companies. think a lot of our, whatever success we got and whatever we're having is definitely because of like, know, we like to work with each other and we sort of prize this kind of like, know, Cambradars and kind of like, you know, this team of ninjas.

against incumbents that are 10,000 bigger than you are, literally in some sense. So this has always been fun. Bad things that we shouldn't do anymore, and hopefully we're not going to do it. So the first company had many good things for April, like also some basic mistake. One, there was markets. It was a very tough market. The search market at the time, information retrieval was slightly less hot than now. And so it was hard to make a lot.

Ciro Greco (08:35.968)
It was small.

Jacopo (08:39.894)
It was hard to grow a lot with not so much investment. At those times we were just new in Silicon Valley, so was much harder for us to raise the type of round that we can raise now because of very different awareness, access and all of that. So the input-output balance was not super easy to do. And I think related to that was like, we prioritized a lot of early revenues in our first company because we thought it was like the easiest way for us to get the funding and then start to start the flywheel. But when you're a complex product, sometimes it's good to have

an initial phase in which you, I mean, I wouldn't say chase revenue, but like you kind of build the platform that you think should be building. And then you start adapting afterwards. I think in the first company we adapted a bit too early, which ended up to make things harder, you know, as you try to scale. And so now we're trying to balance this a bit differently, but you know, God knows if we're able to do this or, if we're just, if we're just shifting the mistakes, you know, one year, one year down the line.

Nitay Joffe (09:34.652)
And you said that search was not a great business back then. What makes you say that? Because I would imagine, I mean, I don't recall what the exact years we're talking about here, I imagine there was, you know, obviously you had Google on the consumer side, but on the enterprise side, had like Algoia, Elastic, and folks like that. What did you guys find in terms of the search market and how did it change now with AI?

Ciro Greco (09:40.77)
Hmm.

Ciro Greco (09:53.386)
Yeah, year is 2018. it's like, it's not, I mean, like technologically speaking, it's like a year and a half, but like, but it's not that long ago. But like, I think like the problem that we found there was that nominally, the market was insanely large, right? Because you do have things like Amazon and you do have Walmart and you have like very large players that

process like insane volumes and there's a remarkable amount of revenues attached to those search engines. But then, yeah.

Jacopo (10:28.952)
Sorry, Ciro, I think because nobody knows this, but like Tuso was doing NLP and search just for e-commerce. So it was a very, like, you if you, mentioned elastic, which is a very horizontal infrastructure company that actually makes, yeah, it makes most of the money for blogs. But our competitor will be much more narrow solution in that case, like Broomleach or Algoia wasn't even that much in e-commerce at that time.

Ciro Greco (10:38.08)
Yeah, I think I mentioned that before.

Ciro Greco (10:49.836)
Yeah, but at a certain point, pretty much everybody's drawn to that vertical because it's one of the places where search is actually a place where you make money out of the search engine. pretty much everybody sooner or later is going to end up trying to sell to retailers. And the truth of the matter is that the very large guys will build. The very small guys have way more basic problems than they have to figure out. They're trying to attract traffic to their website. They're not really trying to optimize the

versions on the website just yet. And so you're kind of like stuck with this mid-sized market that varies a lot. Like some of the players are technologically advanced, some of them are not because the DNA of the company is just like an old school retailer and so on. And so the market then like, again, like nominally is very large, but then, you know, practically is not insanely big. And your company gets pushed to become more horizontal.

So you can't really focus only on this thing. You tend to become like something that does more and more things for your personas. You're going to do the merchandising tools and then you're going to do, you know, like advertisement and then you're going to do recommenders. So it becomes like, you become an e-commerce platform. Elastic is different because elastic is really is infrastructure. But you know, like there is no such a thing as a search engine elastic that works out of the box.

It's a very different animal. Like our product was just like, hey, this is an API. You call the API. We guarantee you that you never have to think about search ever

Kostas (12:21.723)
Yeah, I think you're bringing a really good point here because I think even if we focus on Elastic specifically, and of course they started with search and with Lucene and blah, blah, like all these things, but at end of the day, at some point they had to go into the observability space. So they had also, in a way, pivot, right? Because...

Ciro Greco (12:33.024)
Right.

Kostas (12:48.431)
apparently the market that you're describing is not as big to justify a public company.

Ciro Greco (12:55.938)
By the way, not super popular opinion, but it doesn't matter how many BC has called me to do due diligence. I don't think that rag fundamentally changed that. Like the market is still kind of small. So I don't have like very strong reason to believe that they, I really change the economics of the market. It certainly made like much easier and to build and the final experience for the user very different.

Kostas (13:10.011)
Yeah.

Kostas (13:15.783)
Yeah.

Ciro Greco (13:24.82)
And one thing that I believe is truly, that is truly different is back in the days of a search problem that was fundamentally impossible to solve was enterprise search. Because in enterprise search, you have like a variety of stuff that you have to search on. It's super fragmented and you have a very, very little behavioral data from the users that you can use to do ranking and do relevance. And e-commerce is flipped, right? There's the other way around. Like you don't have like a million documents you have to search on unless you do something like Walmart.

Kostas (13:34.449)
Mm-hmm.

Ciro Greco (13:54.412)
but you're going to have like a lot of behavioral data that you can use to orient your efforts. so enterprise search was historically just like bad. And you had like a million different players trying to do that and it was always bad. And I think that actually changed because now the experience of the final user is much more pleasant because I don't like, I can actually now converse with the, like the NLP part is offloaded to the LLM.

And so the final experience for the users is very different. think companies like Glean show that that market, which is not super small actually, like the enterprise search market I think is reasonably large, became available all of a sudden because of the LLF.

Kostas (14:37.337)
Yeah. Do you think, I mean, okay, these were like kind of existing markets, probably some were served well enough, some were other served as like the enterprise search. By the way, I think I remember, Nitai, Google had an appliance at some point for enterprise search, right?

Nitay Joffe (14:55.686)
They did search search search appliance. did. Like a physical box you literally would take to your data center and say, here's my search box. Yeah. And there was a few other companies. They weren't the only one that tried to. So I obviously came from an LP search company. So I saw some of this myself as well. I'm trying to remember the name of the company, but there was a few other companies that tried to build a true like search appliance too, because the idea was that like, it was so complicated to even set these things up.

Jacopo (14:55.8)
Google search appliance. Yeah.

Kostas (14:58.587)
Yeah, what happened with that?

Kostas (15:21.789)
Yeah.

Nitay Joffe (15:25.314)
Even in the last kick, as you just mentioned, Shira, that you had to do so much on top of it that this was like a drop in, just connect the power and the ethernet and whatever, and here you go, boom, we'll scan everything and just magically work. As far as I know, it never really took off.

Kostas (15:41.757)
Yeah.

Ciro Greco (15:42.592)
No, but as I said, like the intuition behind our company, think was ultimately a profound one. was like, we looked at stuff like Algolia at the time and we were just like, I think there's a missed opportunity here in collecting the data. I don't quite know why they weren't collecting the data. It might just be that it was just like too early to understand that search could be treated as an AI or an ML problem, or just like maybe the DNA of the company was just like these guys are.

But like our company was just looking at that. It's like, yo, we could probably plug just like a cookie into this website and gather like an insane amount of behavioral data and solve the search problem as an AI problem rather than like this is going to be better elastic. That was like, I think like was the fundamental intuition of the company, which I stand by.

Kostas (16:39.739)
Yeah. So you say something interesting. would like to ask you about that. you say people, especially like engineers, I think when we're talking about search, the first thing that comes in our mind is like an information retrieval problem, right? And you have like a query, you have a bunch of data out there. Obviously it's not structured data. So that's like what complicated thing. So you need somehow to structure this data in a way that you can match.

stuff. But you said about the behavioral data. And you mentioned a couple of times when we're talking about that, the behavioral data is like, or the lack of behavioral data is what made enterprise search being broken. And what makes, let's say, e-commerce search easier because you have a ton of data, behavioral data there. Can you help us make the connection there? Why at the end, we need behavioral data to do

to solve information retrieval problems.

Jacopo (17:41.516)
I mean, there are a few technical reasons. One is most people on a shoe website, they're going to go on a search bar. You know what they're going to search? What is the most common search query on a shoes website? Shoes. Which doesn't really narrow it super down. So there's an entire search of queries that are very, very broad. And any good result that will come from the second phase of Informaturity, which is called re-ranking. So search for people that don't know.

simplifying a bit, but there are two main aspects of search. One is query understanding, like understanding what the user says as a linguistic problem or vector problem, whatever. And the second is going to be the ranking. So given that I have a bunch of products that they all match what the user want, which are the one that Nitai or Costas or Chiro are most likely to buy because of their previous preferences. So for the second part, it's obvious to understand why behavioral data is important. Nitai likes Nike, Chiro likes Adidas, Costas like, I don't know, whatever, like Armani.

more fancy than all of us. And so, you know, we, by knowing this behavioral data, can, we can kind of like, you know, tune that. And there's a huge part of search that basically goes into the re-ranking science and the quicker, more responsive you are, the most incredible things you can do. By the end of our career information retrieval, we were able to personalize at the very moment of people clicking on a website.

their entire, what we call the GPS of product. Like we will have a vector spaces of user and we updated their position in real time based on what they were clicking. Meaning if you search for golf stuff, if you browse golf stuff and then you search for gloves, you're gonna get golf gloves. If Nitai is doing searching for boxing stuff and then he searched for gloves, the same exact query, same exact understanding will actually move into, you know, to box stuff. And the only thing you can, you know, the only way you should achieve that

is by giving this virtuous loop of injecting behavioral data, changing ranking, feed the user some result, and then get it back. And you can imagine, in enterprise search, when you search once a week, how does my 401k work? The amount of feedback and the amount of precision you can get is completely different than in a website like an e-commerce when people browse all the freaking time.

Kostas (20:00.263)
Yeah.

Ciro Greco (20:00.514)
It's a data problem. Ultimately it's a data problem, which is that that's how we ended up like doing like data infrastructure, right? Cause like to have like a real like effective application out there, you ultimately have to solve a lot of data problems even before you start asking like the more kind of like information retrieval specific questions. And like one thing that remains, that remains true is that your data management will be like the most important part.

I think like one thing that also was very, it was very important for us is that in doing NLP or ML, that's one part of the, it's one component of your end to end search and recognition application, right? For us, it was particularly important because that's essentially we're kind of like going around and just like, this is what makes us special. However, I do remember those years as years where most folks

Most people I knew with the same kind of like education or professional interests as myself were doing data science, which was a very different animal at the time. It was basically just like a lot of like complicated analysis on a notebook. there wasn't really like, was often there wasn't like a real notion of what production means. You know what I'm saying? Like it's not really part of a software application.

There was this data team and the data team does data stuff and the data stuff is done on data tools. And it's kind of like a separated thing compared to everything else. And we never really had that because like our journey through ML and data was always just like, this is a search engine and there's going to be somebody who buys or not. But like this needs to work. If the search engine is down, I am going to get sued by my client, like hardcore because it means that basically the entire website is down.

They literally lose money. And I think that is the biggest shift that got us really excited also in starting Gowblon is this. It's the fact that you go like, okay, I think things change radically because of AI. More and more of these pipelines are now part of applications, which also means that now there's a different population of developers that need to be able to touch this stuff. And a lot of times it's just software.

Ciro Greco (22:22.274)
All of them is just like, a mix and match of like data scientists, data engineers, and software people. And so you kind of have to rethink the whole tooling stage. Like we need something here that is a common language, common obstructions. Cause my problem back in the days was I had a bunch of tools and only the data people understand and everybody else doesn't understand. And a lot of my enterprise processes internally, we're about translating these two languages, which made my life as the VP of AI hard because I essentially had.

a long cycle to bring things to production. Because the fact that the data people did anything didn't mean by any means that this goes to production as it is. Because it was done on notebook, it was done on different tools.

Kostas (23:05.981)
Yeah, yeah, 100%. I think you're touching. It's like something that I've been... I give an example, I try to communicate to people who are not coming from the data world. It's like the equivalent of, let's say, building a SaaS application or a web application, right? And have your front-end developer go and build whatever they build. And then you hand it to the backend developer, and the backend developer has to rewrite this thing in order to put it in production.

Ciro Greco (23:36.514)
100 %

Kostas (23:36.677)
This is what that's what happens like in data, right? Like you have the data scientists, have like the ML folks, have the analysts, have whoever's like the downstream, let's say, person that works on the data and creates an app at the end of the day. You do have to go back to whatever team you have for data and give them the assets like a notebook. They will hate themselves because

You know, like words, in my opinion, matter a lot. There's a reason that we have data engineers and data scientists, science and engineering, like two completely different like disciplines, right? Like, and this is reflected also in the tooling notebook is like an environment that's like great for experimentation. It's yeah, exactly. But putting things into production, it's not a science experiment. It's all about like.

Ciro Greco (24:25.462)
The focus is different, exactly, because the focus is different, right? Yeah.

Kostas (24:35.759)
reliability is all about predictability, like things that whoever is like, experimenting doesn't have and shouldn't have like to care about, right? So it's a pretty hard problem, actually, in my opinion, to solve and like bridge these two worlds, but they have to be bridged, right? Because otherwise, yeah.

Ciro Greco (24:51.052)
Yeah, I agree. Also because like the experimentation, like I think like one important piece is that, I guess like, because like people talk a lot about like experiments, right? So like the experiment gets like a little bit of a bedwet, like it seems like you're playing around, but it's not the case. Like the most business wise, the most important part is actually the experiment. the experiment is the part that is about the business logic, is about

finding an application that solves, you know, finding a model that solves a business problem. The data scientist is usually more is closer to the business problem than the engineer. But to your point, well then stuff needs to run in production. Like we're not playing, right? And so there's a disconnect between, I think like my personal preference here would be that the person who is closer to the business problem

has simpler ways to go to production autonomously rather than the other way around. Because ultimately, the production aspect of it, the infrastructure, all that is very important. But it's always a means to an end. While the model is actually like, the recommender system needs to work because the recommender system sells our products for us. That is the part that is important. That is the part that's going to impact our top line. So my personal preference is enable

Kostas (26:19.067)
Yeah.

Ciro Greco (26:21.858)
the person who writes the business logic to be faster or move more securely and so on. And I think I know that we share the same problems with Spark. There's also the fact that just for historical reasons, the data infrastructure is kind of insanely hard. And that's because in the very beginning, it was really focused. It was developed by companies that had very complicated problems.

And then they open sourced the stuff and everybody else was just using the same tools and they were using the same tools that, know, like, Google uses that. It's like, yeah, man, we're like, you're not Google. So now you kind of have like the Google, you know, like you have the Google infrastructure for something that actually is like much, much, much easier if you wanted to. And now you have an entire team that needs to learn how to use this and so on. Like it was also this historical, I think, distortion towards big data. In a certain moment, it made like

Kostas (27:06.865)
Yeah.

Ciro Greco (27:17.558)
this whole thing like super hard and super exoteric.

Nitay Joffe (27:21.807)
And.

Kostas (27:22.077)
Yeah, 100%. No, I totally agree with that. I think it's a very interesting problem to solve. I would say look at it in the sense that, OK, of course, whoever figures it out was going to make a lot of money. But the money is also a result of value delivered. is. The way that data teams are like,

designed right now, it's kind of like a huge single point of failure. Like you have these centralized data platform, data engineering teams, they tend to be very small and everything has like to go through them, right? They have to deal with operations, which is important because that's what they should be doing. Like they have to make sure that these things like run and they keep running. But at the same time, they have also like to bring things into production.

And as a result of that, yeah, you can see some crazy monstrosity. And the problem with data, by the way, something that people don't... There's a big difference between building infrastructure for data and tooling for data and for software. Data has obviously bugs, let's say, or problems that you can detect with your compilers or your interpreters or your query engines or whatever.

But that's like the tip of the iceberg. Like the biggest problems with data is that bugs usually are like silent. They, you might have like a problem in your logic. And that's why what you said here is like, think like really important because to debug this kind of problems, like you need the downstream person. need the person who's like closer to the actual value and the context of the problem.

Ciro Greco (29:07.584)
Yeah, because the person who knows why we're like, why we need this data and not that other piece of data, why we're doing this plot like this pipeline, not yet. Like, yeah, the person who understands like why we're doing what we're doing rather than like, this is how we're going to run it like reliably. Yeah, I agree with that.

Kostas (29:23.227)
Yeah. Yeah. Okay. So, let's talk a little bit more about, what you're building. and I really find super fascinating the fact that you are people who are coming, as you said, from the data science or like the ML world, and you decided to go into like infrastructure and data infrastructure. and I'd love, first of all,

I'd love to hear from you guys, like how was the experience, like what is different, right? It's like building, going from like building recommenders or like search engines to building infrastructure that is kind of like much more like horizontal about like processing data, doing that like at scale and like all these things. So tell us a little bit about that experience.

Jacopo (30:17.722)
I think it's good things and bad things. So the good thing is that you are the first users. you know, your taste is, you know, like the first company was selling a technical product. It's very hard to build an NLP again as we went through those days, but the person that buy it didn't know much about NLP or anything. They're like, you know, people running an e-commerce with, you know, often, you know, coming from a very traditional business background. Now the people using our product, people like us.

Maybe like us five years ago in many cases. Sometimes they're interns like us, let's say 10 years ago, 12 years ago, whatever. And so, you you sort of know already before asking people what is a good way to do this problem? You know, how should I design a system that makes the good things easy to do? You know, the right thing should be the first thing that you could do in the system because I know what is the right thing is because, you know, I years of experience at literally all possible, you know, scale from garage to IPO in this. So that's been good. Like, you know,

I go in a customer call and I know how to solve their problem. I literally know how to solve their problem. I can literally solve them for them as fast or sometimes even faster than they can do. So that's great. And it gives you a lot of like.

gives you sometimes it's a bit biased because of course, you the way in I do data may not be what everybody thinks. But obviously, you always come across as a very expert person, and you can be very helpful to your initial customers. And you're very receptive to feedback because you know exactly how to interpret that feedback because you know how this work. So that's good part. The bad part is that building infrastructure is kind of a drag. It's kind of a drag, like it's really a drag. You know what they say?

Ciro Greco (31:50.07)
Yeah, it is.

Jacopo (31:51.404)
You know what they say? We don't do this because it's easy. We do this because we thought it was easy. That's exactly building infrastructure. So we are used to a work like a recommender system. I have an idea. want to test it out. Again, I e-commerce. I run e-commerce with millions of my customers and millions of users. I'm going to go on my laptop. I'm going to ship it to production. I'm going to divert 1 % of the traffic to my new recommender system. If it doesn't work,

Ciro Greco (31:56.31)
Yes.

Ciro Greco (32:13.506)
Hmm

Jacopo (32:16.654)
I'm going to have it back. Honestly, nobody's even noticed. My client won't notice. It's a very fun thing. In infrastructure, I have an idea and then it takes just a week of testing all the possible edge cases, how that interfaces with the four different languages that we use across literally the entire stack. From a good idea to this, I'm comfortable on sending this to my customer, is a much longer process. So the feedback loop is much more than what I'm used to. And the stakes are higher because

When I have an API, my surface of attack is a recommender system. Recommender system are complex, but nearly you can just send me a product and I didn't need to send you back some products. Combinatorial is a lot, but it's a very finite amount. When you sell infrastructure and people can write whatever function they want in BOPLAN, the number of things they can do is technically, know, countably infinite. Like there's actually an infinity of things they can do. So it's very hard for me, unless I test really well, to make sure you don't break it at the first like week.

So it's been a very different, in that sense, it's been a very different feeling in that. Because even if I sell something that is, I ship something that is half-assed, the customer will break it in the first hour. Like it's kind of a thematic. And so it's a bit of thing, which is why, I guess, all of you know why it's so expensive to build infrastructure company compared to application level company. I know everybody in this chat is very familiar with the concept. But because also it's horizontal, so hopefully the thought is, as you all know, that the market is much bigger if you break through.

Ciro Greco (33:16.95)
Yeah.

Ciro Greco (33:32.577)
Yeah

Jacopo (33:44.408)
to escape velocity. I don't if that makes sense.

Ciro Greco (33:46.376)
It changes the economics of the company. think like it changes like the way you should like budget for your funding and your go to market. Like it takes a longer time to go to market and your, your, your feedback, like your there's no such a thing as a lean startup playbook for infrastructure. think like if you do SaaS, you can somewhat like hide behind your end point for awhile. And then like whatever you return, like it can be, you know, great technology. It could be just like you doing math.

stuff. The client doesn't care.

Kostas (34:16.033)
Yeah, guess, yeah, you can't really fake it until you make it with infrastructure.

Ciro Greco (34:21.11)
Yeah, that's very much. So, which means like, which is an interesting thing because like your company that it becomes like something like from a strategic point of view, like the way you develop your business, it's, yeah, it's like somewhere in between, you know, a truly SaaS company and robotics, right? It's not as hard as doing physical stuff, but it's not as easy as doing a CRM in the cloud. Like, it's just like, it's very much like somewhere in between.

So it's less explosive when you begin, but then the market is insanely large. And the other thing that I think is very interesting is, damn, it's sticky. Like when people start using an infrastructure product, if, even for some products are kind of like designed and on purpose to be sticky in a malicious way. Like it's just like, they're going to lock you in. But even products, they're like fairly, they're designed.

to be open. And I think like this is a very good thing about the last wave of data infrastructure is that we're the whole industry is buying into concepts like, you know, open formats, open standards like iceberg or arrow and so on. So you have in theory more interoperability. But the truth is that it becomes it's so mission critical that if you run something on some piece of infrastructure, even if that

infrastructure is open, it's going to be hard to get out. It takes a while. You have to be careful about switching infrastructure providers, which makes your go-to-market harder to a certain extent.

Nitay Joffe (36:03.868)
So I want to put together two things that you guys said that I find really, really interesting. The first is, if I heard you right, and it's something that I've heard many folks talk about, if you think in business, there's this constant pendulum of basically decentralization and centralization, and consolidation versus explosion of tooling and so on. And typically, and I think this most recent AI wave is very much an example of, you get some new amazing technology.

Ciro Greco (36:18.444)
Mm-hmm.

Nitay Joffe (36:32.632)
Immediately everybody within the business, and I mean like SMB++, right? So like enterprise certainly and so on. I'm not talking like the hot Silicon Valley tech startups. I'm talking the kind of what we think of as like, you know, the businesses that buy software. Immediately they look at the CTO, maybe the CIO, right? To say like, hey, what are we doing with this? Right? There's this AI stuff. Like, what are we doing? Like make me AI first, make me the intelligent company or whatever. And then inevitably you have this phase two.

where either the CTO doesn't fully deliver or they deliver but they don't meet all the needs. Like they meet some of the needs and the CEO may be happy, but some marketing person over here is like, well, they didn't really help me. And some customer success person is like, well, but I don't have an agent doing my stuff. I still have an army of people, whatever it is, right? And so then you have this like shift towards the under end of the pendulum where every business unit starts saying basically like, you know what? It's great. We respect what you're doing, but we're going to do our own thing over here.

Ciro Greco (37:13.911)
Yeah.

Nitay Joffe (37:31.91)
Right. And you kind of have the same thing you alluded to it with like data scientists, the engineers and so on, where like the power is shifting towards the data scientists and the practitioners being like, I'm just going to put in this model myself. I'm going to figure it out. Like, I'm not going to wait, you know, months and whatever, until all the data infrastructure looks like this beautiful, you know, thing.

Ciro Greco (37:49.203)
Like, in my experience, that's a recipe for disaster, but yes, go on.

Nitay Joffe (37:53.936)
So that's the part that I was gonna get to exactly. That's exactly the question I was gonna come to is that there's often times that also fails in different ways. And then you get to what you kind of just pointed out a minute ago, which is like this phase three of like, okay, we wanna empower everybody, but let's all do the same stuff, right? Let's all use iceberg. Let's all use aero. Let's all be friends and have standards and do things in nice ways. So certainly with the AI wave,

Ciro Greco (37:56.243)
Yes.

Ciro Greco (38:16.065)
Right, right, right, right.

Nitay Joffe (38:23.484)
it's like sort of like a cycle of evolution, right? It's happening faster. And you guys, talked about kind of talking to the practitioners and the regular day-to-day users. So how do you get them to care about these kinds of data infrastructure stuff enough or to know that this is the right way to do it, right? As opposed to other means, like how do you find from the customer conversations that you guys have that they think about it?

Ciro Greco (38:48.29)
Okay. And then like, then I'm going to also have to say something about it, but like, so I'm going to say this. Um, okay. So there's a, there's, I remember there's a sentence that I read once. It was written by a person I respect a lot. is, uh, Eric Berner, son, the founder of model. Um, and it was kind of like describing a bunch of data infrastructure in the world. And I remember, uh, one of his block was saying just like every company that I know ended up building their own did apply.

which seems wasteful. that really, at the time, that was like few years ago, but it really did resonate with me. It's just like, yeah, every company I know did build their own little platform. none of them really worked super well, by the way. They have a lot of sharp edges. It becomes hard to do things in a standardized way and so on. So what I think is happening today is that the need for a standardization

happens at different levels. And it happens because this thing used to be fairly niche and now becomes like part of data applications that are now touching the business more systematically. And I think there's an enormous opportunity in making this simpler because we need more velocity. And this was not true five years ago. It's a little bit like

An analogy with that would be DevOps. Before the cloud became really ubiquitous, we were OK with changing our cloud deployment every once in a while, and we'd have very special people that could do that. But then the cloud opened up so many opportunities that people needed a certain amount of velocity and needed ways to decentralize.

part of the infrastructure and make it like easier and make it code, which is something that all developers will understand. And I think what is happening in data today is exactly this. It used to be something that could sit in that data science or data infrastructure team that Karstos was referring to, which is like, it's small, it's in the middle. Everybody has to go through them. But this is just not, not sustainable anymore because everybody wants to have essentially data workflows embedded in their

Ciro Greco (41:14.73)
applications because they want AI to do a bunch of things in their applications. And if I have to think about what the landscape is today, which I think is the big opportunity about what we do, is the incumbents are very powerful and very well designed for large enterprises, but they're not very developer friendly. And I think the next like very large market

will not be more data scientists. It will be developers that need to do some data stuff now. Cause there's a million of these, there's so many of them. And I just like struggled to see the AI revolution happening on notebooks. It's just like the people who are now part of this, you know, industrial cycle are not the same people as before. And they do software and they have standards.

in which they do software. They need Git, they need code, they need APIs, they need ways to do this in a standardized way that they understand that we know how to do software at scale.

Nitay Joffe (42:26.779)
And so how do you?

Ciro Greco (42:26.914)
That I think is the thing that resonates the most in the market for us is going to companies that don't think of the data team as a separated.

Nitay Joffe (42:40.828)
So then let me ask you, you wrote up some interesting points there. How do you help these people do data right? Which parts should they be doing versus others? What are the standards that should be coming out in the future? Right, you mentioned APIs, code, Git, et cetera, right? These are kind of well-known standard things. Is there some, is there, they exist in data as well, right? And so, right, right. So that's my question to you is like, are those the things that need to exist on the data side?

Ciro Greco (42:55.964)
But they don't exist in data. They don't.

Nitay Joffe (43:05.564)
Is it other things that are trying to be developed? Which things do you see kind of medium to long term still having to be owned within the company? Right? Like you mentioned something interesting there where every company built their data infrastructure. If you zoom out and kind of look forward, you know, with with BALPLAN and the rest of the market kind of shaping out, what are the things that companies should be doing versus what are the things that the vendor should be doing or a standard should take care of for them? How do you think about that?

Ciro Greco (43:33.217)
Bye,

Jacopo (43:33.806)
I mean, I think so for the, for what people should be doing, they should write their own code, either them or their agents or their mom or whatever. people data like data pipelines are inherently about something about a business context that people from the outside that have entered can not know. There's no way to generally know what's going on in your company. There's no way to know that your field of text underscore three.

is the field that actually matters for this thing. So you should write your own code or have your own NLM or whatever, doing your own business logic to do that. And Baplan does allow you to write your own code with whatever library you want. And we think that's hopefully a good idea. But definitely people like not being forced to think in PySpark or think in another frame or yet another data frame library. God knows we have to learn another data frame library after pandas, pollars.

IBS size, PiSpark, Snowflake, and whatever. So that's, think, what they need to do. What they shouldn't be doing is storage. That's for gigantic players that are doing it very well. Storage may be a very thin layer bucket, maybe becoming a bit better with S3 tables. There's an argument to be made as like where storage becomes a bit so intelligent that it does a bit of that. But it seems that implementation detail and get a similar things if we keep the...

the open format type of things. So I think that's definitely for a cloud provider or a vendor, depending again, where you go into the, IE and up and infrastructure, you should have loaded it. most companies, again, not Pinterest, like the vast majority of companies that will buy software, infrastructure is kind of a drag. Again, that's why people like all of us kind of like, invest a lot of time, money and knowledge into make it proper. And I think for the vast vast majority of use cases,

The truth is people drastically overestimate how special they are and drastically underestimate how a good infrastructure company will actually make all the pain go away. And I think that was true in the MLOps craziness in 2020, 2022. And I think this is still true in data. Like the amount of people that manually spin up Spark clusters to run a pandas function and then spin it down is still...

Jacopo (45:49.29)
in the thousands. Some people get very rich out of this, of course, know, somebody's benefited from all this, but this is all nonsense and we know it. This is all nonsense and we know it. And I think going back to Ciro, don't think it's going to be, I would be very surprised in five years from now, this is still the way in which we do, you know, many of the pipeline jobs that we doing today. And so by that.

Ciro Greco (46:09.676)
Yeah.

I think like an example of that is just like, you know, like these days we talk a lot with companies that are like, I'm going to say like mid-size and just like a few hundreds people, maybe a thousand, uh, some enterprises there are like more forward looking, but like that is kind like the size of where we are. like, I was talking to these guys and I had like a little bit of an extra body experience where I'm just like talking with the engineering team and they're okay. Um,

they have to do like a bunch of pipelines and reporting and some machine learning stuff. And they're good. Like they're not, you know, they're decent engineers. They're very cautious. They know what they're doing and describe their stack and the stack makes sense. And the stack is they have data on S3 and Parquet and then they're going to have, you know, Hive tables, a Hive Metastore. And then we're going to do dbt over Trino. And then, you know, Polars and Docker to...

run their Polars functions and then everything is orchestrated with Airflow. And then they have like Jupiter and superset for, you know, for, for, for as an interface with the developers. And then we have Terraform and they were going through this and we were just like, yeah, this is a reasonable stack. And then for a second, I kind of like step up with myself and goes like, there's a lot of stuff here. There's like an insane number of like, ultimately these guys need to do like 50 pipelines.

There's a stupid amount of different technologies stitched together to do something like this. It's just like, and as I said, they were being frugal. Like I've seen way worse than that. Like you've seen much more, like it's not a bad design. It's a good design. And the steel is like seven, eight, nine pieces. Just like, come on, man. It's just like, you know, it just feels like too complicated for what it is.

Kostas (48:04.987)
Yeah. So what's solution? What's the answer to that?

Ciro Greco (48:09.036)
Well, a lot of that, like what we're like, what we try to do, a lot of that is that, so first of all, there's, I think there's part of that become like part of that is your runtime. So let's put it this way, whatever you do, your data platform will essentially need to support like three use cases, three scenarios. One is you're running a query and you're just like looking at the results. So it's interactive, you're exploring the data you're doing, you you're looking into.

The other one is you're training a model or you're building a pipeline, but you're still in development mode. So you're doing something more complicated than a single query. But you're not quite there to the point where you kind of ask yourself what we do on a schedule, what do we do at scale? And then you have the schedule scale scenario where you put things in production. Now today, these three things are three different cases, three different infrastructure. So in one case, you have an SQL engine.

In the second case, you probably have like single node spark. In the third case, you're going to have like cluster. They have different interfaces. One is SQL editor. The other one is notebook. The third one is probably VS code. They have different obstructions, different concerns. So the first thing that we tried to do is can we find a way that for the most pipelines, not all of them, the very large ones is different, but like for everything, there's like in a reasonable scale.

Can this become a uniform and all you have is whatever you do is gonna be a function. And we run that function in the cloud for you. So if you run a query, that's gonna be a function. It goes up, queries the data directly in a stream, gets your data, streams your data back and then it dies. If you're doing pipelines and you're iterating, you just like write functions and you chain them and that's a pipeline. And when you wanna run them, the runtime is gonna run these functions as ephemeral functions one after.

When you want to put them in production on a schedule, you're to have an orchestrator that kicks off functions. So can we standardize the runtime in a way that becomes super simple on the one hand, because everybody knows what a function is. And if you do your job correctly in obstructing away the infra, literally every person that knows enough Python to be dangerous will be able to pick this up in an hour. And the platform, which is my job,

Ciro Greco (50:31.638)
doesn't have to run three completely different infrastructure under the hood. It will just run functions. That's the first part. And as I said, it doesn't work for everything. If you run five petabytes scale pipelines, you probably should think of something else. But for the vast majority of use cases, this is actually pretty good. And then you have the data management part, which is the other part that is super annoying is

people spending a lot of time dealing with stuff like, is this a parquet file? Is this a hive table? Is this a iceberg table? These are all implementation details. You should have a way to expose these tables in a declarative way. Whether this thing is a pandas data frame or an iceberg table, ultimately, who cares? It has rows, it has columns, it's a table. It's whatever it is.

You can declare that in your code. So the solution that we bring is really radical simplicity, which is a concept that we put at the center of our kind of comp-ready culture is the design principle is if a concept is not a concept that you can expect a freshly graduated student from CS to know, you need a very good reason to introduce that in your framework.

And so we built like this, yeah, that's solution that we bring to the table is like, abstract away as much as possible in abstracts that everybody understands.

Kostas (52:04.719)
Yeah. Yeah.

Yeah, that makes total sense. I have a question on that. Like, I can see...

simplicity can be something really nice for someone, as you described, someone who's coming out like a freshly ECS graduate, right? But humans tend to resist to change, right? So you have all these people that they've already been using these stacks of 10, 15 tools. They write on their resume that, I know Flink, I know...

Spark, know Pino, know every crazy thing like to set up and operate. And then you come and you say, you know what, hey, like, sure, that's all great. But there is a world where you don't need all that stuff. But that's also like a little bit existential for these people, right? So how do you and this is like a question, I would say like, more of like a product, because

Ciro Greco (53:06.676)
yeah.

Kostas (53:14.813)
From engineering perspective, it makes a lot of sense. From product kind of makes sense. But when you hit the markets, you need to penetrate it. And now you have people who, you know, they build careers on that stuff.

Ciro Greco (53:28.684)
Yeah, yeah. No, no, I hear you. I hear you. I think like it's very important for us to focus on a segment of the market that is really under pressure to build. That's really the answer. Because like if I go after today, you know, like if you go to any major bank in the US, you're going to have IBM mainframes in production. And that means that there's a guy there that knows how to operate that stuff. And that guy would not...

Kostas (53:39.271)
Mm-hmm.

Ciro Greco (53:55.946)
And that's okay, I guess, but like that's definitely not the right company for me to pitch just now. They will, ultimately. We will go after them when an entire cycle of technology has gone by and they will have to update, right? In fact, all these companies do have generations of technologies always in production, right? They have the IBM ManFrame, then they have the Cloudera cluster, then they have Snowflake, then they have Databricks and so on. And at the end, they will have.

Kostas (54:02.802)
Yeah.

Kostas (54:12.029)
Mm-hmm.

Ciro Greco (54:27.084)
But today I focus on people that are under pressure to build more than anything else. And it works reasonably well because they are for the first time. Like there's a lot of teams out there that like really are pressured by the business to build like a lot of AI. And they don't necessarily know how to satisfy that demand. The other thing that I think helps a lot in this is

I really, really, really, really like the Lake House as an architecture. Because it unbundles the stack in a way that you can build now compositional different things on top of object storage. It makes it easier for me to get in. Because it's true, if I find a guy who built his own life on just like I'm really good at Spark, yeah, not a great fit.

Kostas (55:00.231)
Mm-hmm.

Ciro Greco (55:25.078)
But it's a different story if the stack is open. Maybe there's a guy sitting next to this guy who says, not only I didn't, but I don't want to do Spark. And ultimately, you do your own thing, I do my own thing. We can plug into a more open infrastructure that is built on object storage, which is very different if you try to do that. If data is siloed, my experience is that it's very hard to move data from one place to another. The most powerful locking you can think in data is.

Kostas (55:30.012)
Mm-hmm.

Kostas (55:35.942)
Yeah.

Ciro Greco (55:54.87)
can think of in data is put your data into my warehouse. never going to get

Kostas (56:00.987)
Yeah, yeah, 100%. So from your experience so far, like what's the

Kostas (56:09.765)
What resonates the most to these people out there when you present to them like Bauplan and the simplicity of this system?

Ciro Greco (56:21.834)
I think today, the abstractions around Git for data and the way in which they are exposed as code is definitely the thing that people appreciate the most. Because it's useful in both, it's kind of useful throughout the entire lifecycle of the data and it unlocks a lot of value.

Kostas (56:27.271)
Mm-hmm.

Ciro Greco (56:46.348)
During the development phase is great to have a way for developers to just like branch the data from production and work safely. And because it cuts an entire cycle from your processes. If the data scientists can just like branch the data, do their thing and then merge it, they don't have to do sampling a notebook and then handing off the notebook onto DevOps, which may take like

weeks or months. So it makes the entire cycle much, much faster. And then the same obstructions of Git turn out to be very useful when you run things in production schedule. So do you want your system to be robust? Do you want to import the data into your data lake in a way that makes sure that your data lake is never broken or has faulty data? Well, open a branch, import it in the branch.

run your tests, if that makes sense, merge it. Otherwise, pay your duty, your engineer, and debug. So that abstraction really resonates with both development and running things in schedule.

And I think it's not that surprising because ultimately that's how we use Git for like the traditional Git on code. That's why we use Git. Like it's both a way for us to iterate faster and a way to not screw things up when you

Kostas (58:10.503)
Yeah.

Kostas (58:13.885)
Yeah.

Nitay Joffe (58:14.352)
And I imagine this is something like the guys that Databricks brought, Neon famously did this, right? Where they have kind of every single, you can branch the entire database and then they do clever copy and write basically to do it efficiently. And all the like MVCC where you can do time travel and things like that. Are all those things enabled, do you think, strictly because of the cloud storage and Lakehouse and all that and you couldn't do it before?

Ciro Greco (58:29.623)
Yeah, yeah.

Nitay Joffe (58:42.236)
Is it some innovation that you guys particularly have developed in-house that others don't have? Or is this a growing standard tied to the previous part of the conversation? How do you think about this get-for-data thing?

Jacopo (58:54.53)
There's, mean, there's like necessary condition was the decoupling of storage and computing, know, Neo can do that compared to other Postgres provider that, you know, they're like available because, you know, it doesn't, it doesn't cost anything for them to basically map new FML compute to, you know, data because those two things are completely different. So I think that's at least a necessary condition, but it's not sufficient.

In fact, if you just take Asper's snapshotting and point in time travel, which is some of the capabilities, as you mentioned, will not work in the data pipelines, of like Pythonic Word, because you don't really have the guarantee of like multi-table version and multi-table transactions and kind of like these distributed Python processes may or may not actually give you a consistent view of the data. So there's a bit of things you need to do on top of the...

open source standard that definitely are coming for everybody and definitely building blocks that everybody's building on top to guarantee the same type of semantics that we do. Plus, because users don't care about semantics, users care about the API's usability. Plus, we have to expose all of this, which I think where the company has put a lot of work into a way that resonates with people.

Like at the end of the day, people learn how to use Boplan in literally 15 minutes and a lot of, they will never be able to learn iceberg snapshotting in 15 days. And so there's a lot of translation that takes what is good about this abstraction and put them into APIs you can understand and I know the complexity that you don't want to see. So it's a combination of these two things, improving the formal semantics, but also working on API in a way that people can relate to without a degree in distributed system.

or going to the LDB to listen to our topic. If you at LDB, please come and listen to our talk, but that's not exactly our market.

Nitay Joffe (01:00:41.788)
Yeah, absolutely. I think this is a fantastic abstraction. one that I think has so much applicability, so many fields. I know various folks in the CAD and design space working on visual diffs. So you can do get like style red, green, where I have my part and I see that. So I think this general notion of versioning and being able to have the safety and governance that that gives you and yet the speed and index and kind of.

Ciro Greco (01:00:57.249)
Right.

Nitay Joffe (01:01:09.118)
experimentation that it provides is spot on.

Ciro Greco (01:01:10.71)
Like in this period of things that you start doing when you build a company and then like halfway through you realize that, there's, this is different than what we thought, which happens all the time. There's the fact that, on the one, so one turns out that exposing everything as code, running everything natively on Python and having branches that can isolate.

the perimeter in which a developer can impact data is very, very good for agents. Because that's what people today are worried about, that agents can destroy your production environment, which totally makes sense. And so today we start seeing, and I don't think that's what we had in mind when we started, but right now we do see a unique affordances that Baoblin provides for

agendic workflows, very well designed for agents. Because agents ultimately look a lot like developers. That's, think, a particular message. The other thing is, turns out that most of the Git for data literature out there is pretty much a metaphor. But it's a shallow one. So you read stuff, you look into that, and on a superficial level, it kind of makes sense.

And then when you look into edge cases, not so edge actually, but when you actually look into that and you go like, OK, so what happens if three people are committing at the same time on partially overlapping data artifacts? The answer of everybody is we don't know, which is not a good answer. to this day, I think to our knowledge, we are the only one who are.

pursuing this a little bit further than the metaphor and try to do formal semantics. Cause I think ultimately like this is going to become something that people will use really reliably only if we don't have to try and see. We have a principled way to know given a scenario, what our system allows as an outcome and what is a bug, which is super hard.

Ciro Greco (01:03:29.89)
Don't get me wrong, apparently this problem was a problem with Git as well for a long time. In 2018, there were a bunch of things like the Git that we all love and use produced and people were just like, wait, is this a bug or just what we expected? And everybody was like, we don't know. It was like, oh, do we have a formal semantics of this thing? It's like, actually, we don't. It's like, oh, that's not good. And so they actually had to develop. So late in the game, 2018, that's the first time when people really

dove into this. And I think like, Balplin is very serious about this. And so we're going to do that for data. Because the metaphor is great. But we now have like enterprise customers that use this and do like 40,000 tags, you know, per week. And sometimes now like, we can't just say, just like, hey, try it out. You know, worst case scenario is going to do something we don't expect and we learn. It's like, well, it's like, yeah.

Nitay Joffe (01:04:25.336)
Yeah, you can make it provably correct. Very cool. Very cool. OK, well, coming up on time here, there's certainly a whole lot more we could talk about. But perhaps one last question to close this out. Since you talked a bit about agents, I want to actually kind of tie it back to something you said early on. You were talking about search. And one of the things you made me think of is this kind of trade-off between latency and quality. And you talked about latency a lot before.

I find that in this day of LLMs, that that's one of the number one problems people run into. Once they have a fully kind of system that's really up and running and they have their eval and they have whatever, all these kinds of things, even the very simple decision of like, can run Lama 7DB, I can run Lama 7B. That's a 10x speed up, basically, right? Like not exactly necessarily, but essentially. And it also might be a roughly 10x quality, right? How do you guys think about that?

And how do you find your customers thinking about it, this kind latency quality trade-off in the agents and things that people are using you guys for?

Ciro Greco (01:05:30.698)
The short answer for us right now is that we don't see a lot of that because like pipeline is not really used for real time use cases. What you typically do with our platform is you pipe, create pipelines, you manage your data, and then you make your data available for more time sensitive processes like retrieval. So the typical integration that we have is with products like Mongo or pine cone or, you know, things like this.

where you already is where we're, you know, that last mile is not something that we do immediately. On the agent side, I think like, it kind of depends again on what is use cases. Like we're really in the infancy of that. I think agents are not really good at doing work on data today. I'm a great believer of the fact that a lot of ETL and data engineering will be automated by agents.

But if I have to take that out of the box today, that just doesn't work now. And I know Jakob can bitch about this for hours, because he really tried to give one. So I think latency for the use cases that we will solve with the agents is not yet really a concern, consistency, and just not hallucinating and operating things. We're still earlier than that.

Nitay Joffe (01:07:00.124)
That makes sense. Cool. OK. So with that, I think we'll leave it there. We'd certainly love to have you guys back on again and dive a bit much deeper as you continue building out our plan and getting more more use cases and interesting stories to share. So thank you guys both for coming. It was fantastic.

Ciro Greco (01:07:14.774)
Yeah, anytime. It's lot of fun.

Jacopo (01:07:16.707)
Turn.

Ciro Greco (01:07:20.758)
Thank you very much, guys.

Jacopo (01:07:20.866)
Thanks for having us.

Kostas (01:07:20.967)
Thank you guys.

View episode details

Listen to Tech on the Rocks using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

Navigating the Future of AI and Data Infrastructure with Bauplan

Subscribe