Episode 8

Evolving Data Infrastructure for the AI Era: AWS, Meta, and Beyond with Roy Ben-Alta

November 21, 2024 · 01:03:28

Nitay Joffe (00:03.608)
All right, it's great to have you join us today here with Kostas. Very excited to dive in. You want to start maybe by telling us a bit of the background by yourself?

RBA (00:14.39)
Yeah, sure. Hey, hey guys. Good to see you and meet you today. So I have my friend ChiaGPT here. Two weeks ago they released this advanced video voice mode and you know, let me ask him like, hey ChiaGPT, I'm now sitting with my good friends Nitay and Costas. Can you tell them about me? I'm Roy Benalta. So just give them a short bio. Just speak faster. Okay.

Absolutely Roy. Here we go. Hey Aytan and Kostas. I'm here with Roy Benalta He's a seasoned expert in artificial intelligence with over 20 years in the tech industry as the co-founder of oak minor AI He's leading the charge in business process automation using generative AI Before this he was a director at meta AI research and an executive Hey, Chia GPT, you're too official. Can you make it in a

more casual, but speak way more faster.

Got it, Roy. Hey, Nitay and Costas, I'm with Roy Benalta. He's a wizened AI with over 20 years in tech, co-founded Oak Miner AI diving into business process automation. Okay, guys, it's just an example. Like the reason I'm bringing up, like I was delivering the keynotes in Vegas early this week. So I went to stage and I used that live like to introduce myself, but just born and raised in Israel.

I've been in the data and the machine learning space for over 20 years. I came to America in 2004. We lived in Atlanta, Georgia. I did my MBN, University of Georgia. It was interesting time back in the days. We did a student exchange in China. We met Warren Buffett. We have great stories to tell from these days. Then in 2012, I moved to New York. joined...

RBA (02:10.626)
Amazon Web Services, I still have emails from colleagues. told me, you crazy? You're moving to a company that no one will move their data to the cloud. Like that was the thing. And I was fortunate to be one of the first early adopters where we actually did a lot of crazy things back in the days with big data and machine learning on the cloud. And remember these days when you many enterprise companies didn't know what is S3. You know, I've been there like almost

to nine years, then I joined Meta AI research. One of the leadership principal in Amazon was a learner be curious and joining Meta AI research, know, early days were building large language model was great experience working with young Lekunza, our researchers teams, their great product engineering team. And last year I decided to do something that I still sometimes ask.

I'm crazy to do that. We started Oak Miner. We are three co-founders, still in Stealth, and building a product for business professionals to use AI and automate works that is really important for them. And it's been amazing to build it over the last year plus, and trying to take my experience into that. Beside that, I am a dad. I have two kids and a dog. And yeah, that's about me.

Nitay Joffe (03:37.144)
Thank you, that was great. It was interesting to see. ChatGPT got it mostly right. Even after the years of me trying to teach it my name, it still can't get that piece, but everything else, I got it right.

RBA (03:46.328)
Yeah, man, I could use for their training team and their, you know, they crawl nicely LinkedIn or other sources. yeah. It's pretty impressive how, you know, this, technology is evolving, you know, in just two years, I think I see the leap. Like you're probably all remember like Sundar when he came in, think it was 2017 about, you know, calling a restaurant in order. It's still not working by the way. Like, but, but you see now.

Nitay Joffe (04:05.986)
Mm-hmm.

Nitay Joffe (04:14.39)
Yeah, Duplex. We're talking about Google Duplex.

RBA (04:15.704)
Yeah. And by the way, like OpenAI on their dev day, they did a really nice demo where they actually order live strawberry chocolates. It was nice to see the action connecting with Twilio and so yeah, but yeah, thanks.

Nitay Joffe (04:31.084)
Very cool. Let's back up to the beginning a little bit. So you said you had a Warren Buffett story. What's the Warren Buffett story?

RBA (04:37.292)
Wow, we don't know the true story. So I did my MBA. One of the reason I went to study MBA. So I came as, you you're coming as a technology geek to America. You're working in companies. You're doing your day to day writing code, solving problems. I actually blame my husband, my sister-in-law, her husband, he went to NYU. was a professor.

I did this postdoc in Princeton and one day decided to go to NYU and to do his MBA and he told me you must do the MBA you need to understand how corporate America I need to understand finance and It will open the doors and set her up and and I said, okay I will go and purchase like you know, my wife She also did their master degree and I said, okay I will acquire another degree and then you know being and Because English is not my first language. I think it was also good experience to learn like full in full English

And remember, it's Georgia, it's not Ivy League. And I went to this fair of universities and just, and there was Georgia Tech and cetera. then I went to the booth of UGA and they were like really cool. Like they talk about the university in Athens and I'm, I love music. I, REM is one of my favorite bands and they came from Athens. said, okay, I'm going to study there. So I.

It started the NBA. was one of the best experiences I had. You work with people who are not on your field. You know, there was like a commercial airline pilot or a partner in a large Deloitte, sorry, a consulting company. And one of my colleagues was, his name is Jeff Cole and he was a brain surgeon. And he actually was one of the early Warren Buffett fans. he, you know, he told me about the intellectual investor book, et cetera. So I read.

all these books and one day he wrote a letter to to Warren Buffett and asked him hey can we come a group of MBA students to visit you two weeks after we got there you can come so we went a group of 15 students to home under basket to meet Warren Buffett the funny story was that we were invited to their headquarters but the day before they did like a nice you know cocktail at the hotel like you know very and he came as with his car

RBA (06:59.832)
by himself and we all had the chance to talk to him, et cetera. And some people say, hey, can we take a picture with you? And I took this opportunity and asked Warren Buffett, said, can I take a picture of you? But I want to make this picture in a different way. Can I hold your wallet when I take your picture? Because no one would believe me that I met one of the richest person in the world, that people are paying money to meet him on eBay.

And I see the professor on the side looking at my request and like holding his hand and said, he's crazy what he's asking. And Warren Buffett was laughing and we took this picture and I can share with you, Nitae, but then there was a line and other students took the same picture with the same pose. So it was an interesting experience. And I don't know like how much opportunities you have we're doing, but I learned a lot about.

the code of honor in US. Like you cannot tell your professor that he's wrong. Like for example, I learned the hard way. Again, maybe it's part of the cultures of being Israeli. So we're trying to be blunt as much as we can, not because we want to prove that we are right. It's because, you know, we are data-driven and data-informed. But I learned a lot. I hope my English improved because of that, but I'm...

You know, if I think about my imposter syndrome, it's always because English is not my first language. So sometimes you, you, you know, there is a much better phrase, but today thanks to, you know, Gen AI, I think this gap is vanished by the way. So, as long as you can be still real and authentic and creative so you can leverage this tool. So, so that's back to the beginning. That was my MBA experience.

Nitay Joffe (08:43.382)
That's very cool. I wonder how much money he had left in the wallet after everybody took pictures of it.

RBA (08:47.886)
Man, I can tell it was thick, but I think he had some coupons to local places in Oman. No, but it was interesting to hear his perspective about the importance of family and life and how we look at businesses and why he's not investing in tech, for example. And here it's, know, firsthand is always different because you can get the small nuggets there. And especially, I think they...

I found out, and I was an executive in Amazon, when you meet with students, people tend to be much more authentic, right? You want to speak in the same level of them. so you actually try to get a genuine approach of opinion of someone about a topic.

Nitay Joffe (09:30.69)
What's the best or most memorable nugget for you? Either from meeting him or the end of your holiday, like you've taken to your day to day.

RBA (09:38.33)
Wow, nuggets like so one of the VPs that I worked in Amazon, his name is Anura Gupta. He created Shoreline. think that it was acquired by Nvidia. one day, you know, we had, you know, peak and valleys in the days of AWS. And one day he sent an email to the team about the importance about, you know, having a purpose and working on hard problems and

That's something that I took away about being bold, like really stick to the missions. think that's something that I recall from Anurag and he's one of the great leader. Like he's one of these VPs that, you know, he go and write a blog post and he will go and read the developers code and really set high bar as a leader. But that's something I recall. I also had Ken who is not in Amazon anymore, but you know,

you know, making sure that when you're a leader or you're building a company, like it's not like a pyramid that you're on the top, you're actually in the bottom and you, you lift people up, like you, want to hire people that are smarter than you and they can bring, you know, every, every place I was working, I think I got different things, you know, at Meta, you know, there was this phrase of like, if you want to go move fast, you go alone. If you want to move far, you go together. and

You know, and you can see difference in the cultures between companies like Amazon and Meta, like where it's more engineering-led, this is more like a product-led, of course, different businesses, but just learning about the cultures and observing it. You know, I joined Meta with, you know, with the executive titles, you come with perspective, but things I think I changed over the years about myself that I learned is that, you know, we as humans have the habits of forming opinion about something.

after five minutes. and it workplace is dangerous because you can sometimes interact with very senior leaders. Let's say they give you like 10 minutes of their time and they're driving in their car, et cetera. And sometimes they have this mistake of forming an opinion about a person that you just talk to them. I learned myself that I never judge, like you need to give the opportunity to every humans on operate. I do have the

RBA (12:00.758)
Three strikes thumb rule rule of thumb like just don't make the same mistake more than three times because you know If you ask Elon Musk, will tell you maybe it's one strike I just think that you know to correct the system you you need to do a retry and then you need to set up your threshold, right? it's so it's there's a lot of gotchas and sometimes you learn from you know, I one of the things I when I If you are joining to a new role in a company and you're a manager or leader. There are two tricks. I I

I it's not you know what I don't call it tricks but two habits I developed and I've seen one meet the not meet just the senior peers that you work with but talk to the engineer talk to the people who are doing the work the day to day and Set up the first inter meeting with them At 4 p.m. Or 5 p.m. Don't do it in the beginning of the day like maybe end of day like like half an hour and Ask only one question

that is important in my opinion. And I always like to ask this. And if I will ask you, Nitai, and will ask you, Kostas, I wonder what you will answer. But can you describe me what you did today at your work, just in plain English, but how does these activities impact the bottom line of the business? You will be surprised how many people cannot know how to address this question. I think that's the core of the problem in big companies today. The people forget the purpose. Yeah, they come with a lot of...

visions and missions, but if you cannot say it out clear. Now, product managers will always be able to give you these answers. This is what they trained up, but I think this is what differentiates also like good data engineers and okay data engineers. the best data engineers, really understand what they do. And in the context of geeking out in tech, what I see today with large language model is even more important because now English is your new programming language.

Nitay Joffe (13:37.72)
Thank

RBA (13:57.53)
You must understand the context of what you're dealing with. And it's much harder because we are not experts in term and conditions of a contract or I'm looking now at a drawing of a chip, which is a thousand pages of design. If I'm a knowledge engineer, AI engineers, I don't have the domain subject matter expert. So I think we're going to see the sphere of business professional experts and the engineer coming together. So.

Again, I have the tendency to talk a lot, so stop me when you want.

Kostas (14:31.444)
I know that's great actually, but I want to ask you something. You mentioned at some point in your introduction about joining AWS and it was a very early times where everyone was saying to you, are you crazy? Who's going to send their data up there on the cloud, right? Tell us a little bit more about how it was to work in these early days.

on something that we take for granted today, but back then it wasn't. and I think that I'd love to hear the connection also with what you just said, the impact to the business, because I'm sure back then in a company like AWS, hopefully the connection to the impact was much more direct and much easier to understand. So that was a little bit more about that.

RBA (15:02.34)
Yes.

RBA (15:26.714)
Yeah, and I can tell you like from again, it's and everything I said, like take it with grain and salt. This is based on my experience observation. But, you know, when, companies back in the day, you know, AWS business started before 2012 I joined, but the early adopters of AWS were like, you know, the, you know, the first Netflix of the world and Airbnb and like a startups, like the ideas that you can now.

you know, S3 like one SKS was the first service, but then S3 and then, you know, I an EC2. think one of parliament moment for AWS was like in 2009 where they come up with EMR, the Elastic MapReduce, right? The idea is that, you know, and you know, when we started talking to customers, you go like, I will not mention the names, but the big pharma company, I will not forget these days and they did nothing on the cloud. So they were like a big.

data on this, Teradata, massive data warehouse running on premise. You know, they was just crazy and happy that they moved to Tableau because Tableau was kind with, and before that, bear in mind that I was in the data space. If you remember the old days where the business wanted a new report or dashboard, they had to go to the IT department, ask them, hey, can you design me a universe that was like business objects? So you had the team.

actually build the reports and after three months you will have a nice report and dashboard. What happened, Tableau came and they brought a laptop to the business and said, hey, you can do it yourself using your tools. So the same concept happened with Data Warehouse when AWS came with Redshift, right? You remember with, yeah, Parxel, et cetera, but Redshift was indeed the first cloud data warehouse that exists there, et cetera. And companies were...

It was new and the early adopters really enjoyed, but when you go to a big pharma company, it's a journey. don't first you need to tackle security because everybody are concerned. If you are coming to a discussion and you talk to them about how you can run spark jobs and they don't know what is S3, we have a gap. It means that for them to start even using it, it will take another three, four months because they need the landing zone. They need your IM, the VPC. And back in the days when I joined AWS,

RBA (17:44.482)
We had 67 services. Also, I think the bar of the technical people back in the days was much higher. You know, a manager like a solution architect manager was hands on person. They were actually go to the console and they can show a customer how it works, troubleshoot. you know, the first people we hired in AWS, they couldn't talk to a customer before three to six months training and understanding the platform. So you need to end.

And also like early days, was for Amazon. Like back in the days when I joined, even Amazon itself, they didn't use completely all AWS. They were using even Oracle instances. We, I was involved in a project that we were migrating Oracle to DynamoDB. Like, like think about these, these all day. So with this journey with the companies, and I think this is what reminds me what happens today with Gen. AI in the first three years, it's all about training and learning what

I do think that companies need to be cautious is not to put in investment and build another Hadoop. I've seen some new vendors that coming with like this big platform, no hallucination, know, bring your own model, but still it's a closed platform. But the technology space moving so quick that, you know, I still see businesses that still have a Hadoop cluster on their own premise and it actually slows them down. Like you probably...

I don't know what you see in your space, et cetera. you know, and that was part of the people processes problem that customers were not, but also the technology that evolves. Like today, Amazon probably has like 250 plus services that exist there. Also the competition, you know, it was early days of Azure. think GCP came after, you know, with BigQuery also tried to challenge AWS in this space. But today...

I still see AWS as a leading cloud provider. I think Azure is number two, then you have GCP, but I think with Gen.AI, I actually think the cloud war is going to be totally different than what we see today. I actually look at the LLM vendors as a new cloud utilities. you know, we tend to say, hey, I'm using OpenAI GPT-4.0 for one. No, you are using

RBA (20:08.376)
GPT system. It's not foundation model only. It's a crazy system with APIs and the fact that OpenAI acquired Rockset and you probably know the team building in meta. They invest a lot in this data infrastructure that is important for you to serve. So really early days, let's see how it's evolved. then re-invent is coming, so I don't know what AWS will announce, et cetera. yeah.

It's the same cycle that I've seen. It's just happened faster. And I think now what you will need to do is to be much more closer to the end user, the business. And this is where like you see the value chain of AI quicker, but because this technology also impact, you know, your engineers, like, you know, some of our engineers, they use cursor, right? But you don't want unexperienced engineer that only use cursor because in the end it will become like a tab engineers. will tap, tap, tap, but

Kostas (20:47.204)
Yeah.

RBA (21:05.188)
You know, what happened with things goes down. So you have, you know, the folks from cognition that building dev in and cetera. So you'll have autonomous software engineers, et cetera. So every time I saying autonomous, I always said, Hey, we all love Elon Musk. I still don't see a cyber truck driving in Bangalore in India without the driver. Right. So, you know, because the world and humans are complex more than what we think of.

Kostas (21:27.802)
100%. That makes total sense. Before we get into the more recent things, I have a question about Redshift specifically, because you mentioned something and you're very accurate. Redshift was pretty much the first cloud data warehouse out there. And for a while it was the only thing and everyone was trying to use it.

But something happened at some point and it lost its position as the cloud data warehouse, especially after Snowflake came out. What happened? Why Redshift, while being the first mover and actually being kind of dominating, right, in your opinion, lost its edge at that point? What happened? It's probably...

I don't know, notes.

RBA (22:24.196)
It's very subjective opinion probably. And if you will talk to even Andy Jesse, trust me, like the leaders then they know this topic very well. you know, I think two things that I can relate to what you say. First is the first mover is not necessary is the winner. And I think we all need to learn this lesson from the concept of, you know, even in Gen. AI, et cetera. And we say things happen. I think two things happen. One is BigQuery. Second is Snowflake. These are the two things that happened.

Kostas (22:29.455)
Mm-hmm.

RBA (22:54.424)
And I think, you if you think about the business of AWS, it's not like AWS is a redshift business, right? The idea and the person that actually he coined this the best was Charlie Bell. He's now in Microsoft, but AWS vision was for the first time that was built is to build this everything store for IT. I want you as an IT to be able to run everything, Windows workloads, Oracle workloads.

you know, and look at their partnership they have with Snowflake and, so you want to be customer obsessed. You want to give your customers like options. So when Snowflake like think about Snowflake is, is a company dedicated for just one thing, which is a data warehouse business. So they have a dedicated sales team, dedicated go to market motion, et cetera. It's really hard for AWS like to put like, you know, 500 people just selling redshift, right? Like it's not.

And you want to be customer obsessed. So the customer obsessions come from the leadership of the product as they're building the product, et cetera. think where I see Snowflake really did a good job was one on user experience. The fact that you can create a cluster in one click or two clicks and start querying, I think that was overwhelming. Really, the onboarding was much quicker than what you can do with Redshift. So that helps them to tackle this new

Kostas (24:02.426)
Yes.

RBA (24:18.774)
startups, these new companies to start and having that. Also, were like investments from Capital One, think at some point, invested them. So there's always like business relationship into this technology. But I think that if I need to nail it, I think it's about the pace of innovation that you couldn't run fast. And bear in mind that Redshift had really big customers running like large clusters in large volume. And you don't compete against

you know, workloads that comes from startup, but you have mission critical systems that running like, if I have an SLA and I need to reports to be ready by 6 a.m. and if I'm not, I'm going to pay a penalty of million dollar, you must trust us. So Redshift we're focusing on that. I think another challenge that I've seen is that, you know, we were focusing a lot on migrations, right? So now you have migrations, know, migrate Teradata, Oracle, whatever data warehouse.

It's a six to nine months project minimum and you cannot automate it because once you have a BI layer, you know, if someone wrote a TSEQL or store procedure 15 years ago, with all the business logic of the company, you're scared to touch it and change it. So that was also a barrier that this migration takes more time. And I think the last one, if I'm looking at, know, Redshift was always focusing on...

giving you the price performance and et cetera. think the connectors and the ecosystem that actually like I think Snowflake did it much quicker than anyone. Like you can go to Snowflake and now you can have Rivery or you can have Confluent and you can integrate with all the data ingestion. They actually figure out this partnership really close. So it really was timing and coming and I still remember Snowflake and I met.

In 2013, like just previewed, show me the first time and I said, wow, this is really cool. And I took to some like periods. said, eh, don't worry. You know, there are many startups are trying to do it and cetera. said, guys, they came from Oracle. They know this business and, but, but they are great partner with AWS. You know, when I left AWS, we were like quite agnostic. Like I want you to use the best tools for your job. I think the reason for you to use like just one vendor, if you decided that you're going to build something and,

RBA (26:36.078)
You don't need to be cloud agnostic, etc. I personally, I'm trying not to use many vendors as part of the software stack that I'm building just to mitigate the risk, but you don't want to have one single point of failure to do that. And also internally, think AWS also did experiment, right? At some point we came up with Athena and that was like, okay, now we have a managed Presto. I still remember we actually went as a team to Facebook offices to meet with the Presto team.

In the early days, so, you know, you brought me like interesting memories But but then now you coming to a client and he said okay I have a Tina I have wretched 20 years what and if you need to start explaining your customers when to use different parts of your product You start having issues. I think by the way, I think snowflake has issues today because you know, have competition from data bricks I can tell you about my philosophy about data warehouse and sequel

I don't think in five years from now we will need this system in the way that we need it today. So there will become like this data infrastructure that like we have AS 400 and COBOL engineers. You know, maybe there might be people that hate me saying that, but I think the technology is evolving in a way that you can really ask it in plain English and get an answer today. But anyway, hopefully it answered the questions about Redshift.

Nitay Joffe (28:02.552)
Yeah, there's an interesting point there that will certainly come back to. I've heard a few other folks kind of mentioned this, the idea of kind of the generative AI world and what the future of kind of data lakes looks like. But before we go there, one last question, because you touched on something a few times in terms of like AWS being focused on complex industries, migrating people to the cloud, kind of, think, lot of regularly environments and things like that. And, you know, it's funny, you mentioned Facebook. I remember even back in my time at Facebook.

we had so many data systems within Facebook, many of which we built in-house, some from outside, that we started having spreadsheets of which data system to use for which use case and which capabilities, you need low latency throughput, on. And then eventually we had so many different things that we had spreadsheets of which spreadsheet to use. Like I guess to a point of it, it's like this multi-level, like the little... So I'm curious because you must have seen this to like a tenth degree at AWS with all the different systems you had, and then you seeing this on the customer side.

RBA (28:51.844)
Yeah!

Nitay Joffe (28:59.916)
So tell us a bit more about like, what was it like kind of wrangling all these different things together and then what the picture looks

RBA (29:05.036)
Yeah, no, there are two angles I can look at it as one is if you're an SD or software developer and you're building a service, you're very isolated to your specific service, et cetera now. But even if you're building a service on top of AWS services, it can be complex, right? You're maybe using a new service like, you know, I remember when we built Kinesis, for example, that was...

And Kinesis started as an internal project in Amazon to be able to process our billing because you had to process the telemetry and the billing logs. So you started with MapReduce, then you had to do something like Kafka in real time. And that was like a first service. I think we called it Canal. And that become like, and now we have a new services that actually using Kinesis as the backbone to build their service. So they depends on that. think for Redshift,

you know, the dependency was not for the service, but now you as a builder, you need to have like a prescriptive guidance or like, know, this is my use case. This is the architecture pattern that I need to use. And this is how they can conduct the services. I, you know, when I joined Meta the first week, what I did, I connected all my WhatsApp and Instagram and I just follow the data. I wanted to know which system use when the

You know, the WhatsApp logs, again, we are not storing any of the locales on the meta servers, but in the systems, you know, we have the system like Scribe and Dacuri and the queries, you're still familiar with them, but they build everything internally. And the problem is that when you're building your internal tools, you know, it's a job security for the people who build it because, it's our tool, but...

You don't see what happens in the industry and sometimes you can see these companies that are building these internal tools that Hey, if you take a data bricks notebook, you don't need to really build it yourself However, know meta has a unique problem that not many companies has like the scale Etc. So from the principle was you know how you mitigate it? So yes spreadsheet was one of them second. This is why you have solution architects or you have specialists or

RBA (31:23.256)
you have experts to talk to and ask and we have this ambassador for each part of the business. Like we just launched SageMaker. So you need to bring SageMaker experts to help people on the field to tell them, like, if you are now a solution engineer in Databricks, I feel bad for you because you need to know so many things. You need to know ML Ops. You need to understand about mosaic ML that they just acquired. You need to understand about Spark ETL. And guess what? People just want to build a,

you know, data mart. So now you need also to understand the slow changes they mentioned and that topics. And so, you know, you need to bring a mixture of experts in each part. the usual suspect of what you do usually is standardized by job roles, right? you know, one of the biggest mistake in my opinion is the term of data scientist. I think it created so much, you know, I know DBAs that just took

Coursera class and they become like data scientists. Some of them are really good. it's a, but do you need to have a PhD? So I always have this joke when I talk to customer, I'm a fake data scientist. You know, I train and build models, but I don't have the PhDs. That's why I'm saying I'm fake. But, and then we came up like data engineer, have ML engineer, AI engineers, et cetera. So I insisted when I built the group that we are going to combine the role. Like it will be ML and data engineers. I think we should not look at them separately.

And I will continue saying that. think it's with OpenAI couldn't build what they built without a really good data infrastructure and data engineering around how you build these models. And just the complexity of training, this process, it's a big data problem. Like if you think about it. So yeah, there are different techniques. I still think like knowledge management in companies sucks.

Kostas (33:12.644)
you

RBA (33:16.416)
orphan wiki pages and and and I've seen many attempts of startups and companies to fix enterprise search the problem and and I will quote the Do Liberty from pinecon is He told me like there is no one rug solution that can solve everything It's like you tell me I have a search for everything There is no such thing right like if we had one database that can solve all the problems. We didn't have

know, that could be in a click store and all the different things, including like, like what you built me tight, you know, for, for CDP, right? Like, cause there is a domain, there is a specific domain. And I actually think the future is personalized. think the new versions of, of SAS and software actually companies will be able to build it themselves rather than just taking out of the shelf. I need a software for just this.

problems with contracts. Like I think you will have these tools that you will be able to build it without writing code in the future with agents and everything, but time will tell.

Kostas (34:21.688)
Yeah, that's some great stuff. Roy, so after AWS, you went to Meta and in Meta, you mentioned you joined the AI and LLM staff that was happening there. Tell us a little bit more about that because after you do that, I want to go back to what you just said about how important the data foundations are for actually going and building these models and all these things.

Tell us a little bit of, you know, I don't think like many people understand what it takes like to build an LLM or how these things were approached. I mean, at the end of the day, how many companies have done that anyway, right? So you've been there, seen that. Please tell us what it takes, how it looks like.

RBA (35:02.692)
Yeah. Yeah.

RBA (35:07.311)
Yeah.

Yeah, yeah, and one thing, it's a large team effort. It's involved engineering, it's all about hardware, it's involved technical product manager, technical program managers, it involves like researchers. so at Meta, there were two groups that when I joined, since then probably things were changed, but they had what we call FAIR, was the Facebook AI Research.

Back in the days, Jerome Presitti was one of the creators of Watson, the transdisorganization. the vision was, I want to bring the top researchers in the world to come and innovate around AI. And they built the first supercomputer cluster just for the researchers. That means that they didn't train Facebook data, but public data sets that you need to, and to progress academic research. And when they built

you know, the first version of Lama, I remember like a project called No Language Left Behind. It was the idea of that, can we take like two or three hundred languages that you don't have a lot of content about them, like, you know, for one tribe, but let's train the model that can help people that, you know, you cannot translate, you know, English to this tribe language, etc. So, and you see the powerful like of how you can build it. The problem is one is the cost. You need to make sure that

During the training, this training can take like months sometimes. And if a job fails, right, you want to be able to recover quick. Sometimes it can be a hardware failure. In fact, if you read the Llama 3.1 paper, actually wrote a really nice improvement that they did on the failovers and the hardware. I think it was improved by 90 % of rates of failures. These failures, you translate it, it's like tens of millions of dollars because these GPUs are expensive to...

RBA (37:04.858)
to use Then you think what I call like we need to look at what I call the apply. Okay, you have models you build it But how you actually take them and implement it into production? So today you go to whatsapp and you can use meta AI and their inference Improvement with llama 3.2 is amazing. You can it's auto complete your image generation. You can say I generate image I want to see a boy riding a bicycle and as you type

you see the video or the image, sorry, the images change on the fly. This requires a lot of engineering effort to run an inference like in such a scale, et cetera. So I think the strengths for MetaWare is that first they own their own destinies, their own data centers. These are their own, you know, they have access to a plethora of data, whether it's from Facebook and Instagram. And, know, of course you're using it in the advertising, you know, to...

Increase the number of seconds that people watch reels, right? You know the same as the tik tok game etc but you use this this Basically use deep learning to to do that but you know I remember like one of the effort was admitted that I was involved was we had this thing called system cards like where You go to instagram and you want to explain why i'm getting this commercial like what i'm saying getting this advertising So you had to build a system

that shows why you're actually seeing. It's a hard problem because it's not like a query. You can look at the query and see, okay, because I run this query, I put this information, I did the join with this table, or I actually use the XGBoost. XGBoost you can explain, right? But with weights, et cetera, it's hard for you to go and do a backward propagation to why the model selected. So because you might have, I don't know how many parameters and weights, et cetera. So,

Meta had this expertise and you know, back in the days when I joined, there was the challenge also, like they have their own data centers, you know, using ASIC network, et cetera. Now you need to add GPUs into the game. So you need to plan and it takes time and money. So at my time with Meta, just seeing and working with these researchers and amazing engineers to see how you progress and how it works, it's a great learning experience. And what I like about what Meta did, they are really

RBA (39:29.388)
sharing everything on papers, the, you people underestimate that this content is not GPT written. Like these are researchers that this is their core work. Now one can ask question, okay, why meta needs to invest on protein folding and, you know, but I, you know, there is the Silicon Valley wars that Mark wants to show, hey, Google has deep mind. I have met AI research, very respectful. And I think

You know, I'm a big fan of what they do with the open source. They just announced today, I don't know if you saw their version of Sora of generating text to video. Amazing, impressive. When I joined, I actually worked with a team in Israel and they did the you know, text to image. And I remember when I joined, like, it was taking you like 12 seconds and probably cost you like three cents per call.

Like if I die, you know, recall and now I think we are like 5x faster probably and probably much cheaper. Don't forget that met also invest in their own custom chip for MITA. Like inference is the king. Like, like if where you can like this is most of the cost. I actually take my assumption that going forward in the next five, 10, five to 10 years price will go down and the performance will go up and you know, and this is again, patterns from AWS.

AWS reduced prices for customers more than 100 times. Imagine a vendor that comes every year to their clients and tell them, hey, we cut by 10 % the cost for your compute. Why? But this is part of the customer obsession, but you also have the economy of scales and improvement that you can, and that was the philosophy of Amazon, right? The virtuous life cycle. If I can reduce my operational costs and I can pass the value to the customer, I can reduce their cost.

they will be happy, they will come back to us again and again again. So there's still way rooms to improve performance. Like look what Grok is doing, like CoreWeave, like all these companies that you can buy cheaper GPUs, et cetera. I think the biggest winner is Nvidia, course. They are joining from the most. think last year there was a certain point of time when we started the company.

RBA (41:51.918)
We're still with this thing. We can do, we are doing fine tuning. We're doing training. Eventually we decided, but we couldn't get GPUs. Like there was no GPUs that you had at some point. I almost got a GPUs from Oracle like even like just to do some training, but luckily the foundation models and the open source are so advanced today. Like with few shots and you can do and you have Laura and you have other techniques to make this things like cheaper. just honestly don't see companies needs to train older.

company data anymore with GPUs. Like you can really rely on some foundation commercial or open source model. Yeah, so.

Kostas (42:31.534)
Yeah, that makes a lot of sense. So you've seen the evolution of data platforms literally from when, even before MapReduce up to today with the LLMs. And I want to ask you how, because, okay, I think what is happening with data platforms in general and how we deal with data is that it's driven a lot by the needs that we have.

Right. The data warehouse exists as a concept because BI is important. Companies need to analyze their data in order to understand it. do well, maybe not how we can do better. And it might sound as a simple case, but actually it's a huge part of the industry out there. And it can become very complicated when you are talking about really big companies.

We had BI, BI drove all these, it's a innovation in these past 30 years in the industry. And then we had machine learning because now we also want to build products on top of the data that we have. We have recommenders for example, right? And today we have LLM and we still need data, these huge...

models at the end of the day, they are nothing without the data that comes in and generates these weights there. Right? So how you've seen the evolution of the infrastructure needed to build this. I'm not talking about the hardware right now, but more about the data systems in place.

RBA (44:16.398)
Yeah, it's like this data lake evolution, right? Like, you know, we started having this one source of truth, like a data lake, but what happens with data lake, I think over the years, I think one of the common mistakes I've seen is like companies build it like as an ODS, like the operational data store, right? Like you put your data, it helps at least for the IT, at least to...

monitor the access control and put all the security like like PII and cetera, but once you were able to build like You know the first like s3 folder that okay. This data is already cleaned and processed Do I need it really in the dashboard in the database where I can just you know point an LLM and ask question like if it's structured data Yeah, sure like SQL can do it but

you can leverage LLM to write the query for you. Like it's very, it's pretty like writing text to SQL now, it's not that complex problem that it used to be because it's a deterministic language, right? When it becomes more complex, if you are doing like more advanced techniques with the SQL you can have, but the challenge of the old system, and it's also, it comes to me as the people process problem.

because I still remember when we used to look at the Informatica power center and people like wrote their pipelines with the ETL and Informatica. Great, you have a tool, you can... But actually, if you read the code itself, they implement business logic. Like, I want you to classify this field as a shipment address, and every time this shipment address is updated, I need to update this marketing field in this system.

Why that eventually someone can trigger an email campaign because maybe the customer is moving a house. Sounds like, yeah, you need an ETL, you need a pipeline. I think this problem is solved, right? You have today DPT. I think seeing the motion of OLTP now when you have Postgres that can offer you everything on the box, et cetera, like even vector search, for example, but...

RBA (46:34.712)
I think that the truth is for the data system is that you as builder, you will actually write to one place and whatever it's an OLTP or OLAP, the system will know exactly how to reroute it and set you up. But the opportunity now that happens, and this is where I see the data lakes are evolving into, I like to call it a knowledge house because I, you know, in the university 101, when I study, I remember still this deck that they showed us data.

information, knowledge to wisdom, right? So, and just think about the role. Information is above the data, right? It's the idea that you can convert data to information that the business can do something with that. And that's the job of the CIO. You're the chief information officer, right? I think now is more than important is how you actually maintain your knowledge, the know-how in your company. You're ever run books. Some people wrote this round book, but

Eventually when you have a problem, you need to ask a person and that's something that it's hard to encode into a database and this is one of the evolution that I think we will see with LLM is that how you can build an ETL for unstructuring information with human in the loop embedded that you can auto correct it or have a human correct it. Today in the data warehouse, if I will ask you this problem, say yeah, you can do a, you know, I create a table as an update or.

Like you will talk to me in the data engineering terms and you know, and you will build, the data engineer will build a script to deal with that. And you need to move away from building an ad-hoc, you know, scripts to deal with this problem. But it's all to me comes to a people processes problem. And I think where we are heading is that being more business outcome, like understand what is the goal of what I need to build. But still you need to have this infrastructures and foundations. So.

You know, you will still need a place to store the data. You need a way to query the data. But the layers of the business intelligence system, I don't know if they will exist in 10 years from now. Right now, think Tableau and all these companies are becoming like a chat bot, right? Like you will ask questions, you will see an insight. I actually met an interesting startup that what they do, they take a screenshot or a video of someone using Tableau.

Kostas (48:37.774)
Mm-hmm.

Kostas (48:51.281)
Yeah.

RBA (49:01.73)
and they actually reverse engineer to show like the flow of what, because I'm telling you, go to every company that you use and they have Tableau users, et cetera. They probably use 20 % of all the features that you have in Tableau today. Like literally, like even Zoom, like every time they have a new features with AI companion that it's, so many companies are adding AI features, but no one really asked themselves.

Kostas (49:03.162)
you

RBA (49:27.884)
If I have a black canvas, will I build Tableau the way I build it now? Like no one will ask this question. So I think one thing is in my founder mode is that today's the day one of my company. Will I do the same? How would I build it? You need to ask yourself, especially in an evolving technology phase, right? Like these technologies are changing. every week you have something new that you need to adapt. Last thing about data.

I think now more than ever, I think the winners are the ones that also will focus on the real time. you know, Databricks invest a lot on structure streaming and real time information, et cetera. It also connects to the inference because now with footprint of memory, et cetera, like look at that DB, for example, like you can put like billions of records running in a super fast and it's operate like a data warehouse. Think about it. So we'll see like.

Mini data warehouses that are all vetted and curated with LLM around it, but we still need to have the data governance. These are things are not gone away, but this to me always like people processes problem. And I don't think you need to take a generic solution that will fit every problem because every company is different. And I think that's the nice thing how you can make it generalize.

but give them the tools that they can customize their solution for their own needs. know, even how you build the data deck in S3, right? Like, I always like to look the DMZ approach, right? Like I have the red part, like no human can access to this part. This is only someone with privilege and I need to anonymize this information, et cetera. You know, 10 years ago, it was hard. You had to build it yourself. Today you have like, probably, Nita, you can give me like two.

two vendors in this space with two clicks, you have that. Like one of good buddies of mine is the founder of BigID, right? They have the idea to classify all your data, like in a click of a button, et cetera, like things that companies build. Who needs to build it, right? Like I'm, you know, the build versus buy is always a struggle. I like to take the pragmatic approach. I prefer not to build anything to get to the outcome.

RBA (51:50.042)
The only reason you need to build if it's something that you need much more control for your product, et cetera. It's not about the defense of the IP, right? Like I'm a five people startups, right? There are probably thousands of startups that are doing the same thing that I do with much more funding, et cetera, right? It's not about being smart, but having that experience and understanding how these companies operate, how the real life of...

a finance analyst who needs to get his reports done and they need to send an email and they need to actually they have a process that they need to update this database. It's there because for a reason, right? It's about risk mitigations. It's about but I think we are going to see a big shift in this domain. I would not be surprised if DBT will announce soon. We are now LLM pipeline, right?

So it's evolved a lot, right? Like I remember when I joined AWS, there was a service called Data Pipeline. I think I was one of the few real users of this service, right? Because it was built on SWF. I love workflows. So I'm a big fan of Luigi, if you remember. And then all of these service workflow was one of my favorite service on AWS, by the way. But then, you know, they built Glue. And Glue is a great service. hear...

good things from customers who are building it, et cetera. on the other hand, okay, I have glue, have data bricks, which one I should use, et cetera. you you have a platform, choose, again, I'm not an expert with data bricks. I was using it when it was just a Spark ETL, but you also need to train the market, right? The data engineers, what skill sets they have.

Kostas (53:37.272)
Yeah, 100%. Okay. have a question on that. Actually, I found great, like this pyramid that you described going from data to information and then to knowledge. And I think, I mean, everyone will agree that this is what we are trying to do here. I would say data is probably from an engineering perspective, the more concrete assets on how to manage.

We have databases, have structured data, we have unstructured data. Sure, there are problems with all these things, but at the end of the day, we have a path to efficiently solve them and do the right trade offs Then we have information. Things get a little bit more blurry there. What exactly information is out of this data? But then we have knowledge and knowledge...

It's not just information, it's also information that fits in, in my opinion, specific context. And this frame of reference now is very relative. The same thing for one company might mean something and for another company might mean something completely different. Right? So sure, we have LLMs. LLMs can probably help us both in, let's say,

represent knowledge, but also reason about this information and retrieve knowledge. But how can we engineer solutions on top of that? Because engineering at the end of the day, as a discipline, is supposed to be very deterministic. We have trade-offs, we decide about these trade-offs, and we implement.

And we can, if something goes wrong, we can reason about that. Right. So what's your take on that? Because I have a feeling talking about knowledge and the LLM is not just another relational model that we're going to throw out there. Right. So I'd love to hear your thoughts on that.

RBA (55:42.5)
You know it.

RBA (55:48.868)
Yeah, I, by the way, this, the same problem also exists in this big data system. If you remember, you know, hyperlog log and approximately distinct, right? Like, like I'm okay to have a 1 % of errors if I'm querying like billion of records, et cetera, and I can get the performance gain that so, so, so it's really about the tolerance of your business to, to mistakes. And I'm, you know, I'm still like,

thinking about, you remember this CAP theory in distributed system, I think we have a new CAP theory in LLM, right? But it's not about consistency and atomic and partition tolerance. It's about the complexity, the trust and the accuracy. Like maybe only two can coexist, right? We need to think, like I start developing like at least in my mindset that we'll need also to pick the trade-offs and from the angle of engineers.

I see it on the day to day like with engineers that are struggling to work with fuzzy logic, right? Like, yeah, you have cool things like Pydantics and now OpenAI you can have in the API and you have a JSON schema because we like to work in the structure, but you can still work in structure theme. Like we are using, for example, prompt optimization. Like there's a library called dspy. It came from Stanford. By the way, Dettobricks, I think they...

took all the founders and now they're going to sponsor this pie. It's interesting library, because the idea is that we don't want people to write prompts in the long run, right? Like you want to have a programming language, et cetera. But I think in the terms of the knowledge is this transition between, if you think about data interpretation and contextualization, know, the LLM can interpret row of data and place it in the meaningful context.

If you train it, if your own data, if your own knowledge that you have in your company, you're good because you can prove where the data comes from. So, so, you know, the developers were okay. Okay. So, so I think we need to have this verify symbol about the answer and companies will need to enforce this in all their system, the same, the same rules that applies to security, right? Like you all have been in the security reviews and, privacy policy. think we need to take this, some of these disciplines.

RBA (58:09.292)
Also, when we work with these systems and even with data problems, like data engineers, the problem still exists today, like about data quality problems, right? So one thing that you want to mitigate, and this is something that we invest, for example, a lot is on eval framework. You must have an eval framework. If you're building any GNI-based system and you don't have eval framework, you're just playing in the casino right now.

because basically every time you will change a system prompt, you will impact your system. And the same thing happens to RUG is a different problem. you know, one of my friends told me, if you want to be very rich, just go build a startup. That's what they do is a RUG evolve. Like just that, like, like how can you prove that this RUG system is better than the RUG system? Think about you as a buyer. I'm in enterprise IT. I'm the CIO of a company.

These two vendors tell me, our platform is not elucinating. We build a state of the art rug. The best answer I have is that let's throw some data, spook questions, get the answers and understand. And you will see that these both companies in the end of the day, they will need someone that do some prompt engineering tweaks to make sure that it works right. So the last mile needs to have the human in the loop. And I think the company that did it the best is Palantir.

They just called it early the days, the forward deployed engineers. But what they did actually, they had engineers coming to the customer using Palantir platform, fix the problem to the customer. It doesn't matter, like fix it, but this fix of the problem is now embedded into the know-how on the platform. And today Palantir have secret weapon called ontology, right? Like it's, you all know like how hard and how important it is to have this ontology. So.

Same things you will need to build for the LLM. So, Eval is one example. The second, the ability for you to work with multiple models, it's important because you don't want to rely on one vendor, right? I think lock-in is overused because I think you are locked in with your first engineer that write code. Because engineers are different, they have different ways of doing things, but you don't want to be in lock-in in your innovation.

RBA (01:00:32.856)
So when I say I want to have using multiple LLM, I want to have benefits to using advanced voice mode now with JGPT and connected to my APIs because it's available today and I can have a better user experience or better features for clients. And also like the advancement they're doing, there might be that Claude LLM will be better in generating code. And you know, we've opened AI, they are much better to...

to doing something that is not code related, et cetera. But it's harder for engineers if they're not adept today. When I interview now, and engineers that we are hiring, I ask them how many hours a week they spend with AI on their day-to-day work. So the answer must be over five hours a week. Like they need to use it. And you will be surprised, there are many engineers that are not using it today yet. So...

And again, this is why, because if you don't use the system, you don't understand their limitation and you don't understand how they really work. So sometimes it's funny. I see all this LinkedIn posts by executives from companies like big companies and they're standing on stage and talk about Gen.ai. And I can promise you they don't use AI on the day-to-day basis, et cetera. And it's hard. It's complex. I don't know all the answers to how.

the data ecosystem will look like, but I do strongly believe that we are moving into the knowledge verse world because, you know, I'll give you an example. One of our clients, you know, he defrosts like 5,000 contracts of documents and they want to ask questions about lease, like you have lease and expiration day and you said you probably they have a database that they know all their 10,000, 20,000 properties or the lease expiration day. Apparently no, someone needs to go.

into the system, pull some data from this system and connect it together. Now you can throw all this data. You can ask this question and create a nice dashboard that shows you the distribution of months of when the lease expired. So now you skip this data, data warehouse BI, you can actually get straight source of truth answers. Now the answer you get, where do you store it? You can store it still in the warehouse. You can store it as a knowledge units, but

RBA (01:02:53.314)
I think we will need to have new systems that allows to capture these knowledge units, et cetera, whether it will be like a new form of database, whether in just an embedding or, you know, knowledge graph. Everybody are fascinated about knowledge graph. I found out knowledge graph are cool. I use Neo4j so many years ago, like, and people forget about it. Now it's becoming cool again, but...

You need to take also consideration the price and the cost of processing. You don't need graph database for every problem, by the way. But, but it's really powerful to some use cases.

Nitay Joffe (01:03:31.32)
That's very fascinating stuff and we're probably coming up on time here. So maybe last question, You mentioned a few interesting areas around like model explainability, real-time streaming, eval framework, kind of, you know, if somebody is starting a company today around data AI, what are the things that they should be putting in the bucket of like, okay, I need to build this myself or maybe you as building your company, what are the things where you're like this, I have to build versus the things where you're like, I can just.

get some off the shelf thing. And I think there's almost a third bucket in this day and age to your point of things where like it's actually better off to basically just wait because in six months or 12 months, like the answer will be there. How do you think about that?

RBA (01:04:07.236)
Yeah. Listen, I first, do believe that we will have the new Salesforce and the new SAP companies that are being built right now. Like we will see, like it happens before and we will see it happen again. You know, as a founder, I think it depends on what you're building. First understand who are the end users, like not ICP, but are building a platform that the end user, our software engineers, our DevOps.

And so once you understand that, you start, but I think I always go to the data 101 as if my system is creating a new data for my users, I want to control my own destiny. This is where like I will probably will try to build the tooling for myself similar like how meta infrastructure is built, et cetera. However, if I just started my company,

Shall I go and now build a data warehouse? No, you can run your company on a spreadsheet today with Gemini and ask questions like that should be your data warehouse. Do I need Apollo now for your go-to market? No, I will use clay to find leads and twain to generate emails with two clicks, et cetera. So the build versus bytes where I think where you are in your journey with the company, I do like to try new stuff like

You know, we are using a cool, you know, the guys like from PromptLayer. I love this team. you know, I need to build my own observability for my prompts, but you know, I'm a new, I'm a young startup, et cetera. It's secure, you know, it's in the VPC. I like to help our local New York businesses and they're doing a fantastic job. Now, you know, observability is so important thing. So that was a decision, like we decided to go with PromptLayer in the beginning. Now.

you know, in the future, now I need to build it my own and such. Who knows, right? Like we might, know, prom player might have an offering that you can run it on your own VPC and I will pay them a little more, but you know, as long as I can control it. So this, build versus buy, I always look at the cost of how much it costs me to keep the lights on for a system. Like that's, that's to me, like what happens if the engineer that build it tomorrow, he decided that he's going to Hawaii for two years and he's gone like,

RBA (01:06:29.412)
How much technology depth do I have to maintain it, et cetera? So these are the questions you ask as a builder. I think, but you need to ask yourself every time. Some companies just don't even ask, like AFDAT, but it's related also to ego and job secure and there are other aspects that if a company, your promotion is based on how you make an impact.

Of course, as an engineer, want to build a first of a kind something. And I'm not doing it just because it's important for the business, but it's important for my career. make the decision that are based off the right answers. This is like models like OKRs, but tie to your business goal. Like that's how you should do it in general, in my opinion. You know, I'll give you an example. You know, we were debating, build on Versa, like v0, like the entire UI, et cetera. We decided not to go because

you know early early tries like it was a little you know now we are like debating we might migrate like next year and we will change the entire there is another company like it's rippling right now like see i like there's so many things that happens that i always like to make something you learn in amazon two ways doors decision so always try to make two ways door decision if you're making one way door decision make sure that

It's your bet, right? You make a bet, but so this is where you Get some consulting talk to people like me as a founder. I'm not an AI expert by all means, right? There are so many unknowns. So I have my own You know, know this research from this company in this research like hey, what do you think? I always get get consulting from someone just to be a sounding board, right? And then you know telling that you are an expert or something, you know be humble

we still don't know many things, right? So anyway, it's a fun journey and I really thank you for having me. I'm doing this podcast with you. It felt like sitting with my friends in a bar and it's fun. So next time, hopefully we can do it live over a tea or beer.

Nitay Joffe (01:08:43.03)
Likewise, definitely. It's been a real pleasure. Some great, very sage advice there for people and very interesting to see the roots and history of the data world and AI and where it's going in the future. Yeah, so we thoroughly enjoyed the conversation. I'm looking forward to the next one. Thank you.

RBA (01:08:57.764)
Great.

View episode details

Listen to Tech on the Rocks using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

Evolving Data Infrastructure for the AI Era: AWS, Meta, and Beyond with Roy Ben-Alta

Subscribe