← Previous · All Episodes
Incremental Materialization: Reinventing Database Views with Gilad Kleinman of Epsio Episode 17

Incremental Materialization: Reinventing Database Views with Gilad Kleinman of Epsio

· 52:19

|

Nitay (00:02.392)
Gilad, thanks for joining us today here at CoastS. Why don't we start by giving us some brief background about yourself and some of the things you've been working on.

Gilad Kleinman (00:10.734)
Sure. So I'm Gilad, Gilad Kleinman, one of the founders of Epsio. Basically, actually started learning to code together with my co-founder more back in high school. Originally worked mostly on back-end related stuff. I was very strong on the PHP world. worked a couple of freelance jobs and like a Silicon Valley startup. As time progressed, I kind of shifted more towards the low level development world.

Um, in, Israel and a 200 worked on like low level development. then I know my after time did a little bit of a kernel development, um, a couple of a security security research, uh, and kind of had a really fun time, like looking on everything from like the bits layer up until a Python code, uh, doing fun back end stuff. Um, and the past couple of years been working together with my co-founder.

I'm on FCO, a new and exciting incremental views engine, which I'm sure we'll talk about a lot today.

Nitay (01:18.562)
How does, before we dive into some DiveCio stuff, how does one go from working on PHP to working on kernels? Typically, kind of move up the stack is what I found. So how did you get into that and what was it about it?

Gilad Kleinman (01:31.404)
Yeah, so that's actually a really good question. And I know so the classic path is like going from low to the stack to up, but I think actually there's something magical that I always enjoyed of like going down the stack because like you start from the use case and then you like understand how the infrastructure can support that. And I think like also just because of the things I had to work on, I had to go a bit more low level, but also fundamentally also and kind of, think.

Those are the kind of approach that we take in FCO is that there's something healthy about starting from the high level and then breaking down what is the fundamental blocks that you want to support it. Cause at the end of the day, your kernel is at the end of the day running a lot of Python code and communicating over network. And there is a benefit to understanding the higher stacks when doing the lower stacks and so into hardware, like when going into hardware.

It's important to understand the kernel code that runs and then build hardware that kind of like makes it easier to run stuff on top of it. So it was a combo of like, just, the things I had to do had to be more low level, but also generally like, I think as time progressed, I just went to more, more lower stuff and think that's a good, process.

Nitay (02:50.488)
What was the kinds of projects or types of things you did or perhaps tell us maybe some of the things that people often miss about not understanding the lower levels when they go and they program these PHP things. Coming from that world and going to the kernel, I'm curious, what are the things that people often just don't understand about the lower levels and make assumptions that are wrong or do things that are inefficient?

Gilad Kleinman (03:20.14)
I think that's a good question. I think there's a couple of stuff that always surprised me as I went lower. And first is just the options that you can do and how much trade-offs these lower systems offer you, whether it's, again, talking from the Python code to the kernel, whether it's scheduling.

deciding what schedule your program runs on or changing the niceness of it and giving priority to specific tasks, how memory allocation works, and a lot of pretty beautiful things you can do to optimize the upper stacks. And think just going down that path, I of like...

saw how many options and better trade-offs the lower system offer that usually every time I'm surprised. Also again, going also from the kernel to the hardware stack, also similar, like you're surprised by how many different trade-offs that the hardware can offer that you didn't think about. And also like in the database space, like you start off backend, you use Postgres for anything. And then as you dive deep, you understand like also within Postgres, like how

Bloatworks how the different how there's like a lot of index says you probably haven't heard about that offer different trade-offs And then you go to maybe there is also not just post dress and there's post-wish flavors that also offer better trade-offs for better for other better situations So yeah, I think that's the number one thing of just like learning how many trade-offs the lower systems can offer

Nitay (05:08.718)
And I think that to add onto that, I think there's a certain maybe art, if you will, to taming the hardware and the operating system. What I mean is oftentimes the calls that you have at the application layer are hints or a device at best. Like in Java, you tell the system, I want you to run GC. You're really saying it like, can you run GC now? They might choose not to. When you do nice to a kernel and you change the value, they might ignore you completely.

When you do memory mapping, you do M-advise, you're giving it hints, it might ignore you. And so to a lot of people, these are things that are like, I'm going to do this, and it's going to be so much better. And then they do it, and basically nothing happens. And it's the same program, essentially. Any learnings you have or any experience you have, and like, OK, this is how you actually tame the system, and here's some of the understanding or the things that you've done that show that.

Gilad Kleinman (06:07.95)
that's a good, that's a good question. I think, I think it's like a combo of trying to understand and playing with it a lot. So that's another thing I kind of like learned of like, it's really important to understand things, but also really important to play with things and like see if something actually helps or not. like right now we're doing a lot of work on, performance, and EPSO and like,

that a lot has to do with how disk access is used, a lot of kernel-related stuff, how the CPU caches memory requests and things like that. And I think from the one hand, it's really important to understand the theory behind it. But I think originally when we started, we were very focused on what would be fastest on paper, which is a good starting point. But I think

we did some mistakes pretty early on of like relying on that too much and not just getting our hands dirty, iterating fast, doing micro benchmarks, real benchmarks, actually putting it in customer's environment and seeing if it actually helps. Cause as you say, like a lot of the time you would change nice or do like something that looks on paper based on like the theory really well.

But these things are very complex systems. And it's really hard to know at the end of the day if it's really going to help. And some of the times, it even helps, but just hurts another component you have. So you can change the niceness, and then the component you want will run fast, but then it will screw other things up that you haven't thought about. And it's kind of like you just need to try, iterate. And it's.

Learning and understanding how these lower system works is the first step and just playing without understanding is probably a bad idea, but it's not enough. Like even people who know very well on the thing, I think still need to get their hands dirty and like prove their claims.

Kostas (08:19.604)
So, Gilad, you tell us a little bit about EPSIO and what EPSIO is?

Gilad Kleinman (08:25.484)
Yeah, happy to. So basically, FCO kind of like popped up from something we saw on pretty much every backend related project that we worked on. Also, my co-founder, Maor, pretty much worked on database related stuff most of his life, whether it's, again, as a backend engineer and also in the Technion here in Israel, working on a couple of new theories in the data processing world. And kind of when we like...

of years ago started chatting about interesting stuff we saw in the database world. We kind of saw that in a very fundamental way, two things happen in pretty much every project we touched. Both of them are pretty obvious. There was more data as the company grew and the queries became more complex. Although these two things are very simple, I think from a philosophical point of view,

I think we kind of understand that when these two things happen, there becomes a gap between the data we collected in our databases and the results of the queries that we ran. If, for example, I'm collecting salaries in my database, whether it's pluscrest, snowflake, or whatever it is, if at the end of the day I want to show my users some of the salaries per department, the more salaries I have in my database, the more work my database is going to need to do.

from here's a list of salary to here's the sum of salaries per department. And the more complex of a query I am running, also the more work my database is going to need to do if I want to correlate that to last year, more execution to do. And as developers, it was really weird for us that even though we knew these are really important queries that are core to our business, that are going to run multiple times a minute, a day, or whatever it is.

Pretty much almost every database today in the market is agnostic to the application running above it and basically just executes these queries from scratch every time you want it. And just as users, we really wanted to shake up the database and tell them, hey, this is a really important query. I know it's going to run again. Please save the previous result. And then when there's a change to the underlying data, just adjust that previous result only based on the change.

Gilad Kleinman (10:50.742)
without running a full refresh. So for example, the sum of salaries, we would save the original result of that query. And when Dave joins engineering, that theoretical engine could just take the new salary and do plus to the previous result. That way you get a similar to a cache result that's instant because it's pre-calculated. But unlike a cache or a materialized view in Postgres,

It's always up to date, meaning you never need to refresh it. You never need to run a full calculation to get instant and up to date results for the queries you know are important for your company. And that's kind of like basically what we do at Epsio, obviously that like materialize into specific product and like architecture and tech and things. But I think like fundamentally we help companies make their queries faster and cheaper.

using these incremental strategies.

Kostas (11:53.16)
Okay, that's great. Before we get deeper into the tech and the business, I want to take a step back and connect with your background and your journey until you started Epsio. So as Nitai was saying, like you have this very interesting journey of starting, let's say, higher on the stack and going lower in the stack. And you did go pretty deep, right?

almost scratched the surface of the silicon there with the operating system that you stuff you are doing. From this whole spectrum of things that you worked with, how did you end up in data systems and databases and why that part of the stack at the end was what convinced you to dedicate a big part of your life in building Epsio at the end?

Gilad Kleinman (12:51.49)
Yeah, so I think at the end of the day, when we kind of like, usually when I talk with companies and ask them where your biggest bottleneck is, usually that's the databases, at least for the companies and places that I talk with people. And I think there's something really interesting in the database space. Also from a problem space that I think is really interesting to solve.

that honestly I love Linux kernel, but I'm not sure how much you can, if you build a 2X better Linux kernel, I'm not sure how much that would change companies, because it's already pretty well fitted to what companies use this. Databases, I feel like there's so many trade-offs, like incremental views, compute disks.

like separation, OLAP, OLTP, there's so many different use cases and things to optimize that I feel like the gap from like an ideal space there is very possible. And unlike the Linux kernel, I really believe that if we talk in 20 years, you're to be able to run things at least 20 times faster for like a fraction on the cost. And I think the trend is much more interesting there. And also,

Fundamentally also, I think it's kind of like a fun place that combines all the worlds. Like you can have in a single place, like talk about how like ORM migration, like how to write an ORM migration and like CPU cache and ARM architecture. And like that's like a single block of code.

Kostas (14:25.437)
Yeah.

Gilad Kleinman (14:44.534)
not code, like a single lot of product that kind of does both of it. And I think there's something really magical about like how it touches pretty much everything in the software world and how core it is to what companies do.

Kostas (14:59.956)
Yeah. And I think there's also, I don't know, I think at some point, we should do a little bit of like a historical review on that. But historically there was always a little bit of friction between the database and the operating system folks, right? With the database folks being, okay, whatever. I'll manage my stuff on my own. Like, don't need you. Like you're, you will never accommodate my needs. And I think,

Things today or in the future will be even more interesting. There's like couple of publications lately where database researchers start kind of building, try like to build every component of a database based on unicurrents and trying to, let's say, strip down the operating system as much as possible and create at the end like, I don't know, like a nice good fusion between like a database and operating system there.

Nitay (15:50.798)
Thank

Kostas (15:56.916)
Anyway, that's probably a conversation for another episode with probably very interesting things to talk about. let's go back to Epsio and tell us a little bit about, okay, you describe the need of materialization and why materialization is important. I think people also, depending on where they come from in terms of

interacting with database systems might have a little bit of a different perception of what a materialized view is. think someone who's coming, for example, from data warehouses, mean, Snowflake, for example, for a very long time, materialized view was a very, very simple thing. You couldn't even join among tables to do that. And it's probably different what traditionally materialized view has been like in Postgres world.

And for a good reason, probably because people just have different needs, right? When they are doing analytical work, like workload also affects in a way, let's say what kind of requirements we have around materialized views. tell us a little bit more about how EPSIO understands or defines a materialized view. And also with that in mind, who is the user of EPSIO, right? And who should...

really care at the end of the day about what Aptio brings when it comes to materialization.

Gilad Kleinman (17:34.498)
Yeah, so I think basically, again, I really relate to what you said about materialized view, meaning different things in different places. And I also always try to understand the context when somebody speaks with us about a materialized view. I think for us, the definition, think, is basically materialized, meaning something that sits on disks that consists the result of a query that you defined.

meaning it always has to represent SQL query, not some custom code or something like that. It sits within a table and yeah, it's accessible instantly. I think that's like the core definition. Other properties other than that, I think it could like varies very much. And for example, we like to talk about ourselves as offering incremental materialized views. So...

like definition wise, think like for me, materialized view is just a saved captured result of a specific query. And then there's other properties, whether it's incremental, whether it's always up to date, what operators it supports. For example, you mentioned that some databases have materialized views, but don't really support any SQL query that you can offer. So there's a lot of different variations and...

I think basically what we try to do in Epsio is focus only on that and try to offer basically the best materialized views that you can offer, which means that they're incremental, which means that are always up to date, which means that you can throw any SQL that you want on it, whether it's join, CTEs, and we're supposed to announce next week also recursive queries, and also that it can handle your scale.

Because even if your materialized view supports joins and does it incrementally, if it's not effective and you have a lot of changes on your database, that's another property that's really important for a materialized view. So long answer to the question. think materialized view is a snapshot of data. And then there's a lot of variations on top of that. And we just try to offer the best.

Gilad Kleinman (19:56.63)
variation of materialized view that we think is there. And regarding kind of like who it's relevant for and who it's interesting to learn and use materialized views, I think kind of like one of the things I'm excited at least about that is that the use cases are pretty wide. And I think in the long term, like you mentioned,

Snowflake, for example, whether it's more like analytics and more like BI use cases and more like on the data engineer side, I think there's a lot of applications there. For example, running DBT models and materializing every stage incrementally using materialized views or in the space that we're currently more focused on.

the transactional world of Postgres, MySQL, and MSSQL, which is currently where we decided to start from. And over there, basically what we offer is incremental materialized views for Postgres, MySQL, and MSSQL. And then given these databases are usually customer-facing, backend-related, usually it's helpful for backend developers

or backend organizations trying to ship complex features on transactional databases.

Nitay (21:27.87)
So tying back to what you said a little bit earlier, help our users understand, know, if I'm a developer, I've got an app, I'm hitting my database, it's hammering it. I don't have up-to-date results, so I say, okay, I'm going to make a view. I know it needs to be a materialized view because otherwise I'm just running the query every single time. So I do a materialized view, seems to solve the problem. Kind of time goes by, eventually something gets too slow, the lag is too big, like it's not behaving as I want it to.

What's the next step? Like, how do I then know to your point about all the different nuances of different kinds of materialized views? Like, where do I go and what are the things I should know about? What are the things I should think about? To then say, maybe I need an incremental, know, materialized engine engines such as FCO.

Gilad Kleinman (22:15.756)
Yeah, so I think specifically in the world of Postgres MySQL, the variance of materialized views isn't that high. Even MySQL doesn't even have materialized views. It's just a view or nothing. So usually, yeah, I think the flow would be you have a heavy query, probably doing a couple of joins, because usually that makes the queries slower. Things progress. You get more data. You add more joins. That gets slow. Maybe you try a...

materialize view or just in MySQL, just dump the query into a table and refresh it manually once in a while. But then as time passes, either you get more data and the refreshes take too long or your customers complain that, hey, I change an item, but I don't see it change in my dashboard. And usually then is where we usually talk with companies at that point, usually after they tried a couple of

a couple of iterations and try whether it's materialized views or some other pre-calculation mechanism or optimization that they had.

Nitay (23:28.908)
And any sense of like, where is that line? Like where do you typically find, okay, this is the limits of what like your homegrown solution or you're kind of, you know, I'm just going to go and recompute this for myself every once in a while. At what point do you start to need to get to cross the threshold of I need an incremental manager and incremental materialize view thing that's going to manage the entire thing.

Gilad Kleinman (23:51.95)
Yeah, so I think it's totally up to the specific use case. And I think one of the learnings for us in Epsio is kind of like, usually when a company talks with us, we just ask them generally, what have you tried so far to make this query's performance? And if the answer is, we just added an index or, we just did something small, sometimes it's too early. like, maybe like,

Probably if you just need to add an index and it will solve the performance of the issue, the issue of the performance, maybe that's a little bit too early. Usually, I think it's interesting to have a conversation when there's like hours, days, or long periods of time where you spent and you're feeling like you're hitting a well. And then probably I think it's a smart time to talk with us. Because if a materialized view does the job, great.

Like I'm happy to, and I think Postgres materialized views solve a lot of use cases, and that's great. Like I think we are more relevant for use cases where freshness is important, where refreshing them takes a lot of resources, or where just companies can't choose that down that path. But I don't think there's like a specific point of time that I actively say to a company saying,

Using materialized view is wrong. If it works for your customers, for the use cases you're using it, great. Like I'm a big fan of PlusQuest and not going to encourage anybody to use Epsilon for any place it's not, I know it's not valuable for.

Nitay (25:31.31)
I'm curious, tying back to where we started from, tuning the depths of the system, do you guys treat different kinds of query patterns as different kinds of incremental materializations, or is it all the same? Meaning, for example, to your earlier point, I have a query where it's just getting appending tons and tons of data and I need to just do regular roll-ups and plus ones versus a query that's doing

a lot of deltas and replaces, or it's actually doing updates and deletions of past data versus different query patterns versus to your point one that's doing 10 joins every time. Do you guys actually detect that and treat it differently? Is it all kind of one system, one type of materialization for you? How do you think of it?

Gilad Kleinman (26:15.598)
So think internally, we talk a lot about how do we give better trade-offs for different queries, different patterns, things like that. Most of it, try to, I think the way we like to look on it is that there's trade-offs that are always good to give for separate scenarios. And there is a trade-off where it's not always a win-win situation.

But I think like, yeah, we do try to like optimize. Like I think yesterday we published a blog post, for example, on like the way we treat symmetrical and asymmetrical joins differently. Cause for example, if you're joining two big tables and joining one big table to a small table, there's a lot of different things that you can do to optimize each one of those trade-offs. And there, for example, it's a win-win situation in terms of trade-offs. Cause you can very easily detect which option it is.

and know that on option A, you want to do like a joint strategy of a single joint strategy and on a different one, a different joint strategy. So we do try to adjust it based on the query and also based on the workload. For example, we usually put freshness as a first-class citizen, meaning we'll always try to work on small batches. So you'll get like the results.

When you make an insert, the result will be reflected as fast as possible. But if we see that we don't keep up the change, there's a lot of overhead for working on batches. So we slowly make the batches bigger and bigger and bigger, which hurts a little bit the latency, but improves the throughput, which is something you want to do if you see that you're close to not handling the throughput. So we try to, again, I think everything in databases is trade-off.

try to fine tune the trade-offs for the specific, whether it's SQL queries, whether it's the data if you're doing joins between data volumes that are different, or whether it's like the insert pattern that we're receiving, and just offer the best trade-offs for whatever a company throws at us.

Kostas (28:33.844)
Gilad, can you tell us a little bit of how EPSIO works with existing databases? I find it very interesting because materialization is one of these things that usually, mean, touches storage. It's like a lot of the internals of like database, right? So someone would assume that for someone to build, let's say, new way of materialization, probably they will come up with their own like...

database engine or they need to be a plugin or something to like an extension to Postgres. But you mentioned a couple of different vendors that you support. So tell us a little bit more about that. And the reason I want to start from that is because I'd love to hear more about the technology, like how you solve the problem. Right. I think.

incremental materialization is one of these things that I think it clicks to everyone. yeah, it's something that whenever I have new data, I just add something there. It's like the first thing that we're going to think of, but actually depending on the operators that you have there might be like a super complex problem to solve. So, and there might be limitations, which by the way would be great like to hear, like there are operations that operators that

cannot be supported or if they can be supported is hurting the performance a lot. So tell us a little bit more about that, but let's start from the part of how Epsilon integrates and works with existing database engines.

Gilad Kleinman (30:14.274)
Yeah, sure, happy to. So to shed us a little bit light about, I think, also to give context on why we chose the integration also before I dive deep into that, I think as a general context, as you said, think materialization is something that's been around for a lot of time. And slow database queries have been a long time around. But.

like giving your point on like supporting SQL operators and like doing it for real use cases. There's been a couple of like theoretical advancements in the last couple of years. projects like differential data flow, timely data flow, kind of like papers and projects that kind of like lay the theoretical and some, little bit more of that.

for doing incremental calculations in scale for real world queries and leapfrog that into something that could be used for enterprise and big use cases. But having looked on that, we saw all these new technologies pop up. And when just talking with companies and asking ourselves why we wouldn't adopt this, we of...

thought that the shift has shifted from a theoretical one to an engineering or productization one. Meaning if like seven years ago, the major gap for like wide adoption of incremental materialized views was a theoretical one, meaning it wasn't clear how to do these things, how to treat changes, how to consolidate them, how to do that. Now, like these are mostly relatively solved and there's like a good frameworks for dealing with that.

And it's more question of how easy it is to implement. Does it use memory? Does it use disk? How does it work with my existing database? How do I trust it? Where is it deployed? Et cetera, et cetera, et cetera. So kind of like building Epsilon, our general context was to focus on that and focus on how do we make it extremely easy to use, easy to trust, and easy to integrate.

Gilad Kleinman (32:36.802)
And with that, basically the first thing is to work with the existing database a company has. There's a quote I really like from like one of our early customers that said that migrating a database is like changing a car's engine while driving, which I really relate to. So basically we plug into the existing customer database. We consume the replication stream from that database, every database and the specific way.

post-crystological replication, MySQL, binlog, and MSSQL, CBC, consume the stream of changes, and then we write back to result tables that materialize within the original database. So you kind of hide behind the database and kind of expose the interface that the backend isn't even necessarily aware of the fact that there's an EPSO there. Whether it's view creation, basically it's just calling a store procedure within the original database that

company already has, or whether it's coring the database, it's just a result table that sits in the database. We try to make it as seamless as possible to integrate. And yeah, that's pretty much it.

Nitay (33:52.046)
There's a few things you said that make it very interesting approach and perhaps counter to one that many people take, but I know you've thought about this a lot, so I'm curious to hear more in terms of what you didn't say is the application, you're not sitting in between and the application is kind of querying you direct, right? You're sitting off to the side and you're not a, for example, Postgres plugin, right? Famously, like you put PG underscore XYZ, like it probably exists, right? Postgres has an extension for everything.

So why not take one of those approaches that you see many vendors typically do?

Gilad Kleinman (34:24.44)
Yeah, so I think that's great questions. And I think the word easy to adopt doesn't necessarily mean the least steps, but it's also how trustworthy or how scary it is to do that action. And what we found is that, for example, with going down the extension route, first of all, most managed databases like RDS don't support installing extensions that are not officially supported.

But other than that, it's a pretty scary operation to do because if you install a extension and it crashes, your entire database goes down. Or if you're putting a proxy between your backend and your database and it crashes, all your database traffic is going to go down. And I think although it's not like the classical, as you said, PG underscore incremental views approach, there's something that I think is really appealing in this approach because at the end of the day,

First of all, we don't even have permissions to anything other than the result tables that we maintain and read permissions from the base tables. So if EPSIO does anything funky, we can't really mess up anything other than the EPSIO views. And even more than that, since EPSIO is just responsible for updating the results, if EPSIO crashes, the results are still available. They're just not up to date. And when EPSIO restarts, we just continue where we stopped.

and you get the results being refreshed after a couple of seconds after the FCO instance restarted. So I think although it's like not as quote unquote easy as doing a install extension on your Postgres, I think it is fundamentally more easy because it's easier to approve internally, it's easier to trust, it's easier to iterate. We talked about playing with things and seeing how they work. You can plug it on your database, it's not...

that scary because we can't really mess up anything and just allows companies to iterate I think much faster with this architecture.

Kostas (36:31.092)
So, okay, you consume the feed from the database with the changes that happen on the tables that you care about, right? What happens next? What's going on inside the mind of EPSIO before the results are written back to the database?

Gilad Kleinman (36:52.312)
Yeah, so that's where the magic happens. And basically there, try to break it down into the relevant components. Basically, as you said, we consume the stream of changes. And then basically we're a Rust shop, so we're using Rust for pretty much everything. And then every view basically has its own execution engine that basically, when you define a query,

It's break down into a lot of logical operations that can be performed incrementally. So even if you take a query that's a thousand lines of SQL at the end of the day, it's still lot of joins, filters, order buys, a lot of simple operations. So we break that down into a data flow of very simple operations. And then when there's CDC, we stream all the changes through that data flow graph. Now, one of the interesting things is

Some operators are stateless, meaning, for example, filter. If your entire view is select from table where ID is 7, you don't need any context to know how to materialize based on a change. If a new row get added, you check if it's equal to 7. And if it is, you pass it. And if not, you drop it. But there are operators that are stateful. For example, the easiest example is group by, where, for example, you want to know the count of elements in the table.

So another layer on top of that is basically for stateful operators, we also have disk backing it. And specifically, we're using a flavor or kind of a small fork that we have of rocksdb to save data efficiently for these operators. So group by, example, count group by would save the amount of elements. And if a new row gets inserted, we'll just read the last result, do plus, and send that update.

further down the data flow graph. And that kind of propagates until you hit the sync operator that writes back to the database.

Kostas (38:58.58)
Are there any operators that they are not supported?

Gilad Kleinman (39:03.052)
I think recursive is something that we're releasing soon. Other than that, think there are still Postgres, MySQL, these other databases have a lot of functions mostly. I think more than operators, they have a lot of functions. We're trying to support the most that we have, but I think that's still the place that we're mostly just, if a company tries to define a query and it's not supported, some of them we don't yet support.

We try to iterate on as fast as possible. I would say probably the functions is the hardest part just because Postgres and MySQL has so many of them.

Kostas (39:41.46)
Yeah, what about UDF? So it's like the user defines a function that can be like pretty opaque from the outside of what is going on there. So is that something that can be handled or not?

Gilad Kleinman (39:57.08)
Yeah, that's a great question. Right now we don't support UDFs. That's something we have been thinking about and I'm pretty sure that we will support later down the road. Whether it's regular Postgres MySQL UDFs or whether there's like a couple of other vendors that I think are doing really cool stuff that actually, for example, allow you to compile a Rust code as a UDF, then you can...

pretty much do pretty fun and interesting things like calling APIs and doing interesting things there. I think the UDF is a really interesting path to go into. We just know that if we wanna go down that path, we wanna do it really well. So we just haven't started going down that path, but that is a limitation that we have today.

Kostas (40:43.442)
Yeah. So how important are UDFs in the transactional world that you're operating, right? Because again, everything, if there's something that I really love about the conversation we're having today is how many times you use the word trade-offs, right? And it's true. That's what engineering is at the end, right?

I think like in the database world, the workload is kind of the abstraction around these like big sets of common, let's say like train doors. So how important are on UTFs actually in the transactional world compared to something like the OLAP world, for example.

Gilad Kleinman (41:30.818)
Yeah, so I think they are core. I think one of the reasons we decided not to go down that path is that most of the companies we talk with don't really use them that widely. think, and even if they do, usually it's simple SQL operations that companies can replace. I do think that as we expand to additional use cases, whether it's OLAP, whether it's more the...

Kostas (41:42.665)
Mm-hmm.

Gilad Kleinman (41:56.494)
KSQL streaming use cases, I think these might become more relevant. But so far we haven't seen it as a big blocker. And once we do, we'll know to shift focus to focus on that more.

Kostas (42:14.044)
Yeah. Is there any operator that you could tell that it was like the most tricky, like to implement? Like for me, for example, I don't know, like in my mind, anything that has to do with windows is always hard to reason about. So I don't know how you do that also like incrementally, like even how you debug this thing if something goes wrong, right? So tell us a little bit more about like the...

The hard and fun things of the end like to solve, trying to do incremental materialization.

Gilad Kleinman (42:51.234)
Yeah, so I think every operator is easy, quote unquote, without making it performant. And I think the real question, for example, window functions, we started from a pretty naive implementation, and then it got more and more complex as we wanted to improve the performance there. So I think the tricky thing is doing something incrementally, quote unquote, like

That's also a scale, like you can do something incrementally in a very stupid way, and you can do something incrementally very efficiently, and whether it's parallelization, whether it's like making sure you're not doing redundant work, whether it's how you work with RocksDB. So I think that that's kind of like where the scale goes. Specifically, window function was like, think the funnest one. Hopefully we'll publish a blog post on how we did it.

I don't know why, but it's just very logically confined and fun to work on. I think recursive is probably the hardest to work on. Also, it's fundamentally different from the other operators because it's iterative, meaning you reference previous results, which is something you almost none other operators have.

which also really makes you think about edge cases, doing it efficiently is significantly harder. And as you said, the edge cases and debugging and what happens if you have infinite recursion, it's like a different world space that you can't have infinite window function. It's like a whole different world to go into. So I think that was the hardest to think about.

there's also something fun about that being hard.

Nitay (44:48.886)
Are there any corner cases where the performance of the incremental computation actually is so poor that it's almost like not worth it at that point? Meaning like, there any edge cases where the particular operator with a particular type of data and the particular recursiveness, cetera, as you said, basically there's just not that much optimization that you can do to it, period. And yes, you might get out a little bit of savings, but for the amount of compute and effort and like this caching with RocksDB, whatever that you do, it just doesn't end up worth it.

Gilad Kleinman (45:19.63)
So I think fundamentally, the logical question I usually like, or the way we like to think about is how much of your data set changes how frequently. And as long as the response for that is most of the data set doesn't change most of the time or on shorter intervals than you query it, logically, usually that means that doing it incrementally correctly would be more efficient.

cause if, if most of the data doesn't change, like at the end of the day, when you're doing a read, you're scanning all the data set. so it just, just compared to that, obviously like everything isn't perfect and there are like some small, small edge cases, maybe that perhaps we're not doing a left join and the specific option, the most optimal and some of your data set doesn't change. it would be more efficient to not run incrementally, but

Fundamentally, that's the way I like to think about it. And also, I will say, it's not always a question of efficiency. It's a combination of cost and speed. And there are companies that would open a dashboard once a week. And all the data set would change between that week. But when somebody opens that dashboard for it to be instant, so even though it's not the most efficient thing to do, they would still want to use incremental view.

because it offers a better trade-off for their use case of always offering instant results with the compromise of using more compute because all their data set changes all the time.

Nitay (46:57.658)
So to that point, tell us a bit about like, what is the operator or user side of this look like? Like I've got AppCL. I imagine some process in some like K8s pod or something in my VPC or somewhere. It's reading my change log. It's writing back some tables. Do I register a query with you guys instead of querying my database? I take that same query, I send it to you and then I go and I read from the read table.

Do I configure or tune anything? What order of the thing? Do I understand what the lag is? Is that something that you provide? What is the interface to the user?

Gilad Kleinman (47:34.894)
Yeah, so also on that front, we try to be as compatible to Postgres and MySQL. And for example, to create a view, all you have to do is instead of doing create view in normal SQL, you do call EPSIO.createView, which is a stored procedure that install within your database. You give the name of the result table that you want EPSIO to write the results for. And then you just pass as a string the normal SQL query that you would want EPSIO to maintain for.

You run that, EPCO builds intermediate state, dumps the result into the result table that you defined. From that point onward, you could just start querying that result table just as any other result table that you have in your Postgres. You can index it using regular Postgres MySQL, MSSQL indexes, partition it if you want. And on the monitoring part, also there, similarly to the...

that you can query a PGStatActivity or a Postgres table that exposes the stats of Postgres. We expose a couple of functions like EpsilonListViews, for example, that in your Postgres gives you all the views that you have defined, their status and their latency, meaning how far behind they are from that live data that you have. We also have like data dog integration and thing to feed that.

to later systems if and when you want to monitor them. But again, like on the basic interface, we try to be as compatible to the way companies already use and monitor Postgres today.

Kostas (49:18.228)
So there is always a little bit of like overlap, I would say, or at least people who have been exposed to both might think about both solutions there. So you have the materialization that you say, right? Traditionally, it was something that someone had like to go and trigger, like to recalculate the materialization, but now the...

you come here and say, hey, we can do it incrementally. So as soon as the data arrives, your view is going to be updated. And then you have the streaming systems, right? And you have streaming systems that they can manage state. So they are not just stateless operators like Flink, for example. And they kind of feel that they...

fits the same space, right? Or at least they expose similar functionalities. Again, it might be different users, different workloads, but how does something like Epsio does compare to these systems? And should anyone ever consider them as a potential alternative? And the reason I'm saying I'm talking specifically about something like Flink is because

there are, let's say, use cases or actually not so like fling, but like let's consider more something like Pino for example, right? Like something that, you you want to expose like the richness of like SQL like to people, but also one like to keep like very low like latencies compared to when the data arrives and when the data is available. And

I think the persona might be similar. We are talking again about people who are more towards the product engineering. We're not talking about all the app or anything that happens on the backend of the organization there. So tell us a little bit about that. How, let's say, the landscape of these technologies is changing. How do you see Appshare comparing to them? Is it something that coexists? Is it something that competes?

Kostas (51:38.228)
And what do you think is the future of these technologies? mean, Fling has been around for, I don't know, like decade, maybe more. So tell us a little bit more about that.

Gilad Kleinman (51:52.27)
Sure, yeah, happy to. So I think basically, I'll connect things that we talked about previously. And I think going back to the importance, I think, of understanding the higher layers of the usage. think a system like Flink, for example, I think is a good example of how we've seen that sometimes not work out when there's different levels of abstraction that don't work very well.

For example, Flink, they do, for example, have incremental SQL. But for example, one of the issues that we saw companies trying to implement Flink is consistency, meaning in Postgres you have transactions, you do a bulk of things and changes in single transaction. And when you read, you usually expect a similar promise of transactions, whether it's eventually consistent, like a read replica, where you know it's not

necessarily you did an insert and you see it immediately, but you know that like the chunks of transactions still remain the same. options like Flink, for example, they do incremental calculations, but they're very separate from the layer on top. So for example, they don't promise you eventually consistent, of all, meaning if for example, you delete a row, you take out of your balance 50 bucks and you add 60 bucks,

Or sorry, no, you add 60 bucks and remove 50 bucks It might show you that you have minus 50 bucks before it process the events of the plus 60 bucks Which doesn't make sense on a normal postgres Postgres system and it's something that would break and we've seen companies try to implement similar systems and have kind of things break out Because the system wasn't built for the abstraction of the upper system

And whether it's even things like when writing back to the database, you need to do things in a single transaction. So for example, the classical thing would be using Debezium to read from Postgres, push it to Kafka, then have Flink consume that, and then either push it back to Kafka or Postgres or use the built-in Flink JDBC connector to write to the Postgres.

Gilad Kleinman (54:16.343)
For example, even if you have internally consistent, they're still not writing in the right transactional batches that you had. So for example, if I move Costas from R &D to marketing in a single transaction, I want my results to also reflect that in a single transaction and not delete from marketing, commit, insert into R &D, commit. And kind of these things I think are really important.

when connecting between problems from different levels. And I think some of those technologies are great, but I have seen a lot of efforts of stitching it together and learning how these different layers play together and merging them together. And I think that's the major gap today in the market, because there are amazing technologies and like...

they're doing great things, but it's just really a question of how well does it fit into my use case and how much is it, am I building a completely new system or just adding something that helps me for what I already do. And having said that, there are use cases, if you're mostly doing stream processing and you're doing that very heavily and your focus is not databases and things like that, that's great. And these systems, if they're the main focus and that's

where you're spending most of your time, that's also great. But I think they're mostly, they're built to be the center of focus. And I think a lot of the times that that's not what you want.

Nitay (55:55.502)
And I'm curious, last question, because you mentioned something very interesting there. It sounded like you're saying that architecturally speaking, basically, Epsio will guarantee you that you'll never get something out of order in terms of the events, the changes that are happening to the underlying database, whereas some of these streaming SQL systems and others don't have that guarantee. Is that a particular...

design decision that they made, you think, or is that something that's just like inherent in the architecture as soon as you slap the BZM and you do kind of streaming joins and you do these things in Kafka and so forth, that you just kind of have to bite that bullet versus something else? Like, how do you see that?

Gilad Kleinman (56:36.684)
I think it's kind of a combination. think, first of all, Flink, for example, really don't promise internally consistency. So I think by definition, they kind of don't try or claim to solve that issue. There are other incremental processing engines that are internally consistent but still don't do what we talked about regarding the transactions, just because it's really hard.

when you're doing it as with Kafka, for example. Cause for example, if you're reading from a Postgres to the BZM and then reading from Kafka later on, for example, you need to synchronize like the BZM pushes changes of every table to a different topic. And it's really hard synchronizing between topics when you're doing it like that. Like you want to natively integrate with the database because otherwise I'm not sure that it's even possible.

to synchronize cross topics. So I think kind of the truth is in the middle that it's really hard to do it with the existing tooling and as acting like as a very separate stream processor. It might be possible and they might be thinking about it, but I think it's just like pretty far when doing it like that.

Nitay (57:56.526)
Cool, makes sense. OK, we have to wrap or we're out of time. Any last thoughts, anything else you want our users should be thinking of or perhaps kind of where are you going in the future? What do you see the future of FCO and materialized views and so forth going to?

Gilad Kleinman (58:13.076)
Yeah, nothing specifically regarding other thoughts. I think the incremental world and incremental stream processing is a really fundamental change in how companies can approach and query data, putting queries as a first-class citizen and not just trying to be as fast as possible for any query that you throw on databases. So whether it's in the world that we're focused on,

transactional workloads, whether it's like the world of Flink, of stream processing, whether it's OLAP, I think exciting years are coming. And I really think we're going to see these technologies boom and just help companies build better products and make them focus on the things that they're supposed to do and not optimizing database queries or spending a lot of cost on compute. Yeah, and that's it.

Nitay (59:12.446)
I couldn't agree more. There's lots of value to be had in making databases manage data even better while not having to worry about it as a user. So, Gilad, thank you very much for joining us. This was a fantastic conversation. I really enjoyed it and looking forward to having you back with us again.

Gilad Kleinman (59:29.678)
Thank very much. It great being here.

Kostas (59:31.06)
Thank you so much for that.

View episode details


Subscribe

Listen to Tech on the Rocks using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music
← Previous · All Episodes