Episode 5

Proving Code Correctness: FizzBee and the Future of Formal Methods in Software Design with FizzBee's creator JP

October 7, 2024 · 01:01:28

Kostas (00:02.168)
Hello, JB. It's so nice to have you here. I'm very excited today. And I'm sure Nitai is also excited. We'll see later why. He had some ideas in the past that are very, very related to what you are doing. And we are going to be talking today about some things that everyone that's in the industry has heard at some point, Like formal methods, proofs about like correctness of the code and all that stuff.

But I think very, very few have seen how these things are used in practice. And most people are probably like a little bit scared. Now you are working on a project that is related to that. And I think we have a lot to learn from you, very exciting things. And that's going to be the main focus of this episode. But before we do that, let's do a quick introduction. Like, can you introduce yourself to our audience?

Jayaprabhakar Kadarkarai (01:02.597)
Yeah, thanks, Kostas and Anita. I'm also excited to be here. I'm Jay Prabhakar, you can call me JP. Yeah, recently for our last eight months or so, I've been working on this product called FizzBee. It's a formal methods tools. Before that, I was working at other companies as startups and big companies like Lyft. Clumio was my previous job where I was there for four years.

Before that, I was at Google for almost 12 years and a few years in another company before that. So yeah, most of my time had been on server -side application, predominantly back -end applications and a little bit of ML here and there. And yeah, that's primarily my background.

Kostas (01:56.364)
Sounds good. Okay, before we go to FISB and formal methods, I want to ask you about your experience with Google. You've been there from quite early and been there also like for a while. So I'd love to hear from you how you experienced the growth and the transformation of a company like Google, which is pretty unique and how that also impacted yourself, right? Like as an engineer. And the reason I'm asking that is because I'm trying to build

like the path that got you in what you're doing today. So tell us a little bit more about that.

Jayaprabhakar Kadarkarai (02:33.104)
Yeah, so Google is actually really an interesting place. And most of the modern development practices we are talking about, pretty much Google had a huge impact on them. Whether we talk about big data and cloud, pretty much it started there. Now everyone talks about Kubernetes. And Google was doing it right from its early days. Docker, the container services came. Google open -sourced it as part of the

kernel, then that's where Docker used it to make it usable and popular. So a of things came from Google, and I definitely had some experience working with and talking to people who actually built it, and also the initial design docs and why something is done the way it is. Some of them may not be ideal from today's perspective, but back in those days, it's really good.

One thing I found it interesting at Google was that lot of engineers were actually really good at actual engineering, not just programming. So people would talk about lot of math and probability, a lot of advanced statistics things. And then back to coding, they can do many languages, not just purely coding only, data structures only. Exclusion was pretty rare. But during that time, one thing I learned really

well was how to think about fault tolerance and consistency, especially it's an eventual consistency, which is hard for most people to, and it's hard for actually testing it in the distributed system. So you have to build it at the design phase, you have to say that, okay, yeah, by design things would work correctly. which was, which I thought would be the normal norm.

until I quit Google and joined other companies where it was actually harder. It was not the actual case.

Kostas (04:34.55)
Okay, that's super interesting. Question here, because you said something that triggered my curiosity a lot. You mentioned engineering and coding and you made a distinction there. And you said that in engineering, sorry, in Google, you saw many people that were not just coders, but actually they were like really good engineers. They could design things, engineer things. And then of course they could code too.

What's the difference? How do you differentiate an engineer from a coder or programmer? What term we want to use here? But tell us a little bit more about that. What makes the difference?

Jayaprabhakar Kadarkarai (05:11.313)
Okay.

Jayaprabhakar Kadarkarai (05:20.112)
Yeah, I don't know if there is any standard definition, but for me, think of an analogy. Let's say you're building a bridge. have an engineer in a construction world. Engineer does some work, and the construction worker does another work, right? In a software development, there is not much of a difference between the actual construction worker kind of thing, digging and operating the machinery to designing things to ensure

whether it can withstand a certain earthquake in that zone, or wind, or how would they perform with a certain load, all those kind of things. That's a different, there's a clear distinction between that part on the other end, the actual construction work. That distinction actually is not that clear in software engineering, is what I feel. That could be a reason.

There are multiple reasons that would have led to this. One of them could be the number of software engineers are huge. So not many are actually going to do engineering. It's like for constructing a bridge, you probably need only a few engineers for different, different areas and then lot of construction workers. Now here we have a title as software engineering, not many have to be that way. And then the growth is also slightly different. and since it...

And there is always a debate about a popular statement again. I don't know who mentioned it. It's not include software engineering as an engineering itself is a statement that people have raised before as well. So yeah, to me, engineer and engineer versus developer. Developer is about building things. Engineer is about ensuring things don't go wrong.

when you build things, okay? It's all about what can happen when things don't work the way you expect it to be.

Nitay Joffe (07:24.081)
And there's a really interesting point there that, I mean, what you're kind of highlighting, which I've heard and seen elsewhere is the notion of engineers is a focus on rigor and quality, essentially. Like there's a certain distinction of like the result that you produce is super high quality and you take a lot of pride in that and so forth. And I think, you know, one of the things I've seen many, many companies kind of the approach they take at that is, okay, we have a bunch of, as you said, kind of, know, programmers, coders, et cetera, whatever it is.

We want to have that level of engineering rigor. And so what's the natural next step? Well, let me bring in a bunch of QA people and they will help me solve that problem. And I think that's great. People certainly help and they, they provide a very critical role, but it seems like oftentimes it requires more than that. And I think Google really, really showed that very well, that it's, that it's not just a distinct role that there's a person over here that can make sure everything's high quality, but it has to go into the engineer day to day ethos and the process of how you build software and so forth.

And so I'm curious around kind of what you saw at Google versus the other companies that you mentioned in terms of the entire kind of SDLC and code process that you saw around building systems to your point that you mentioned around building these resilient systems, thinking about eventual consistency and so forth.

Jayaprabhakar Kadarkarai (08:39.848)
Yeah, that's a valid point about QA. It's definitely needed, and you can't avoid it. one thing I noticed, again, since you asked about the distinction compared to my other companies I worked at, the number of QA to the number of developers, that ratio was much lower at Google than at other companies.

I don't know why. I mean, assume it's purely because, or I want to say it's because we had our good development practices. But then that, again, doesn't really address the point completely because the type of products they were building was also slightly different. That said, I'm not sure if I'm answering the question correctly.

Jayaprabhakar Kadarkarai (09:39.464)
The thing again, as you said before, engineers and developers, take pride in the quality. And there are, again, when we say quality, there are different aspects. One is functionality -wise, everyone wants the product to do well. So functionality -wise, it looks beautiful. It gets a lot of adoption and all those things. On the other side, all, an engineer, specifically as a backend engineer,

One thing that we feel really proud of about, can we scale it? Can we do what are some new cool technologies that come in? And we want to be ahead of others. And at the same time, one thing, another thing was working with great people. Even if we don't actually work with them closely, you always hear or read their papers, their design docs, their internal discussions during.

during the development and it brings in a huge amount of experience before. But they have learned a of things that worked and that did not work before, which at many other companies it may not be true. And another aspect is that QA actually, whether we like what we think about, mean, we may not agree clearly, but it actually has a

written on investment at the lower levels. So you can quickly find most of the bugs. Easy to find bugs. It's only harder to find bugs that becomes harder with the QA also, because most of the time they are going to do the same things. That's where many other aspects come in, like formal methods, or even automation testing with fuzzing, and a lot of other techniques come in.

Nitay Joffe (11:28.981)
So let's talk about that then. So to your point, once you go beyond kind of the initial set of things that can go wrong in the basic and you're getting now to complicated, you know, race conditions and starvation and all these kinds of distributed systems problems and things, as you mentioned, what was, I'm curious, what was your first exposure, if you may remember, to some of these kind of more complicated, nuanced problems and maybe the first time you tried some of these formal methods or TLA plus or whatever it is.

What was I like? Because it's obviously a very different world, a very different way of thinking about problems than that kind of traditional programming.

Jayaprabhakar Kadarkarai (12:04.346)
Yeah, actually for me on the consistency and concurrency issues, I don't think I had any real production issues directly on my product, mainly because I had a good theoretical computer science knowledge. So this is an area we all are familiar with. And also in the olden days, or before big data technically became popular, people were actually focusing on atomic.

consistency and integrity, isolation, durability, all those kinds of things. And acid properties were actually popular. It's only when things started to scale beyond that point, in the initial days, pretty much the first thinking is that, OK, let's not focus on strong consistency. It's OK to be not consistent for a while, as long as we can guarantee consistency eventually. So it was a different thinking compared to today, where

eventual consistency is kind of the norm. And many times it's like, okay, everything is asynchronous. And then we try to add consistency to it. Whereas we learn from the other way where we try to keep everything extremely consistent to reducing consistency. So, so since I, most of my exposure to eventual consistency actually came within Google. So we always had like, okay, these are all best practices. These are all the bad practices.

And at least in the initial days, we always had a good design review process because you always had many senior engineers who have experienced it. Because when I was starting the career, I don't have many, I have theoretical knowledge, but I didn't have the experience to know which part of the theory is actually relevant in the real world. and because of that, I don't think, I've faced other scalability related issues, but I rarely face consistency related issues.

And even in Google, in the initial days, again, we were always building systems across cross -region replication of data were always present. in 2009 to 2010, it started to become obvious that people are spending a lot of time ensuring consistency across data centers. So that's where Spanner and other things came in. That made a lot of

Jayaprabhakar Kadarkarai (14:31.536)
development simpler. Whereas in the outside world, we still don't have enough usage for Spanner or related database solutions. So we still, like DynamoDB, you put it in one region. Recently, there is a global databases or global tables, but still it doesn't guarantee any strong consistency that Spanner can guarantee. So it became a little bit easier when it comes to development within Google. That said, since we still have to

store data across multiple places. Actually, the single biggest thing I found valuable is that we cannot figure out what can go wrong and try to address each of the things that can go wrong. Instead, we always have to look at what is the minimum thing that must go right occasionally to ensure our system is correct most of the time.

was a big difference which I noticed from other places. So QA, this is where QA helps in finding out what all can go wrong. But the problem is that addressing each of those things becomes incredibly hard. So now if you are going to focus on only the minimum thing that has to go right for your system to be correct, then it becomes a lot easier. So we can have a much lesser amount of testing and a much lesser amount of...

bugs, even if there are bugs, it's going to correct itself. Now it's only the duration of when things can be inconsistent and that can be optimized gradually. yeah, that was for me a bigger learning.

Kostas (16:15.864)
Okay, that's great JP. I want to go back to the metaphor that you used for engineering and you mentioned building bridges, right? And you have the civil engineer there and you have the workers that go and construct based on the plan that civil engineer has created. And the civil engineer has some specifications that they need to meet, right? For example, if you are in California probably where you have earthquakes.

probably you have to reinforce your bridges, right? But civil engineering also has very mature tooling foundations and best practices because engineering is not just, it's always, it's not like, the difference, my opinion between engineering and science is that engineering has always trade -offs and we are always have to try and balance the trade -offs. It's not...

we prove that, I don't know, exists. And if we've done that, it's a fact. We don't have anything else to do like you do like in science, right? So the tooling there is known. It has been matured all these hundreds of years, probably that we are doing civil engineering. What's the tooling in software and computer engineering, right? What's the equivalent tool sets? And I would say best practices too, because I want to emphasize that best practices are very important.

for engineering, it's not only, let's say the tooling or say the math behind it, the proofs. So in your mind, if you had to enumerate, I don't know, three or a few of them, right, the most important ones, because we can talk probably about these things for a very long time, what that tool set would be.

Jayaprabhakar Kadarkarai (18:06.63)
Yeah, that's actually a good point where civil engineering has been there for many, many, many, many centuries, right from the time humans started to construct the HUT. But in software, it's new and it is evolving much faster than we can even keep up with. yeah, definitely there are trade -offs. So that said,

Some things I, again, how do we ensure things are correct? Now it brings it directly to formal methods because that, maybe that question I would default to that. But then it addresses one part of the problem. One is how do we, and even within formal method, are again different techniques, different categories of things, right?

Ultimately, we need to ensure the system that we built meets the specification. And obviously, there will be trade -offs. But then there are other things like where some, again, let's say the static analysis tools on automated testing and all. So they also help ensure certain amount of bugs.

So automated testing, like unit tests and all, help in finding out regressions. So that is actually a huge tool. But our static analysis tools and all are really unfuzz testing and all can help find bugs we haven't seen. some of them can actually, specifically static analysis can prevent bugs before it happens where, okay, this variable you did not, know, mark it as synchronized or it's not locked.

but access across multiple threads kind of a thing, right? But then those kind of tools are not really available for distributed systems. Most of the static analysis today we have are like a single process in a single code base. Yes, it works. But the moment you talk about multiple processes, the static analysis doesn't work. Especially when the data is going to be in a storage. comes, and so this is where other type of,

Jayaprabhakar Kadarkarai (20:32.21)
formal methods again come in, one another one, is traditionally used to be the most popular one called proofs. So this is way more popular in hardware industry and like space or rocket science, literally, right? So, but then we have to prove and it is way more mathematical. Now you need to ensure that one, your designs meets the specification.

Now your implementation also, can actually do, it's called refinements. At each level, you prove that your implementation meets the design itself. But then since it's so complicated, traditionally, it is available only to a certain class of industries and certain types of problems. And only a very few people can use it. Within advances in the computing itself, now it provides a different category of

It enables a different category of tools called model checking, where we can execute certain amount of our, we can actually evaluate what all things can happen and then explore the state space, every possible state spaces, and then figure out if it would still meet these specifications. So there are, multiple tools related to the multiple approaches. There are some like antithesis.

Yeah, antithesis that does at the implementation level itself. It's running simulations with different thread scheduling and randomizations and different types of errors and to figure it out. And the other class like TLA +, and FizzBee that I'm working on, there are a few other tools that actually are suitable for a design level.

where you hide out lot of implementation details. even because with the antithesis, you're testing after you're building it. So if you are finding some bugs, you're already spent a significant amount of time on this. So in that case, people are not going to throw away the old design and rewrite everything. So most likely, we are going to do patches. But with these tools, we are focusing on the design level so that we can quickly

Jayaprabhakar Kadarkarai (22:58.362)
ensure that our high -level designers meet our specification, then we can see how to implement. Most likely, again, there would always be some bugs that we miss out. But then if the design is correct and people are genuinely trying to implement the system to meet the specification, it usually, even if there are bugs,

they can be fixed because it's going to be implementation bug, not the design bug. And traditionally, also expected that a design bug found at a design time, if it is 1x, if it is found at the coding time, it's around 8 to 10x expensive. Whereas if it is post maintenance, like after release, it can go up to 100x because sometimes you may have to deal with the refunds, customer issues, and a lot of other PR issues as well. So we had some issues where

we overcharge the customer. So now one, we have to refund. It creates more issues. But then during the next renewal cycle, they're going to come and complain. especially if you're a startup, then the sales have to give a huge, even steeper discount. So it affects the revenue as well. So finding issues early makes it avoid all these expenses.

So that's where these tools come in, TLA +, PE, and the FISB. There are a few other tools that can also do performance evaluation as well, which TLA +, doesn't do. So that's also, again, we are including it into FISB. the goal is to verify your design is correct. And since you mentioned about the best practices as a tool, right?

Now, one is best practices. And another thing that we bring in as a senior experience engineer is our past experience. The problem with past experience is that different people's past experience is different. And then with the current timelines for most people, evaluating various scenarios mentally when doing a design review is not

Jayaprabhakar Kadarkarai (25:19.624)
We generally, again, look for common patterns and anti -patterns. And that purely comes from experience. And when our development practices change significantly every few years, most of our past experience is kind of useless. So we are left with only like two, three years of experience on current technology at any point in time. So we cannot really build too much onto our past experience. So this is another issue that, again, I noticed.

with working in my previous job. And it causes another tension also where you point out some design flaw. Now there are like some debates. Hey, this is a modern practice. This has this issue and the trade -offs itself come with issues. Like this is more important versus this is more important. people, if we can verbalize it and get a commitment, then it is easy. But sometimes it is after the fact. no, this would be important. But yeah, that was not part of the specification. So.

There are many of those issues could be avoided with the formal methods where you can actually prove first whether it meets all these requirements. And there are formal methods techniques, again, for performance evaluations where you can say what is going to be your tail latency distribution? What is the expected availability? Can you promise 5 nines of availability or 6 nines? And there are many such things also we can find it out. So that's another thing that.

which many formal methods tools don't support. But again, I have that also in FIWSB.

Nitay Joffe (26:53.947)
There's a lot of fascinating things you said there. think first off in particular, this kind of idea that there's going to be a, their market will be, have this shift left, if you will, similar to what we saw in the kind of security space and DevOps and so forth. Where, because I agree with you, I too have seen, where I've seen formal methods used the most, it tends to be more after the fact. It tends to be the whole program's already written.

And then somebody comes in who has like a PhD in this stuff and says, wait, can actually give us a formal specification that will describe exactly what our algorithm is doing, as opposed to doing it at the design phase in the beginning. And so perhaps, and tying that to, you said something interesting before too, around this whole kind of notion of design specification languages and model checkers has really been enabled by the expansion of the compute that we've gotten, being able to explore the space space in a way that we wouldn't been able to before.

So maybe tying those pieces, I think it would be helpful to give our readers a sense. And I realize it's maybe hard to do over an audio podcast, but first give us a sense of like, what is a design specification language? what do, for most of our listeners, I think we'll be familiar with, what programming is, what coding is, but not necessarily what writing in TLA plus or FISB is like. So give them a sense of what that is like, and then tying to your point about kind of the compute and so on. So what is the right kind of problems that they should think about? okay. For this kind of thing, maybe I should actually use this FISB tool first.

Jayaprabhakar Kadarkarai (28:18.578)
Yeah, OK, so there are two questions. One is, what is a design specification language and what type of problems these tools can use, right? For the first one, so the design, basically a higher level specification. So when you say design specification, like whatever you're writing in a design doc, it is a kind of a design specification. But the problem with that,

traditional design docs is it's predominantly text and the pros, which is free form text, mostly unstructured. Only some structure that you would find is occasionally API specification, some like a database schema. So there are only two things you would see in the structured content and a little bit of diagrams like sequence diagrams and a few block diagrams for explanation, right? But how do...

most of the logic is going to be specified in free form text. being free form text, again, it's not suitable for any automated analysis. Now, beyond the grammar checking, you pretty much can't do anything to that document. And if you look at in the last 25 years, every part of our software development practice have changed except the design talk.

If anything, is one change. Previously, we used to send a design doc in an attachment in an email. Now it is a cloud document. That is the only change that has happened. Nothing else, right? So it seems everyone understands the value of design because finding bugs early is important. At the same time, there are not much tooling. So when you say what the language is, it's just building a structure to

the design specification. And now, like even today, for many problems, we end up describing it as some kind of a pseudocode. Okay. So like you make a call to this server or same thing, it doesn't work. Now you aggregate that data from here and there is going to be some amount of pseudocode kind of a thing. Now we just set, instead of having it completely as a pseudocode, now we are having some defined language.

Jayaprabhakar Kadarkarai (30:42.696)
That's why it's a TLA plus and it's cousin called pluscal actually calls it algorithm language. there's effectively that where you just, can specify like, you what are the core data you have and how the data is going to be processed and how many processes can, and when you change the data, which ones should be changed atomically.

versus which steps need not be atomic so that we can specify, OK, this happens in a single transaction versus two different transactions. In that case, how do we ensure it is going to be in sync? previously, I mentioned what must go become come work correctly once in a while. So that's called fairness. can specify, OK, this is the specific action that must happen.

for the system to be correct. But as some, it can happen, which means things can go wrong kind of things also we can specify. once we can, we specify pretty much it addresses every concern in the design. And now you can also say, like, what are the assertions that you are going to make? some things that, for example, you will never overcharge a customer. That is an important thing. But there are some other things like a liveness constraint. Like eventually we still would charge a customer. It's not like a.

Never overcharge, but we have to charge once in a while. So this comes in. It's a slightly different problem. So we would be able to specify all those in the design, in the specification itself. Now where different tools differ is how much of this information can be coded into their language and the type of the language itself. So TLA++ started.

I mean, it was created by Leslie Lampert, touring about Winner. So he created it in what? In 1990s. And that is for actually publishing papers. So it was designed for researchers and mathematicians who, computer scientists used to be mathematicians. they publish paper, can actually write their design in a...

Jayaprabhakar Kadarkarai (33:06.612)
in a precise language. So this looked more like a math. There are other languages like P, for example, uses some kind of C -sharpish language. But it has its own drawbacks, actually. So for me, in FISB, I found that for most software engineers, the language that is easy to understand for anyone is Python or JavaScript.

So I wanted to define this specification language as close to one of these and I found Python to be a lot more concise for this particular use case. So I have a modified version of Python which removes a lot of libraries that are not relevant and removes a lot of features that are not needed for this particular use case. And then some kind of a non -determinism, we need some additional language constructs, but...

But then the concepts are so simple that most people should be able to understand quickly. so yeah, on a high level, that's a design specification language. Basically, you precisely describe the things that can go wrong, things that can happen to the system, and things that must happen to the system, and what are the safety and liveness properties. So these are the things we have to specify in a lab.

Again, language can be any language technically, but different implementations have different languages of choice. The type of problems where these become suitable, the thing is that all these languages like TLA +, which is predominantly maintained at Microsoft, P, which is developed and maintained on AWS, and FISB, all are targeting distributed systems.

Even though, again, technically, can say each one would claim it can be applicable to everything else, our primary goal is actually distributed systems. anytime you have multiple services storing data across multiple places, then most likely you're going to get consistency issues. this is what these tools can actually help. Systems where you have a single

Jayaprabhakar Kadarkarai (35:30.792)
you know, crudal APIs like create, read, update, delete, and on a single database, yeah, this is going to be overkill. But if you have multiple databases, multiple services, then almost always this would become helpful. where the, again, even though it's helpful, now people always talk about trade -offs. So this is going to be important, right? With the TLA +, it's so complicated that only people who

have a very high stakes end up using it. If the stakes become low, the value prop may not make sense. But if the language is easier to use, people are going to use it. So a few, again, 10, 15 years ago, most people were not doing unit testing. Unit testing, like, why should I do it? Even today, people don't talk about test -driven development is not that popular. It doesn't matter how much we preach.

people do unit testing. If I go to any company where I say that, okay, I don't want to do unit testing, it doesn't help, they will laugh me out of the company. But 15 years ago, people were doing it. Like, no, I don't want to do it. And what made the difference is one is that value proposition become obvious, and then the tooling became more approachable. continuous builds are so popular.

And so you don't have to run anything on your own. every new language built comes with a unit testing framework. Like Go, right from the get -go, came with the unit testing. I assume Rust also would have come with it. So same way, once we have tooling, if things are easier, the type of problems for which we can use this solution would become, it would expand. So to...

Going back to the first question, if you are, type, sorry, second question, what type of problems, where this is going to be useful? It's simple, if you have data across multiple databases, are processed by multiple services, then you need multiple services in the sense, multiple processes or multiple threads. even if you do exactly same service, but you have two replicas, there is a chance it's going to mess up. yeah, that's a thing.

Kostas (37:59.606)
Yeah, JB, what would be the hello world of Fisbee? You know, like we have for every introduction to like a computer language, right? Like we have hello world as the way like to show like the syntax there, right? Here I would say it's like a little bit more complicated because it's not just the syntax that we want to demonstrate, but also some concepts, but...

Okay, if someone wants to go and play with FizzBee, I don't think everyone can come up with a distributed system to go and play around, right? So if someone wants to go and get a taste of what FizzBee can do without doving into like really complicated systems, what's a good example for that in your mind?

Jayaprabhakar Kadarkarai (38:56.528)
Yeah, actually, if I know the answer, will be better, because I'm also exploring that Hello World of TLA plus and P and all this. So each one, if you look at all these systems, either they're demonstrating the system, like for example, I also have a huge tutorial where each syntax is being explained, but that is with a small example. But they don't show that kind of a value either of why it has to be used.

But obviously, if we say that, it should be simple. Hello World technically is about getting started. So you just have to be able to build. the first example that I generally use is that I have a single action and then it can, like a coin toss. A coin toss, you have two possible values. Okay, now you can show that it's a non -deterministic.

Now we can say, if you repeatedly toss a coin multiple times, eventually you are going to get a head or eventually you are going to get a tail. So just want to have an assertion with things like, if you say it's always going to be a head or always going to be a tail, it's going to fail. But if you are going to say eventually we are going to get a head or a tail, it's going to pass. And that is another one, stability kind of a thing. There are a few examples we can point out with one.

And I also use this example for performance evaluations kind of a thing also. Now we can say what is the probability of these two even happening. In this example, it's obvious by default it's going to be two choices, 50 -50, but we can specify different ratios. And the reason that coin task example is also interesting from a performance and probability estimation is that

It's about, let's say head equals RPC success, or API call succeeded, and tail is a failure. Now your ratio is not 50 -50, but maybe 99 % and 1 % error rate. Now, if you do retries, what is the expected number of coin tosses? And how do you ensure, say, finance of availability means that how many coin tosses would be needed? Now you can add the latency.

Jayaprabhakar Kadarkarai (41:19.438)
and with a different distribution graph. So I use that example to start with and then add different properties to it as a description. So that's the hello world, but it doesn't really describe the benefits. Like why will someone use it? In that case for something that is obvious benefit, I generally say take two -phase commit as an example. For example, let's say you're building a

you're booking for a travel. So you need to book a flight, hotel, say car rental, and the payments through bank. Now all four have to agree. You can't say that, OK, two of them succeeded, one failed. It's bad. So everyone has to book or complete the transaction, not none. But they are distributed across multiple services, across multiple entities, right? So the standard technique is a two -phase commit.

So I generally use that as an example and see that how this can be implemented. the description looks more like a very close to the pseudo code. And then I can also show the, I'll also show the visualization. Like, look, this is all the sequence diagrams of each event. If one of them rejects, these are all the things that can happen. Now you can specify cases where basically if one of them agreed,

one of them committed, no one else will abort, that we can guarantee. And same way, if one of them aborted, of course no one else can commit. So that is a safety property which we can actually prove. Now, liveness problem is what if you, basically the two -phase commit solution is you do a pre -auth kind of a thing, like how you block a credit card, and then you release it, right? If you abort it, you can release it, or after completing it,

Again, you can say, OK, it's done. Or you charge for it. Same solution for, but here liveness properties also we can prove that, you did a pre -auth, you blocked it, but the thing is that if your transaction aborted, it should not be in that state. You either commit it or abort it. So those, again, we can demonstrate and what is the minimum that would be needed and those kind of things.

Jayaprabhakar Kadarkarai (43:40.87)
I generally take two -phase commit for a slightly meaningful example because it's very simple and most people are familiar with this problem in the first place because almost everyone booked a travel using travel website.

Nitay Joffe (43:54.113)
And that's interesting examples, JP, because one of the things I've seen a lot, I'm curious if you've seen this as well, is a lot of engineers today, where they run into this problem, they assume that it's going to be solved for them already by some Paxos or Raft system. I'm going to bring in some core layer like that, just talk to it, have it do all the coordination, distributed consensus, et cetera, and then I'm fine. But then what oftentimes I find happens is,

You start with a system that seems simple. talking to some data store. I have this distributed consensus. have a bunch of nodes. Okay, like it works. And then over time, you start adding another data store, some other thing that you call, and the engineers themselves don't realize that like, hey, you actually have a distributed transaction now. You actually have something that you need to coordinate across multiple nodes, even though you have that distributed system that's helping you, that distributed consensus system that's helping you.

You're still talking to multiple nodes and you still have multiple servers writing to multiple places. And you're still going to start to have these sorts of truth problems and so forth. And so I think it's this interesting thing where like most engineers today are not themselves going and, know, writing a raft implementation or even having to write a two -phase commit, right? Like most problems are not that, but given the current state of, know, all the different cloud offerings and all the different data stores and all the different ways to do.

know, distributed systems and event -based systems in real time and so forth, that you very quickly end up with this kind of distributed transaction need, right? And then you do like saga pattern or you do like various different things. And it seems like that's more and more these days, the kinds of applications where you should be like, wait a minute, like let's bring in something like a Fizbee, have a design specification here that actually makes sure that like once we have all these different moving pieces, we're actually doing it right.

Does that kind of align with some of the stuff you've seen?

Jayaprabhakar Kadarkarai (45:46.544)
Yes, yeah, that's true. of course, if I'm using MongoDB, I'm not going to model of back source into it, or raft, Mongo users raft. But if I'm building applications, so that's what I mentioned, if you have two databases, some things are going to go out of sync. So yeah, that's why people use either one of these protocols and like you mentioned saga pattern where like...

We first write to one, once it succeeds, you publish to another one and so on. Now again, we are to guarantee that the second one, first one succeeded, but second one must succeed at that point. It cannot be indefinitely incorrect. then interesting questions would come in, like how much delay is acceptable from the first one to the second step? Now, those are all again, performance modeling and other things also would come in. So yeah, that's again.

Most of these architectural patterns, Whenever you're going to use it, most likely, FisBee is going to be helpful. And again, when I say formal methods like both FisBee, TLA +, and other things, it's just that I feel FisBee is easier to use. So if I'm saying FisBee, there are other tools also, do it.

Nitay Joffe (47:04.497)
So that's touching on that point, because there's something you mentioned earlier on that's a really interesting point, which is TLA +, historically to your point, I think one of the things that's maybe kept some folks away from it is exactly the language itself. It's not the most intuitive thing to understand and get your hands on. And it's a very different way of looking at things. And so I wonder if there's kind of, you mentioned earlier in our talk about the

the wave of compute enabling more usage of formal methods. I wonder if the other aspect is actually the language itself, the UX or DX, if you will, the experience for the person. And so there's a very interesting conscious decision that you made, which is making it Python based and Python obviously being a lot more accessible to most engineers. And so I'd to hear more about that and kind of how you interweave it. I know you're using the Stalark language, if I'm not mistaken.

And so how you kind of weave into Python, why Python and how that plays out.

Jayaprabhakar Kadarkarai (48:05.544)
Yeah, obviously, like as you also mentioned, TLA +, the language being complex means people are not, we find it hard to get started. And almost always, if you see the design phase, like first a few people work on, even for most complex system, people work on it for two, three weeks. After that, majority of the time is on actual implementation. So if the learning curve itself is like two, three weeks,

people are not going to use it unless there is a really top -down mandate. So that actually means that people are not going to use it. And now when it comes to this tooling, usually has a network effect. We need some people to use it, then other people become familiar with it, and then more people will use it. And one thing, again, since I have used your TLA++ before, but if I end up coming back to it just six months later,

Again, I find many of these syntaxes, like many of the things I get lost because it's so alien. that's why I believe, yeah, easier to use language means that one thing is out of your mind. Now you can actually focus on the problem that you are solving. And obviously, Python, if I look at any interview problems or anything, almost everything is in Python.

If you look at the sample code on most of the design interview preparation sites, it's actually obviously more accessible. again, JavaScript was another one I didn't choose. I mentioned before also, I was considering between JavaScript versus Python. My first preference was actually JavaScript. Then I can actually do lot more tooling in the browser itself. I can do a V8 engine and then put it into, what's it called?

Isolation is a lot more cheaper than running a model checker on the cloud. But then the language was still lot more robust for the problem that we are solving. Python seemed way more easier to use. But two things here. One is when you say easier to use, easier to write and easier to read. If I take reading as a thing, Python becomes even more easier.

Jayaprabhakar Kadarkarai (50:30.62)
compared to JavaScript itself. So, a lot of language choices with Python are like about conciseness. So, and it makes it again easier to embed Python -ish script into the design doc, but I can say that, look, this is how I'm modeling this particular segment. This is the algorithm. It's again, so that was the reason I was choosing Python as a base or Python -ish language.

And Starlark is a subset of Python, so that makes it even easier. And it has a very good APIs in Go, and Bazel build system uses it. these guys have Facebook also has its own, I think, Buck, right? If I remember right, that build system also uses a variant of Bazel. Sorry, variant of not Bazel, sorry.

variant of Starlark language or Starlark implementation. So it actually has a very good support. And since it cuts down a lot of libraries and lot of other things that I don't need, it becomes even easier to ensure security. So I don't have to worry about accessing file, randomly being able to access file system or other things when I'm running it on the cloud. other, yeah, so that was another reason to choose Starlark instead of actually

using CPython or RSTandard Python implementation.

Kostas (52:03.98)
JP, we talked about when is, let's say, the right time to engage with creating a model and formally trying to prove the validity of the model. But what's the life cycle of it? Is it something that, let's say, someone designs at some point, runs, everything looks amazing. There's no problem.

build the perfect distributed system out there based on the requirements that we have. And then we forget about it. We just go and implement and that's it. Or there is some iterative work that happens or we revisit this thing in the future. Like how is, let's say the, and I'm asking that because I think,

People understand, for example, easily for testing how testing feeds their everyday work, right? Probably they don't, they cannot relate on how something like a formal method would do that. So tell us a little bit more about that. Like how often we have to get back to that.

How many people should have access to that? Is it only the architects of the system, for example, that consume these and work on these? And the rest of the workers, let's say out there, they don't have to get access to that. Tell us, and I'd love to hear from you both what you've seen happening out there and if it's different, what you would like to see out there, right?

Jayaprabhakar Kadarkarai (53:49.992)
Yeah. Yeah, you made a valid point, right? Where in the initial phase, people get excited. then usually one person in a team or a company decides, OK, this is a cool thing. I've seen it being used. And it's useful. They build it once. After that, they don't use it often. again, there are multiple reasons. Usually, it's like

The person tried to convince others to use it and they couldn't convince them. That's what I've seen it multiple times. then like I'm previously again, the value prop, it's like for an incremental additions, they don't want to go back because even the person who implemented it would have forgotten the syntax and the nuances and all. like, okay, screw it. And another reality of the situation is that

If you look at the software development lifecycle, design phase is short in the front. Most of the time is spent on development. Again, people say maintenance, but it's usually post -launch continuous improvements. So during that phase, it's tempting. Today, again, how often people go back to their design docs and update it. It's not that frequent. So that is a reality with whatever I've seen. People use TLA+.

But then both in Amazon and MongoDB, if I remember right, they both go back and keep updating designs. They have full -time engineers actually dedicated to this. again, are a few, like again, AWS is the biggest proponent of formal methods in tech industry. They, like most of their cloud teams actually, pretty much everyone are familiar with P and TLA+. They had a lot of trainings.

So most people, if they're making any meaningful design change, they go back and update and run. This is one good thing compared to traditional design docs, because there is no value in updating a design doc. After the initial design and design review, most people don't go back. Whereas here, at least it proves value that your design changes are going to be like you. The previous assumptions are still valid.

Jayaprabhakar Kadarkarai (56:15.4)
So those kinds of things can be done. But as in a traditional design doc, if I'm making any new design increment, incremental design, the changes would be in a completely different design doc. There are two reasons for it. One is that it's practical, no one wants to touch the old one. And again, political reason, if I'm going to apply for promotions, I want to show I have a complex design doc that I wrote compared to updating an existing one, right? So these two factors exist, whereas with the TLA +, or FISB and this kind of...

It gets checked into the code, so people are going to use it. They have to update it. So what I say is that, I want folks to continuously update their design whenever the specification, whenever their design changes. But then in the current phase, it's, again, technically not easy to enforce, except for someone saying that, you must do it. But then people are not going to do it.

So it means that's where the value add, like we need to find out other value beyond just model checking alone. So one thing I want to note, I'll get to this later, but after another point, but you mentioned how many people in the company should have access to it, which I believe every engineer working on a certain or whatever component should have access to these because

If they are going to implement it, they need to know the design and the assumptions, right? Because we need to know if I'm updating two variables, should they have to be in same transaction, or can they be in a different transaction in the first place, right? So even those kind of things have to be specified. And now they are captured in the spec. So now it would be possible. anyone who is writing code should have access to the design. Otherwise, they're not going to do it correctly.

the additional value add would be is coming up with like an automated test case generation. So although ideally code generation would be nice with the more and more systems like NLP coming in where, not NLP, what's it called? LLMs, launch language models coming in. And I am extensively using Copilot and ChartGPT for many code generation, right?

Jayaprabhakar Kadarkarai (58:42.93)
but then they make mistakes. So now this is where if we have a design spec, technically even that for a complex design, can actually give like, this is my complex design, it's very specifically, like everything specified clearly, our generations would become a bit more easier. And then we, but even if we do it now, either another engineer implements it or an automated tool implements it, I would need a way to validate that.

whatever I implemented, matches the specification. So that part is, again, another hard part. There are very few research into this. The traditional approach is called refinements. Basically, take one piece and say that your code, you prove that that particular one matches the specification. Now you build the multi

a next layer and it's more like a peeling onion where each layer you prove next level subsystem would be matching the specification. But that's actually not feasible in practice. So it's too complicated. So this is where automated test case generation comes in. Different companies have experimented with these where from the spec we can say what are the properties.

what are the scenarios we need to test and then from that we can actually validate our implementation, whether it is correct or whether it matches the specification. Again, correctness itself, we can't define what correctness is, but from the spec we can actually validate. yeah, that's another area. There is not much research, but there are a few. MongoDB has experimented with a couple of approaches related to that and they found success in one of them.

with the test case generation. And some have also tried related to observability. Now, you have the spec. You know what are all these states the systems can be. It's called model -based trace checking. You have logs. have data stored across multiple places. You have lot of observability metrics. Now, can we validate that whether your system violated any of the constraints?

Jayaprabhakar Kadarkarai (01:01:04.156)
In that case, can actually get alerted and you fix it before the customer notices it or before the damage is too big. So there are other areas. I believe that would be ideal if we can do both test case generation and observability to ensure our implementation is correct. Now there are, again, things can go.

Further as well, like antithesis is an example where they are running simulation on the actual implementation. What if we can feed the spec into it to ensure whether their entire system, actual implementation matches the spec? So those are other areas that I need to expand into. But the test case generation is something that after visualization, that was something I would be actively working on.

Nitay Joffe (01:01:58.491)
That's really interesting and I'll come back to the test -gate generation and the LLM stuff in a second because there's some interesting points there. But just one more question on the workflow side. So let's say you have somebody who's using FISB or formal methods properly and using it early on in the design phase. So at that point, what specifically, do they still need a design doc? If they're still writing a design doc, what goes there versus what is now in the...

in the model language, how do you see the two of them working together or does it completely replace the need?

Jayaprabhakar Kadarkarai (01:02:32.85)
Yeah, no, it's not going to be replaced because it's a specification. Whereas a design doc is very good at explanation of describing. You can describe why you do what you do. But here you can say, is what I'm going to do. So these two are going to be different, where you may want to give more explanations about a scenario. These are things that can happen, so we need to address it.

what user requirement or like either explicit requirement or an implicit requirement that this segment of a design is going to address. So those are things that have to be captured in the design doc. Now we can say that now here it is like what in the spec it's purely like what we are going to do. Like the why are not usually captured.

in this, right? It's like pretty much any paper, there would always be some specifications of an algorithm, but a lot of descriptions before it, right? So they both add value to it. Some things can be captured here, but for now, believe the descend docs are not going to go away. It has to be there.

Nitay Joffe (01:03:59.451)
That makes sense. And so then going to the testing and LLM topic. So there's an interesting point you raised there, which is, it seems like a lot of the potential errors that can come about are essentially in these different translations that are happening, right? Meaning some engineers coming and translating the algorithm in their head to the FISB spec and then translating between the FISB spec to, or TLA plus or whatever, to, you know, C++, Java, Scala, Rust, Go, et cetera, code.

And then furthermore than that, then to your point going and writing test cases in that same language to verify that their Rust code, et cetera, is operating properly, but then also matching the model. So there's multiple levels of translation here. And so I'd love to hear maybe anything more in terms of some of the explorations you've done around the test case generation and then tying it to LLMs. One of the things I'm curious is,

Do you foresee more and more utilization of LLMs to help do those translations? Like, will engineers be writing a formal method and then the LLM will take that and write the code for you, for example?

Jayaprabhakar Kadarkarai (01:05:08.552)
I know it's a little bit of a touchy topic. People don't want to say that they can be replaced. it has happened to every industry. So many jobs are taken away by LLM automation. then things, still people. It's just the nature of job would change. People would upgrade to a different job. We are not going to say that, I'm going to do only manufacturing job.

gone away. automation is real. So this will happen. Now, it's not going to be complete replacement of everyone all at the same day. There will be a small segment where it will be replaced by another one. It's like using libraries and other SAS. At some point in the past, people would have built all the monitoring tools in -house, or built tools, except for make. But then,

Most people would have built lots of wrappers around it to make things work. Now you bring in something else, you don't even manage it. for us, it's an external system. Same way coding also. It's like I see it as like different layers. You have a mission layer. No one likes assembly code anymore. Compilers you describe in a slightly meaningful language and then it converts. Now compiler, it's a fixed set of rules. What if the, it's just one.

conversion from one language to another. So I see LLMs could be in that phase. then it's, again, a bit naive to say it's exactly that, that language translation, because where software engineers come in is actual intelligence, where your specification is not complete, but then it

the humans use a lot of heuristics to translate into the working code. So now can LLMs do it? But based on the trajectory, feel it can happen. It's just it may be slow and it's not going to replace all software engineers in the next few years or few decades. It's going to take a long time. even otherwise, it's very close to how

Jayaprabhakar Kadarkarai (01:07:34.416)
you're like, again, outsourcing industry used to work. I don't know if it still works that way. But you have a few people designing it. And then you have a huge army of programmers who take those design and meticulously translate. There are also different layers of design. are high -level design doc. And then there is a low -level specification and then actual coding. So it can happen at the different layers. Things may get re

LLM should get that option. ideally, yeah, so it will happen, but I don't know when and how long it is going to take.

Kostas (01:08:13.996)
Hopefully we'll be here to see what happens and have more opportunities to talk about that stuff, JP. We're close to the end here and I want to ask you two questions. One is, first of all, are there some projects out there that use FISB that people might be aware of? I think you mentioned at some point something like...

Jayaprabhakar Kadarkarai (01:08:16.52)
Yeah, we're...

Kostas (01:08:40.384)
the iceberg specification, which people can go and see what it means and how it looks like in a real complex system. So that's the first question and then I'll ask them the final one. So give us a few examples.

Jayaprabhakar Kadarkarai (01:08:58.632)
The largest systems modeled are Iceberg and Paimon by a confluent engineer, Jack VanLightly. He blogged about it, tweeted about it. And interestingly, he actually found a bug in Iceberg, but it's actually in design doc, it was not explained. But in the code, was there. So he was able to model it and then he raised a bug. Okay, no, in the implementation, we have it.

It's not specified. yeah, it did find kind of an issue, at least with the spec. So it also gives an indication that if someone else is going to do another implementation by this design doc, that they might miss out. But if we have a spec, then it would be obvious. So from the large systems, are the two.

two large ones that were modeled. And there's a Shopify engineer who modeled it for another production system, but it's a much smaller system. And he didn't blog about it. It's kind of some proprietary thing. So other than that, people haven't reached out to me, but there are some bloggers and they have like, okay, I'm creating a blog about it. kind of that has happened.

Kostas (01:10:25.366)
Yeah. So people please reach out to JP. Okay. If you have built something like we need to put, yeah, we need to be to put everything in place. So people can go and see and learn and get inspired. Most importantly, I think about that stuff. I think that's the most important. All right. And one last question. So what's next about, Fizbee and where people can learn more about it.

Jayaprabhakar Kadarkarai (01:10:29.579)
yeah.

Jayaprabhakar Kadarkarai (01:10:48.38)
Yeah, frisbee .io, that's what I've said. has a lot of tutorials and simple examples and some bigger examples. It's again an approachable example. And the first thing is that it comes with an online playground. that way you can actually, you don't have to install anything, just play around online itself and get the feel for all the languages.

as an online playground, again, since it's not a production system that you're not going to run Fisbee in your production, right? So it's going to be like a design doc. So you can just use it as long as you're modeling a small one, you don't have to install it. But if you are going to model more complex systems, right now you need to, yeah, install it because I have some explicit time limit, 20 seconds of model checking. Otherwise I don't want to have someone do an infinite model and run my build at least.

Other thing is the visualization is one that I'm currently focused on. that way, if someone is writing a design doc, they're going to create all the lots of visualizations and explanations and all those things. So I want to build from the spec as much of those and then have an LLM component where you can ask questions to it, like, why do I need this?

It can say that if you don't have it, this safety violation would happen. So those kind of things it can actually answer. So that is going to be the next. So that way it handles the design review and all. The performance evaluation, again, is something that you need to install it. It's not part of the online playground yet. But biggest from getting started, fizzb .io and online playground is a...

Easiest way and if you face any issues just ping me on LinkedIn or JP at Fisbee .io Yeah, that's that's it. I will probably soon start a discord or a slack channel. I'm still confused between these two

Nitay Joffe (01:12:57.841)
Yeah, I was gonna add to that. You have a GitHub project, right, as well. So how should folks, if they're looking to contribute or get feedback or so on on the project, how should they do that?

Kostas (01:12:58.784)
Yeah, that's a -

Jayaprabhakar Kadarkarai (01:13:08.136)
Yes, of course. get up racing and get up issue. So first step, I mean, for any suggestions or any help, that's another area.

Kostas (01:13:24.248)
That's awesome. JP, thank you so much. This was an amazing conversation. I hope to repeat it really soon. Both me and Nitae for sure will keep an eye on the project. Hopefully play around with it also. And yeah, I think the future is bright and looking forward what's next for you and FizzBee. So thank you so much.

Jayaprabhakar Kadarkarai (01:13:50.729)
Thank you. Thank you, Kostas. I had a good time. Thank you.

Nitay Joffe (01:13:50.821)
Thank you, JP. It's been a pleasure.

Nitay Joffe (01:13:55.953)
Thank you.

View episode details

Listen to Tech on the Rocks using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

Proving Code Correctness: FizzBee and the Future of Formal Methods in Software Design with FizzBee's creator JP

Subscribe