Episode 29
· 51:45
**Nitay (00:03.981)**
Sergey, great to have you today with us here with Kostas. Thanks for joining. We'd love to start off by tell us a bit about your background and work experience and how it brought you to what you're doing today.
**Sergey Arkhangelskiy (00:16.77)**
Thank you, Nitay. Thank you, Kostas. Thank you for having me here. So yeah, about myself, I used to work at Google for 10 years. Half of the time I spent in search and search ranking, doing very cool and interesting projects. And after I left Google, if you wish, I co-founded with my friends a company called WANNA.
We've been doing computer vision and AR stuff. Essentially the idea is that you point the camera of a smartphone to your feet and you see sneakers on top of them. Later we did the same thing for watches. So you point to the wrist, and then for clothes even later. We worked with multiple cool companies, including high fashion — Gucci, Louis Vuitton. So, you know, I'm an engineering-roots guy, really a lot of engineering. And when I found myself the second or third time in a high fashion house, I was thinking what am I doing here? Like I don't care too much about fashion, but the business pushed me that way. So I had to visit Louis Vuitton, Gucci offices, like many others. So yeah, in 2022 we sold the company to Farfetch, which is the international high luxury marketplace,
where I worked for a couple of years. And then early 2024, I left Farfetch and since then, I guess, took a break and was thinking what are the next challenges that could be interesting for me, and robotics drew my attention partially because this is a really big topic, which I think we as humanity still haven't solved yet. And there is a lot to do.
And my experience in AI products building and computer vision gives me some advantage in this area. So this is how I came to building Positronic.
**Nitay (02:32.119)**
Very cool. And take us back a little bit. I think you said it very quickly, but for many of the folks here that might not know, ranking is kind of, I've always viewed it as the elite of the elite of Google, right? I mean, this is like the heart of what Google does and the heart of like what makes things actually really relevant. And everything else is kind of almost built on top of that. And I remember from my own personal experience, it was like this kind of...
**Sergey Arkhangelskiy (02:52.514)**
Yeah, you know it.
**Nitay (02:55.779)**
Bring off in the corner that nobody even knew, like how do you get into that team? What does that team even do? So take us into kind of those days of like what you were doing, how you got into that team, what the kind of work was, what the kind of work is today maybe that they're doing, like how do you, yeah.
**Sergey Arkhangelskiy (03:10.944)**
Yeah, I agree on the elite of the elite. At least I was thinking so when I joined Google, and you know, I didn't even think that I'm qualified enough to work in that, although I really wanted. So when I started, one particular thing struck me. In Google, we had an internal website which showed you which quantile you are in, in a particular group — which tile you're in as an employee. So for example, how many people joined Google after you and how many people joined search after you, and all of that.
So when I was in search ranking, I think I was around 80%. So people around 80% of Googlers joined after me, but in search I was only on the median. So it means most people —
So there were very many elders, people who joined like pre-IPO and still, I think this is the organization that has the most of them. So for example, my boss who I used to work with in 2017 is still there. And when he was already there, he was — not like 20, but I mean 15 years at Google. So definitely the elite.
In ranking, actually one thing which is very relevant to what I do right now, which I really liked about Google and Google search in particular, is the data-driven approach. So essentially any launch you had, with even the smallest change, you brought through a very well-organized evaluation,
including side-by-side evaluation and live-metrics evaluation. So this discipline has taught me a lot for the future. Regarding the projects, I mean, it's like with the Secret Service — you cannot talk too much even 10 years afterwards. But it was different kinds of ranking projects. So I was doing —
**Sergey Arkhangelskiy (05:35.712)**
As my project before I left Google, we built the ranking layer that combined different types of results — the web search results, the image results, maps results. Before our project, which was called Tetris, these different verticals used their own logic and applied "I want to be second, I want to be third." And we built the infrastructure and the ranking layer
that arranged them together. Not just the template links, but template links and all the other result types out there. So for my fellow Googlers and ex-Googlers who know Tetris — yes, I was one of the first engineers working on it, and I even TL'd this project for some time.
**Nitay (06:27.395)**
And you know what's funny that you brought me back to? So I was at Google in, what was it, 2005, 2006, something like this. And even back then, one of the biggest projects they had that was always their vision was, I think they called it at the time, it wasn't one box, it was like uni search or something like this. And the whole vision was we want to have, like you shouldn't have to search and then press a tab of like, you know.
Texts, images, maps, et cetera, news they had. And then was like Google Shopping or something at that time, as well as an option. And even back then, I remember them having a project of like, we want to get rid of the tabs and just have one search. And this ties to, think, very similar to what you're saying. Why has that never been done? Why is that so hard? Why after decades is that still kind of an unsolved problem? Mark, tell us a bit about some of the stuff that you were doing there.
**Sergey Arkhangelskiy (07:19.774)**
You know, I don't... So essentially what we did is we helped to — I mean, when we started, Google started to have a lot of different universals. So yes, the name "universal" was used during my time. We tried to organize that, I would say, because later we got the instant answers. So you type the question — I would say what we currently have with AI-assisted answers, but
the version which happened before the neural networks, and many different types of results. The results page became, you know, like a Christmas tree to some extent. And you had to organize it and rank it as a whole. So this was the whole idea. I mean, the biggest challenge of Google search, which I think many can still underappreciate, is that there are
hundreds of very talented engineers working on these problems alongside each other. So this is a really big organization which focuses on one page. And I mean, this is just hard infrastructure-wise. It's hard organizationally, product-wise. It's tricky. But I wouldn't say it's not solved there.
I mean, some decisions — for example, even right now when I search for a particular business on Maps and when I search in Google, but I want to go into Maps to see the direction, it's not like, you know, see where it's relative to other points. I cannot go to Maps in one click. I mean, why? So, yeah, still not.
**Nitay (09:14.691)**
How did the very data-driven approach tie to infrastructure level things that you were doing? Because I can imagine and understand how it ties to A-B testing experiments, like we're going to try this feature flag, we're going to go and test and see the lift and the deltas and so on. How did you find the changes of that all the way down to the infrastructure layer and the changes that you were looking to make there?
**Sergey Arkhangelskiy (09:43.608)**
So if you look at what I do at Positronic Robotics, we recently — essentially last week — released a physical AI leaderboard. So the idea here is that we tested four open-source vision-language-action models on one particular task, which is pick-and-place, which is a very popular task in warehouse logistics and industry, and compared them on production metrics like units per hour — the throughput of the model — and mean time between failures, which is the reliability. And the reason why I started doing that is that when I drew my attention to robotics and to ML robotics in particular, I realized that there are many, many cool videos on GitHub, on Twitter, about what
a robot is capable of doing, and the cool advances in vision-language-action models. But I mean, there is still not clear what — is it, are these models ready for going into production and serving the factory floor? How far are we from that point, and which model is the best one for my case, for my task? It just felt to me that
nobody actually seriously looked at the evaluation per se. And my mantra from Google times is that if you cannot measure, you cannot iterate, you cannot improve on it. So this is why we built a physical AI leaderboard. And we also had to, while building this leaderboard, build some infrastructural solution as well.
**Nitay (11:49.219)**
Sorry, I there was a delay on my side. OK, so it sounds like you were taking a page out of Google in terms of what you cannot measure, what you don't have the data for, you cannot measure, and we can't measure, you don't understand.
What's been like the, the status quo that you've seen when you entered the space in terms of what people were doing on the robotic side with models and so on. think a lot of folks here will be, will be familiar with some of the like benchmarking happening in LLMs and, and some of the like imaging models and things like that, but less so in the robotics and physical world. And why do you then need some of the physical infrastructure that you're talking about in order to, to get to some of the quality and the kind of measurements that you're seeking?
**Sergey Arkhangelskiy (12:37.902)**
Well, I think the overall sentiment in the area is that we are approaching, or we are at, a GPT-3 moment in physical AI. What we mean by that is that we can take a model which is trained, not programmed, and it can do stuff for you. It can fold your laundry. It can put the dishes into the dishwasher, et cetera.
One year ago, I think the opinion was that, okay, we have some really nice papers from DeepMind, for example, about Open X-Embodiment, and that there is a cross-transfer between different kinds of robots. So if you take data from different robots, you can put them into the same basket and it will improve the quality. And everybody learned the
"scale is all you need" mantra from LLM models. So I think the overall idea is that physical AI is coming. Maybe it's already there, and we will automate everything and this will be a huge market. On the other hand, I think it's more — it's more still expectation rather than the real case. So I don't think we passed the GPT-3 moment
in robotics, and the reasons — I mean, fundamentally there are two big reasons why we lag and why we think we still will be lagging, and the area will need more and more investment. One is that we don't have the internet data for training robotic models. So nothing to scrape, nothing to train on. So currently there's a big effort
in collecting egocentric data from humans doing operations with hands. And it should give some boost, but definitely it's not the one which solves everything, because you need data on the robot in order to really perform operations reliably. And the second big reason is that it's a real world. I mean, you want stuff to work here in the physical world and it's much slower
just because of that, and much more expensive, because usually you need a human who monitors this robot. And even when you test something, you need to test it on the real hardware to understand whether it works or not. So these two reasons just — I cannot say slow down the progress, because they were here all the time and they are just given for us. So it's not something you can argue with, but
yeah, at the end of the day, you realize that you still need to understand how good these models are. And this is why I focused on building the infrastructure which allows you to compare these models with each other and bring this really
focused evaluation of how the models perform.
**Nitay (16:00.963)**
It seems like almost part of the thing we're missing is the ImageNet moment of this large accepted data sets with known labels and everything that we can use for robotics. Do you think we're getting there with that? Do you think that this becomes your benchmarking and testing platform becomes part of that? How do you see the world solving the data problem, as you said?
**Sergey Arkhangelskiy (16:29.378)**
Yeah, I mean, to some extent — ImageNet was much easier than what we would have in robotics, because ImageNet was about classification, which is much easier to evaluate. When we speak about robotics, in order to understand whether something works or not, you really have to run the policy and to see how it performs. Otherwise you're just guessing.
I think philosophically, yes, the vision and the overall idea of what I do is to build the trustworthy yardstick, which can be used by model builders and by people who want to apply these models in their business, to really understand where we are and what is the gap and whether we will have the ROI on our factory floor or not.
It's partially a dataset — and with the release of PhAIL, we released the fine-tuning dataset, which we used for the pick-and-place operations. And we will expand that. But obviously, in order to have good evaluation, you need to have the closed evaluation — for example, objects which have never been seen before by the trained model.
So it's not only the dataset, it's the whole protocol and the whole measurement methodology, which makes it trustworthy for the industry and for the potential customers.
**Kostas (18:12.484)**
Sergei, you mentioned that people say, like the industry says, that with robotics you are close to the GPT-3 moment. And I want to ask you if you believe that in the physical world and with robotics, things will, let's say, happen and progress in the same way that it happened with software.
And the reason I'm asking that it's and I'm not talking about like the technical side of things. I'm talking more about like the product and the adoption and how things like will happen. And the reason I'm asking that is because in my opinion, at least, and like experiencing a little bit of, let's say like the autonomous cars, like process building them, which it is a robotic.
application at the end of the day, right? Like an autonomous car is a robot. And it's something that has been getting built for quite some time now, right? It's working, but it's not like you go to Bangalore and you get into a robotic car and it drives you around, right? And I feel like we tend to the industry to project the
qualities and the experiences of software to other adjacent industries and that's not actually true like you can't really do that. It's one thing like to have the luxury let's say of everyone like having like a mobile phone and building a web application that is like a chat where you can have like a whoa moment in like just a few seconds because suddenly like you start like speaking to a machine and it's like oh this is like science fiction right.
And it's pretty safe. And it's another thing like to need to procure a robot, get the robot in your house, make sure that the robot doesn't kill your dog and also folds like your clothes all the time. Like the process to get to that point is like completely different, right? And if we want like to accelerate the progress in the same pace, like it's probably going to happen in a different way.
**Kostas (20:35.352)**
Tell me a little bit about that. I'm very curious. never been, I was always like in the software world. I was always like interested in the hardware because it's like such like a different world, although coexists with software, right? But I have this feeling that like things will probably happen a little bit differently. So tell us, tell us about that. What do you think is going to happen there?
**Sergey Arkhangelskiy (21:01.182)**
Yeah, 100% things will happen differently. And I came from the software background as well, right? So the hardware and robotics is still relatively new to me. And I see a lot of people currently coming from software.
And many of us don't appreciate that hardware is hard. I mean, really you have to — with more time, you have to understand how the gearboxes and the electrical engines work and what are the limitations there. Because these things matter at the end of the day. One interesting thing which I see in the industry is that
particularly in the venture-backed industry and in robotics, many companies rush to the household usages. And I mean, I can buy that. It's probably easier to sell as a venture business with high valuations because, yes, you can definitely claim that this thing can be in every house. Yeah, I totally agree. The problem is I don't agree with the timeline. And to an extent, your analogy of the self-driving car
I think is very relevant here. I have a couple of friends who worked firsthand on robotics driving, on self-driving. And they say for 10 years we've had the same mantra that self-driving is one, two years away from now. And yes, it appeared to be much, much harder. So my personal take, and the thesis with which I bet
in Positronic, is that the main usage of physical AI — general physical AI — will start with the commercial applications. Due to the reasons which you mentioned: first it's safer. It can guarantee more safety just by restricting the robot. The operations you work with here are much larger in volume. If you speak about pick-and-place on every single
**Sergey Arkhangelskiy (23:18.968)**
factory floor, it happens hundreds of times, hundreds of times at least per day, right? And while you're folding laundry, a couple of times per week, okay, maybe three, five times. So yeah, I don't fully buy these household applications. I wish I could have a house robot, but I still think we are quite far from that. And the other reason is, like I told you, the hardware
breaks, and I mean, I don't fully believe in the mantra that AI will figure this out. Some things — for example, the gearboxes, if you lose some signal, for example, you turn the motor but the end effector, the finger which you control with it, just doesn't move because there is some lag inside the gear-
boxes — these things, yeah, AI won't figure it out, because the information is just lost. If you speak in software terms, information is just completely lost in this transmission. So yeah, I would expect that many companies who raised the big money and the big valuation will — it's not like disappear, but let's say become more humble. And we see similar stories with the AR companies, for example, right?
But many will probably go for that. We will particularly — I would bet on the companies who are focusing on manufacturing, logistics, production, and more on the vertical cases. But long term, I'm still a big believer in general physical AI. We will have it. It's just a matter of time.
**Kostas (25:06.156)**
Yeah. And okay, like robotics in industry has been around for a long time, right? Like there are industries that they are like super automated through like robotics, like the automotive industry. for example, I remember many years ago, was in Turing in Italy. It was like a company called, if I remember correctly, like the name was like Comau. It's like a big manufacturer of robotics for car manufacturing.
it was a very interesting experience, for me coming like from software, like seeing that primarily because I saw things that in their, let's say engineering discipline are much more like profound compared to mine. And one of them is like safety. Like I remember like walking around and hearing alarms, all the time.
around like, because they were like robots that they were like testing and they had like to make sure like how close you get to that thing because obviously like it can, if you can manage like a whole car, like it's something that can crush you, right? And you see there that like, and the same with autonomous cars, like it's really, really important like to build trust. And the problem with trust is that it takes a lot to build and it's very easy.
like to ruin it, right? Like one accident is enough like to, to destroy it, regardless of many, many, you know, like metrics out there saying that like overall at the end of the day is like safer than like, let's say, having like human like drive. but there are like industries that they are like, let's say more, aware of the value of like having like robotics and they're like other industries that probably they
don't have that yet. Like in your opinion, do you think that in what like positronic is doing or like, let's say this wave of like new companies is like, the growth will come from commercial applications that, let's say like more greenfield, they didn't have robotics before, or it's more about like, okay, let's go like to the automotive industry, for example, they know about robotics, they are building like robotics for a long time, but transform like the way that robotics is done.
**Kostas (27:32.356)**
And we can do it like in a very different way because now we have like, let's say this new like large language or whatever like models. We call them like that. They do robotics in a much more efficient or like more, I don't know, like better way.
**Sergey Arkhangelskiy (27:49.563)**
Yeah, very good question. And what I particularly like in this question is that you put AI robotics in the context of automation, which is already there. And I like to think about this from the economic perspective. So why are these operations automated? The reason is it just makes economic sense, right?
On the upside, you have the number of operations you have to perform — putting the door into the car, right? This is done with robots. And if you have thousands of them and the added value of the operation is quite big, because the car is quite expensive and the doors and the body are also quite expensive.
So this is the profit which you go through automation. On the downside, you have the cost of the machine, the cost of automation — these are the costs which you have. So you need to have your upside larger than your downside, so you can break even at least in some time. On top of that, there is also another concern, which is the precision and the
distribution — humans usually make more errors. So if the operation is fine-tuned and you have reliability — what physically AI changes in this equation currently in order to automate it, it changes the cost side actually. The upside doesn't change, but because we decrease the cost of automation, right now you can automate the operations which
are not that frequent and have less added value. And why does AI robotics decrease the cost of automation? Because you can use the same hardware. You don't have to have specialized hardware for different operations. Yes, obviously you need to have a robotic arm which can handle five kilos or a robotic arm which can handle one kilo — it's different tiers — but overall we can reuse the same hardware for different operations.
**Sergey Arkhangelskiy (30:15.202)**
And the second thing, which actually might be even more important in this equation, is the cost of development. So currently, in order to automate all of these, these are really big integration projects with tens, sometimes hundreds of people working on making everything together. And now AI brings the promise — which we already know is fulfilled by looking at the LLM world and the coding world — that with training, with data, maybe some additional fine-tuning,
you can reach the quality which meets the bar there. So this is why I think people need to focus on the manufacturing and logistics tasks which happen more. And getting back to your question, where it starts — whether with a more automated or with a less automated.
**Sergey Arkhangelskiy (31:09.846)**
I mean, probably both. It's hard to say. Everybody is looking into automation right now. Even globally, the population is declining and this pushes people to look into that. And the only thing which prevents companies from getting robotics inside is still this ROI, the return on the investments, which we have to —
So we need to decrease, and what physical AI brings is a decrease of the costs, and the automation will fill itself. So these are the details. Let me answer — I don't know which one starts, but this is a detail. The trend is to decrease the cost of automation.
**Kostas (31:54.668)**
Yeah, that's a I find it very interesting that and then very important that you bring the economics of the whole thing like in the conversation and I would like to ask you because okay, there's Places and cases where human labor is still much cheaper, right?
And let's say, I'm not talking about here, like, okay, like people might think about like exploitation and like, blah, blah, blah, like all these things. I'm not even talking about that. Right. Like there are like situations where it's still like cheap, like to get humans to go do like a physical task. Right. So in an ideal world, let's say we wouldn't need like humans to go and do this like really hard, not very fulfilling.
tasks, like they are probably like dangerous also. like a little strain of like the human body and all that stuff. But at the end of the day, in order like to substitute the human with a machine, right? There has to be like the ROI, as you say, like these machine, the cost it has or the maintenance it has and the operation of like this thing should be like less than one, having like a human there.
Do you think that this is like something that it's like, what it will take for this like to happen? Because obviously it's not happening yet, right? Like you can go see like in the fields out there, for example, there are like thousands of like people that they just like migrate for the period of time, like to go and pick up, let's say like fruits, right? Why not like do that like with machines?
And the reason is like the economics of that. So what's the inflation point and what is needed in order to be able to do that?
**Nitay (34:19.555)**
I think we lost him. All right, let's, we can continue and he'll come back for his question. Yeah, go ahead.
**Sergey Arkhangelskiy (34:24.494)**
I think I can try to answer the question. So yeah, clearly the different jobs will be automated at different times. But one thing which my career also taught me is that there is no magical full automation across the night, or even not across the night, across one year.
So I am a big believer in the human-in-the-loop situation. And actually this is the reason why on our PhAIL benchmark, we measure mean time between failures, or assists. This is a well-known metric in the automation industry, with the idea of how often does a human need to intervene. And if this metric is big enough, it means, for example, for 100 robots, you need one human, right? If it's a metric where
anything breaks one time during the shift. And if you look at the history of all automations, this is how usually things evolved. So yes, we had manual sewing, then we got the sewing machines. And maybe even simultaneously, many people lost their jobs, but they still found other jobs eventually.
Well, I just think we should be patient and things will evolve naturally. So some jobs — for example, if you look at warehouse fulfillment, already many operations are automated, but there are still some which are not. And I think this might happen in the next five years or even sooner.
And with other jobs where we particularly need a very gentle touch and more tactile feedback, which are harder for machines, this will take longer. I mean, being very cynical, it's still the amount of operations, the value added, and the cost of automation, the cost of running the automation. So yeah,
**Sergey Arkhangelskiy (36:48.866)**
these things — the cost of automation, the cost of running AI — won't drop overnight. It will take many years for these to decrease. And with that, we will see more and more adoption.
**Nitay (37:04.386)**
And you said some, some things I'll come back to this cause I'm curious to hear from you some of the kind of details of where like on the timeline we are in terms of like the technical capabilities enabling some of these things. But, but I want to go back to a point you said before that I found very interesting, which was you were saying that the primary driver around all of these kinds of use cases tends to be cost. As opposed to like, I don't know, throughput or consistency, or even this like time between failures as you were talking about.
Why, how are folks thinking about it when they're looking at these like manufacturing, logistics, et cetera, and use cases and why is that the main driver?
**Sergey Arkhangelskiy (37:44.142)**
Well, I mean, this is pretty much the case with the coding agents as well, right? So as soon as we have the quality, which is there, and the cost, which is there, the usage explodes. If you think about it more generally, and I like how OpenAI puts it,
they're saying this is the cost of intelligence, right? So previously, in order to have the same results, you had to have many software engineers, humans working on that. And now you can have similar things for much cheaper. So this is the Jevons paradox, right? When we say, as soon as the cost of something goes down, the usage jumps up. And I think this is pretty much the same with automation. As soon as the cost of automation goes down,
the usage rate will explode.
**Nitay (38:44.515)**
So I agree with that. The reason it was interesting to me is that typically the cost part comes after, as you said, like you reach a certain quality, right? Like same with LLMs, right? Like initially they were insanely expensive. Even today, many people would argue like some of these subscriptions and the amount of actual token spend is still pretty high, but people didn't care for many years, I would argue even since kind of the early days of LLMs.
They didn't care to pour more and more money into it because they saw the quality was going up and they see that like the quality is already interesting and holy moly if it gets even 10x that like that will be amazing. Right. And so some more and more businesses are pouring money into tokens because they see the impact of the quality. And then in terms of the like true cost savings aspect.
from what I've seen, that's only really starting now. Like literally in the last, like, I don't know, maybe last year or so, not even our business is really thinking like, okay, should I be using more local models, small models? How do I train my context so that I pour less into it but still get the same result? How do I, can I switch out different vendors? All these different kinds of things.
from a cost lens. So from my experience, cost usually comes after the quality is achieved. Now I ask you this because it seems to me like a lot of the robotics use cases you're talking about from listening to you speak that a lot of the quality hasn't necessarily reached it yet. Is that correct? Or is that like, how do you think about it for some of these users?
**Sergey Arkhangelskiy (40:16.942)**
Yeah, yeah, totally. At least from my experience, for the real applications, the quality is not here yet. And the leaderboard shows that quite clearly. When I'm thinking about the cost, it's not the cost of running the models themselves. It's more the cost of what it's worth to switch from the standard workflow which I have
to using these models, and whether these models are cheap enough in order to make economical sense. And so yes, there is a scaling law, and there are some — at least people claim they found or discovered some scaling law in robotics. But I mean, we can talk separately. Maybe there's a next question about this. I think there are still
problems unsolved in robotics from the algorithmic side. It's not like in the LLM world, where we know we need more data and we put RL on top of that. But for robotics, we don't have — yes, we can pull more data, but it's a bit different data than in the LLM. And we still seem not to have solved the RL part, like learning from experience. But this is a different conversation. Getting back to what you asked —
I think yes, so quality needs to pass some threshold and then people start switching, like really — LLMs and coding engines teach us that. So when I speak about cost, it's not the cost of running the model itself, but more the cost of switching to the new technology, so that the whole economy makes sense compared to humans.
**Nitay (42:19.905)**
Yeah, that makes sense. How do you think about it from the variety of use cases to your point? So you're creating this benchmark and then underlying physical infrastructure. Do you foresee it expanding use cases and taking over more and more and more different grip or some scenarios, different kinds of?
arms and all these different kinds of mediums and use cases of robotics? Or do you foresee there being multiple different benchmarks, each focused on a particular use case, particular environment, particular like, how do you see that growing?
**Sergey Arkhangelskiy (43:02.275)**
Yeah, we definitely will — as we speak, we work on making the benchmark better. So our current roadmap has both adding new tasks. In particular, we see the next step as insertion, which also happens a lot during, for example, assembly and different other operations. And also
bringing in more hardware. So currently the robotic arm and the setup we have is a pretty standard research setup called DROID, which is quite popular across researchers. And we were able to find a lot of pre-trained models, which we fine-tuned. But the next step is a bimanual setup,
where it's not one arm, but two working synchronously, which is also a popular setup, and many AI labs and physical AI labs showcase their policies on that. But the overall idea, and the overall design which we have behind the leaderboard, is
to answer the question from the industry: where is physical AI rating? So all our decisions are driven by helping the industry to understand and at least to get the rough sense of what the ROI should be. And this brought us to working on tasks which are commercially viable and commercially interesting, working with real hardware and not the simulator, because the simulator doesn't
persuade anyone on the factory floor — they need it in the real world. And the third one is measuring metrics like throughput and reliability, not the success rate, which many research labs measure. Yeah, so this is our roadmap for the leaderboard.
**Kostas (45:25.617)**
does it work for someone to submit something on this leaderboard? think people are quite aware of how it works in software. You usually get your agent or your harness or whatever. And there are almost standardized right now, how a new benchmark running on
something like hugging face works, right? But as you said, you want to focus primarily on the hardware, using the hardware itself. So that puts another constraint there, right? For whoever works on that, you need to at least have the hardware and do things. But at the same time, how does this work from your side as the evaluator, right? Do you need to have all the different hardware?
like to go and test that like in the real world in order like to, so like the leadership, like how, how, how does this like work? It sounds like much more harder, like constraint than it is in like evaluating like software agents.
**Sergey Arkhangelskiy (46:46.637)**
Yeah, that's right. It's much harder. But I think this is the reason why we still don't have a good leaderboard and a good benchmark — it's just because people don't want to bother with working with the hardware. But at the end of the day, if you really want to build a good benchmark, you have to do that. There's no other way around it. So you have to admit it and work with that. In terms of how to submit —
even now, working with four different models, each model has its own repository, its own training code, its own data format with some peculiarities. We had to build the infrastructure for that. And part of this infrastructure is the inference API. So soon we're going to release a very clear API boundary so that even the
private participants who don't want to expose their weights could just expose the endpoint which implements this API, and we could use this API to run their models on our hardware. I think our main value proposition here for the market is that we are a neutral player. So we don't train our own models. We don't develop our own hardware.
Our own business is to understand how good the AI works on this hardware and on this particular task, which we think is important for the industry. So right now, if you want to participate, just write to me at hi@phail.ai and we will talk and find a solution.
**Sergey Arkhangelskiy (48:38.03)**
Am I back?
**Kostas (48:42.043)**
Yeah, yeah, no, it's all good, it's all good. We never lost you. Okay, that makes sense. So, go ahead, go ahead, sorry.
**Sergey Arkhangelskiy (48:46.37)**
because it's, it's, yeah.
**Sergey Arkhangelskiy (48:55.81)**
No, I was just saying — write to me at hi@phail.ai and we will find something right now, or just wait several weeks and we will release the API very clearly so that you can participate easily.
**Kostas (49:09.085)**
Okay. How are these models different compared to the LLMs that we are using in software? Okay, you talked about the datasets. That's a big difference out there. The training material that is like...
different and obviously much more constrained compared to what we use for the language models. But the differences in terms of their architecture, the hardware that they need to run on, or we are still talking about GPUs here, is the infrastructure out there to host and run these models the same?
kind of someone like use like something like VLLM like to host and do like inference for these like models. Tell us a little bit more about that, like how like a model that operates like on a robot differs compared like to like the LLMs from the architecture of the models themselves, but also the infrastructure around them like to actually operate and use them.
**Sergey Arkhangelskiy (50:40.11)**
Yeah, very good question.
I mean, there are many, many differences. I think number one is that the result of inference of this model works on the hardware and it manages the hardware. And different hardware has different, let's say, APIs, drivers, et cetera. And so there's still an open question of compatibility between different robots.
There's a partial solution that we can think about — I can think of a robot as an end effector with some location in free space and you just need to control this end effector. But this is only part of the answer. And also the fact that it runs on the hardware creates the second difference, which is that latency matters.
I mean, it matters for the LLM world as well, but not that much. Because yes, you generate the answer by streaming and the user wants to see that, but you know, currently the big LLM providers batch your requests together and definitely latency suffers. So I think
longer term, we will come to the solution where we have something which runs locally on the robot or near the robot, and something which runs in the cloud. And we will have different configurations of that in the future. So even now, for example, we have OpenPI 0.5 by Physical Intelligence on our leaderboard, and this model runs in the cloud. So we run it in the cloud and we have the full back-and-forth to do it.
**Sergey Arkhangelskiy (52:36.888)**
Some smaller models run on the local GPU next to the robot and the turnaround is much faster. And I think these are the two main major examples of how things are different for physical AI. And we're still quite early, so we don't know what's going to be the final answer for that, what's going to work. I mean, even right now we have some techniques of how we can hide the latency,
so that the robot has a fluid, continuous trajectory, but still if you run in the cloud, you cannot react to things in 100 milliseconds. It's at least half a second before you can react to the new information which happens there.
**Kostas (53:31.09)**
That makes sense. like, if I understand correctly, it's kind of, it feels almost like the inverse of what is happening like with LLMs where with LLMs, you have the really big, huge models that you probably need to distribute them across like, I don't know, like probably many different GPUs just like to serve them. And you start from the cloud because you need to have the data center to do that.
Right? Like you can just run it like locally. And then you have all the attempts to get, let's say to AI on the edge, right? Which doesn't have, let's say like the same capabilities of like these like really big like models out there, like the frontier models. But it's false to me that with robotics, you kind of have like to do the inverse, right? Like you need to start
from localize, especially if you are talking about like being manufacturing, right? Like, okay, if throughput is important there because you need to have like to generate specific number of like products and the unit of time, like you can't really be like, like, okay, now like I have a fluctuation on my latency here because I don't know, like the data center in the east of AWS is...
weird today, right? Like you can't really do that. Is this the case? Like, does this make sense that like it's more of like starting like locally and then like trying like to get into like the clouds or it's still, let's say like the industry is like trying to figure this out if it is even like feasible like to do that.
**Sergey Arkhangelskiy (55:17.354)**
Yeah, I think it will be similar to what we see in the LLM world, and similar in the sense that first we will have heavy, expensive models which run in the cloud, just because they get to the threshold of the quality which Nitay talked about, in order to perform operations there. And then as a second step, we will have optimizations
which can bring things locally — or maybe, what I expect myself, that we will have a cheaper model which runs more high-frequency near the robot and some less-frequent models which run in the cloud, which can do more like planning and longer-term computations.
Another problem, which is actually very interesting and many don't anticipate, is that if you have multiple robots working on the same factory floor and all of them have to send their videos, I don't know, 10 FPS to the cloud, the internet throughput is taking this down. Essentially the internet is down on the whole factory, and you either have to bring your own internet or
change things significantly there. So I would think that we're still — we would love to see a really good physical AI which works if you have one machine with eight H100s on the factory floor first, and then we will start optimizing as a community of researchers.
**Kostas (57:13.873)**
What's the impact of, or like, let's say how important open source is into like robotics and like the AI use cases in robotics. We see like a lot of things like happening like in software in with because of like open source. And obviously we have like the whole thing there with like the open source and open with like models also, but also like all the software infrastructure around that, right?
Is it like something similar like in hardware and what is its shape? Like, is there like an open source robot? Well, if I want as a hobby, Steven, let's say like to go and experiment with like I can somehow like manufacturing like assembly or like how does it work if it works at all? And what's your opinion on like the importance of open source? Like specifically like in this case?
**Sergey Arkhangelskiy (58:13.4)**
Yeah, very good question. I think actually in the robotics world, open source is even more important than in the software world. And I think the big success story is the Robot Operating System, ROS, which has been around 10 or 15 years. Actually, I know that many, many companies use it. Not everyone is happy, but you know, it's
better to have something which you're not happy with but really helps you do your business than being completely happy. And I also see a lot of very interesting open-source hardware projects. So one notably interesting one is OpenArm, which is the design of robotic arms — a really full-scale design starting from the servo motors.
And I see more and more efforts there. Though again, hardware is hard. And the reason it's hard is because you have to iterate on the hardware, and in order to iterate, you need to run stuff. You need to find where it breaks and you need to fix that. And it all takes time. And definitely many companies have years of experience with creating robotic products. And it's not something very easy to replace overnight.
But I mean, I see very, very interesting open-source projects, particularly in open-source hardware, which is happening right now.
**Nitay (59:57.335)**
Yeah, that's very fascinating. And it'll be interesting to see kind of where the open source versus vendor world ends up in the robotics landscape. I'm curious kind of tying off that and perhaps last question kind of to close us out since we're coming up on time here. What would you want to see the ecosystem doing in the next few years? What's kind of the biggest asks or the biggest gaps that you see that the ecosystem needs to fill alongside kind of the landscape that we've talked about here?
**Sergey Arkhangelskiy (01:00:30.998)**
Well, definitely the measurement side, right? I think we need — definitely it would be great if it's PhAIL, the leaderboard which I built, but any other credible evaluation there will really help, because it will bring this mantra of "we cannot improve what we don't measure" there. And I think for physical AI, this is instrumental.
This is very instrumental. I see actually a lot of papers publishing, overall research is happening more or less openly, though not totally openly, which is still fine. So in this sense, I'm quite confident of the great future of physical AI.
I mean, probably that's it for now. Overall, I'm a big believer and I don't think we have, let's say, very big instrumental roadblocks on the way forward. Probably people just need to be more patient in their expectations. At some point physical AI will break away and start to consume the factory floors and logistics, but it's inevitable.
**Nitay (01:02:01.377)**
Yeah, hardware takes time. As you said, hardware is hard. So the cycles and the investments required are obviously bigger. Cool. Well, with that, was an absolute pleasure having you with us here, Sergey. Thank you for joining us. And we'll definitely have to check in in the future again and then seek on how the robotics ecosystem is evolving and catch up with Positronic again.
**Sergey Arkhangelskiy (01:02:27.704)**
Thank you guys very much for having me.
Listen to Tech on the Rocks using one of many popular podcasting apps or directories.