LLMs Are Giving Robots A Cheap, Flexible Language Interface: Former OpenAI Head Of Research Bob McGrew

AI seems to be a few years ahead of where robotics is at the moment, but AI progress is helping spur progress in robotics.

On a recent podcast, Bob McGrew, the former Head of Research at OpenAI, articulated a pivotal shift in the field of robotics, attributing it to the rise of large language models (LLMs). McGrew, a key figure who witnessed the evolution of groundbreaking models like GPT-3, suggests that LLMs are providing a much-needed, cost-effective, and adaptable language interface for robots, a development that is dramatically accelerating their capabilities.

McGrew’s insight highlights the confluence of advanced language understanding and powerful vision systems, which together are enabling robots to tackle a wider array of generalized tasks. He explains, “I think what’s really changed is that now that you have LLMs, you have this language interface to the robot so that now you can describe the tasks much more cheaply and you have really strong vision encoders that are tied into that intelligence.” This potent combination, he adds, “gives the robots really a headshot at doing generic tasks.”

To illustrate this paradigm shift, McGrew contrasts the painstaking, years-long effort required to teach a robot a single, specific skill with the rapid, versatile learning now being demonstrated by companies at the forefront of this new approach. “So we spent years solving one specific problem teaching a robot to manipulate a Rubik’s cube. And now a company likePhysical Intelligence can spend months solving a huge variety of problems like laundry folding and cardboard and packing egg crates.”

He emphasizes that this rapid advancement is not happening in a vacuum. Instead, it’s the direct result of building upon the foundational technologies developed over the past decade. “And that’s something that they can only have because they’re building on top of existing frontier models. And the entire tech and research stack that we’ve built over the last 10 years,” McGrew concludes.

The implications of McGrew’s assessment are profound. The ability to instruct robots using natural language drastically lowers the barrier to entry for programming and deploying robotic systems. This “cheap” and “flexible” interface means that businesses may no longer need teams of highly specialized robotics engineers for every new task. Instead, a generalist robot could potentially be adapted to a variety of roles simply by describing the new requirements in plain language. Meanwhile, as the humanoid robot form factor becomes popular, it’ll bring down the cost of the production of each robot, thanks to the economies of scale. This new wave of robotics, powered by LLMs and through settling on the humanoid form, promises a future where robots are not just powerful, but also accessible and easily adaptable to the ever-changing needs of the physical world.