Google's latest AI gives robots the skill to fold origami and zip bags delicately
Follow us
Google’s New AI Models: A Leap Towards Smarter Robots
Google DeepMind is making waves with its latest announcement of two innovative AI models, Gemini Robotics and Gemini Robotics-ER. These models are designed to give robots a better grasp of the world around them, allowing them to interact with objects more delicately than before. Think of tasks like folding origami or sealing zipper bags without causing any damage—these are now within reach for robots.
A New Era for Robotic Intelligence
Robotics hardware is advancing steadily, but creating AI that can guide robots safely and accurately through new situations has been a tough nut to crack. This challenge, known as "embodied AI," is something many in the tech world, like Nvidia, are striving to achieve. Google's new models are a step towards making robots versatile helpers in the real world.
The Gemini Robotics model is built on Google's Gemini 2.0 large language model and introduces "vision-language-action" capabilities. This means it can see, understand spoken commands, and then act accordingly. On the other hand, Gemini Robotics-ER is all about "embodied reasoning," enhancing a robot's ability to understand space and connect with existing control systems.
Robots That Understand and Act
Imagine telling a robot to "pick up the banana and put it in the basket," and it does just that by recognizing the banana through a camera and guiding its arm to complete the task. Or asking it to "fold an origami fox," and it uses its knowledge to fold paper with care. These scenarios are now possible with Gemini Robotics.
Back in 2023, Google's RT-2 model made strides by using internet data to help robots understand and adapt to new situations. Gemini Robotics takes this further, not just understanding tasks but performing complex actions that RT-2 couldn't, like delicate origami folding and packing snacks into bags.
Generalization: The Key to Versatile Robots
A standout feature of Gemini Robotics is its ability to generalize, meaning it can tackle new tasks it wasn't specifically trained for. Google claims it more than doubles performance in generalization tests compared to other models. This adaptability is crucial for robots to function effectively in unpredictable environments.
Despite advancements, skepticism lingers about the true capabilities of humanoid robots. Tesla's Optimus Gen 3, for instance, faced scrutiny over its AI's authenticity. Google is addressing this by developing a "generalist robot brain" and partnering with Apptronik to create the next generation of humanoid robots.
Safety First
Safety is a top priority for Google. The company is implementing a "layered, holistic approach" to robot safety, inspired by Isaac Asimov's Three Laws of Robotics. They've even released a dataset, aptly named "ASIMOV," to help researchers assess the safety implications of robotic actions.
While these AI models are still in the research phase, and specific commercial applications aren't yet announced, Google's demo videos hint at significant progress. However, there's still a question mark over how these systems will perform outside controlled environments.
In conclusion, Google's new AI models are pushing the boundaries of what robots can do, moving us closer to a future where robots are not just understanding commands but executing them with precision and care.