Gemini Robotics 2.0 Unlocks New On-Device AI Dexterity With 150+ Tasks

Spread the Truth

5G
Dollars-Burn-Desktop

  

📰 Stay Informed with Truth Mafia!

💥 Subscribe to the Newsletter Today: TruthMafia.com/Free-Newsletter


🌍 My father and I created a powerful new community built exclusively for First Player Characters like you.

Imagine what could happen if even a few hundred thousand of us focused our energy on the same mission. We could literally change the world.

This is your moment to decide if you’re ready to step into your power, claim your role in this simulation, and align with others on the same path of truth, awakening, and purpose.

✨ Join our new platform now—it’s 100% FREE and only takes a few seconds to sign up:

👉 StepIntoYourPower.com

We’re building something bigger than any system they’ve used to keep us divided. Let’s rise—together.

💬 Once you’re in, drop a comment, share this link with others on your frequency, and let’s start rewriting the code of this reality.


🌟 Join Our Patriot Movements!

🤝 Connect with Patriots for FREE: PatriotsClub.com

🚔 Support Constitutional Sheriffs: Learn More at CSPOA.org


❤️ Support Truth Mafia by Supporting Our Sponsors

🚀 Reclaim Your Health: Visit iWantMyHealthBack.com

🛡️ Protect Against 5G & EMF Radiation: Learn More at BodyAlign.com

🔒 Secure Your Assets with Precious Metals: Get Your Free Kit at BestSilverGold.com

💡 Boost Your Business with AI: Start Now at MastermindWebinars.com


🔔 Follow Truth Mafia Everywhere

🎙️ Sovereign Radio: SovereignRadio.com/TruthMafia

🎥 Rumble: Rumble.com/c/TruthmafiaTV

📘 Facebook: Facebook.com/TruthMafiaPodcast

📸 Instagram: Instagram.com/TruthMafiaPodcast

✖️ X (formerly Twitter): X.com/Truth__Mafia

📩 Telegram: t.me/Truth_Mafia

🗣️ Truth Social: TruthSocial.com/@truth_mafia


🔔 TOMMY TRUTHFUL SOCIAL MEDIA

📸 Instagram: Instagram.com/TommyTruthfulTV

▶️ YouTube: YouTube.com/@TommyTruthfultv

✉️ Telegram: T.me/TommyTruthful


🔮 GEMATRIA FPC/NPC DECODE! $33 🔮

Find Your Source Code in the Simulation with a Gematria Decode. Are you a First Player Character in control of your destiny, or are you trapped in the Saturn-Moon Matrix? Discover your unique source code for just $33! 💵

Book our Gematria Decode VIA This Link Below: TruthMafia.com/Gematria-Decode


💯 BECOME A TRUTH MAFIA MADE MEMBER 💯

Made Members Receive Full Access To Our Exclusive Members-Only Content Created By Tommy Truthful ✴️

Click On The Following Link To Become A Made Member!: truthmafia.com/jointhemob

 


Summary

➡ Google’s new AI breakthrough, Gemini Robotics on Device, allows robots to perform complex tasks independently, without needing cloud-based data networks. This model is designed to adapt quickly to tasks and can operate in places with limited or no connectivity. It also keeps sensitive data on the device, increasing security. The model can be used on various robotic platforms, demonstrating its versatility and efficiency in performing tasks, even those involving new objects and instructions.

Transcript

Google just revealed its new AI breakthrough with the launch of Gemini Robotics on Device, a new vision-language action model that brings advanced AI capabilities directly onto robotic hardware. So what can it do now? To start, this development means robots can now perform complex, dexterous tasks while operating independently from cloud-based data networks, potentially transforming the way robots are used in homes, factories, and unpredictable real-world environments. But even more important is that at its core, Gemini Robotics on Device is designed to deliver general-purpose dexterity, as well as rapid task adaptation, which is unlike conventional robotic models that rely heavily on remote servers for processing.

Alternatively, Gemini Robotics on Device is a new model that’s optimized to run locally on the robot itself to deliver several critical benefits, including low-latency responses, increased reliability in settings where connectivity may be limited or non-existent, and local data operation, meaning sensitive data never leaves the device. And for even more intelligence, Gemini Robotics on Device builds on the architecture of Gemini 2.0, which is the company’s most advanced multimodal reasoning system. By integrating vision, language, and action capabilities, the model enables robotic systems to understand and interpret complex environments, follow natural language instructions, and manipulate a wide range of objects, even those they have never encountered before.

Because of this, the model demonstrates strong generalization abilities across a spectrum of dexterous tasks, with internal evaluations having already shown that robots powered by Gemini Robotics on Device can now complete multi-step instructions that historically challenged the adaptability of previous robotic models. And while initial training was conducted on Aloha robots, Gemini Robotics has demonstrated that the on-device model can adapt to several robotic platforms, including the Bi-Arm Franca FR3 and the Apollo humanoid robot by Aptronic. Importantly, this means the same foundational AI can now guide different robot forms across a range of tasks and environments, whether they’re two-armed industrial robots or fully mobile humanoids.

And the results are next level, as Gemini Robotics on Device has already outperformed other on-device VLA models on a range of challenging tasks, particularly those requiring adaptation to new objects, scenes, and instructions. But the real-world evaluations are what really highlight the strength of Gemini Robotics on Device. For instance, on the Franca FR3 platform, the model performed complex assembly operations and handled previously unseen objects. While on the Apollo humanoid, it demonstrated generalized manipulation skills, from folding garments to following intricate verbal commands. Furthermore, Gemini Robotics on Device is the first VLA model from the company to support user fine-tuning.

As for speed, Gemini Robotics on Device has been optimized for efficiency so that it requires minimal computational resources. This allows standard robotic platforms to harness sophisticated AI-driven behaviors without the need for expensive or bulky hardware upgrades. On top of this, the model’s compact architecture supports rapid iteration and experimentation, allowing developers to fine-tune the model to suit a variety of tasks and domains while adapting to new requirements with as few as 50 to 100 demonstration examples. This flexibility speeds up deployment in both research and commercial settings, making it easier for teams to prototype, test, and scale up advanced robotics applications.

And to accelerate adoption and experimentation, Gemini Robotics is launching a software development kit alongside the new model, with the SDK allowing developers to evaluate Gemini Robotics on Device on their own tasks and environments. Plus, they can run simulations using the MuJoCo physics engine and adapt the model to new domains with minimal extra training data. And by gathering real-world feedback from a select group of early adopters, Gemini Robotics aims to further refine the model’s performance and safety profile before a broader release. At the same time, a new milestone has been reached in AI Android control with the introduction of LeVerb, a new framework for vision-language whole-body control in humanoid robots.

Incredibly, these Berkeley researchers have addressed a long-standing challenge, which was to bridge the gap between high-level semantic understanding and the dynamic, agile whole-body behaviors that real-world tasks demand. In fact, until recently, most vision-language action models have operated under the assumption that robots possess a reliable, pre-defined set of actions, which are often limited to basic movements such as controlling end-effector positions or root velocities. But while these approaches have enabled robots to perform quasi-static or simple manipulative tasks, they’ve fallen short when it comes to executing more complex, dynamic and full-body behaviors, which are essential for tasks ranging from athletic motion to collaborative assistance in human environments.

To surpass these constraints, researchers have introduced latent vision-language-encoded robot behavior. This is the very first hierarchical, latent instruction-following system for humanoid VLA, specifically designed to support full-body, closed-loop control. Plus, LeVerb operates within a new, sim-to-real-ready benchmark, which features over 150 photorealistic tasks grouped into 10 distinct categories. These tasks span a variety of real-world scenarios, and they’re all rendered within NVIDIA’s state-of-the-art robotic simulation platform IsaacSim. But the heart is actually in its two-tiered architecture that was inspired by cognitive science, often called the System 1, System 2 model. At the top level in System 2, LeVerb VL employs a transformer-based vision-language backbone with 102 million parameters.

This model interprets instructions and visual context, translating them into a latent verb, which is basically an abstract, compressed representation of the intended action. Next, at the lower level, System 1, LeVerb A, a 1.1 million parameter transformer, decodes these latent verbs into fine-grained dynamics-level commands suitable for whole-body motion. On top of this, the underlying dataset for the bench is particularly robust. It consists of over 1.8 million rendered frames, 15,400 episodes, and more than 150 tasks, offering a comprehensive testbed for evaluating and advancing vision-language whole-body control policies. These scenarios were created by retargeting human motion capture data and rendering each scene with randomized camera angles, assets, and materials.

This diversity ensures the benchmark remains relevant across a wide array of real-world conditions. Furthermore, to enhance the diversity of training data, researchers augmented the 17.1 hours of rendered motion trajectories with an additional 2.7 hours of text-only trajectory data. From a technical perspective, LeVerb VL leverages techniques from physics-based animation, such as variational autoencoders and trajectory reconstruction objectives, to learn a highly expressive latent space for whole-body skills. This latent space serves as a compact vocabulary of robot behaviors, which can be dynamically combined and interpolated to generate novel actions. On top of this, a privileged encoder aids in learning, while a discriminator ensures that both visual and text-augmented data contribute to a unified latent behavior space.

And in closed-loop evaluations, it’s demonstrated significant gains, achieving an 80% success rate on simpler visual navigation tasks and a 58.5% overall success rate across the benchmark, outperforming more naive hierarchical approaches by 7.8 times. Anyways, for more AI news, make sure to like and subscribe and check out this video here. [tr:trw].

5G
Dollars-Burn-Desktop

Spread the Truth

Leave a Reply

Your email address will not be published. Required fields are marked *

 

No Fake News, No Clickbait, Just Truth!

Subscribe to our free newsletter for high-quality, balanced reporting right in your inbox.

TruthMafia-Join-the-mob-banner-Desktop
5G-Dangers