Summary
Transcript
And as for dexterity, the D9’s dual robotic arms each have seven degrees of freedom to handle payloads of over 20 kg each, while including proprietary artificial intelligence and multimodal sensors for executing delicate tasks. On top of this, Unitree’s G1 robots also just revealed several new humanoid abilities thanks to a groundbreaking framework called X-Body 2, a new system that allows robots to perform dynamic movements like running, dancing and crouching with unprecedented stability and precision, and with its reinforcement learning-based development, X-Body 2 even bridges the gap between simulated training and real-world execution, enabling robots to replicate human motions with remarkable fidelity.
Furthermore, the framework is extremely unique with its introduction of a novel key-point tracking and velocity control system, coupled with what’s known as a complex motion skills into a deployable student policy. Incredibly, this approach not only enhances motion accuracy, but also allows seamless adaptation across different humanoid platforms. In fact, X-Body 2 outperformed existing state-of-the-art methods in both qualitative and quantitative benchmarks after being tested on two distinct humanoid robots. And with its ability to handle intricate expressive motions, X-Body 2 is finally providing a viable, cost-effective path forward when deploying robotics across applications including entertainment, healthcare, and beyond.
But that’s just the beginning, as robots also just unlocked several other new human-level skills with the development of ARMA, a novel egocentric perception system that integrates cutting-edge hardware and software but with a serious twist. In fact, by incorporating wearable-like depth sensors and advanced motion-planning algorithms, ARMA even allows robots to navigate complex environments with unprecedented precision and agility. But what really makes ARMA special is its distributed perception approach, which incorporates lightweight time-of-flight lidar sensors to generate comprehensive point cloud data around the robot. Impressively, this setup outperforms traditional head-mounted or externally mounted depth cameras by achieving a 63.7% reduction in collisions and a 78.7% improvement in task success rates during testing.
Plus, the system’s low-profile design is scalable and commercially viable to serve as a practical solution for real-world applications. Furthermore, in addition to its hardware innovations, ARMA also uses a transformer-based imitation learning policy trained on 86 hours of human motion data from the AMAS data set. This approach enables dynamic collision avoidance and significantly outperforms traditional planning systems like CuRobo, with 31.6% fewer collisions, a 16.9% higher success rate, and a 26 times reduction in computational latency. Together, ARMA’s perception system and imitation learning policy set a new benchmark for humanoid motion planning. As for results, ARMA was successfully tested on the GR1 humanoid robot by Fourier Intelligence and demonstrated its real-world potential, with the developers planning to release the system’s source code, hardware descriptions, firmware, 3D CAD files and training pipelines on GitHub soon, enabling the robotics community to build further.
Overall, ARMA brings humanoid robots one step closer to achieving general-purpose human-level intelligence, but AI just had a massive breakthrough in 4D simulation too, as NVIDIA plus over 20 other research labs just unveiled the Genesis project, a revolutionary generative physics engine capable of creating 4D dynamic worlds for general-purpose robotics and physical AI applications. And it’s deadly fast, as the Genesis physics engine is written entirely in Python and thus outperforms leading GPU-accelerated platforms like Isaac Jim and MJX by a staggering 10 to 80 times. But that’s just the beginning, because it also achieves a simulation speed that is roughly 430,000 times faster than real-time, making it possible to train a robotic locomotion policy in just 26 seconds on a single RTX 4090 GPU.
This makes Genesis an incredibly efficient tool for developing real-world transferable robotic behaviours, plus the engine is fully open source and available at GitHub, with plans to roll out its generative framework in stages. And Genesis is built from scratch to integrate cutting-edge physics solvers, enabling the simulation of highly realistic virtual physical worlds, while providing a unified platform for a wide range of tasks. But it’s even more than just a physics engine, as Genesis also aims to become a universal data engine that autonomously generates diverse datasets. This includes creating dynamic environments, camera motions, reward functions, robot policies, character motions, interactive 3D scenes, and open-world articulated assets.
On top of this, Genesis’s physics engine features a VLM-based generative agent, giving it the ability to leverage APIs provided by the simulation infrastructure to autonomously create its 4D dynamic worlds. And central to Genesis’s vision is the use of its generative agent and physics engine to automatically generate robotic policies and demonstration data for a variety of skills across different scenarios. Importantly, this capability finally eliminates the need for labour-intensive manual data collection, making it easier to develop and test robotic behaviours in diverse environments, plus Genesis’s approach builds on insights from modules like RoboGen.
But the most impressive AI breakthrough may be Google DeepMind’s release of VO2, a new video generation model that promises to transform creative industries with its advanced capabilities. In fact, VO2 delivers state-of-the-art results in video generation, outperforming its predecessors and leading competitors in both quality and realism. Moreover, VO2 also demonstrates an impressive understanding of real-world physics, human movement and cinematic principles. Importantly, VO2 even offers a sophisticated approach to storytelling, allowing it to generate videos in a wide range of subjects and styles. Plus, the model even excels at interpreting complex user prompts, enabling precise customisation, allowing users to request specific cinematic techniques, such as a low-angle tracking shot or a close-up of a scientist peering into a microscope, and VO2 delivers high-quality outputs at up to 4K resolution.
The model also accommodates specific lens preferences, such as wide-angle shots with an 18mm lens or shallow depth-of-field effects for focused visuals. However, a persistent issue with AI video generation has been model hallucination, where models inadvertently generate incorrect or extraneous details like extra limbs or misplaced objects, so VO2 addresses this with significant advancements in accuracy and realism to make its outputs more reliable and polished. [tr:trw].