Summary
Transcript
You’ll also have access to community hackathons with prizes and employers, plus an expanding repository of AI tools and courses. So if you’re really serious about AI, click the link below to become a founding member of our official AI Academy experts and insiders community today. Now back to CMU’s release of Omni H2O to serve as a learning-based platform for anyone to control full-sized humanoid robots in various ways. By using Kinematic Pose as a universal control interface, this system opens up multiple control methods, including real-time teleoperation through virtual reality headsets, verbal instructions and RGB camera inputs.
Plus, Omni H2O not only facilitates real-time teleoperation but also enables full autonomy. The system achieves this by learning from teleoperated demonstrations or integrating with advanced models such as GPT-4. This dual capability showcases Omni H2O’s versatility and dexterity, making it suitable for a wide range of real-world tasks. Whether it’s playing sports, moving and manipulating objects or interacting with humans, Omni H2O proves its potential through both tele- and autonomous operation. Another advantage is the reinforcement learning-based SIM-to-real pipeline. This involves large-scale retargeting and augmentation of human motion datasets. By imitating a privileged teacher policy, the system learns a real-world deployable policy with sparse sensor input.
Additionally, carefully designed reward mechanisms enhance the system’s robustness and stability. Additionally, the release of the first humanoid whole-body control dataset, named Omni H2O 6, includes six everyday tasks, providing a foundation for learning humanoid whole-body skills from teleoperated datasets. This process involves three main steps. First, Omni H2O retargets large-scale human motions and filters out motions that are infeasible for humanoids. Next, the SIM-to-real policy is distilled through supervised learning from a reinforcement learning-trained privileged policy. Finally, the universal design of Omni H2O supports versatile human control interfaces, including VR headsets and RGB cameras.
The SIM-to-real policy also allows control by autonomous agents like GPT-4.0 or diffusion policies trained on teleoperation datasets to generate motion goals for activities involving sequential contacts, which are crucial for complex robotic interactions and operations. Carnegie Mellon researchers also released Wacoco, which is a unified framework for learning whole-body humanoid control with sequential contacts. Wacoco’s Edge is in its ability to separate tasks into a series of contact stages, facilitating simple and general policy-learning pipelines. This approach uses task-agnostic reward and SIM-to-real designs, requiring only one or two task-specific terms for each task.
And the effectiveness of Wacoco has been demonstrated through several challenging real-world tasks, including versatile parkour jumping, box loco manipulation, dynamic clap-and-tap dancing, and cliffside climbing. Impressively, these tasks are accomplished without relying on any motion priors. The researchers further demonstrated Wacoco’s versatility by applying it to a 22-degree of freedom dinosaur robot for loco manipulation tasks. With these advancements, Carnegie Mellon University is pushing the boundaries of humanoid robot control and paving the way for more sophisticated and adaptable robots capable of performing complex tasks in diverse environments. And in another AI-powered leap, Fourier Intelligence’s GR1 is now using six cameras to achieve an impressive 360-degree view.
This is a standout feature because unlike most humanoid robots that use a combination of cameras, lidar, radar, and infrared sensors to navigate their surroundings, Fourier has opted for a vision-only approach. The GR1’s six strategically placed cameras provide a comprehensive view of its environment. By combining the camera feeds, the robot can generate a bird’s eye view to map its surroundings accurately. This top-down perspective allows the GR1 to create virtual 3D models, helping it avoid collisions with precision. During outdoor tests, the robot demonstrated its ability to detect cars and pedestrians on sidewalks, showcasing the effectiveness of its vision-only system.
Fourier has also equipped the GR1 with advanced AI capabilities, including a chat-GPT-like multimodal large-language model for natural language processing and logical reasoning. According to Fourier, this makes the GR1 suitable for various applications such as optical rehabilitation, family services, security, emergency rescue, and industrial manufacturing. One of the major benefits of the GR1’s vision-only system is its cost-effectiveness. Fourier claims that relying solely on cameras reduces hardware costs while enhancing the robot’s perception capabilities. Given that most humanoid robots are priced similarly to entry-level cars, this cost reduction could make the GR1 more accessible and mainstream.
And while the robot has yet to see widespread commercial use, its latest vision-only update could attract more companies to explore its potential applications. By cutting down on costs and maintaining high performance, the GR1 could become a more affordable and appealing option for businesses and households alike. And finally, Mentibot just demoed its ability to navigate complex environments with a combination of 3D modeling of the world plus a dynamic obstacle map. What can I do for you? Please face me. Sure, I’ll face towards you. In a new location, Mentibot creates a 3D model by following someone around the space with voice.
Additionally, it’s also a demonstration of turning the robot in place to maintain a centered view on the human addressing it. I’m waiting for your instructions. Please walk backwards. Sure, I’m going to walk backwards. The demo also showcased the robot’s LLM operating system in action with various gates. Additionally, Mentibot leverages the latest SIM2 real AI learning for human-like gate and hand movement, nerf-based real-time 3D mapping and localization, dynamic navigation in complex environments, and large language models to build a cognitive map of the world and execute advanced tasks. The robot also automatically modifies its gate when carrying heavy loads, mimicking human behavior in similar situations.
Cutting-edge SIM2 real learning technologies endow Mentibot with remarkable agility, enabling human-like movements such as walking in any direction, running, turning in place, balancing, and squatting. Mentibot offers a complete end-to-end cycle from verbal commands to complex task completion, encompassing navigation, locomotion, scene understanding, object detection, and localization, grasping, and natural language understanding. And last, Tokyo Robotics demonstrates its Torobo robot using impedance control to cut a piece of wood like a human. [tr:trw].