Summary
Transcript
Plus, the robot utilizes advanced integrated joints with a peak torque density of 207 newton meters per kilogram to maintain the highest precision control possible. But to power Zero-One’s intelligence, CASBUT relies on a hierarchical end-to-end AI model, enabling sophisticated awareness and decision-making. Additionally, its use of RGBD cameras, LiDAR, and multiple sensors also allows the robot to engage in dual visual and auditory interactions when encountering humans. And for improved dexterity, its bionic hand weighs only 800 grams, putting the expected cost of this humanoid somewhere between $20,000 to $80,000 upon release. But that’s only the beginning, as Tiangong’s general-purpose humanoid robot showcased its speed and pushed the envelope with several other next-gen capabilities.
In fact, this full-sized humanoid is 163 centimeters tall and weighs 43 kilograms, operating on a 48-volt battery with a 15-amp-hour capacity. Importantly, the Tiangong robot uses state memory-based predictive reinforcement imitation learning algorithms, which enable it to run at a steady speed of about 12 kilometers per hour. Furthermore, Tiangong can navigate slopes, stairs and uneven surfaces without relying solely on its 3D vision sensors, operating instead in blind mode by using its predictive force modeling. And Tiangong’s ability to adapt its gait and balance in real-time is supported by an array of inertial and six-axis force sensors.
These sensors, combined with its predictive algorithms, allow the robot to maintain stability across changing environments. As for its brain, the robot is equipped with processing power that allows for 550 trillion operations per second, as well as Wi-Fi connectivity. This provides Tiangong with the computation required for completing complex tasks as well as potential integrations across connected environments, serving in settings such as manufacturing and logistics to start. But another important aspect of Tiangong’s development is its open-source nature. As the Beijing Humanoid Robot Innovation Center is making this robot’s design and architecture available to the public, allowing researchers, developers and companies to access, modify and expand upon the technology, encouraging the creation of customized versions and innovative use cases that address industry-specific needs.
And in another demonstration of real-world robotics, Paxini just released their new dual-modal robotic hand, but now with a special upgrade by combining a tactile sensor system with a high-definition AI vision algorithm to interact with and manipulate objects in varying environments. Incredibly, this robotic hand is equipped with an 8 million-pixel AI-powered camera plus a zero-sample position estimation vision algorithm, allowing the system to accurately identify and estimate the position and shape of objects in real-time for dynamic settings. And for maximum durability, the robotic hand features a maximum gripping diameter of 15 centimeters with the ability to perform millions of open-close cycles.
In terms of strength, it can lift loads of up to 5 kilograms, with fingertip forces reaching up to 15 newtons each. And with its advanced force control technology, its fingertip force control is as fine as 0.01 newtons, allowing it to perform delicate tasks that require extremely high levels of precision. Furthermore, with its bionic design featuring 13 degrees of freedom, it can carry out a range of complex movements, including grasping, pinching, and rotating. Plus, its flexible design supports a variety of communication protocols, allowing for easy integration into different systems and devices.
And with a weight of 1.3 kilograms and a compact size, this powerful robotic hand can handle high precision, high-intensity environments, and carry out real-world applications. But in yet another breakthrough, Galbot also just unveiled Dex GraspNet, a breakthrough dataset supercharging how robots interact with diverse objects using three new approaches. Number one, balancing. At the core of Dex GraspNet is its ability to synthesize stable, diverse grasps on a massive scale. And by utilizing an advanced optimizer, Galbot achieves force closure conditions and high grasping scores across 1.32 million shadow hand grasps, which significantly surpasses all previous datasets.
Importantly, this capability not only enriches the dataset, but also ensures superior performance in cross-dataset experiments. Number two, generalizing. By introducing UNIDEX GRASP++, Galbot also tackles the challenge of learning dexterous object grasping strategies from real point cloud data. Furthermore, geocorriculum learning and geometry aware iterative generalist-specialist learning both enhance the generalizability of vision-based strategies, demonstrating successful generalized grasping on over 3,000 object instances in Isaac Gym, with success rates of 85.4% on training sets and 78.2% on test sets. Number three, scaling. Dex GraspNet 2.0 takes dexterous grasping to all new heights in cluttered environments with a 90.7% real-world success rate, and to further accelerate advancements, Galbot has also developed a simulation test environment using Isaac Sim and NVIDIA Isaac Lab.
This open source framework expedites the exploration and scaling of dexterous models, paving the way for even faster implementations when addressing practical applications. But AI is also transforming how 3D assets are developed for video game design, film production, and simulation, as traditionally creating 3D models involves time-consuming work and specialized skills. But now with Edify 3D, NVIDIA has streamlined the process to take less than two minutes. But the secret to Edify 3D is its combination of multi-view diffusion models and transformer-based reconstruction. These advanced systems can generate complex 3D assets from text prompts or images, resulting in detailed meshes, organized quad topology, and 4K physically-based rendering textures.
Importantly, this technology not only cuts down production time, but also allows for seamless integration into existing workflows, providing designers with editable and adaptable models. And one of the standout applications of Edify 3D is its ability to create full 3D scenes based on simple text inputs. By integrating a large language model, Edify 3D positions objects, sizes them accurately, and builds coherent, ready-to-edit scenes. It works by first generating multi-angle RGB images from a text prompt using a diffusion model. These images are then analyzed by a multi-view control net that synthesizes surface normals and predicts a neural 3D representation.
Lastly, isosurface extraction and mesh post-processing help to shape the final asset, while an upscaling control net enhances the texture resolution to create sharp, high-quality outputs. And unlike traditional models, Edify 3D’s multi-view diffusion technology adapts text-to-image capabilities with pose-aware conditioning, ensuring consistent and accurate visuals from all different viewpoints, which is key for realistic 3D reconstruction. [tr:trw].