AI News

DOBB-E: 6D General AI Robot Breakthrough (109 TASKS 5620 TRAJECTORIES 1500000 FRAMES)

Posted on January 2, 2024 by AI News

Spread the Truth

Summary

➡ The new framework Dobby, designed for training domestic robots, features ergonomic tools for data collection, a comprehensive dataset of real home environments, a pretrained perception model for adaptability and responsiveness to diverse settings, and efficient deployment and testing in real homes. The usage of Dobby could streamline the implementation of household robots and ongoing enhancements aim to improve the robot’s ability to perform long-duration tasks.

Transcript

The future of general domestic robots is revealed. Nvidia releases its newest 3d modeling AI, and more. But first, a breakthrough has just been achieved with the introduction of Dobby, a comprehensive framework designed for training multiskilled robots for household tasks. But it’s really the four key features underlying this pioneering tech that allow for the integration of generalist robots into home environments, with the first being the ergonomic tool. At the heart of Dobby is an innovative data collection tool ingeniously designed to simplify and enhance the efficiency of gathering necessary data for robot training.

This tool, known as the stick, combines a reacher grabber tool, 3D printed mounts, and an iPhone Pro. The integration of a smartphone allows for the recording of high resolution, video depth and movement information, providing rich data for robot training. This cost effective and accessible tool represents a novel approach in the field, making advanced data collection feasible in a variety of domestic settings. But standing head and shoulder above is Dobby’s data set, its second crucial feature.

Utilizing the stick, the researchers compiled the homes of New York dataset, a comprehensive collection of footage from 216 different real home environments featuring 13 hours of interaction with over 1. 5 million frames and a rich variety of scenes, as well as robot behaviors across 5620 different trajectories. Honey offers rgb depth videos at 30 FPS, plus detailed action annotations, including the 6d pose of the gripper and its opening angle normalized between zero and one.

Importantly, one of the most crucial components of Dobby is its pretrained perception model. Named home pretrained representations, this model, trained on the honey dataset, employs a stateoftheart selfsupervised learning algorithm called Moco V three. The HPR model enables Dobby to scale across vastly different scenes found in various homes, enhancing the robot’s adaptability and responsiveness to diverse domestic settings. And the results were shocking. In the fourth feature of deployment and testing to assess the practical application of Dobby, the framework was tested in real home environments using the hello robot stretch, a multifunctional mobile home robot.

In these experiments, the robot successfully learned to complete 109 different household tasks, each task requiring only an average of five minutes of new video data for model fine tuning. And this achievement showcases Dobby’s potential in efficiently training robots for a wide array of domestic tasks, confirming the feasibility of implementing robotic agents in a range of home settings. Looking into the future, the researchers envision further development of Dobby, including the integration of a higher level planner or policy to chain skills together for long duration tasks and improvements in the robot’s sensory capabilities.

These enhancements aim to create more sophisticated and user friendly domestic robots capable of performing meaningful long horizon tasks in homes. With the public release of Dobby’s data collection tool, dataset, and pre trained model, other research teams could soon leverage these resources, potentially accelerating the advancement of domestic robot systems. Overall, the development has thus far yielded promising results, which highlights the effectiveness of the framework’s components in enhancing the performance of domestic robots.

Meanwhile, Nvidia, the University of Toronto, and MIT have unveiled a revolutionary AI system capable of transforming text descriptions into dynamic 3d animations. This new AI, named align your gaussians, could transform the way we interact with digital content. Offering promising applications in various creative and technical fields. AYG operates by representing 3d shapes as collections of 3d gaussian functions. It then models their motion through deformation fields, which dictate how these gaussians evolve over time to create animations.

This innovative approach positions 3d gaussians as a potential successor to the widely used nerfs, or neural radiance fields in generating realistic 3d environments. The uniqueness of AYG lies in its fusion of different AI models for visual accuracy. AYG employs the stable diffusion texttoimage model for rendering realistic appearances in individual frames. Furthermore, the texttovideideo model is trained on extensive video datasets and provides the necessary temporal coherence, ensuring that the motion in the animations is fluid and natural.

To maintain geometric consistency from various viewpoints, AYG incorporates a multiview 3d model adept at adapting to different 3d shapes. This coordinated training process allows EYG to optimize both the 3d shape representation and the deformation fields. As a result, it can produce animations that are not only visually appealing, but also maintain lively motion and realistic textures directly from textual inputs, such as a horse galloping across a meadow. But yet another remarkable feature of AYG is its ability to generalize and apply its learning to new concepts that haven’t yet been encountered during training.

This flexibility demonstrates the AI’s potential in understanding and visualizing a vast array of scenarios and subjects. In practical applications, AYG introduces techniques that extend and interconnect animations over longer periods than currently possible with existing text to video models. The team demonstrated this capability by showcasing animations where dogs transition smoothly from walking to barking. Looking forward, these methods could be pivotal in generating extended 4d scenes and simulations, potentially revolutionizing creative tools and synthetic data generation.

Synthetic data, crucial in areas where training data is limited, could see significant improvements with AYG’s implementation. On top of all of this, AYG also has the special ability to amalgamate multiple animated objects within a single scene. The researchers illustrated this by creating a captivating scene featuring various animated entities congregating around a campfire. From enhancing visual effects in filmmaking to revolutionizing video game design and aiding in complex simulations for autonomous systems, the possibilities with AYG are boundless.

Finally, in a groundbreaking development, Alibaba has unveiled Dream moving, a cutting edge system designed to create personalized dance videos. This innovative tool, resembling a TikTok video generator, allows users to generate dance videos using either text or image prompts. At the core of Dream moving is its reliance on advanced diffusion models featuring two key components, the video control net and the content guider. The video control net is ingeniously crafted to manage the generation process, adhering to the specified animations.

Meanwhile, the content guider takes charge of the video content, dictating the appearance of characters and backgrounds, thus ensuring a tailored experience. A notable aspect of dream moving is its integration of motion blocks into the Unet and control net. This integration enhances the temporal consistency and motion fidelity of the videos, providing a seamless and realistic experience. The system’s prowess is the result of extensive training on over 1000 dance videos.

These videos, segmented into clips of eight to 10 seconds, ensure a smooth flow of images devoid of transitions and special effects. The MinigpTV two was employed to caption each frame, aiding in the system’s multimodal training. Thanks to its sophisticated training and architecture, dream moving excels in generating lifelike videos from text, images, or a combination of both. Users can, for instance, create videos featuring a specific individual dressed in attire of their choice, provided via an image.

.

Spread the Truth

1 thoughts on “DOBB-E: 6D General AI Robot Breakthrough (109 TASKS 5620 TRAJECTORIES 1500000 FRAMES)”

Taylor says:
I can’t get excited about Al, the powers that be will use AI against us so I’m for restricting AI use ☮️💟

January 3, 2024 at 7:03 am
Reply

Leave a Reply Cancel reply