Summary
Transcript
But an even more important feature of the new Atlas robot is its ability to adapt in real time. For example, when a part is initially positioned too high, Atlas encounters resistance. It then and autonomy. In fact, this adaptability is Boston Dynamics’ new fully autonomous mode, highlighting the minimal ongoing need for human intervention. And to demonstrate, Boston Dynamics contrasts the autonomy of Atlas with other robots that still require human oversight, such as Tesla’s Optimus Humanoids being remotely operated during events, whereas Atlas now operates completely independent of human instruction while performing various complex tasks in changing environments.
Furthermore, the fifth generation Atlas is fully electric, featuring multiple hardware and software upgrades. For one, the Atlas robot is equipped with multiple cameras that allow it to observe its surroundings by detecting the colors, shapes and distances of objects. By processing this visual data, Atlas builds 3D models of its environment to identify, prioritize and interact with objects within its workspace. Next, Atlas can pinpoint and recognize objects essential to each task by using its perception system, including its color and depth cameras. Boston Dynamics calls the approach Model Predictive Control, where the robot’s control system plans its movements over time while accounting for the forces at play.
Plus, its articulated head with integrated LED lights improves 3D spatial awareness, while a suite of sensors, including LIDAR and stereo vision, helps Atlas further process environmental data. But Atlas also uses all-electric actuators as well as a custom computing system to even further enhance its control capabilities, with everything being powered by a battery pack for extended operation in industrial settings. And what’s more is that Atlas’s software integrates with Boston Dynamics fleet management platform called Orbit in order to facilitate coordinated tasks alongside a fleet of other robots while requiring minimal human input.
And as for its cost, while it remains officially undisclosed, it’ll most likely be in excess of $100,000 until Boston Dynamics ramps up their production numbers to bring down the per unit price. In the meantime though, Boston Dynamics is continuing to refine the Atlas robot for primetime usage, aiming to enhance operational efficiency and reduced human labor costs for repetitive or hazardous tasks. And now, thanks to NVIDIA’s latest breakthrough, Boston Dynamics can train the Atlas over 10,000 times faster than before using a brand new neural network named Hover, which outperforms specialized control systems while requiring much less computation.
In fact, Hover operates with just 1.5 million parameters, which is in stark contrast to the hundreds of billions used by typical large language models. And despite its small size, Hover expertly manages complex robot movements, showcasing its potential for streamlined AI solutions. To achieve this, Hover was trained in NVIDIA’s Isaac Jim simulation by accelerating robot movements by 10,000 times to allow for a full year’s worth of training to be condensed into just 50 minutes on a single GPU, highlighting the extreme efficiency of this approach. Plus, Hover seamlessly transitions from simulation to physical robots without requiring fine-tuning, all while supporting various input methods, such as head and hand tracking from XR devices, full body motion capture, joint angles from exoskeletons, and traditional joystick controls.
But what’s most remarkable is that Hover even surpasses systems tailored to individual input types with its researchers attributing this to the model’s comprehensive understanding of physical principles, such as balance and precise limb control, which it applies universally across different control mechanisms. And by building on the open-source H2O and Omni H2O project, Hover is compatible with any humanoid robot that can operate within the Isaac simulator. Additionally, NVIDIA has made the system’s examples and code available on GitHub, encouraging community engagement and further development. But there’s another AI tool that’s getting lots of buzz, as Autodesk just announced the beta launch of Wonder Animation, which just changed 3D filmmaking forever by finally enabling creators to transform live-action scenes into animated sequences effortlessly.
And by using any camera in any location, artists can generate 3D environments with CG characters, broadening the scope for creative expression. But the secret to Wonder Animation lies in its video-to-3D scene technology, which allows for the filming and editing of sequences with various shots, wide, medium and close-ups. To do this, the AI reconstructs these scenes into a cohesive 3D space, accurately matching the camera’s position and movement relative to characters and environments. The result is a virtual representation of live-action scenes, encompassing all camera setups and character animations. Artists can even edit various elements including animation, characters, environments, lighting and camera tracking all within their preferred software such as Maya, Blender or Unreal, giving them increased flexibility and control.
Next, here’s a peek into a sample project that demonstrates what mixed reality will soon become, thanks to the power of generative artificial intelligence, as users will be able to walk through spaces and seamlessly switch between styles and locations. In fact, these simulated environments may be shared between many players, with objects and environments transforming to create unique experiences for each user while maintaining the core essence of the setting. Because of this, generative AI may become a crucial layer of the metaverse, enabling humans and NPCs to interact across environments without clear distinctions between real and virtual entities.
In this near-future scenario, generative AI doesn’t just enhance the visuals, it dynamically shapes narrative elements and interactive possibilities, allowing users to co-create or modify the storyline as they explore. And finally, meta-AI just announced LongVU, a multimodal large language model designed for efficient long video analysis. Where traditional models struggle with the data demands of lengthy videos, LongVU overcomes this challenge by using spatiotemporal adaptive compression, which reduces video tokens while keeping essential details. Additionally, the included DynoV2 helps to minimize redundancies and process long-form videos without sacrificing critical content. The result is that LongVU’s spatial pooling mechanism efficiently handles hour-long videos within an 8,000-token context, making it valuable for real-time video analysis in fields like security, sports, and education.
Anyways, like and subscribe to AI News and check out these bonus clips. Thanks for watching!
[tr:trw].