Summary
Transcript
1X has already collected thousands of hours of data from its Eve humanoids by performing various mobile manipulation tasks in homes and offices, as well as interactions with people. As a result, this video and action data was combined to train this world model to predict future video frames based on observations and actions. Number two, action control. The world model can also produce a range of outcomes based on different action commands. For instance, 1X demonstrates several generated scenarios by conditioning the world model on four distinct trajectories, all starting from the same initial frames. Incredibly, these scenarios weren’t even part of the original training data.
Instead, the robot came up with them completely on its own. Additionally, one of the world model’s primary strengths is its ability to simulate object interactions. In further demos, 1X provides the model with identical initial frames and three different action sets for grasping boxes. In each case, the box being grasped is lifted and moved based on the motion of the gripper, while the other boxes stay in place. Even without explicit actions, the model generates realistic video while recognizing that people and obstacles should be avoided when moving. Number three, long horizon tasks. The AI can even generate long duration videos, with examples simulating an entire t-shirt folding process, which has traditionally been extremely challenging in rigid body simulators.
But now, even deformable objects like clothes are easily approachable. Number four, physics. The world model simulation has also developed an implicit understanding of physics to some degree, which can be observed when the spoon drops to the table after being released by the robot’s gripper. However, there are still instances where the model fails to adhere to physics, such as when the plate it placed remains in mid-air. Number five, object coherence. The world model still sometimes struggles to preserve the shape and color of objects during interactions, with objects even vanishing entirely in some instances. Furthermore, when objects are blocked from view or else viewed from challenging angles, their appearance can become somewhat distorted throughout the generation sequence.
Number six, self-recognition. One X placed its Eve robot in front of a mirror to determine whether the model would generate mirrored actions and thus an understanding of its own existence. However, the model failed to recognize these reflections or demonstrate any form of self-understanding, which still indicates a limitation in the model’s ability to interpret and respond the way a human might in some cases. Nevertheless, One X is proving these AI-powered world models have the potential to transform general-purpose simulation and evaluation, allowing humanoids to safely operate with general intelligence across a wide range of environments.
And to speed up progress around advancing world models for robotics, One X is making over 100 hours of vector-quantized video available, along with pre-trained baseline models. Plus, they’re launching the One X world model challenge as a three-phase competition with prizes to encourage further innovation. Meanwhile, researchers from Stanford and MIT have introduced Wonderworld AI as a new standard in virtual environment creation, allowing real-time rendering and rapid scene generation. With this technology, users can explore existing content and generate new scenes on the fly, simply by specifying what and where they want to create. But at the heart of Wonderworld is its ability to take a single image as input and generate expansive, diverse 3D scenes that form an interconnected virtual world.
Users can define the content and style of these scenes via text commands while controlling the camera to determine where the new content should appear. And thanks to the system’s real-time rendering capability, scenes are generated in under 10 seconds, made possible by Wonderworld’s fast-layered Gaussian surfles representation. The flag system combines two key innovations, setting it apart from traditional scene generation methods. First, its layered design requires only one image to create a 3D scene, whereas most other methods need to generate multiple views progressively. Second, Wonderworld’s surfle design uses a geometry-based initialization that drastically reduces optimization time, essentially fine-tuning the scene rather than building it from scratch.
This approach is significantly faster than alternative methods like nerf and Gaussian splatting, which often struggle with speed due to the need for multiple views and depth maps. But another major hurdle in 3D scene generation is ensuring connected and coherent geometry across scenes. Wonderworld solves this with its innovative guided-depth diffusion, allowing for partial conditioning of depth estimation, ensuring seamless transitions between generated scenes. Running on a single A6000 GPU, Wonderworld generates 3D scenes in less than 10 seconds, enabling real-time user interaction and virtual exploration. This opens up exciting possibilities for user-driven content creation in virtual worlds, bringing us closer to fully immersive digital experiences.
And finally, Microsoft just released its latest AI update with its 365 Copilot, which introduces several powerful features aimed at enhancing workplace productivity and collaboration. Key highlights include Copilot Pages, a dynamic workspace for real-time content creation and AI-assisted collaboration, allowing teams to create, share, and edit documents seamlessly. In terms of app integration, Excel now supports unformatted data analysis, advanced formulas, and improved visualization. The addition of Python allows complex data analysis through natural language commands, removing the need for coding skills. PowerPoint introduces the narrative builder for streamlined presentations, while Brand Manager ensures consistent company branding. Teams now analyzes both spoken conversations and chat messages, offering meeting insights.
Outlook features a Prioritize My Inbox tool and smart email summarization to manage email overload. Word enhances document creation with better references and inline AI collaboration, while OneDrive adds tools for file analysis and document comparison. The introduction of Copilot agents allows businesses to automate specific processes, from simple tasks to autonomous operations. With the agent builder, users can quickly develop custom AI agents within BizChat or SharePoint, connecting them to various data sources. Performance improvements are significant, with faster response times and higher user satisfaction. This update follows feedback from 1,000 customers, resulting in 700 product updates and 115 new features over the past year.
Microsoft plans to incorporate even more advanced AI models in future updates, keeping Microsoft 365 at the forefront of workplace AI solutions. And as the workplace becomes more digitally integrated, these advancements could redefine how teams collaborate and achieve goals. The only question remaining is how much of this workflow won’t be completely automated by AI, and what humans will have to do with all of their extra spare time. [tr:trw].