Summary
Transcript
Then you can enable it to use page content and use AI to finish your task. For example, we’re going to ask our copilot to extract the key points from this website. Additionally, you can create custom AI models to handle your communications with professionalism and efficiency. And to refine your writing, browsercopilot instantly corrects grammar, rephrases sentences, translates languages, and summarizes lengthy content. But it’s also a powerful research tool that analyzes documents, websites, and even YouTube videos in seconds to save you valuable time. So click the link below and try browsercopilot.ai today. And as for the latest in robot tech, Xpeng just unveiled its first ever humanoid named iron with several new abilities.
To start, iron stands at 5 feet 8 inches tall and weighs 154 pounds. Plus it’s already operational on production lines assembling the company’s upcoming P7 electric vehicle. Impressively, iron is equipped with an end-to-end large AI model that’s integrated with Xpeng’s Eagle computer vision system, allowing autonomous movement and human-like postures such as standing, sitting, and lying down. And with over 60 joints and 200 degrees of freedom, iron packs enough dexterity for complex tasks, using its 15 degree of freedom hands to feel, grasp, hold, and place objects. Furthermore, the robot’s brain is powered by Xpeng’s proprietary Turing AI chip, a 40-core processor enabling reasoning and decision-making, allowing iron to think on the fly and adapt to various scenarios, from performing quality checks in factories to potential integration into homes and offices.
And to prove it, iron is already actively working alongside traditional automation equipment in Xpeng’s factories to help assemble electric vehicle components. But Xpeng has even more ambitious plans for iron in the near future, aiming to extend its role beyond factories into retail environments, offices, and homes. With human-like hands, voice interaction via Xpeng’s Tianji AI OS, and flexibility for various activities, iron is intended to become a versatile companion in everyday life too. This move aligns with Xpeng’s broader AI-driven ambitions, including expanding its product line with more AI-driven and electric solutions, new vehicle platforms, extended range systems, and robo-taxi models planned for 2026.
This means that by developing iron, Xpeng will not only diversify into robotics to enhance its competitive edge, but also become a direct contender on the world stage with companies like Tesla, Figur, and Boston Dynamics. And as robot intelligence reaches new levels, the very first diffusion transformer AI model crafted for both generating and interactively controlling open-world game videos has just been released. Named GameGen X, this innovative model excels in high-quality open-domain generation by simulating a wide range of game engine features, including unique characters, dynamic environments, complex actions, and diverse events. But what’s most striking about GameGen X is its interactive controllability, which allows it to predict and alter future content based on the current clip, enabling realistic gameplay simulation.
And central to this achievement is the creation of the open-world video game dataset. And being the first and largest dataset of its kind, O-Game Data includes over 1 million diverse gameplay video clips from more than 150 games, each paired with informative captions generated by GPT-40. GameGen X accomplishes this feat with a special two-stage training process whereby the model is first pre-trained on text-to-video generation and video continuation tasks, equipping it with the ability to produce long-sequence, high-quality open-domain game videos. Next, to further enhance interactive control, the InstructNet component integrates game-related multimodal control signals, allowing the model to modify latent representations based on user inputs, thus harmonizing character interaction and scene content control, which is a first-time accomplishment in AI-powered video generation.
Importantly, during the instruction tuning phase, only the InstructNet is updated, while the pre-trained foundation model remains unchanged. This method ensures that interactive controllability is achieved without compromising the diversity or quality of the generated video content. Overall, GameGen X highlights the potential for AI to complement traditional rendering techniques and seamlessly merge creative generation with interactive capabilities, delivering a breakthrough advancement in generative open-world video gaming. But that’s just the beginning of generative AI gaming as Descartes just unveiled a new feature for its Oasis AI game generator, allowing users to upload images to create playable game worlds.
This custom worlds feature moves the company closer to its goal of transforming any image, even real-world photos, into interactive environments. The problem is that currently the generated worlds can lose their resemblance to the original images quickly, but Descartes AI is already planning out improvements. In fact, Oasis is the first real-time AI-generated game that is actually playable, with players being able to move, jump, collect items, and break blocks in a Minecraft-like environment. Furthermore, Descartes’ system processes player inputs instantly, creating gameplay elements like physics, rules, and visuals. But where Oasis stands out is in its speed when being compared to other AI video generators.
While models like OpenAI’s Sora or Runway Gen 3 require 10 to 20 seconds to generate one second of video, Oasis delivers 20 frames per second with zero latency, producing a new frame every four milliseconds. The game uses custom AI architecture based on transformer technology, enabling it to create images instantly and respond to player actions. By combining vision transformer technology with a diffusion model for image processing, players can control the game using standard WASD keys and mouse input. And to address the challenge of maintaining high-quality visuals, the developers use dynamic noising, which intentionally adds randomness to image data early in frame generation.
This technique prevents error build-up and allows the AI to refine images by gradually removing noise, filling in sharp details while maintaining consistency. But when it comes to improvements, Descartes believes that using even larger models and datasets will resolve many of the current issues. In fact, Descartes’ partner Etched is developing the Sohu AI chip to handle larger models with up to 100 billion parameters in 4K resolution. Plus, Descartes and Etched have released the code and waits for a 500 million parameter model on GitHub, along with a playable demo of a larger version, enabling users to explore the capabilities of Oasis locally.
And to address the scarcity of real-world 4D data, Microsoft has introduced an innovative data curation pipeline that extracts camera poses and object movements from videos, resulting in the creation of the Camvid 30K dataset. This large-scale dataset supports the GenXD framework in generating 3D and 4D scenes, effectively tackling the challenge of limited real-world 4D data. Importantly, the pipeline extracts crucial camera and object motion details from video content, providing the foundation for the dataset. And even more impressive is GenXD’s introduction of multi-view temporal modules that disentangle camera and object movements, allowing seamless learning from both 3D and 4D data.
Furthermore, the framework employs masked latent conditions to accommodate various conditioning views, enabling the generation of videos that match camera trajectories and consistent 3D views that can be transformed into 3D representations. And the results of extensive evaluations across real-world and synthetic datasets highlight GenXD’s superior effectiveness and versatility over prior 3D and 4D generation methods. The framework’s use of a masked latent condition diffusion model, combined with multi-view temporal modules and alpha-fusing techniques, ensures precise generation of 3D and 4D samples while effectively managing multi-view and temporal data. Because of all of this, GenXD represents a significant leap forward in 3D and 4D visual generation.
[tr:trw].