NVIDIA Explains New Humanoid Robot Tech At CES 2025 (AI FUTURE TECH)

Spread the Truth

[ai-buttons]

Summary

NVIDIA is developing a new technology platform called Cosmos, which uses artificial intelligence (AI) to understand the physical world. This platform, trained on 20 million hours of video, aims to revolutionize the robotics and industrial AI industry. The technology is designed to work with Omniverse, a physics-based simulator, to create a system that can control and condition AI generation. This combination will be crucial for training robots, especially in the fields of self-driving cars and human robots, and is expected to lead to significant advancements in general robotics.

Transcript

There are three robots. The chat GPT moment for general robotics is just around the corner. This will be the largest technology industry the world has ever seen. NVIDIA Cosmos, the world’s first world foundation model. It is trained on 20 million hours of video. The 20 million hours of video focuses on physical dynamic things. So, dynamic nature themes, themes, humans, walking, hands moving, manipulating things, you know, things that are fast camera movements. It’s really about teaching the AI, not about generating creative content, but teaching the AI to understand the physical world. All of this is part of the Cosmos platform.

We hope that this moment, and there’s a small medium large for very fast models, you know, mainstream models, and also teacher models, basically not knowledge transfer models. Cosmos world foundation model being open, we really hope, will do for the world of robotics and industrial AI what Llama 3 has done for enterprise AI. The magic happens when you connect Cosmos to Omniverse. And the reason fundamentally is this, Omniverse is a physics grounded, not physically grounded, but physics grounded. It’s algorithmic physics, principled physics simulation grounded system. It’s a simulator. When you connect that to Cosmos, it provides the grounding, the ground truth that can control and to condition the osmos generation.

As a result, what comes out of osmos is grounded on truth. This is exactly the same idea as connecting a large language model to a reg, to a retrieval augmented generation system. You want to ground the AI generation on ground truth. And so the combination of the two gives you a physically simulated, a physically grounded, multiverse generator. And the application and the use cases are really quite exciting. And of course, for robotics, for industrial applications, it is very, very clear. Omniverse plus Cosmos represents the third computer that’s necessary for building robotic systems. Every robotics company will ultimately have to build three computers.

The robotic system could be a factory, the robotic system could be a car, it could be a robot. You need three fundamental computers, one computer, of course, to train the AI. We call it the DGX computer to train the AI. Another, of course, when you’re done, to deploy the AI, we call that AGX, that’s inside the car, in the robot, or in an AMR, or in a stadium, or whatever it is, these computers are at the edge and they’re autonomous. But to connect the two, you need a digital twin. And this is all the simulations that you were seeing.

The digital twin is where the AI that has been trained goes to practice, to be refined. And so it’s the digital twin of the AI. These three computers are going to be working interactively. The chat GPT moment for general robotics is just around the corner. And in fact, all of the enabling technologies that I’ve been talking about is going to make it possible for us in the next several years to see very rapid breakthroughs, surprising breakthroughs in general robotics. Now, the reason why general robotics is so important is, whereas robots with tracks and wheels require special environments to accommodate them, there are three robots that we can make that require no green fields.

Brown field adaptation is perfect. If we could possibly build these amazing robots, we could deploy them in exactly the world that we’ve built for ourselves. These three robots are, one, agentic robots and agentic AI, because, you know, they’re information workers, so long as they could accommodate the computers that we have in our offices, it’s going to be great. Number two, self-driving cars. And the reason for that is we spent 100-plus years building roads and cities. And then number three, human robots. If we have the technology to solve these three, this will be the largest technology industry the world’s ever seen.

And so we think that robotics era is just around the corner. The critical capability is how to train these robots. In the case of human-owned robots, the imitation information is rather hard to collect. And the reason for that is, in the case of cars, you just drive it. We’re driving cars all the time. In the case of these human-owned robots, the imitation information, the human demonstration, is rather laborious to do. And so we need to come up with a clever way to take hundreds of demonstrations, thousands of human demonstrations, and somehow use artificial intelligence and omniverse to synthetically generate millions of motions.

And from those motions, the AI can learn how to perform a task. But now artificial intelligence is everywhere. It’s not just in researchers and startup labs. This is now the new way of doing computing. This is the new way of doing software. Every software engineer, every engineer, every creative artist, everybody who uses computers today as a tool will need an AI supercomputer. Project digits. Here’s the amazing thing. This is an AI supercomputer. It runs the entire NVIDIA AI stack. All of NVIDIA software runs on this. DGX Cloud runs on this. This sits and it’s wireless or connected to your computer.

It’s even a workstation if you like it to be. And you could access it. You could reach it like a cloud supercomputer. And NVIDIA’s AI works on it. If you would like to have double digits and you connect it together with ConnectX and it has nickel, GPU direct, all of that out of the box. It’s like a supercomputer. Our entire supercomputing stack is available. We’ve created three things for helping the ecosystem build agentic AI. NVIDIA NIMS, which are essentially AI microservices, all packaged up. It takes all of this really complicated CUDA software and the model itself.

We package it up. We optimize it. We put it into a container. And you can take it wherever you like. And these AI models run in every single cloud. The next layer is what we call NVIDIA NIMO is essentially a digital employee onboarding and training evaluation system. In the future, these AI agents are essentially digital workforce that are working alongside your employees, doing things for you on your behalf. So that entire pipeline, digital employee pipeline, is called NIMO. And on top of that, we provide a whole bunch of blueprints that our ecosystem could take advantage of.

All of this is completely open source. And so you could take it and modify the blueprints. We have blueprints for all kinds of different types of agents. We’re announcing a whole family of models that are based off of LAMA, the NVIDIA LAMA NIMOTRON language foundation models. It’s 72 Blackwell GPUs or 144 dies. This one chip here is 1.4 exaFLOPS. It has 14 terabytes of memory, but here’s the amazing thing. The memory bandwidth is 1.2 petabytes per second. That’s basically the entire internet traffic that’s happening right now. And so the thing that we would like to do is we would like to have in the future your AI basically become your AI assistant.

And instead of just the 3D APIs and the sound APIs and the video APIs, you would have generative APIs, generative APIs for 3D and generative APIs for language and generative AI for sound and so on and so forth. And we need a system that makes that possible while leveraging the massive investment that’s in the cloud. And so if we could figure out a way to make Windows PC a world-class AI PC, it would be completely awesome. And it turns out the answer is Windows. It’s Windows WSL2. Basically, it’s two operating systems within one.

It works perfectly. It’s developed for developers and it’s developed so that you can have access to bare metal. WSL2 has been optimized. And so WSL2 supports CUDA perfectly out of the box. We used GeForce to enable artificial intelligence and now artificial intelligence is revolutionizing GeForce. Everyone, today we’re announcing our next generation, the RTX Blackwell family. 92 billion transistors, 4,000 tops, four petaflops of AI, three times higher than the last generation ADA, 380 ray tracing teraflops so that we could, for the pixels that we have to compute, compute the most beautiful image you possibly can.

And of course, 125 shader teraflops. There is actually a concurrent shader teraflops as well as an integer unit of equal performance. So two dual shaders, one is for floating point, one is for integer. G7 memory from Micron, 1.8 terabytes per second, twice the performance of our last generation. And we now have the ability to intermix AI workloads with computer graphics workloads. [tr:trw].

Spread the Truth

AI News

NVIDIA Explains New Humanoid Robot Tech At CES 2025 (AI FUTURE TECH)

Summary

Transcript

Leave a Reply Cancel reply

No Fake News, No Clickbait, Just Truth!

Subscribe to our free newsletter for high-quality, balanced reporting right in your inbox.

Subscribe Free Now Below!

No Fake News, No Clickbait, Just Truth!

Subscribe to our free newsletter for high-quality, balanced reporting right in your inbox.

Subscribe Free Now Below!

Summary

Transcript

Leave a Reply Cancel reply