Tech

Nvidia releases DreamDojo, a robot ‘world model’ trained on 44,000 hours of human video

Published

on

A team of researchers led by Nvidia has released DreamDojo, a new AI system designed to teach robots how to interact with the physical world by watching tens of thousands of hours of human video — a development that could significantly reduce the time and cost required to train the next generation of humanoid machines.

The research, published this month and involving collaborators from UC Berkeley, Stanford, the University of Texas at Austin, and several other institutions, introduces what the team calls “the first robot world model of its kind that demonstrates strong generalization to diverse objects and environments after post-training.”

At the core of DreamDojo is what the researchers describe as “a large-scale video dataset” comprising “44k hours of diverse human egocentric videos, the largest dataset to date for world model pretraining.” The dataset, called DreamDojo-HV, is a dramatic leap in scale — “15x longer duration, 96x more skills, and 2,000x more scenes than the previously largest dataset for world model training,” according to the project documentation.

A simulated robot places a cup into a cardboard box in a workshop setting, one of thousands of scenarios DreamDojo can model after training on 44,000 hours of human video. (Credit: Nvidia)

Advertisement

Inside the two-phase training system that teaches robots to see like humans

The system operates in two distinct phases. First, DreamDojo “acquires comprehensive physical knowledge from large-scale human datasets by pre-training with latent actions.” Then it undergoes “post-training on the target embodiment with continuous robot actions” — essentially learning general physics from watching humans, then fine-tuning that knowledge for specific robot hardware.

For enterprises considering humanoid robots, this approach addresses a stubborn bottleneck. Teaching a robot to manipulate objects in unstructured environments traditionally requires massive amounts of robot-specific demonstration data — expensive and time-consuming to collect. DreamDojo sidesteps this problem by leveraging existing human video, allowing robots to learn from observation before ever touching a physical object.

One of the technical breakthroughs is speed. Through a distillation process, the researchers achieved “real-time interactions at 10 FPS for over 1 minute” — a capability that enables practical applications like live teleoperation and on-the-fly planning. The team demonstrated the system working across multiple robot platforms, including the GR-1, G1, AgiBot, and YAM humanoid robots, showing what they call “realistic action-conditioned rollouts” across “a wide range of environments and object interactions.”

Why Nvidia is betting big on robotics as AI infrastructure spending soars

The release comes at a pivotal moment for Nvidia’s robotics ambitions — and for the broader AI industry. At the World Economic Forum in Davos last month, CEO Jensen Huang declared that AI robotics represents a “once-in-a-generation” opportunity, particularly for regions with strong manufacturing bases. According to Digitimes, Huang has also stated that the next decade will be “a critical period of accelerated development for robotics technology.”

Advertisement

The financial stakes are enormous. Huang told CNBC’s “Halftime Report” on February 6 that the tech industry’s capital expenditures — potentially reaching $660 billion this year from major hyperscalers — are “justified, appropriate and sustainable.” He characterized the current moment as “the largest infrastructure buildout in human history,” with companies like Meta, Amazon, Google, and Microsoft dramatically increasing their AI spending.

That infrastructure push is already reshaping the robotics landscape. Robotics startups raised a record $26.5 billion in 2025, according to data from Dealroom. European industrial giants including Siemens, Mercedes-Benz, and Volvo have announced robotics partnerships in the past year, while Tesla CEO Elon Musk has claimed that 80 percent of his company’s future value will come from its Optimus humanoid robots.

How DreamDojo could transform enterprise robot deployment and testing

For technical decision-makers evaluating humanoid robots, DreamDojo’s most immediate value may lie in its simulation capabilities. The researchers highlight downstream applications including “reliable policy evaluation without real-world deployment and model-based planning for test-time improvement” — capabilities that could let companies simulate robot behavior extensively before committing to costly physical trials.

This matters because the gap between laboratory demonstrations and factory floors remains significant. A robot that performs flawlessly in controlled conditions often struggles with the unpredictable variations of real-world environments — different lighting, unfamiliar objects, unexpected obstacles. By training on 44,000 hours of diverse human video spanning thousands of scenes and nearly 100 distinct skills, DreamDojo aims to build the kind of general physical intuition that makes robots adaptable rather than brittle.

Advertisement

The research team, led by Linxi “Jim” Fan, Joel Jang, and Yuke Zhu, with Shenyuan Gao and William Liang as co-first authors, has indicated that code will be released publicly, though a timeline was not specified.

The bigger picture: Nvidia’s transformation from gaming giant to robotics powerhouse

Whether DreamDojo translates into commercial robotics products remains to be seen. But the research signals where Nvidia’s ambitions are heading as the company increasingly positions itself beyond its gaming roots. As Kyle Barr observed at Gizmodo earlier this month, Nvidia now views “anything related to gaming and the ‘personal computer’” as “outliers on Nvidia’s quarterly spreadsheets.”

The shift reflects a calculated bet: that the future of computing is physical, not just digital. Nvidia has already invested $10 billion in Anthropic and signaled plans to invest heavily in OpenAI’s next funding round. DreamDojo suggests the company sees humanoid robots as the next frontier where its AI expertise and chip dominance can converge.

For now, the 44,000 hours of human video at the heart of DreamDojo represent something more fundamental than a technical benchmark. They represent a theory — that robots can learn to navigate our world by watching us live in it. The machines, it turns out, have been taking notes.

Advertisement

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending

Exit mobile version