Topping Two Prestigious International Ranks! Haidian-Based AI Company Unveils New General World Action Model

SOURCE: Beijing Haidian

TIME: 2026.05.12

For a long time, embodied intelligent robots have faced a core industry pain point: the fragmentation of perception, planning, and execution modules. Robots either understand the environment but struggle with precise actions, or perform simple operations but fail to anticipate environmental changes, making it difficult to complete complex, continuous tasks stably in real-world scenarios.

Recently, Haidian-based AI company ShengShu Technology officially launched its general world action model, Motubrain. As a key milestone in the company's world model development, Motubrain is designed as the "universal brain" for embodied intelligent robots. With features like multi-robot adaptation, multi-task generalization, and long-horizon task execution, it enables robots to perform continuous and complex tasks more stably in real-world scenarios such as households, factories, and commercial spaces, offering a new technical path for embodied intelligence to transition from lab to reality.

Core Technological Breakthrough: Unified Modeling Breaks Down Barriers Between Perception and Action

Motubrain's core innovation lies in its unified modeling of "the world as seen" and "the actions to take" within a single model. This allows robots not only to understand their environment but also to imagine and predict changes in it, generating executable action strategies.

Specifically, Motubrain leverages the original UniDiffuser framework to unify the modeling of two continuous modalities: video and action. This enables the model to simultaneously learn the relationships between environmental changes, action execution, and task outcomes. With a single training run, Motubrain supports multiple capabilities, including Vision-Language-Action (VLA) tasks, video generation, inverse dynamics modeling, and video-action joint prediction, eliminating the need for separate models to handle perception, prediction, planning, and execution.

Based on this, Motubrain further adopts a three-stream MoT (Motion Transformer) architecture that integrates video, action, and language. By combining capabilities from existing multi-modal pre-trained models and expert models, it can simultaneously perform scene understanding, language instruction following, outcome prediction, and action generation. Unlike traditional fragmented approaches where perception, planning, and execution are handled separately, Motubrain's unified architecture connects the full task pipeline, resulting in stronger semantic understanding, instruction following, and end-to-end action capabilities.

More importantly, unified modeling enables Motubrain to continuously learn from a broader range of data. It can ingest not only complete robot task trajectory data but also video data without action annotations, task-agnostic data without language instructions, and video, action, and language data from different robot platforms. Unlike traditional VLA models, which primarily rely on task trajectory data from specific robot hardware, Motubrain breaks down "data silos" by leveraging large-scale heterogeneous data, resulting in greater scalability and generalization capabilities.

Four Core Capabilities: Enabling Universal Robotic Action Across All Scenarios

Motubrain does not merely "teach robots to perform actions"—it equips them with the ability to understand, predict, and interact with the world. To achieve this goal, the model is built around four key capabilities:

One Brain, Multiple Tasks: Handle a Wide Range of Tasks

Motubrain delivers consistent performance across multi-task scenarios, eliminating the need for single-task training. As the number of tasks grows, the shared world knowledge across tasks increases, driving up the model's average task success rate and demonstrating superior multi-task integration and generalization capabilities.

One Brain, Multiple Platforms: Adapt to Diverse Robot Hardware

Motubrain is not tailored for a single robot model but designed as a unified intelligent foundation for multi-robot systems. Its multi-platform adaptation capability breaks the traditional "one robot, one model" paradigm. The model makes good use of heterogeneous data, and as the ecosystem expands with more robot types, scenarios, and data, its capabilities continue to evolve—enhancing overall versatility while improving performance across all robot platforms in the ecosystem.

One Brain, Continuous Flow: Complete Long-Horizon Tasks in One Go

Motubrain learns full task pipelines directly, eliminating the need for high-level planning, task decomposition, fast-slow dual systems, or multiple model stitching. This delivers higher success rates in complex long-horizon tasks. A single World Action Model can complete complex tasks involving up to 10 atomic actions, moving beyond the typical 2–3 atomic action demos. For robots, tasks are no longer isolated actions but continuous, closed-loop processes that require sustained execution.

One Brain, Foresight: Enable Dynamic Decision-Making

Beyond simply executing commands, Motubrain understands the world and predicts environmental changes, reasoning to determine optimal actions and motion paths. By combining how it understands the world, makes predictions, and carries out actions, the model continuously evaluates, adjusts, and acts in changing situations, achieving the ability to "predict the world and drive action."

These capabilities are not limited to a single environment but extend to a wide range of real-world scenarios. In households, Motubrain supports continuous tasks such as meal preparation, tidying, and care services; in industrial settings, it adapts to complex process operations like sorting, handling, and assembly; in commercial environments, it enables multi-step tasks including guided tours, pick-and-delivery, inventory organization, and coordinated service workflows.

Motubrain has already claimed first place in two prestigious international benchmarks, WorldArena and RoboTwin 2.0. This validates the feasibility of unified modeling for "predicting the world and driving action," marking a significant step forward for ShengShu Technology in advancing general physical intelligence from technical exploration to real-world application.

Topping Dual Benchmarks: International Recognition of Technical Strength

The most notable achievement of Motubrain's launch is its simultaneous top ranking in two international benchmarks, long regarded as "poles of distinct capabilities." WorldArena focuses on world model capabilities, measuring whether a model truly understands and predicts physical laws. RoboTwin 2.0 evaluates robotic execution capabilities, assessing task performance and generalization in complex, randomized environments.

While these benchmarks appear to target different areas, they address the two most critical capabilities of embodied intelligence: understanding and predicting the world, and interacting with and acting upon the world.

In WorldArena testing, Motubrain secured first place across key metrics, including Motion Quality, Flow Score (trajectory coherence), and Motion Smoothness, demonstrating a deep understanding of real-world physical motion dynamics.

In RoboTwin 2.0, Motubrain achieved an average score of 96.0 across 50 complex tasks, becoming the only model on the benchmark with an average score above 95 in randomized environments. This highlights exceptional task execution stability and cross-scenario generalization capabilities. Motubrain's leadership is not limited to isolated technical breakthroughs—it systematically unifies "understanding the world" and "driving action" within a single model framework, bridging the historical gap between models that "see but cannot act" and those that "act without foresight."

Ecosystem Collaboration: Moving Technology from Lab to Industrial Application

From Motus to Motubrain: The World Action Model (WAM) has emerged as a new path for embodied intelligence. In the evolution of world model technology, ShengShu Technology has firmly chosen a more cutting-edge yet challenging direction: the World Action Model (WAM).

"The ceiling of technology determines potential, while the depth of deployment determines scale," noted a representative from Shengshu Technology. "Motubrain's significance lies not only in proving the feasibility of a 'universal robot brain' but also in its progress toward real-world industrial applications."

Recently, ShengShu Technology has entered into strategic partnerships with leading embodied intelligence companies including Anyverse Dynamics (Wujie Power), Simple AI (Shenpu Intelligence), and Astribot (Stardust Intelligence). These collaborations focus on advancing the development of universal embodied intelligent brains, driving key progress in base model evolution, multi-modal and embodied data fusion, high-quality data infrastructure, and integrated hardware-software optimization.

Through ongoing collaboration with robot hardware, data, scenario, and application ecosystem partners, ShengShu Technology is redefining the technical foundation of embodied intelligence with its general world model. The company is driving deep integration between world models and robotic systems to build an open ecosystem tailored for real-world applications.

If Motubrain answers the question "Can a universal robot brain work?" These deep partnerships with embodied intelligence companies address the next critical question: "How can such a universal brain be deployed in real-world scenarios?"

This means ShengShu Technology is accelerating the development of a complete pipeline—from general world models to robot hardware adaptation, and ultimately to real-world deployment. Motubrain is more than just a product launch or benchmark score update; it marks a key milestone for ShengShu Technology in advancing from capability validation to ecosystem building, and from technical breakthroughs to industrial-scale application in the world model space.

Related links

This is Haidian

INVESTING

STUDYING

TRAVELING

WORKING

living

Contact us