China's Trillion-Parameter Open-Source Multimodal Model Takes Aim at Enterprise AI

YuanLab.ai's Yuan3.0 Ultra is a trillion-parameter multimodal foundation model for enterprise AI agents.

YuanLab.ai releases Yuan3.0 Ultra with a novel expert-pruning method that cuts pre-training compute by 49%, targeting enterprise agent workflows.

YuanLab.ai has officially open-sourced Yuan3.0 Ultra, a trillion-parameter multimodal foundation model — and one of only three open-source models in the world at that scale. Designed from the ground up for enterprise applications, the model delivers strong performance on multimodal document understanding, retrieval-augmented generation (RAG), tabular data analysis, content summarization, and tool calling — the exact capabilities that enterprise AI agents need most.

These strengths allow Yuan3.0 Ultra to handle the complex information formats found in real business environments: documents mixing text and graphics, multi-layered structured tables, and cross-document knowledge retrieval. The model is positioned as a core engine for building multimodal, data-driven enterprise agents on frameworks like OpenClaw.

Architecture: Unified Multimodal Design with MoE Backbone

Yuan3.0 Ultra adopts a unified multimodal architecture that enables joint modeling of visual and linguistic information. Its language backbone is built on a Mixture of Experts (MoE) design, starting at 1,515 billion parameters and optimized down to 1,010 billion through the team's LAEP method — achieving a **49% improvement in pre-training compute efficiency**. The model's active parameter count is 68.8 billion.

The architecture also introduces a **Localized Filtering Attention (LFA)** mechanism that strengthens semantic relationship modeling, delivering higher accuracy than classic attention structures.

Yuan3.0 Ultra is now fully open-source, with model weights and code available for free download on [GitHub](https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra).

Enterprise-Grade Multimodal Capabilities

Enterprise AI agents typically need to process documents, tables, and databases simultaneously while completing tasks through multi-step reasoning and tool calling. Yuan3.0 Ultra was purpose-built around these real-world enterprise information processing and task execution requirements.

Complex Document and Chart Understanding

In enterprise settings, critical information lives in technical proposals, financial reports, and industry research — documents packed with mixed text-graphic layouts, complex tables, and cross-page information dependencies. This is where building enterprise knowledge systems gets hardest.

Yuan3.0 Ultra leads Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.2 on benchmarks including DocMatix and MMTab for multimodal document understanding, demonstrating superior capabilities in text-graphic structure parsing and table semantic comprehension.

This enables the model to accurately parse mixed-format document structures, extract key data metrics, and support agent systems in completing document understanding, data extraction, and report summarization tasks — powering use cases like financial report analysis, contract review, and technical documentation parsing.

Multi-Source Information Retrieval and Integration

Enterprise knowledge is typically scattered across document repositories, knowledge bases, and business databases — fragmented sources with inconsistent structures. Effective information retrieval in this environment requires not just search capabilities but semantic integration and comprehensive analysis across multiple sources. Traditional retrieval systems return fragmented results that rarely form complete conclusions.

Yuan3.0 Ultra outperforms Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.2 on RAG benchmarks including ChatRAG and SummEval, demonstrating strong ability to perform deep semantic integration and generate coherent answers from retrieved results.

This capability enables complete information processing workflows — retrieval, comprehension, and synthesis — within enterprise knowledge environments, effectively supporting agents built on frameworks like OpenClaw in completing complex tasks using proprietary enterprise knowledge.

Data Analysis and Business Decision Support

Many enterprise decisions depend on database queries, report analysis, and cross-system data integration. These scenarios require translating business questions into database queries, then analyzing and summarizing the results — a process that traditionally demands manual SQL writing and report compilation.

Yuan3.0 Ultra excels on Text-to-SQL benchmarks including Spider and BIRD, outperforming leading models like Kimi K2.5 and DeepSeek V3.2 on Spider, demonstrating strong natural language understanding and structured query generation.

This powers agent-driven data querying, operational analysis, and report generation, enabling enterprises to build business analytics and decision systems on frameworks like OpenClaw.

Not More Experts — More Effective Experts

The research team discovered that expert load evolution during LLM pre-training follows two distinct phases:

**Phase 1: Initial transition.** During early pre-training, expert loads fluctuate wildly under the influence of random initialization. Token counts received by the same expert can differ by orders of magnitude.

**Phase 2: Stabilization.** Expert token loads converge, with each expert receiving relatively stable token volumes with minor fluctuations.

During the stable phase, token loads remain highly unbalanced — a few experts handle the bulk of computation while others sit chronically underloaded, wasting compute resources. The team found that the gap between the highest- and lowest-loaded experts can reach **nearly 500×**.

From a learning-mechanism perspective, this phenomenon reflects **Functional Specialization** — different experts gradually develop stable preferences for specific patterns, semantic structures, or task types during training, spontaneously forming specialized division of labor within the model. This mirrors how the human brain organizes cognition: neuroscience research shows that the cerebral cortex doesn't allocate neurons equally across all tasks but instead develops functionally specialized regions (visual cortex, language areas, motor cortex) to dramatically improve information processing efficiency.

The key question for large-scale MoE models becomes: how to identify and remove redundant structures that solidify during training while preserving specialized capabilities and maximizing compute efficiency.

LAEP: Layer-Adaptive Expert Pruning

To solve this, Yuan3.0 Ultra introduces **Layer-Adaptive Expert Pruning (LAEP)**, an algorithm designed for pre-training. LAEP uses expert load statistics formed during pre-training to dynamically identify low-contribution experts, then adaptively prunes and rearranges the model structure to concentrate compute on experts that actually matter.

From a neuroscience perspective, this resembles how the brain optimizes and reorganizes neural connections during long-term learning — retaining efficient information pathways, weakening inefficient connections, and improving overall cognitive efficiency while maintaining functional specialization.

**Results: 33.3% parameter reduction, 49% overall pre-training efficiency improvement.**

This research reveals an important insight: scaling LLMs shouldn't simply mean adding more parameters. Models should evolve into "cognitive systems" with structural specialization. Leveraging the expert differentiation that naturally emerges during training and using structural optimization to further improve learning and compute efficiency will be a key direction for future foundation model design.

Not Longer Thinking — More Effective Thinking

Yuan3.0 Ultra's training strategy focuses on a **Fast-thinking reinforcement learning paradigm**. Rather than simply extending reasoning chains, the model defaults to efficient short-path reasoning, prioritizing compute for high-information-gain steps over unconstrained reflective expansion.

During large-scale reinforcement learning, the team systematically optimized a **Reflection Inhibition Reward Mechanism (RIRM)** that constrains the number of reflections through reward signals. This causes the model to proactively reduce unnecessary reflection after reaching reliable answers while preserving necessary reasoning depth for complex problems — effectively mitigating the "overthinking" problem in fast-thinking mode.

Training results show that under this controlled fast-thinking strategy, model accuracy improved significantly while the number of tokens generated during inference continued to decline — achieving simultaneous optimization of accuracy and compute efficiency.

Open-Source for Real-World Deployment

Yuan3.0 Ultra is fully open-source, including model weights (16-bit and 4-bit versions), a technical report, complete training methodology, and evaluation results. The community can use these for secondary training and industry customization.

The LAEP method represents YuanLab.ai's latest exploration of next-generation foundation model architecture, offering the industry a new path for MoE model structural innovation and pre-training compute efficiency.

The team hopes the open-source release will push LLMs from capability demonstrations toward large-scale real-world deployment, providing enterprise users with a deeply optimized multimodal foundation model purpose-built for agent applications.

The Yuan3.0 foundation model family will include Flash, Pro, and Ultra versions at 40B, 200B, and 1T parameter scales respectively, with additional releases forthcoming.

**Links:** - Code: https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra - Paper: https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra/blob/main/Docs/Yuan3.0_Ultra%20Paper.pdf - Hugging Face: https://huggingface.co/YuanLabAI/Yuan3.0-Ultra-int4 - ModelScope: https://modelscope.cn/models/YuanLabAI/Yuan3.0-Ultra-int4 - WiseModel: https://www.wisemodel.cn/models/YuanLabAI/Yuan3.0-Ultra-int4