News

NVIDIA develops intelligent agents! The open source model Nemotron 3 Super has 120 billion parameters and increases throughput five times

# Li Dan

Source: Wallstreetcn

Nemotron 3 Super activates only 12 billion active parameters during inference and natively supports a 1-million-token context window. Its performance leap comes from three architectural innovations: a hybrid Mamba-Transformer backbone, latent Mixture of Experts (latent MoE), and Multi-Token Prediction (MTP). Running at NVFP4 precision on the Blackwell platform, the model delivers up to 4× faster inference speed compared to FP8 on the Hopper platform with no loss in accuracy. Perplexity has become the first partner to adopt the model for agent tasks.

NVIDIA is stepping up its push in the autonomous agent infrastructure race, marking a strategic shift for the chip giant from being a hardware supplier to a deep player in the model layer of the artificial intelligence (AI) competition.

On Wednesday, March 11 (ET), NVIDIA announced the launch of Nemotron 3 Super, a new-generation open-source large language model designed specifically for enterprise-grade multi-agent systems. Featuring an all-new Mixture of Experts (MoE) architecture, it boosts inference throughput by more than 5× over the previous generation. The model has a total of 120 billion parameters, with only 12 billion activated during inference, and natively supports a 1‑million-token context window.

According to NVIDIA, Nemotron 3 Super ranks first on Artificial Analysis in terms of efficiency and openness, leading in accuracy among models of comparable scale. It also powers NVIDIA’s AI-Q research agent to top positions on both the DeepResearch Bench and DeepResearch Bench II leaderboards.

NVIDIA unveiled the first wave of partners for Nemotron 3 Super. AI search firm Perplexity is the first partner to integrate the model for agent tasks, providing users with multi-agent orchestration services across its search and Computer products. Enterprise software leaders including Palantir, Siemens, Cadence, Dassault Systèmes, and Amdocs have also announced plans to deploy the model for workflow automation in telecommunications, cybersecurity, semiconductor design, manufacturing, and other fields.

Nemotron 3 Super is now available to developers via NVIDIA’s build.nvidia.com, Hugging Face, and OpenRouter.

## Two Bottlenecks Spawn a New Architecture

In a blog post, NVIDIA noted that enterprises face two core constraints when moving from chatbots to multi-agent applications.

First, **“context explosion”**: each interaction in a multi-agent workflow requires retransmitting the full history (including tool outputs and intermediate reasoning steps), generating up to 15× more tokens than a standard conversation. As tasks extend, this massive context not only raises costs but also causes “goal drift”—where agents gradually deviate from the original objective.

Second, **“thinking tax”**: complex agents must reason at every step. If large models are called for every subtask, multi-agent applications become impractical due to high costs and slow responses.

Nemotron 3 Super directly addresses context explosion with its native 1‑million-token context window, ensuring agents maintain state coherence and avoid goal drift over long tasks. The hybrid architecture design specifically eliminates the thinking tax.

## Three Architectural Innovations Enable 5× Speedup

NVIDIA’s blog revealed that Nemotron 3 Super’s performance jump stems from three core architectural innovations.

- **Hybrid Mamba-Transformer backbone**: The model interleaves Mamba-2 layers and Transformer attention layers. Mamba layers handle most sequential tasks, delivering 4× improvements in memory and compute efficiency with linear time complexity, making the million-token context window practically feasible. Transformer layers are inserted at critical depths to preserve precise relational recall.

- **Latent Mixture of Experts (latent MoE)**: Before routing decisions, token embeddings are compressed into a low-rank latent space. Expert computations are performed in this smaller dimension and then projected back to the full dimension. NVIDIA states this design allows the model to activate 4× more experts at the same inference cost, enabling finer-grained specialized routing—such as activating distinct experts for Python syntax and SQL logic.

- **Multi-Token Prediction (MTP)**: The model predicts multiple future tokens in parallel within a single forward pass, rather than generating token-by-token. NVIDIA says this design strengthens the model’s internalization of long-range logical dependencies during training and enables built-in speculative decoding at inference, delivering up to 3× speedups for structured generation tasks like code and tool calls—without requiring an extra draft model.

Running on NVIDIA’s Blackwell platform at NVFP4 precision, the model achieves up to 4× faster inference versus FP8 on the Hopper platform, with no accuracy loss per NVIDIA.

## Open Weights and Multi-Layer Ecosystem Layout

Unlike most cutting-edge models today that only offer API-only access, NVIDIA has chosen to release Nemotron 3 Super’s weights, datasets, and training scripts under a permissive license, allowing developers to freely deploy and customize on workstations, in data centers, or in the cloud.

NVIDIA simultaneously released full training and evaluation pipelines covering pre-training to alignment, along with over 10 trillion tokens of pre-training and post-training datasets, 21 reinforcement learning environments, and evaluation protocols. During pre-training, the model was trained natively at NVFP4 precision on 25 trillion tokens, learning accuracy under 4-bit floating-point constraints from the first gradient update—rather than being quantized post-training.

Ecologically, NVIDIA has partnered with major cloud providers and hardware vendors including Google Cloud Vertex AI, Oracle Cloud Infrastructure, Dell Technologies, and HPE. Integrations with Amazon AWS Bedrock and Microsoft Azure are also in progress. Software development agent firms such as CodeRabbit, Factory, and Greptile, as well as life science institutions Edison Scientific and Lila Sciences, have announced plans to integrate the model into their agent workflows.

## “Super + Nano” Combined Deployment

NVIDIA’s blog also outlined the collaborative deployment logic for the Nemotron 3 series. The Nano version of the Nemotron 3 model, launched last December, is suited for targeted single-step tasks in agent workflows, while Nemotron 3 Super is built for complex multi-step tasks requiring deep planning and reasoning.

In software development, for example, NVIDIA suggests: simple merge requests can be handled by Nano, complex coding tasks requiring deep codebase understanding by Super, and expert-level tasks can further call third-party proprietary models. This layered architecture helps enterprises strike the optimal balance between cost and capability.

For specific use cases:

- Software development agents can load entire codebases into context in one go for end-to-end code generation and debugging.

- Financial analysis can load thousands of pages of reports into memory, eliminating repeated reasoning across long conversations.

- Autonomous security orchestration in cybersecurity benefits from high-precision tool calls, avoiding execution errors in high-risk environments.

## Extending the Hardware Moat to the Model Layer

Behind NVIDIA’s open-model strategy lies clear commercial logic. Previously, NVIDIA built its AI dominance primarily by selling GPUs to model providers such as OpenAI and Google. Now, if Nemotron becomes the dominant base model for enterprise agent AI, the GPU infrastructure needed to run it at scale will still rely on NVIDIA—opening up the model layer while locking in demand at the hardware layer.

Nemotron 3 Super is already packaged and delivered via NVIDIA NIM microservices, supporting flexible deployment from on-premises to cloud.

Whether performance holds up under production-grade workloads and how enterprise customers trade off open flexibility against the capabilities of competitors’ proprietary models will be key variables in measuring the success of this strategy.

---

### Risk Warning and Disclaimer

The market is subject to risks; investing involves risks. This article does not constitute personal investment advice and does not take into account the specific investment objectives, financial situations, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article suit their particular circumstances. Investment decisions made based on this article are the sole responsibility of the user.

PREVIOUS：Trump said that the military operation agains NEXT：Iran confirmed its attack on ships in the Str