News

Ming-Chi Kuo: Integrating into the NVIDIA ecosystem will increase LPU production by 10 times, which will have a significant impact on the PCB supply chain

# Zhao Ying

Source: Wall Street CN

At NVIDIA’s GTC conference, Jensen Huang officially integrated the Groq 3 LPU into the Rubin platform. Ming-Chi Kuo immediately released a supply chain survey: LPU shipments are expected to reach **4–5 million units** in 2026–2027, a **10x surge** from historical annual output. Rack density jumps from 64 to 256 units, driving a new cycle in the PCB supply chain — **WUS Printed Circuit** could emerge as the biggest winner.

NVIDIA’s inclusion of Groq LPU technology in the Rubin platform is sparking a profound transformation at the supply-chain level.

At NVIDIA’s GTC conference, CEO Jensen Huang announced the **Nvidia Groq 3 LPU** chip, formally adding it to the Vera Rubin platform as a core inference acceleration component for next‑generation AI data centers.

Famous Apple supply chain analyst Ming‑Chi Kuo quickly followed with a supply chain report. After NVIDIA’s investment in Groq, LPU shipment forecasts have been sharply revised upward. He estimates total LPU shipments will reach **4–5 million units** in 2026–2027, representing an increase of roughly **10 times** historical annual volumes.

Kuo believes this explosive growth is driven by two core forces:

1. Deep integration between the LPU and NVIDIA’s CUDA ecosystem drastically lowers development barriers.

2. Rapid expansion of ultra‑low‑latency inference scenarios such as AI agents, real‑time consumer applications, and physical AI.

He also noted that large‑scale mass production of LPU/LPX racks will heavily impact the PCB supply chain, with **WUS Printed Circuit** poised to be a key beneficiary.

## Jensen Huang at GTC: LPU officially becomes the 7th pillar of the Rubin platform

In his GTC keynote, Huang detailed how NVIDIA has integrated IP from last year’s Groq acquisition into the Rubin platform.

As an inference‑accelerator chip, the **Nvidia Groq 3 LPU** becomes Rubin’s seventh core building block, joining:

- Rubin GPU

- Vera CPU

- NVLink 6 Switch

- ConnectX 9 SmartNIC

- Bluefield 4 DPU

- Spectrum‑X Switch

Architecturally, the Groq 3 LPU takes a distinct path from mainstream AI accelerators.

While most AI accelerators rely on HBM as working memory, each Groq 3 LPU features **500MB of on‑chip SRAM** — the same high‑speed memory used in CPU and GPU caches.

Although this is far smaller than the 288GB of HBM4 on Rubin GPUs, its bandwidth reaches **150 TB/s**, drastically exceeding the GPU’s 22 TB/s HBM bandwidth.

For bandwidth‑sensitive AI decoding workloads, the Groq 3’s extreme bandwidth provides decisive advantages in inference deployments — especially for cutting‑edge AI models requiring high throughput, low latency, and highly interactive output.

## Supply Chain Survey: 4–5 million LPUs expected in 2026–2027

According to Ming‑Chi Kuo’s latest supply chain research, LPU shipment forecasts have seen a material upgrade following NVIDIA’s stake in Groq.

He projects **4–5 million total LPU units** in 2026–2027:

- 30–40% in 2026

- 60–70% in 2027

This represents a roughly **10x jump** from historical annual production.

At the rack level, NVIDIA plans to raise LPU density **from 64 units per rack to 256 units**, to preserve ultra‑low latency during inference and decoding, while supporting growing KV‑cache demands from long‑context inference.

Kuo expects the new rack architecture to enter mass production between **Q4 2026 and Q1 2027**.

Rack shipments are projected to surge from **300–500 units in 2026** to **15,000–20,000 units in 2027**.

## Ecosystem integration is key: three technical nodes determine adoption speed

Kuo emphasizes that the rapid growth in LPU demand is fundamentally driven by **deep integration with the NVIDIA ecosystem**.

Compatibility with CUDA drastically lowers the barrier for application development and deployment, allowing developers to use LPU compute without rebuilding existing workflows.

Meanwhile, fast‑growing ultra‑low‑latency inference use cases — including AI agents (e.g., coding assistants), real‑time consumer applications, and physical AI — further boost LPU demand.

He highlights three critical technical integration milestones to watch:

1. **Network architecture**: whether rack‑scale interconnection works smoothly via NVLink Fusion and RealScale.

2. **Developer interface**: whether Nvidia NIM allows workload deployment without distinguishing between GPU and LPU.

3. **Compiler**: whether TensorRT‑LLM supports the LPU’s “compile‑first” architecture.

The pace of these three integrations will directly determine how quickly and deeply LPUs scale commercially.

## PCB supply chain enters a new cycle: WUS Printed Circuit as core beneficiary

Kuo stresses that large‑scale production of LPU/LPX racks is highly significant for the PCB supply chain.

These racks represent the **first large‑scale commercial deployment of M9‑grade CCL (copper clad laminate)** materials, in which WUS Printed Circuit plays a critical role.

M9‑grade CCL demands extremely high manufacturing standards, involving breakthroughs in high‑layer‑count boards using quartz glass fabric.

If LPU/LPX racks ramp successfully, Kuo believes this will:

- Substantially contribute to WUS’s 2027 earnings

- Validate the company’s technological capabilities in high‑end manufacturing

- Potentially catalyze a new growth cycle for the entire PCB industry

### Risk Warning & Disclaimer

Markets are risky and investments require caution. This article does not constitute personalized investment advice and does not account for any user’s specific investment objectives, financial situation, or needs. Users should consider whether any views, opinions, or conclusions in this article suit their particular circumstances. Any investment decisions made based on this material are the sole responsibility of the user.

PREVIOUS：The Reserve Bank of Australia raises interest NEXT：Another oil tanker was attacked near Hormuz,