San Jose is buzzing this morning. NVIDIA’s GPU Technology Conference kicked off today, March 16, and runs through March 19 across the McEnery Convention Center and nearby venues, with thousands of developers, researchers, and executives in town alongside a global virtual audience. This is the event where roadmaps are revealed, ecosystems align, and the market often reacts in real time. At its core is CEO Jensen Huang’s keynote, set for 11 a.m. PT at the SAP Center, preceded by a live pregame show starting at 8 a.m. featuring voices like Sarah Guo of Conviction, Gavin Baker of Atriedes Management, and Alfred Lin of Sequoia Capital dissecting the year ahead in AI infrastructure.
If you’re not in the Bay Area, catching it is straightforward: the full keynote streams free on NVIDIA’s site at nvidia.com/gtc/keynote, with the pregame available there too. YouTube will carry it as well for most viewers. Just load the page early; demand spikes right at the top of the hour, but the stream has held up reliably in past years. The two-hour address will blend the usual big-picture vision with concrete product signals, and this time the spotlight sits squarely on the next phase of AI economics.
The narrative has shifted noticeably since last year’s event. Training frontier models on ever-larger GPU clusters drove NVIDIA’s explosive growth through the H100, H200, and Blackwell generations. Now the bottleneck, and the bigger opportunity, lies in inference: running those models at scale for chat, agents, search, robotics, and enterprise applications where latency, cost per token, and sustained throughput actually matter to customers. Hyperscalers and large enterprises are no longer just buying accelerators for one-off training runs; they’re building persistent, gigawatt-scale inference capacity. That change in focus explains why so much attention surrounds NVIDIA’s December move with Groq.
The deal, valued at roughly $20 billion in cash for assets and technology, wasn’t a full company takeover. NVIDIA secured a non-exclusive perpetual license to Groq’s LPU inference architecture, optimized for fast, low-cost decode phases, while bringing on founder Jonathan Ross (a Google TPU veteran), president Sunny Madra, and a significant portion of the engineering team. Groq itself continues independently, but the transaction effectively folded its specialized inference IP into NVIDIA’s broader stack. Analysts have described it as the piece that completes NVIDIA’s transition from GPU-centric training leadership to full-spectrum infrastructure provider. The first tangible outputs from that integration are widely expected on stage today, potentially in the form of a new inference-focused platform, hybrid server designs, or disaggregated accelerators that pair Groq-derived decode hardware with NVIDIA’s existing strengths in prefill and networking.
That development arrives alongside deeper details on Vera Rubin, the post-Blackwell architecture already teased in recent quarters. Rubin brings HBM4 memory, a generational leap in transistor count, and optimizations tailored to mixture-of-experts models and long-context workloads that now dominate real-world use. Early signals point to 3.3x or greater inference performance gains in flagship configurations like the VR200 NVL72, along with variants such as CPX for prefill-heavy tasks. Expect Huang to outline availability timelines, likely ramping in late 2026 or 2027, ecosystem readiness, and how the platform fits into the emerging category of “AI factories” that treat inference as production infrastructure rather than occasional compute.
The broader context here is telling. NVIDIA’s market position remains formidable, with a valuation hovering near the $4 trillion mark, but the economics of AI deployment are pressuring margins and opening doors for challengers. Custom silicon from hyperscalers, AMD’s competing accelerators, and lingering inference specialists are all gaining traction in niches where general-purpose GPUs face efficiency trade-offs. By integrating Groq’s strengths and accelerating Rubin, NVIDIA aims to close those gaps while reinforcing the CUDA software moat that keeps developers locked in.
Sessions across the week will expand on agentic AI, physical AI for robotics, sovereign AI builds, and the full five-layer stack, energy, chips, systems, models, applications, that Huang has increasingly emphasized. The message is consistent: AI is no longer an experiment; it’s infrastructure every company and nation will need to build or access. GTC has always served as the annual checkpoint for that buildout, but 2026 feels particularly pivotal because the money and attention are moving downstream from training to deployment.
Also read: Nvidia-Backed Cassava Launches AI to Fix Africa’s Network Downtime in 2026
Whether today’s announcements deliver the expected inference breakthroughs or simply set the table for the next 12 months, the event underscores a larger truth. The companies that master efficient, cost-effective inference at scale will capture the real value in the AI era. NVIDIA, fresh off its biggest deal to date and rolling out its next flagship architecture, intends to stay at the center of that transition. The next few hours on stage will show how far along that path it has come.





