AMD Challenges NVIDIA with MI350 Series, ROCm 7, and Free Developer Cloud

By Mithun Mohandas | Updated on 13-Jun-2025

13-Jun-2025

There was a time when AMD was the underdog. Not just in CPUs, but across the board. But here we are in 2025, and the company’s annual Advancing AI event makes one thing clear: AMD is no longer playing catch-up. It’s building a future that covers the entire AI stack right from the silicon to the rack, and increasingly, to the software running on top of it.

At this year’s showcase, AMD rolled out a wide suite of announcements that together form its most comprehensive AI strategy yet. The launch of the Instinct MI350 Series, the ROCm 7 software stack, and a refreshed AMD Developer Cloud, all backed by a new 2030 energy-efficiency goal, signal a company that wants to go toe-to-toe with NVIDIA in the data centre. And if you’re looking at the competitive landscape right now, AMD isn’t just catching up, it’s starting to define its own playbook.

CDNA 4 and AMD Instinct MI350: Leapfrogging the MI300

The AMD Instinct MI350 Series is based on the new CDNA 4 architecture and represents a key pivot point in AMD’s AI ambitions. The MI350X and MI355X GPUs are designed not just for training massive models, but also for inference which is a critical aspect of AI deployment that has historically favoured competitors.

CDNA 4 brings a fresh compute layout with up to 1024 matrix cores, native support for ultra-low precision data types like FP4 and FP6, and 288 GB of HBM3E memory delivering 8 TB/s of bandwidth. AMD claims a 4x uplift in inference performance and up to 35x increase in throughput compared to the MI300X. That’s a bold claim, but it comes backed by real-world deployments at Oracle Cloud Infrastructure, Meta, and Microsoft Azure.

Also notable is the flexibility in deployment. The MI355X is built for high-density, liquid-cooled environments and slots into AMD’s own rack systems, while the MI350X is air-cooled and can be dropped into existing infrastructure more easily. With 64 GPUs per air-cooled rack, and up to 128 in a liquid-cooled setup, scale is no longer just theoretical.

MI355X Performance vs NVIDIA B200 and GB200

The MI350 also supports partitioning via multi-instance GPU modes (NPS1 and NPS2), which means you can run multiple Llama 3.1 70B parameter models on a single accelerator. For enterprises fine-tuning multiple models or running a fleet of chatbots, that’s a serious value proposition.

ROCm 7: 3x performance uplift

All the silicon in the world means little without a software stack that developers actually want to use. AMD knows this, and ROCm 7 is their answer. It’s a massive upgrade and AMD says ROCm 7 delivers up to 3.5x uplift in inference performance and 3.1x uplift in training over its predecessor. Key features include native support for distributed inference engines like vLLM and SGLang, prefill disaggregation, support for mixture-of-experts models, and a big leap in datatype flexibility.

But perhaps the bigger story is ROCm’s new cadence. It now updates every two weeks—a tacit acknowledgement that staying developer-relevant means moving at the pace of the open-source AI world. And in a sign that AMD is finally broadening its ecosystem, ROCm now works with Windows and Radeon hardware, and even supports development inside WSL. That opens the door to a whole new class of hobbyists, researchers, and smaller dev teams who don’t have data centre access but want to build or fine-tune models locally.

For enterprises, the new ROCm Enterprise AI edition adds telemetry, resource quotas, and secure MLOps capabilities. It’s now ready for containerised deployments using Kubernetes or Slurm. In short, AMD is trying to make ROCm the CUDA for everyone else.

Developer Cloud: Instant access to AMD GPUs

If AMD wants to attract developers, it’s going to have to put some GPUs on the table. The refreshed AMD Developer Cloud does exactly that. Developers can now get free access to MI300X and MI350-based systems via a simple JupyterLab interface. Whether you want to run a single MI300X with 192 GB of memory or an eight-GPU setup with 1.5 TB of pooled memory, the entry barrier is dramatically lower.

This isn’t just a sandbox for demos. It’s full-fledged, production-level hardware with pre-configured environments for PyTorch, Hugging Face, Triton, and more. Open-source contributors, students, and startups now have a direct path to evaluate AMD hardware for real-world AI use cases.

Beyond Chips: Energy Efficiency as a Strategy

While NVIDIA continues to lead with raw compute, AMD is making a different kind of bet: efficiency. AMD has already surpassed its 30×25 goal (improve the energy efficiency of accelerated compute nodes by 30x compared to a 2020 base year), achieving a 38x node-level energy efficiency uplift compared to 2020. That’s a 97% energy reduction for equivalent workloads. Now, it’s setting sights on a new target: a 20x rack-scale energy efficiency gain by 2030.

The idea is that a typical AI model that currently needs over 275 racks to train could, by 2030, run on a single rack powered by AMD silicon. The implications? Dramatic savings in electricity, cooling, and carbon emissions. It’s a long-term play, but it speaks volumes about how AMD is positioning itself in a future where power budgets are a key constraint.

Looking Ahead: Helios and the MI400 Era

One final teaser from the event was the Helios AI Rack, set to launch in 2026. It will be built around the MI400 Series GPUs, using HBM4 (up to 432 GB per GPU) and delivering 19.6 TB/s bandwidth. Paired with EPYC “Venice” CPUs and Pensando NICs, AMD says a single Helios rack will house 72 GPUs and offer a staggering 260 TB/s of internal interconnect bandwidth. That’s the kind of architecture you need for trillion-parameter models and next-gen recommender systems. Moreover, with the MI400 Series, AMD is aiming to hit a 10x compute performance improvement. With partners like Meta and OCI already committing to future deployments, and MI355X powering clusters as large as 131,000+ GPUs, AMD is laying down serious AI infrastructure roots.

AMD Instinct MI400 Series Performance 10X

NVIDIA’s answer?

This year’s Advancing AI wasn’t about catching up. It was about maturity. AMD has laid out a roadmap that’s not only credible but also differentiated. It’s no longer just the alternative to NVIDIA; it’s building its own lane. With the MI350 Series covering inference and training, ROCm 7 finally becoming a robust developer platform, and rack-level efficiency goals that address the AI industry’s long-term sustainability, AMD is now in a position to serve not just today’s AI workloads, but tomorrow’s as well.

Mithun Mohandas

Mithun Mohandas is an Indian technology journalist with 14 years of experience covering consumer technology. He is currently employed at Digit in the capacity of a Managing Editor. Mithun has a background in Computer Engineering and was an active member of the IEEE during his college days. He has a penchant for digging deep into unravelling what makes a device tick. If there's a transistor in it, Mithun's probably going to rip it apart till he finds it. At Digit, he covers processors, graphics cards, storage media, displays and networking devices aside from anything developer related. As an avid PC gamer, he prefers RTS and FPS titles, and can be quite competitive in a race to the finish line. He only gets consoles for the exclusives. He can be seen playing Valorant, World of Tanks, HITMAN and the occasional Age of Empires or being the voice behind hundreds of Digit videos. View Full Profile