Monday, 10 February 2020

AMD’s Radeon Instinct MI100 Leaks, Hint at Massive, 8192-core GPU

Back in 2018, AMD launched the MI50 and MI60, Vega-based accelerators built on TSMC’s 7nm process node. Now, there are rumors of a massive new chip coming in that family, in a relatively svelte power envelope. “Arcturus” is a codename that’s been floating around for a while, having first been mentioned by an AMD staffer in late 2018, but it’s never been clear if the GPU was based on Navi or Vega. Thus far, AMD has maintained a divide between its Vega products, which are built for AI and the HPC market, and Navi, which is built for gaming.

This rumor suggests that Arcturus is a Vega-derived GPU, though it seems as though it might use the Vega refinements also coming down the pipe for the Radeon Mobile 4000 series. I’ll get into why after we talk about the specs. The leak, by @Komachi_Ensaka, indicates a 32GB HBM card (same maximum RAM loadout as the current Instinct family), with up to 8192 GPU cores, a boost clock of 1.33GHz, and a base clock of 1GHz.

All of this, supposedly, in just a 200W TDP.

Is it possible to cram this kind of performance into a data center GPU? The answer might be yes. Let’s look at a few pieces of evidence.

First, we know AMD has gotten much better at Vega clocking. One of the surprising things about the Ryzen Mobile 4000 announcement was that the GPUs in these parts are clocked much higher than the old 12nm APUs were — up to 1750MHz on the Ryzen 7 4800U. It’s possible that AMD’s latest Vega architectures on 7nm are more efficient than the initial designs, even though the company hasn’t technically to a new node.

Second, we know that binning can yield significant improvements. The Radeon Nano was a 28nm card with better power efficiency than the 16nm Polaris cards AMD launched after it. AMD improved the Nano as much as it did compared to the regular Fury X by reducing its clock slightly and binning for the best cards. CPU and GPU power consumption isn’t linear — typically the last few hundred MHz cost much more, in terms of power consumption per performance gained. By falling back to an earlier sweet spot on the curve, AMD may be able to maximize the power efficiency of the chip.

I mean, heck. I literally just published a review of the Ryzen Threadripper 3990X, a CPU that doubles the 3970X’s core count and keeps it running in the same 280W power envelope by… reducing the clock speed modestly. So clearly we’re not dealing with some crazy idea here. With that said, yanking power consumption down by a full third is pretty much the maximum we’d expect AMD to be able to achieve. That’s a very aggressive power improvement without a node shift.

It would be nice to have some sense of which customers AMD is selling these products to. Navi OpenCL support is still MIA on Linux as of late December according to Phoronix, and AMD has to provide compatibility with CUDA products via efforts like ROCm. It’s not clear if ROCm is actually being used in any projects — AMD is very quiet about this side of the business.

We’ve suspected Google had a lot to do with AMD’s decision to commercialize the MI50 and MI60 product families, but a chip like MI100 would clearly be developed with different goals in mind. Memory bandwidth is uncertain; the chip could stick with the same 1TB/s of memory bandwidth as the MI50 and MI60 or push up to 1.2TB/s with higher-speed HBM2.

It’s unlikely we’d ever see Arcturus come to the consumer market. AMD’s Big Navi is expected to answer the needs of those customers, while a card like this will be reserved for data centers or HPC.

Now Read:



No comments:

Post a Comment