Monday, 15 March 2021

AMD’s Milan Brings Zen 3 to Epyc, With Mostly Positive Results

AMD has released its Zen 3 “Milan” refresh for servers, bringing its latest Zen 3 architecture to the last market segment to lack it. AMD has been working to gain market share in this segment since launching first-generation Epyc nearly four years ago. Milan brings a suite of changes that should help the company gain market share, but there are a few bumps over on the power consumption side of things.

AMD isn’t boosting core counts with this generation of products. The Zen 3 CPUs AMD is launching today take over from their predecessors at equivalent core counts. They also are generally more expensive than the AMD chips they replace, which is par for AMD’s positioning with its Zen 3 Ryzen 5000 CPUs. AMD’s new Milan CPU model numbers end in 3’s — 7763, 7643, 7313, etc. Previous-generation Epyc CPUs end in 2’s.

Here’s a quick refresher on the CPU changes AMD made from Zen 2 to Zen 3:

Zen 3 has a larger L1 branch target buffer (BTB) and multiple latency-reducing improvements throughout the core, with a tweaked design compared with Zen 2 that’s intended to improve bandwidth. The CPU can perform three loads and two stores per cycle, compared with Zen 2’s 2L+1S design. Zen 3 / Milan also adds support for 256-bit SIMD registers — previously, AMD used 128-bit registers and 256-bit ops were split into two segments.

Zen 3 also includes a larger core complex, with eight cores and a unified 32MB of L3 per chiplet as opposed to the 2×4 configuration of previous chips. These changes can be particularly meaningful in servers, given the need to control intra-core bandwidth and the impact of large caches on overall performance.

This image shows the physical topology of the CPU with an emphasis on the I/O die. The I/O die may look the same — it’s still a 14nm-class chip built at GlobalFoundries — but it’s been redesigned to deliver new benefits and capabilities. Some of these may have had a negative impact on power consumption according to Anandtech, who reviewed the CPU. AMD has increased the speed of its Infinity Fabric to 18Gbps, up from 16Gbps and now supports 6-way memory channel interleaving to boost performance in configurations where not every DIMM slot is available or used.

Performance Improvements, Power Consumption

Anandtech’s compile time for the LLVM suite shows Milan holding roughly a 10 percent advantage over its competitors. This appears to be roughly on par for the CPU’s overall performance, with gains ranging from 6 percent to 25 percent depending on how threaded the test is. Single and moderately-threaded tests show larger uplifts of up to 20-25 percent.

Image by Anandtech

One point Anandtech’s review makes is that idle power consumption on Zen 3 chips is much higher than their Zen 2 counterparts, from 65-72W to 100-110W. These increases are apparently tied to some of the changes AMD made to the I/O die, resulting in a substantial decrease in intra-core latency and overall improved performance, but also higher idle power.

Image by Anandtech

Anandtech compared power efficiency between Rome and Milan by limiting all four CPUs tested to a 225W TDP, then comparing both their relative performance and how much power was allocated to the CPU cores or the entire package. The first measurement only counts power consumed by the CPU cores and their L2 caches, while the latter includes the whole socket.

Higher power consumption from its I/O die places AMD’s Milan at a disadvantage compared with Rome in certain settings. In compute-heavy benchmarks that don’t depend on memory bandwidth, Rome can actually outperform Milan. SPECint2017 and SPECfp2017 both show Milan-based systems offering worse power efficiency and less performance/watt than Rome (Zen 2). While Zen 3 stretches its legs more on memory bandwidth-centric workloads, where its new cache architecture and higher IPC put it ahead of its predecessor, SPEC still shows a decline in power efficiency.

Conclusion

AMD’s server market share hasn’t grown as quickly as its desktop and laptop businesses have. But it now holds between 7-11 percent of the market, depending on whether you count the entire server space (including edge and 4P where AMD doesn’t currently compete) or just the 1P and 2P market. AMD prefers the latter, which is also used by major industry analytic firms like IDC. When you consider that the company barely had a server business back in 2017, this represents a reasonable growth rate in an intrinsically conservative business.

Milan generally improves performance across the board, though the gains at peak load are significantly smaller than the lightly-threaded uplift. Its higher idle power may be something AMD can address in the future, possibly through an I/O die shrink or other manufacturing changes.

Intel will have its own answer to Milan out in the market later this year when Ice Lake-SP goes on sale. ICL-SP’s core counts are still uncertain; Intel has publicly confirmed 32 with its own benchmark results, 36 has been consistently rumored for years and recent reports have suggested we might see 38-core and 40-core SKUs as well. AMD will retain a density advantage this generation with up to 64 cores per socket. Effective per-socket performance between the two companies will come down to power management and efficiency, as well as overall IPC.

Now Read:



No comments:

Post a Comment