AMD’s 6700 XT launch last week gives us a bit of an unusual opportunity. Typically, generational GPU comparisons are a bit limited because core counts, texture units, and ROP configurations don’t align perfectly between the families. AMD’s Radeon RX 6700 XT is an exception to this rule, and it allows for a tighter comparison of RDNA versus RDNA2 that would otherwise be possible.
The Radeon 6700 XT and 5700 XT both feature 40 CUs, 160 texture mapping units, and 64 render outputs. (2560:160:64). The 5700 XT has a wider memory pipeline with a 256-bit memory bus and 14Gbps GDDR6. This works out to 448GB/s of main memory bandwidth. The 6700 XT, in contrast, has a 192-bit memory bus, 16Gbps GDDR6, 384GB/s of memory bandwidth, and a 96MB L3 cache. Today, we’ll be examining the 5700 XT against the 6700 XT at the same clock speed to measure the performance and efficiency impacts of the new architecture, smaller memory bus, and L3 cache.
According to AMD, switching to a huge L3 cache allowed them to shrink the memory bus while improving performance. There’ve been concerns from readers that this limited memory bus could prove a liability in gaming, given that a 192-bit memory bus on a $479 card is highly unusual.
Comparing both GPUs at the same clock allows us to look for any additional IPC (instructions per clock cycle) improvements between the 5700 XT and 6700 XT. RDNA is capable of issuing one instruction per clock cycle, compared with one instruction every four cycles for GCN. This allowed AMD to claim a 1.25x IPC improvement from GCN to RDNA, and while the company hasn’t claimed an equivalent increase from RDNA to RDNA2, we may see signs of low-level optimizations or just the overall impact of the L3 itself.
We’re comparing the performance of the 5700 XT and 6700 XT today, with both cards approximately locked to a 1.85GHz clock speed. We’ll also compare against the 6700 XT at full speed (SAM disabled) to see the card’s native performance and power consumption. A full review of this card, with Nvidia comparison data, will be arriving shortly.
Test Setup, Configuration, and a New Graphing Engine
We’re shifting to a new, more capable graphing engine here at ET. The graph below shows our results in 11 titles for the 5700 XT (1.85GHz). Clicking on any of the color buttons next to a given card will remove that card from the results, allowing you to focus on the others. Click on the button again to restore the data. Data is broken up by tabs, with one resolution per tab.
Game results were combined for the three Total War: Troy benchmark maps (Battle, Campaign, and Siege), leading to the “Combined” score. Similarly, results from Hitman 2’s Miami and Mumbai maps were averaged to produce a single result. Gaps between the cards in these maps were proportional and this averaging does not distort the overall comparison between the three cards in those titles. We’ve still used our classic graphs for a few results that didn’t map neatly into the specific result format used in this article, but the new engine is spiffier (a technical term), so we plan to use it for most projects going forward.
This presentation method prevents us from giving per-game detail settings in the graph body, so we’ll cover those below:
Ashes of the Singularity: Escalation: Crazy Detail, DX12.
Assassin’s Creed: Origins: Ultra Detail, DX11.
Borderlands 3: Ultra detail, DX12
Deus Ex: Mankind Divided Very High Detail, 4x MSAA, DX12
Far Cry 5: Ultra Detail, High Detail Textures enabled, DX11.
Hitman 2 Combined: Ultra detail, but performance measured by “GPU” frame rate reported via the benchmarking tool. This maintains continuity with the older Hitman results, which were reported the same way. Miami and Mumbai test results combined. Tested in DX12.
Metro Exodus:: Tested at Extreme Detail, with Hairworks and Advanced Physics disabled. Extreme Detail activates 2xSSAA, effectively rendering the game at 4K, 5K, and 8K when testing 1080p, 1440p, and 4K. Tested in DX12.
Shadow of the Tomb Raider: Tested at High Detail, with SMAATx2 enabled. Uses DX12.
Strange Brigade: Ultra Detail, Vulkan.
Total War: Troy Combined: Ultra detail, DX12.
Total War: Warhammer II: Ultra detail, Skaven benchmark, DX12.
Of the games we test, Deus Ex: Mankind Divided and Metro Exodus put the heaviest load on the GPUs, by far. DXMD’s multisample antialiasing implementation carries a very heavy penalty and Exodus is effectively rendering in 8K due to the use of supersampled antialiasing.
All games were tested using an AMD Ryzen 9 5900X on an MSI X570 Godlike equipped with 32GB of DDR4-3200 RAM. AMD’s Ryzen 6700 XT launch driver was used to test both the 5700 XT and 6700 XT. ReBAR / SAM was disabled — AMD doesn’t support this feature on the 5700 XT, so we disabled it for our 6700 XT IPC comparison. ReBAR / SAM is also disabled for the 6700 XT “full clock” results, to ensure an apples-to-apples comparison. We’ll have results with SAM enabled in our full 6700 XT review.
The 1.85GHz clock speed is approximate. In-game clocks remain near the minimum value, but this is not absolute. The 6700 XT was allowed to run between 1.85GHz and 1.95GHz and remained near 1.85GHz. The 5700 XT’s clock ranges from 1.75GHz – 1.95GHz, but it mostly remains between 1.8 – 1.9GHz. AMD’s 6700 XT requires a 100MHz GPU clock range and the 5700 XT didn’t respond to our attempts to manually adjust its clock, so we tuned the 6700 XT to the 5700 XT’s default clock range.
These tests will show any high-resolution / high-detail bottleneck that appears on the 6700 XT versus the 5700 XT. If the 6700 XT’s L3 can’t compensate for the increased memory pressure, the 5700 XT should outperform it. The 6700 XT’s default base clock is 2325MHz, or ~1.26x higher than the 1.85GHz minimum value we defined. Low scaling between the 1.85GHz Radeon 6700 XT and the stock-clocked version may mean memory bandwidth pressure is limiting performance.
We’ll also check power efficiency between the cards because AMD claimed a 1.5x increase for RDNA2 over and above RDNA.
Performance Test Results & Analysis
Here’s the good news: There’s no sign that the L3 cache + 192-bit memory bus chokes the 6700 XT in realistic workloads. Only two games show evidence of memory pressure: Metro Exodus and Deus Ex: Mankind Divided. The benchmark settings we use in those two titles make them maximally difficult to render: Deus Ex: Mankind Divided’s MSAA implementation is very expensive, on all GPUs. Metro Exodus’ “Extreme” benchmark preset renders at 2xSSAA. The game may still be output at 4K, but internally the GPU is rendering 8K worth of pixels. Reducing either of these settings to a sane value would immediately resolve the problem.
There is no sign of a memory bottleneck in the 6700 XT versus the 5700 XT in any other game we tested. On the contrary, one game — Far Cry 5 — shows sharply improved results at 4K for the 6700 XT compared with the 5700 XT. We confirmed these gains with repeated testing and confirmed the results. Either the L3 cache or some other aspect of RDNA2 seems to improve FC5 at 4K, in particular.
AMD has said the 6700 XT is intended as a 1440p GPU and our test results suggest that resolution shows the greatest gap between the 5700 XT and 6700 XT when the two are normalized clock-for-clock. Compared against itself, the gap between the 1.85GHz and the stock-clock 6700 XT was also widest at 1440p.
We’ve also included a few quick results in two benchmarks that use different scales than our tests above and were graphed on older templates: Final Fantasy XV and Neon Noir, the latter as a rare ray tracing game that can run on the 5700 XT.
There is no dramatically different data in either set of results. The gap in FFXV is the largest in 1440p but at 1.85GHz the 5700 XT catches (but doesn’t pass) the 6700 XT @ 1.85GHz in 4K. In Neon Noir, the Crytek ray tracing benchmark, the two GPUs hold a steady gap between themselves.
There’s only limited evidence for IPC gains between RDNA and RDNA2. Allowing for a 2-3 percent margin of error on the basis of GPU clock alone, and 2-3 percent for benchmark-to-benchmark variance, most of the gaps between RDNA and RDNA2 disappear. There are three exceptions at 1440p: Ashes of the Singularity (6700 XT is 15 percent faster), Assassin’s Creed: Origins (10 percent faster), and Total War: Warhammer II (8 percent faster).
The aggregate data across all games shows the 6700 XT is 3 percent faster than the 5700 XT at 1080p, 6 percent faster at 1440p, and 5 percent faster in 4K when the two GPUs are compared clock-for-clock. When tested at full speed (with SAM disabled), the full-speed 6700 XT is 1.23x faster than the 5700 XT at 1080p, 1.3x faster at 1440p, and 1.28x faster at 4K.
These clock-for-clock performance results don’t look great for RDNA2 versus RDNA, but we haven’t checked power consumption data. AMD claimed a 1.5x improvement in performance per watt for RDNA2 versus RDNA, and we don’t have much evidence for performance improvements yet. We measured full-load power consumption during the third run of a three-loop Metro Exodus benchmark at 1080p in Extreme Detail.
This is all sorts of interesting. Clock for clock, RDNA2 is much more power-efficient than RDNA. The 5700 XT and 6700 XT perform virtually identically in Exodus at 1080p, and the 6700 XT is drawing nearly 100W less power to do it while fielding 12GB of RAM (up from 8GB) and a 16Gbps RAM clock (1.14x higher than the 14Gbps on the 5700 XT).
The 5700 XT draws 1.37x as much power as the 6700 XT when they’re measured at the same clock and approximate performance level. That’s an impressive achievement for an iterative new architecture without a new process node involved. Unfortunately, it all goes out the window when the clock turns up. At stock clock, the 6700 XT with SAM disabled is 1.21x faster than RDNA, but it uses about 3 percent more power. Clearly, AMD has a fairly power-efficient chip at lower clocks, but it’s tapping 100 percent of available clock room to compete more effectively.
RDNA2 unquestionably offers AMD better clock scaling than the company’s GPUs have previously enjoyed, but with a heavy impact on power consumption. AMD pays for a 1.24x performance improvement with a 1.41x increase in power consumption. That’s not far from a 2:1 ratio, and it speaks to the importance of keeping efficiency high and clocks low in GPU architectures. Clock-for-clock, RDNA2 is capable of offering substantial power advantages over RDNA, but AMD has tuned the 6700 XT for performance, not power consumption. A hypothetical 6700 at lower clock could offer substantially better power consumption, but might not compete effectively with down-market Nvidia cards.
When AMD launched RDNA back in 2019, we noted that the company’s efforts to transform its GPUs would take time, and that not nearly enough of it had passed for an equivalent, Ryzen-like transformation of the product family. Looking at RDNA2 versus RDNA, we definitely see the increased power efficiency AMD was chasing in-evidence when the 5700 XT and 6700 XT are compared clock-for-clock. The smaller memory bus and large L3 cache do indeed appear to pay dividends. AMD is still aggressively tuning its GPUs for competitive purposes, but it has found new efficiencies with RDNA compared with GCN and then with RDNA2 compared to RDNA, to enable it to further boost performance.
We’ll examine the competitive and efficiency situation vis-à-vis Nvidia later this week.
Now Read:
- AMD May Soon Become TSMC’s Second-Largest Customer by Revenue
- AMD Will Not Limit Cryptocurrency Mining on RDNA2 GPUs
- AMD’s Milan Brings Zen 3 to Epyc, With Mostly Positive Results
No comments:
Post a Comment