Wednesday, 11 November 2020

Apple’s New M1 SoC Looks Great, Is Not Faster Than 98 Percent of PC Laptops

Yesterday, Apple announced its new M1 SoC. It’s an impressive piece of work — I’ll share some thoughts on it here — but I want to get one thing out of the way upfront. Apple made a lot of claims about the M1’s performance relative to x86 that I want to discuss, but the whopper that we’re going to talk about first is this, taken directly from Apple’s own website:

“And in MacBook Air, M1 is faster than the chips in 98 percent of PC laptops sold in the past year.”

Apple footnotes the claim, so it’s only fair to reproduce the footnote:

Testing conducted by Apple in October 2020 using preproduction 13-inch MacBook Pro systems with Apple M1 chip and 16GB of RAM. Performance measured using select industry-standard benchmarks. PC configurations from publicly available sales data over the last 12 months. Performance tests are conducted using specific computer systems and reflect the approximate performance of MacBook Pro.

Which benchmarks? We don’t know. Which PCs, specifically, did Apple compare itself against? We don’t know. We do know that Chromebook sales absolutely boomed this past year, thanks to the surge in remote schooling, and that an awful lot of bottom-end systems have flooded into the PC market as a result.

But — honestly — the 98 percent claim doesn’t even make sense on its face. First of all, no matter how good Apple’s ARM core truly is, there is no way in physics that a fanless quad-core Apple M1 is going to outperform an 8-core Intel Core i9. If Apple thought it could outperform an eight-core Core i9 with a quad-core M1 while offering all the additional advantages it claims the chip can offer, it would do so.

Second, we already know that the high-performance PC market is larger than that. According to IDC, 22.3M gaming notebooks will ship in 2020, along with 14.8M desktops. This doesn’t include people who build their own systems — we’re only looking at OEM / boutique sales. There were ~267M PCs shipped in 2019 according to IDC. Let’s assume the market grows by 5 percent this year thanks to COVID-19, and that’s 279M PCs. Gaming PCs account for 13.2 percent of the market, all on their own. That’s before we factor in workstation PCs and high-end commercial business PCs (Dell, Microsoft, and HP may not sell a lot of top-end systems in absolute terms, but they clearly sell enough to make them profitable). Even if we grant that the new M1 SoC is faster than some percentage of gaming notebooks and high-end PCs — and I’m willing to grant it — it isn’t faster than 98 percent of them.

Where’s the real cutoff for Apple’s performance? My guess is, it’s the Intel systems they haven’t stopped shipping. After all, given the tremendous advantages Apple is claiming for this chip, it only makes sense that they’d roll it out to every segment where it can claim an advantage. Apple is still selling six-core Intel systems with a Radeon 5300M, so for now, I’d treat that as the likely performance turnover point until we have independent benchmarks to compare. If this proves not to be true, the company will have some serious explaining to do to its irritated high-end customers.

This Stupidity Aside, Apple’s New M1 Looks Amazing

I’m echoing some of the same points that Gordon Ung made in his latest article over at PCWorld and I want to acknowledge I’m not the only one voicing this particular take. Another thing I agree with him on: The M1 actually looks like a really significant leap for CPU performance. That’s part of why I’m annoyed at Apple for the way the company is marketing it.

The Apple M1 is an eight-core chip with four high-performance cores and four high-efficiency cores. Apple didn’t give us much in the way of performance data, but the company claims that the high-efficiency cores “deliver similar performance as the current-generation, dual-core MacBook Air at much lower power.”

That actually gives us a useful place to start when it comes to estimating performance. The CPU in question is the Intel Core i3-1000NG4 — a 2C/4T CPU with a 1.1GHz base clock and a 3.2GHz turbo. Matching the performance of a 2C/4T CPU with a 4C/4T CPU is impressive, but the real story here is the power consumption angle — and, of course, the unknown performance delta between the high-efficiency and high-performance cores.

Apple doesn’t reveal many details of its CPU cores, but Anandtech has run a number of micro-benchmarks against the A14 (the M1 uses the same “Firestorm” CPU core as the A14 SoC) to estimate its construction and capabilities. The diagram below is an informed “best guess” at what Apple’s CPU architecture actually looks like:

Image by Anandtech

Apple’s 8-wide decode is wider than any other company in the industry. For comparison, AMD’s just-launched Zen 3 has a four-wide decoder. Zen 3 is a bit more complicated — it can also transfer up to 8 fused instructions from its op-cache per cycle and dispatches up to six micro-op instructions per cycle. Even so, we’d expect Apple to be dispatching more instructions per cycle, given the sheer width of its front-end.

The Firestorm core can store far more instructions in its Read Order Buffer (ROB) than any other design on the market, Anandtech reports, with storage for ~630 instructions. This is the number of instructions that the core can have waiting for execution while it executes a thread, and Apple’s ~630 blows AMD and Intel both out of the water. Intel’s ROB is 352 instructions, while AMD’s is 256. Other current ARM designs are ~224.

If Apple had a design philosophy with the M1, it appears to have been “go big or go home.” The design is incredibly wide and it packs no fewer than seven execution ports for integer operations. The L1 instruction cache is — you guessed it — huge, at 192KB. AMD’s old Excavator CPUs used to field this much L1 instruction cache, but in their case, the L1i was split between two CPU cores.

As for performance, Anandtech also has a number of results that show the A14 (not Apple Silicon via the M1, but the iPhone-derived A14) against Intel and AMD x86 CPUs, and the results are… well, they’re kind of incredible. The chart below only captures Intel versus Apple, but AMD is not in a dramatically different position.

Image by Anandtech

The top-level conclusion is that Apple’s rapid yearly improvements have left it with, as Anandtech says, “no choice” but to build its own silicon solutions.

Does this prove that the Apple M1 is faster than 98 percent of PCs, seeing as how it’s incredibly competitive with AMD and Intel solutions at a fraction the power envelope? No. The power envelope is irrelevant since we’re comparing absolute performance, and SPEC is a useful but synthetic test that doesn’t translate directly into performance results in any given real-world application. Again, CPU power consumption and performance are not linear; Apple may decisively outperform Intel and AMD at 10W. But the fact that we don’t see Mac Pros rolling out with Apple silicon today says something about where the company feels its strengths currently are. Don’t expect a fanless M1-equipped system to start outperforming a 5950X in well-threaded applications any time soon. Do expect Apple’s comparisons and highlights to specifically focus on low power consumption, where x86 is weakest versus ARM.

x86 Just Got Put on Notice

Apple’s “Faster than 98 percent of laptops” claim is bull, but don’t let that fact fool you to the larger problem for AMD and Intel, here. This curve should look familiar:

It looks rather like the one Nuvia is promising it can deliver over and above what x86 has achieved. Nuvia, of course, claims it will come in well above the graphed data point for the A13, but the point is made: Companies with non-x86 designs are claiming — and as of now, shipping — silicon that they argue is faster and more efficient than anything x86 can ship.

Companies like Nuvia and Apple can make this kind of claim because what they are offering is, in essence, a more efficient CPU. CPU power consumption is not linear — it rises faster as clock rates increase. The best way for CPUs to improve their absolute efficiency is to boost IPC enough to cut absolute clock while delivering a performance improvement. AMD has rapidly increased the IPC of its Zen-derived CPUs, but it’s also leaned on higher clock speeds to deliver its performance. AMD is not in the same position Intel is, but what you see above is a 5W CPU performing within a whisker of a 5950X. That’s not a great position for either x86 manufacturer to be in.

We won’t know what the true competitive situation is until we’re able to compare benchmarks, but I don’t expect to see Apple losing any of these fights when evaluated core-to-core and clock-to-clock. With both Intel and AMD having recently launched new architectures, it’ll be 2021 before we find out if either company can answer this challenge.

As of right now, we can assume that x86 still wins matchups against the M1 thanks to superior clocks and core counts — but that’s going to last only as long as it takes Apple to keep scaling their SoC design upwards. I wouldn’t count on much of a window.

Now Read:



No comments:

Post a Comment