Thursday, 26 August 2021

Why Lying About Storage Products Is Bad: An IBM DeskStar Story

(Photo: Ken Sallot and Tom Gardner)
A number of storage companies including Adata, Crucial, and Western Digital have recently been caught shipping a superior SSD configuration to tech publications for positive reviews, only to swap these SKUs with lower-performing products once positive review coverage has been obtained. We’ve spoken out against this practice twice recently, but today we’re revisiting the problem from a different angle. Enough time has passed since the events we’re going to discuss that they may no longer be common knowledge, and although it feels absurd to begin a story with the above headline, that’s the most straightforward way to describe the topic.

When Dinosaurs Hard Drives Ruled the Earth

In the late 1990s, IBM was one of the dominant players in hard drives, with 35 percent of the total market. By 2002, its market share had collapsed to less than 10 percent. It sold its hard drive business to Hitachi that year for a bit over $2B in a 70/30 ownership split.

IBM’s hard drive business shrank for several reasons. By 2001, PC sales had slumped following the dot-com boom and a Y2K-related upgrade spree. IBM was facing increased competition from multiple companies in the early 2000s and its new CEO, Sam Palmisano, wanted to focus the company on high-margin business opportunities, not the consumer sector.

But there was one other reason for IBM’s precipitous decline: A sharp decline in user trust. A reputation carefully built up over decades was badly damaged in just two years, thanks to one drive family and one of the worst corporate responses in history.

Meet the IBM DeskStar 75GXP:

A failed IBM DeskStar hard drive after suffering the dreaded “click of death.” (Photo: James Petts/CC BY-SA 2.0)

When the 75GXP debuted, it looked like another slam-dunk for IBM. Initial reviews praised its ATA/100 support and overall performance. The 75GXP built on the well-received 60GXP family and was one of the most recommended consumer drives of 2000. My boss at the time bought six of them.

Then, the drives started dying.

It took months for anyone to notice something might be awry. Most 75GXPs didn’t die instantly and it wasn’t initially clear that early reports of failure represented anything besides a normal bathtub curve. IBM hard drives were popular, which meant any given tech forum was likely to have reports of IBM drive failures. The problem bubbled along quietly for a few months without attracting a lot of attention, but then people who had suffered a single 75GXP failure began reporting their second, third, and sometimes even fourth consecutive dead drive. This raised the collective antennae of the reviewer and enthusiast communities. One click of death is unlucky, two is unlikely, and three or more indicate a problem either elsewhere in the PC or server or a fundamental flaw in the product itself.

When reviewers began asking questions, IBM’s response checked every single box on the “How Not to Handle a Problem” flowchart. The company refused to acknowledge a problem with the 75GXP. It refused to give refunds. It refused to say whether the problem with the 75GXP extended to every drive in the family or only specific capacities. It refused to clarify if the then-new 120GXP drives or the older 60GXP drive families had an increased risk of failure as well. Some users were convinced the 60GXP also suffered higher-than-normal failures, though not as much as the 75GXP.

Months into the controversy, IBM published new guidance claiming the 120GXP was only good for 333 hours of power-on time per month, but the company refused to state whether this was due to unexplained problems in the wider 60GXP and 75GXP families. While IBM claimed its guidance was normal and in line with industry estimates, this came on top of months of stonewalling. The only thing IBM would say is that it would honor all warranty claims on the 75GXP family while the warranty was valid. It would not discuss any plans to extend warranties.

After months of runaround, multiple websites and companies pulled their recommendations for all IBM hard drives, not just the 75GXP family.

Scott Wasson, Tech Report: “Since I recommended 75GXP drives in this very space before all of this started, I should probably say something more. In case you haven’t gotten it through your head yet, don’t buy IBM hard drives, especially the ATA variety (the GXP line). I’m not sure what’s more disturbing: the drives’ apparently extremely high failure rates or IBM’s lack of communication with customers and the press. Either way, though, IBM doesn’t deserve your business.” (emphasis original)

Me, writing for Sudhian: “I… withdraw any recommendation I’ve ever made to anyone regarding the purchase of an IBM DeskStar HDD. Given the myriad of issues surrounding the drives and the total lack of interest IBM’s shown in explaining/clarifying them, I feel there’s nothing else to do. Those of you looking for a hard drive that doesn’t seem to be aimed at the enterprise market one minute and the dust bucket the next would do well to consider hard drives from WD, Maxtor, or Seagate.”

Web host Pair.com: “During October, we proactively swapped out nearly seventy hard drives on our user servers, based on the unusually high failure rate of this model of drive in our recent experience… The drives in question are part of IBM’s Deskstar 75GXP line… We currently use and recommend hard drives by Maxtor and Western Digital.”

Reader reviews left on websites like CNET make it clear that the company’s customers were anything but happy. The 75GXP’s problems were not fatal to IBM’s business. One analyst I spoke to for this story thought it a relatively minor issue compared with the tightening competition in hard drives, the impact of IBM’s new CEO, and the long-term reduction in HDD profit margins. But it’s a failure nobody forgot at the time. Teardowns of failed 75GXPs showed that the magnetic platters had, in some cases, been scraped clean off. The drives were full of dust where no dust ought to be.

The magnetic media has been almost entirely scraped off the platter. If you check the top image, you can see where magnetic material was redistributed outwards, on the edge of the drive as well. (Photo: Ken Sallot and Tom Gardner)

 

The streak on the right side is where Ken swiped his finger through the platter dust created when the drive self-destructed. (Photo: Ken Sallot and Tom Gardner)

More recently, multiple Seagate hard drives, including the 7200.11 and ST3000DM001, have had high failure rates and suffered various firmware bugs, but the company didn’t suffer nearly as much blowback for it. The difference in how the two companies responded to the problem explains some of why. Internal IBM documents eventually showed that the company had knowingly sold drives with failure rates 10x higher than normal into the consumer market after its other OEM customers refused delivery due to the poor quality of the merchandise. Refusing to comment on the problem was a deliberate strategy. All this did was lead to further loss of trust.

The comparison between the 75GXP debacle and the behavior of Crucial, Adata, Western Digital, and any other company contemplating this kind of behavior is not exact. The 75GXP product family wasn’t slower than expected like these SSDs are; it outright failed. When I first wrote our coverage of the 75GXP and 120GXP issues, my boss pulled me aside and told me he thought I was wrong. He had purchased six 75GXP hard drives, after all, and all of them were fine. If there was a problem, he’d know about it.

Six to 12 months later — I don’t recall the exact timeline — he sent me a photo of multiple hard drives in a garbage can. Not only had every single one of his 75GXP’s died, but the replacements IBM sent had also died as well. He’d thrown the entire set of drives away. Nothing of the sort has been alleged about the SSD bait-and-switches that are going on right now, and I do not want to overdraw the analogy. What makes these two situations similar is not the degree of drive failure, but the breach of trust.

A Matter of Trust

Agreeing to review a manufacturer’s product is an extension of trust on all sides. The manufacturer providing the sample is trusting that the review will be of good quality, thorough, and objective. The reviewer is trusting the manufacturer to provide a sample that accurately reflects the performance, power consumption, and overall design of the final product. When readers arrive to read a review, they are trusting that the reviewer in question has actually tested the hardware and that any benchmarks published were fairly run.

When companies ship one product to reviewers and another to consumers, they break trust with both. Customers would collectively go nuclear if AMD, Intel, or Nvidia shipped GPUs or CPUs to reviewers that were 10-15 percent faster than the hardware they shipped to customers. SSDs like Crucial’s QLC-powered P2 are sometimes 50 – 75 percent slower than the TLC drives they replaced once the SLC cache is exhausted. This is not a slightly slower variant. This is a completely different product.

I knew the performance of the P2 wasn’t great, but the gap between what I thought I was buying and what I bought is enormous. The TLC-equipped P2 is 4.37x faster than the QLC-equipped P2. (Graph: THG)

 

I insulated my external enclosure and wrapped it in ice to check and see if the USB controller and/or SSD were overheating. It didn’t occur to me that the company had swapped NAND. The gap between the two flavors of P2 is only slightly smaller than the gap between the P2 TLC and the 980 Pro. (Graph: THG)

 

My drive is somewhat faster than these results because it’s the 2TB version rather than just 500GB. It’s still much slower than expected. The P2 QLC does match the P2 TLC in some benchmarks, but none of those benchmarks happened to be relevant to my own use case. The above benchmarks are. (Graph: THG)

I am one of the customers impacted by Crucial’s decision to swap the P2’s TLC for QLC. The company appears to have compensated for its abysmal QLC performance by jacking up the amount of SLC cache. This works well until you hit the cache limit, at which point the drive’s performance craters.

While I didn’t intend for things to play out this way, my experience is a perfect example of how this kind of deception harms readers. I bought the P2 back in the spring and was surprised to see it on sale at Amazon for $250. The reviews I consulted indicated it was a TLC drive, and the other 2TB drives available around the $250 price point are QLC drives. Several articles confirmed that although the drive’s performance wasn’t first-rate, it would be reasonably fast even after the SLC cache had been exhausted. I was looking for an SSD with relatively high reliability because I intended to use it for moving hundreds of terabytes of video frames and compiled video. I knew the performance of the P2 was not great, even for a TLC drive, but I also needed sheer capacity. I was willing to accept a performance hit if I knew I was getting TLC NAND and the higher reliability it offered. QLC flash might survive, but Micron explicitly told me that this type of NAND is best used for cold storage and light workloads, not enormous, repetitive file copying when it launched the 5210 Ion. I was specifically looking for a low-cost TLC drive to give myself some reliability headroom.

I bought the Crucial P2 on the strength of reviews written by my peers and my own trust in the company and came away pleased that I had found a good deal with the right balance of features for my needs. THG’s P2 review, written on July 2, 2020, did not mention that Crucial would swap the drive to QLC at a later date. Neither did our colleagues at PCMag (August 31, 2020). ServeTheHome and StorageReview reviewed the drive as late as December 2020 without mentioning a QLC transition, though THG and StH have since updated their reviews to acknowledge the bait-and-switch. Anandtech mentioned that Micron might swap TLC for QLC when the drives were announced in April 2020. That site also added: “We encourage Micron to be more transparent about their controller and NAND choices and especially any post-launch changes.”

PC Gamer’s review (July 26, 2020), on the other hand, contains the following statement from Crucial:

You may notice an endurance variance between your results and our published specs, as our specs are set to allow for potential product transitions in the future…The Crucial P2 SSD currently relies on Micron TLC 3D NAND technology, but over time may include a mix of Micron’s TLC and QLC NAND technologies. By mixing types of NAND with different capacities, we’re able to make product adjustments and decisions based on emerging and changing technology, preferred capacities, and flexibility to match movement of the overall market.

Only one other website appears to have published this statement in its entirety: TechAdvisor, on June 30, 2020. That site conducted its own investigation into drive endurance and asked Crucial why the Total Bytes Written to the drive was identical between the 250GB and 500GB capacities. According to the author, “I posed this exact question to Crucial, and this [the statement above] was the reply.”

Crucial’s P2 announcement/press release does not mention that performance between TLC and QLC drives would be very different in certain workloads. And Crucial’s press release announcing the launch of the P2 does not contain it, though it does include a quote from Theresa Kelley, VP of Micron’s Consumer Products Group.

Whether people are upgrading an existing system or planning a new build, the Crucial P2 gives them power and dependability,” said Kelley. “We are one of the few SSD manufacturers that designs and manufactures our own NAND. Engineered with Micron expertise and rigorously tested at every stage of development, the P2 is built on a 40year legacy of innovation and highquality products.”

It is not clear if Crucial only provided a statement regarding its future plans to use both TLC and QLC as part of a default briefing, but the one reviewer I spoke to who covered the drive did not receive this information. When only a few product reviews contain a statement from the company confirming a potentially negative fact, it’s often a sign that the data was either only provided if requested or was buried deep enough that only a few people noticed it. Neither reflects well on Crucial, though at least the company said something to someone. Western Digital does not appear to have made any kind of launch statement at all regarding its plans to downgrade the SN550.

Why Lying About Product Performance Is Bad

Every time a company misleads people about what kind of performance to expect from its hardware, customer trust suffers. Reviewer trust suffers, too. An effectively infinite number of companies would like hardware reviewed. The goal of the reviewer is to find the products that deserve to be surfaced at a given price point so that readers can buy well-built hardware without worrying about being taken advantage of.

It might sound nice to get a so-called “golden sample,” but a golden sample that isn’t identified as such is nothing but a headache. Nobody wants to be the person who praises new hardware for excellent overclocking or fabulous power consumption only to discover that the hardware they tested doesn’t represent the actual product people can buy. Readers have made it very clear over the years that they aren’t fond of this outcome, either.

Not all reviews are particularly difficult, but all reviews are time-consuming. It takes time to generate data, create graphs, reference slideshows, attend briefings, and put copy together. Photos have to be taken and/or downloaded, then edited. Relevant performance data from other products must be gathered. Tests need to be re-done periodically to confirm that a Windows update or driver change hasn’t altered competitive standing. New tests are added and old tests subtracted. Shipping one type of drive (or any other product) to reviewers and another to customers means data from a non-representative product may be inadvertently carried forward for months or years before being detected.

This is work that reviewers undertake because the companies we cover promise that doing it will grant insight into which SSDs, CPUs, and GPUs are better and worse, for any particular value of “better” and “worse” one cares to evaluate. Readers want to know if AMD or Intel makes the faster CPU, or if AMD or Nvidia has the most power-efficient GPU. They want to know fine points of difference between vendors so they can decide whether Asus, EVGA, Gigabyte, or MSI is offering the best deal for them. Sometimes, they want to know which product has the best performance per watt or how large the IPC improvement has been from one product generation to another. The only way we can answer these questions is if the hardware we test and the hardware on the open market are the same hardware.

If we can’t trust that the products we receive for testing are the same as the products readers will buy, we can’t trust that the review data we will generate is accurate. Inaccurate review data is worthless. There is no value in writing coverage of a product if the product’s performance may or may not invisibly crash in the next six months, and no point to writing a review we’re going to have to modify or substantially retract. It doesn’t matter if we run a prominent retraction story; the vast majority of corrections never get as much traffic as the original. There’s no way for us to magically beam new data to everyone who read the original version or bought a product on a recommendation rooted in false premises.

Don’t Go Down This Road

I acknowledge that what WD, Crucial, and Adata (that we know of) are doing is not the same situation as IBM’s catastrophic drive failures, but the idea that it’s acceptable to swap TLC NAND for QLC NAND without explicitly informing both reviewers and customers of the performance difference is rooted in the idea that misrepresenting performance is acceptable if enough people won’t notice. The idea that shipping drives with a high failure rate is fine because people won’t notice is just a nastier application of the same principle.

Invariably, however, someone does notice. Sometimes we notice when a reader contacts us angrily demanding an explanation for why their performance doesn’t look like ours. Sometimes we find out when someone writes a thread asking for feedback and the thread takes off like a rocket. Sometimes, a customer comes along and buys a drive because they think they’re getting a great deal, only to find out the manufacturer is the one having the last laugh. The idea that printing low specifications on the box makes it fine, as Crucial has argued, ignores the fact that the first drives far outperformed these expectations. Some websites even noticed this. Without knowing that Crucial published very low numbers to justify its future bait and switch, the P2 sometimes looked like a drive that was much faster than it claimed to be. Odd, to be sure, but also more likely to make people think they were getting a sneaky good deal.

The message from the company when a customer finds themselves in this situation — the message sent by the company’s actions, rather than its PR department — is that you’re the customer whose use case didn’t matter. You’re the chump whose need for an honest product was less important than Adata, Crucial, or Western Digital’s need to make money at a time when nearly every Silicon Valley company on Earth is enjoying record profit windfalls courtesy of the COVID-19 pandemic.

This article has focused more on Crucial than WD because I happened to buy a Crucial drive, but the WD SN550 Blue was my second choice and I never thought WD would stoop to this, either. The company that shipped some of the highest-performing 7200 RPM drives with the first 8MB caches? The manufacturer who once defined the high-performance consumer hard drive segment? Apparently Western Digital has been reduced to bait and switching customers like a two-bit con with a protractor who makes a living scratching new serial numbers on top of ICs. I expected better. I thought Western Digital did, too. The company’s earnings for its fiscal year 2021 (released August 4) indicated a 4.1 percent profit margin improvement in 2021 and very healthy finances.

When a company ships one product to reviewers and a different, inferior product to consumers, it is using journalists to add a veneer of trustworthiness to benchmark results while simultaneously planning to render those results invalid. This is no different than a manufacturer self-publishing benchmark results in applications or system configurations that ludicrously favor their own products. It treats the independent press as if we are extensions of its marketing department whose results can be manipulated indirectly to achieve similar ends.

We don’t work for you. Don’t treat us as if we do.

There is a way for the companies in question to make this right. Adata, Crucial, WD, and any other company engaging in this behavior need to admit it, launch their QLC products as separate SKUs with different brand names, replace the low-performing SSD of any customer who requests it with the higher-performing variant for no charge, and pledge not to engage in this kind of deception again. I don’t expect any of these things to happen, but that’s what it would take.

No reviewer I have ever met enjoys discovering that they wasted a dozen hours or more generating data that doesn’t reflect the shipping product. No journalist likes it when readers feel they were scammed by positive coverage written in good faith. No one enjoys finding out they spent hundreds of dollars that they’d never have spent had they been aware of what they were buying. I do not speak for all reviewers and I cannot say where different publications will draw the line, but that doesn’t mean they won’t get drawn.

The relationship between hardware manufacturers, reviewers, and readers is built on mutual trust. When genuine, this trust allows for mistakes and miscommunications on both sides of the reviewer/manufacturer relationship. It allows for accidents. It allows for the real-world impact of everything from low initial yields to pandemic-related shortages. One thing it will not survive, however, is sabotage. Intentional or not, that’s what this is.

Now Read:



No comments:

Post a Comment