> Overall, I haven’t seen many issues with the drives, and when I did, it was a Linux kernel issue.
Reading the linked post, it's not a Linux kernel issue. Rather, the Linux kernel was forced to disable queued TRIM and maybe even NCQ for these drives, due to issues in the drives.
Since it’s kind of related, here’s my anecdote/data point on the bit rot topic: I did a 'btrfs scrub' (checksum) on my two 8 TB Samsung 870 QVO drives. One of them has been always on (10k hours), while the other hasn’t been powered on a single time in 9 months and once in 16 months.
very little, about 25 TB written on the always-on one. The offline one just does diffs, so probably <12 TB. Both are kind of data dumps, which is outside their designed use case. That's why I included data integrity checks in my backup script before the actual rsync backup runs. But again no issues so far
I wonder how long those drives can be powered off before they lose the data. And until they lose all functionality when the critical bookkeeping data disappears.
This would depend on how worn they are. Here's an article describing a test a YouTuber did[1] that I watched some time ago. The worn drives did not fare that well, while the fresh ones did ok. Those were TLC drives though, for QLC I expect the result is overall much worse.
I remember that post. Typical Tom's quality (or lack there of).
The only insight you can gleam from that is that bad flash is bad, and worn bad flash is even worse, and that's frankly a stretch given the lack of sample sizes or a control group.
The reality is that its non trivial to determine data retention/resilience in a powered off state, at least as it pertains to a coming to a useful and reasonably accurate generalism of "X characteristics/features result in poor data retention/endurance when powered off in Y types of devices," and being able to provide the receipts to back that up. There are far more variables than most people realize going on under the hood with flash and how different controllers and drives are architected(hardware) and programmed(firmware). Thermal management is a huge factor that is often overlooked or misunderstood and that has substantial impact on flash endurance (and performance). I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.
All that said, the general mantra remains true: more layers per cell generally means data per cell is more fragile/sensitive, but that's generally in the context of write cycle endurance.
First time I hear such negativity about tomshardware but the only time I actually looked at one of their tests in detail was with their series that tests for burn-in for consumer OLED TVs and displays. But the other reviews I glances at in that contexts looked pretty solid from a casual glance
Can you elaborate wrt the reason for your critique considering they're pretty much just testing from the perspective of the consumer? I thought their explicit goal is not to provide highly technical analysis and niche preferences but instead look at it for John Doe that's thinking about buying X, and what it would mean for his usecases. From my mental model of that perspective, they're reporting was pretty spot on and not shoddy, but I'm not an expert on the topic
The article I linked to is basically just a very basic retelling of the video by some YouTuber. I decided to link to it as I prefer linking to text sources rather than videos.
The video isn't perfect, but I thought it had some interesting data points regardless.
As someone that I read Tom's since it was ran by Thomas, I found the quality of the articles a lot lower than almost 30 years ago. I don't remember when I stopped checking it daily, but I guess it is over 15 years ago.
Maybe the quality looks good to you, but maybe you don't know what it used to be 25 years ago to compare to. Maybe it is a problem of wrong baseline.
Have had enough consumer SSDs fail on me that I ended up building a NAS with mirrored enterprise ones...but 2nd hand ones. Figured between mirrored and enterprise that's an OK gamble.
Still to be seen how that works out in long run but so far so good.
You can't trust SSDs or HDDs, fundamentally they still have high failure rates regardless. Modern Filesystems with checksums and scrub cycles etc are going to be necessary for a long time yet.
For data storage, I just avoid SSDs outright. I only use them for games and my OS. I've seen too many SSDs fail without warning into a state where no data is recoverable, which is extremely rare for HDDs unless they're physically damaged.
SSDs are worth it to me because the restore and rebuild times are so much faster. Larger HDDs can take several days to rebuild a damaged array, and other drives have a higher risk of failure when they're being thrashed by IO and running hot. And if it does have subsequent drives fail during the rebuild, it takes even longer to restore from backup. I'm much happier to just run lots of SSDs in a configuration where they can be quickly and easily replaced.
I wonder whats the best SATA SSD (M.2 2280) one could get now?
I have an old Asus with a M.2 2280 slot that only takes SATA III.
I recall 840 EVO M.2 (if my memory serves me right) is the current drive but looking for a new replacement seems not to be straightforward as most SATA is 2.5 in. Or if its the correct M.2 2280, its for NVMe.
NB you need to look at the first decimal number in 177 Wear_Leveling_Count to get the 'remaining endurance percent' value, ie 59 and 60 here
While overall it's not that bad, losing only 40% after 4.5 years - it means what in another 3-4 years it would be down to 20% if the usage pattern wouldn't change and the system wouldn't hit the write amplification. Sure, someone had that "brilliant" idea ~5 years ago to use a desktop grade QLC flash as a ZFS storage for PVE...
Have a look at the SSD Statistics page of the device statistics log (-l smart). This has one "Percentage Used Endurance Indicator" value, which is 5 for three of these disks, and 6 for one of them. So based on that, the drives still have ~95% of their useful life left.
As I understand it, the values in the device statistics log have standardized meanings that apply to any drive model, whereas any details about SMART attributes (as in the meaning of a particular attribute or any interpretation of its value apart from comparing the current value with the threshold) are not. So absent a data sheet for this particular drive documenting how to interpret attribute 177, I would not feel confident interpreting the normalized value as a percentage; all you can say is that the current value is > the threshold so the drive is healthy.
> Overall, I haven’t seen many issues with the drives, and when I did, it was a Linux kernel issue.
Reading the linked post, it's not a Linux kernel issue. Rather, the Linux kernel was forced to disable queued TRIM and maybe even NCQ for these drives, due to issues in the drives.
Since it’s kind of related, here’s my anecdote/data point on the bit rot topic: I did a 'btrfs scrub' (checksum) on my two 8 TB Samsung 870 QVO drives. One of them has been always on (10k hours), while the other hasn’t been powered on a single time in 9 months and once in 16 months.
No issues were found on either of them.
How much have been written to each of them across their lifetime?
very little, about 25 TB written on the always-on one. The offline one just does diffs, so probably <12 TB. Both are kind of data dumps, which is outside their designed use case. That's why I included data integrity checks in my backup script before the actual rsync backup runs. But again no issues so far
I wonder how long those drives can be powered off before they lose the data. And until they lose all functionality when the critical bookkeeping data disappears.
This would depend on how worn they are. Here's an article describing a test a YouTuber did[1] that I watched some time ago. The worn drives did not fare that well, while the fresh ones did ok. Those were TLC drives though, for QLC I expect the result is overall much worse.
[1]: https://www.tomshardware.com/pc-components/storage/unpowered...
I remember that post. Typical Tom's quality (or lack there of).
The only insight you can gleam from that is that bad flash is bad, and worn bad flash is even worse, and that's frankly a stretch given the lack of sample sizes or a control group.
The reality is that its non trivial to determine data retention/resilience in a powered off state, at least as it pertains to a coming to a useful and reasonably accurate generalism of "X characteristics/features result in poor data retention/endurance when powered off in Y types of devices," and being able to provide the receipts to back that up. There are far more variables than most people realize going on under the hood with flash and how different controllers and drives are architected(hardware) and programmed(firmware). Thermal management is a huge factor that is often overlooked or misunderstood and that has substantial impact on flash endurance (and performance). I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.
All that said, the general mantra remains true: more layers per cell generally means data per cell is more fragile/sensitive, but that's generally in the context of write cycle endurance.
First time I hear such negativity about tomshardware but the only time I actually looked at one of their tests in detail was with their series that tests for burn-in for consumer OLED TVs and displays. But the other reviews I glances at in that contexts looked pretty solid from a casual glance
Can you elaborate wrt the reason for your critique considering they're pretty much just testing from the perspective of the consumer? I thought their explicit goal is not to provide highly technical analysis and niche preferences but instead look at it for John Doe that's thinking about buying X, and what it would mean for his usecases. From my mental model of that perspective, they're reporting was pretty spot on and not shoddy, but I'm not an expert on the topic
The article I linked to is basically just a very basic retelling of the video by some YouTuber. I decided to link to it as I prefer linking to text sources rather than videos.
The video isn't perfect, but I thought it had some interesting data points regardless.
As someone that I read Tom's since it was ran by Thomas, I found the quality of the articles a lot lower than almost 30 years ago. I don't remember when I stopped checking it daily, but I guess it is over 15 years ago.
Maybe the quality looks good to you, but maybe you don't know what it used to be 25 years ago to compare to. Maybe it is a problem of wrong baseline.
Have had enough consumer SSDs fail on me that I ended up building a NAS with mirrored enterprise ones...but 2nd hand ones. Figured between mirrored and enterprise that's an OK gamble.
Still to be seen how that works out in long run but so far so good.
You can't trust SSDs or HDDs, fundamentally they still have high failure rates regardless. Modern Filesystems with checksums and scrub cycles etc are going to be necessary for a long time yet.
For data storage, I just avoid SSDs outright. I only use them for games and my OS. I've seen too many SSDs fail without warning into a state where no data is recoverable, which is extremely rare for HDDs unless they're physically damaged.
SSDs are worth it to me because the restore and rebuild times are so much faster. Larger HDDs can take several days to rebuild a damaged array, and other drives have a higher risk of failure when they're being thrashed by IO and running hot. And if it does have subsequent drives fail during the rebuild, it takes even longer to restore from backup. I'm much happier to just run lots of SSDs in a configuration where they can be quickly and easily replaced.
I just don't have the patience for HDDs anymore. Mirrored arrays and backups are going to have to do on data loss.
That said I only have a couple of TBs...bit more and HDDs do become unavoidable
I'm using an HDD with SSD cache for /home all non stale will be cached by the SSD
I wonder whats the best SATA SSD (M.2 2280) one could get now?
I have an old Asus with a M.2 2280 slot that only takes SATA III.
I recall 840 EVO M.2 (if my memory serves me right) is the current drive but looking for a new replacement seems not to be straightforward as most SATA is 2.5 in. Or if its the correct M.2 2280, its for NVMe.
Most companies stopped making and selling SATA M.2 drives years ago.
> The reported SSD lifetime is reported to be around 94%, with over 170+ TB of data written
Glad for the guy, but here are a bit different view on the same QVO series:
NB you need to look at the first decimal number in 177 Wear_Leveling_Count to get the 'remaining endurance percent' value, ie 59 and 60 hereWhile overall it's not that bad, losing only 40% after 4.5 years - it means what in another 3-4 years it would be down to 20% if the usage pattern wouldn't change and the system wouldn't hit the write amplification. Sure, someone had that "brilliant" idea ~5 years ago to use a desktop grade QLC flash as a ZFS storage for PVE...
Have a look at the SSD Statistics page of the device statistics log (-l smart). This has one "Percentage Used Endurance Indicator" value, which is 5 for three of these disks, and 6 for one of them. So based on that, the drives still have ~95% of their useful life left.
As I understand it, the values in the device statistics log have standardized meanings that apply to any drive model, whereas any details about SMART attributes (as in the meaning of a particular attribute or any interpretation of its value apart from comparing the current value with the threshold) are not. So absent a data sheet for this particular drive documenting how to interpret attribute 177, I would not feel confident interpreting the normalized value as a percentage; all you can say is that the current value is > the threshold so the drive is healthy.