Samsung 870 QVO 4TB SATA SSD-s: how are they doing after 4 years of use?

52 points by furkansahin 2 days ago

matharmin 3 hours ago

> Overall, I haven’t seen many issues with the drives, and when I did, it was a Linux kernel issue.

Reading the linked post, it's not a Linux kernel issue. Rather, the Linux kernel was forced to disable queued TRIM and maybe even NCQ for these drives, due to issues in the drives.

Prunkton 3 hours ago

Since it’s kind of related, here’s my anecdote/data point on the bit rot topic: I did a 'btrfs scrub' (checksum) on my two 8 TB Samsung 870 QVO drives. One of them has been always on (10k hours), while the other hasn’t been powered on a single time in 9 months and once in 16 months.

No issues were found on either of them.

diggan 2 hours ago

How much have been written to each of them across their lifetime?
- Prunkton an hour ago
  
  very little, about 25 TB written on the always-on one. The offline one just does diffs, so probably <12 TB. Both are kind of data dumps, which is outside their designed use case. That's why I included data integrity checks in my backup script before the actual rsync backup runs. But again no issues so far

vardump 2 days ago

I wonder how long those drives can be powered off before they lose the data. And until they lose all functionality when the critical bookkeeping data disappears.

magicalhippo 2 days ago

This would depend on how worn they are. Here's an article describing a test a YouTuber did[1] that I watched some time ago. The worn drives did not fare that well, while the fresh ones did ok. Those were TLC drives though, for QLC I expect the result is overall much worse.
[1]: https://www.tomshardware.com/pc-components/storage/unpowered...
- 0manrho 2 days ago
  
  I remember that post. Typical Tom's quality (or lack there of).
  The only insight you can gleam from that is that bad flash is bad, and worn bad flash is even worse, and that's frankly a stretch given the lack of sample sizes or a control group.
  The reality is that its non trivial to determine data retention/resilience in a powered off state, at least as it pertains to a coming to a useful and reasonably accurate generalism of "X characteristics/features result in poor data retention/endurance when powered off in Y types of devices," and being able to provide the receipts to back that up. There are far more variables than most people realize going on under the hood with flash and how different controllers and drives are architected(hardware) and programmed(firmware). Thermal management is a huge factor that is often overlooked or misunderstood and that has substantial impact on flash endurance (and performance). I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.
  All that said, the general mantra remains true: more layers per cell generally means data per cell is more fragile/sensitive, but that's generally in the context of write cycle endurance.
  - ffsm8 3 hours ago
    
    First time I hear such negativity about tomshardware but the only time I actually looked at one of their tests in detail was with their series that tests for burn-in for consumer OLED TVs and displays. But the other reviews I glances at in that contexts looked pretty solid from a casual glance
    Can you elaborate wrt the reason for your critique considering they're pretty much just testing from the perspective of the consumer? I thought their explicit goal is not to provide highly technical analysis and niche preferences but instead look at it for John Doe that's thinking about buying X, and what it would mean for his usecases. From my mental model of that perspective, they're reporting was pretty spot on and not shoddy, but I'm not an expert on the topic
    
    magicalhippo 2 hours ago
    
    The article I linked to is basically just a very basic retelling of the video by some YouTuber. I decided to link to it as I prefer linking to text sources rather than videos.
    The video isn't perfect, but I thought it had some interesting data points regardless.
    
    AdrianB1 2 hours ago
    
    As someone that I read Tom's since it was ran by Thomas, I found the quality of the articles a lot lower than almost 30 years ago. I don't remember when I stopped checking it daily, but I guess it is over 15 years ago.
    Maybe the quality looks good to you, but maybe you don't know what it used to be 25 years ago to compare to. Maybe it is a problem of wrong baseline.

Havoc 3 hours ago

Have had enough consumer SSDs fail on me that I ended up building a NAS with mirrored enterprise ones...but 2nd hand ones. Figured between mirrored and enterprise that's an OK gamble.

Still to be seen how that works out in long run but so far so good.

PaulKeeble 19 minutes ago

You can't trust SSDs or HDDs, fundamentally they still have high failure rates regardless. Modern Filesystems with checksums and scrub cycles etc are going to be necessary for a long time yet.
Yokolos 2 hours ago

For data storage, I just avoid SSDs outright. I only use them for games and my OS. I've seen too many SSDs fail without warning into a state where no data is recoverable, which is extremely rare for HDDs unless they're physically damaged.
- yabones 9 minutes ago
  
  SSDs are worth it to me because the restore and rebuild times are so much faster. Larger HDDs can take several days to rebuild a damaged array, and other drives have a higher risk of failure when they're being thrashed by IO and running hot. And if it does have subsequent drives fail during the rebuild, it takes even longer to restore from backup. I'm much happier to just run lots of SSDs in a configuration where they can be quickly and easily replaced.
- Havoc an hour ago
  
  I just don't have the patience for HDDs anymore. Mirrored arrays and backups are going to have to do on data loss.
  That said I only have a couple of TBs...bit more and HDDs do become unavoidable
  - shim__ 35 minutes ago
    
    I'm using an HDD with SSD cache for /home all non stale will be cached by the SSD

8cvor6j844qw_d6 3 hours ago

I wonder whats the best SATA SSD (M.2 2280) one could get now?

I have an old Asus with a M.2 2280 slot that only takes SATA III.

I recall 840 EVO M.2 (if my memory serves me right) is the current drive but looking for a new replacement seems not to be straightforward as most SATA is 2.5 in. Or if its the correct M.2 2280, its for NVMe.

Hendrikto 22 minutes ago

Most companies stopped making and selling SATA M.2 drives years ago.

justsomehnguy 3 hours ago

> The reported SSD lifetime is reported to be around 94%, with over 170+ TB of data written

Glad for the guy, but here are a bit different view on the same QVO series:

    Device Model:     Samsung SSD 870 QVO 1TB
    User Capacity:    1,000,204,886,016 bytes [1.00 TB]
   
    == /dev/sda
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       406
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354606366027
    == /dev/sdb
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   060   060   000    Pre-fail  Always       -       402
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354366033251
    == /dev/sdc
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       409
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       352861545042
    == /dev/sdd
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40778
    177 Wear_Leveling_Count     0x0013   060   060   000    Pre-fail  Always       -       403
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354937764042
    == /dev/sde
      9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
    177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       408
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       353743891717

NB you need to look at the first decimal number in 177 Wear_Leveling_Count to get the 'remaining endurance percent' value, ie 59 and 60 here

While overall it's not that bad, losing only 40% after 4.5 years - it means what in another 3-4 years it would be down to 20% if the usage pattern wouldn't change and the system wouldn't hit the write amplification. Sure, someone had that "brilliant" idea ~5 years ago to use a desktop grade QLC flash as a ZFS storage for PVE...

yrro 2 hours ago

Have a look at the SSD Statistics page of the device statistics log (-l smart). This has one "Percentage Used Endurance Indicator" value, which is 5 for three of these disks, and 6 for one of them. So based on that, the drives still have ~95% of their useful life left.
As I understand it, the values in the device statistics log have standardized meanings that apply to any drive model, whereas any details about SMART attributes (as in the meaning of a particular attribute or any interpretation of its value apart from comparing the current value with the threshold) are not. So absent a data sheet for this particular drive documenting how to interpret attribute 177, I would not feel confident interpreting the normalized value as a percentage; all you can say is that the current value is > the threshold so the drive is healthy.