File Server Builder's Guide
by Zach Throckmorton on September 4, 2011 3:30 PM ESTHard drives
One of the most frequently asked questions I hear is 'what's the most reliable hard drive?' The answer to this question is straightforward - the one that's backed up frequently. Home file servers can be backed up with a variety of devices, from external hard drives to cloud storage. As a general guideline, RAID enhances performance but it is not a backup solution. Some RAID configurations (such as RAID 1) provide increased reliability, but others (such as RAID 0) actually decrease reliability. A detailed discussion of different kinds of disk arrays is not within the scope of this guide, but the Wikipedia page is a good place to start your research if you're unfamiliar with the technology.
As for hard drive reliability, every hard drive can fail. While some models are more likely to fail than others, there are no authoritative studies that implement controlled conditions and have large sample sizes. Most builders have preferences - but anecdotes do not add up to data. There are many variables that all affect a drive's long-term reliability: shipping conditions, PSU quality, temperature patterns, and of course, specific make and model quality. Unfortunately, as consumers we have little control over shipping and handling conditions until we get a drive in our own hands. We also generally don't have much insight into a specific hard drive model's quality, or even a manufacturer's general quality. However, we can control PSU quality and temperature patterns, and we can use S.M.A.R.T. monitoring tools
One of the most useful studies on hard drive reliability was presented by Pinheiro, Weber, and Barroso at the 2007 USENIX Conference on File and Storage Technologies. Their paper, Failure trends in a large disk drive population, relied on data gleaned from Google. So while the controls are not perfect, the sample size is enormous, and it's about as informative as any research on disk reliability. The PDF is widely available on the web and is definitely worth a read if you've not already seen it and you have the time (it's short at only 12 pages with many graphs and figures). In sum, they found that SMART errors are generally indicative of impending failure - especially scan errors, reallocation counts, offline reallocation counts, and probational counts. The take home message: if one of your drives reports a SMART error, you should probably replace it and send it in for replacement if it's under warranty. If one of your drives reports multiple SMART errors, you should almost certainly replace it as soon as possible.
From Pinheiro, Weber, and Barroso 2007. Of all failed HDDs, more than 60% had reported a SMART error.
Pinheiro, Weber, and Barroso also showed how temperature affects failure rates. They found that drives operating at low temperatures (i.e. less than 75F/24C) actually have the highest (by far) failure rates, even greater than drives operating at 125F/52C. This is likely an irrelevant point to many readers, but for those of us who live further up north and like to keep our homes at less than 70F/21C in the winter, it's an important recognition that colder is not always better for computer hardware. Of use to everyone, the study showed that the pinnacle of reliability occurs around 104F/40C, from about 95F/35C to 113F/45C.
From Pinheiro, Weber, and Barroso 2007. AFR: Annualized Failure Rate - higher is worse!
Given the range of temperatures that hard drives appear to function most reliably at, it might take some experimentation in any given case to get a home file server's hard drives in an ideal layout.
So rather than answering what specific hard drive models are the most reliable, we recommend you do everything you can to prevent catastrophic failure by using quality PSUs, maintaining optimal temperatures, and paying attention to SMART utilities. For such small sample sizes as a home file server necessitates, the most important factor in long-term HDD reliability is probably luck.
Pragmatically, low-rpm 'green' drives are the most cost-effective storage drives. Note that many of the low-rpm drives are not designed to operate in a RAID configuration - be sure to research specific models. The largest drives currently available are 3TB, which can now be found for as little as $110. The second-largest capacity drives at 2TB generally offer the best $/GB ratio, and can regularly be found for $70 (and less when on sale or after rebate). 1TB drives are fine if you don't need much space, and can sometimes be found for as little as $40.
152 Comments
View All Comments
noxipoo - Sunday, September 4, 2011 - link
i'm looking for alternatives to drobo or the more expensive NAS devices so some raid card recommendations along with all the things that one needs would of been nice.bamsegrill - Sunday, September 4, 2011 - link
Yeah, some raidcard-recommendations would be nice.Rick83 - Sunday, September 4, 2011 - link
MY RAID card recommendation is a mainboard with as many SATA ports as possible, and screw the RAID card.For anything but high end database servers, it's a waste of money.
With desktop boards offering 10 to 12 SATA ports these days, you're unlikely to need additional hardware, if you chose the right board.
Otherwise, it's probably wisest to go with whatever chipset is best supported by your OS.
PCTC2 - Sunday, September 4, 2011 - link
But there's the fact that software RAID (which is what you're getting on your main board) is utterly inferior to those with dedicated RAID cards. And software RAIDs are extremely fickle when it comes to 5400 RPM desktop drives. Drives will drop out and will force you to rebuild... over 90 hours for 4 1.5TB drives. (I'm talking about Intel Storage Matrix on Windows/ mdadm on Linux).You could run FreeNAS/FreeBSD and use RAID-Z2. I've been running three systems for around 5 months now. One running Intel Storage Matrix on Windows, one running RAID-Z2 on FreeBSD, and one running on a CentOS box on a LSI2008-based controller. I have to say the hardware has been the most reliable, with the RAID-Z2 in a close second. As for the Intel softRAID, I've had to rebuild it twice in the 5 months (and don't say it's the drives because these drives used to be in the RAID-Z2 and they were fine. I guess Intel is a little more tight when it comes to drop-out time-outs).
A good RAID card with an older LSI1068E for RAID 5 is super cheap now. If you want a newer controller, you can still get one with an LSI2008 pretty cheap as well. If you want anything other than a giant RAID 0 stripe (such as RAID 5/6/10), then you should definitely go for a dedicated card or run BSD.
Rick83 - Sunday, September 4, 2011 - link
I've been using 5400 rpm disks and mdadm on linux for quite a while (6 years now?) and never had a problem, while having severly sufficient performance.If disks drop, that's the kernel saying that the device is gone, and could be jut a bad controller.
I've been on three boards and never had that kind of issue.
Windows is a bit more annoying.
Also, your rebuild time is over the top.
I've resynced much faster (2 hours for 400GB - so 10x faster than what you achieved. While also resyncing another array. Sounds like you may have a serious issue somewhere)
The compatibility advantage of software RAID outweighs any performance gain, unless you really need those extra 10 percent, or run extreme arrays (RAID-6 over 8 disks, and I might consider going with a dedicated card)
I think it might be the Intel/Windows combination that is hurting you - you might want to try the native windows RAID over the Intel Matrix stuff. Using that is the worst of both worlds, as you have the vendor lock in off a dedicated card and the slightly lower performance of a software solution.
Of course, you also mentioned mdadm, but I've never had a problem with that, with a bunch of different disks and controllers and boards.
I guess in two to three years, when I upgrade my machine again, I will have to look at a SATA controller card, or maybe sooner, should one of my IDE disks fail without me being able to replace it.
I think you may just have been unlucky with your issues, and can't agree with your assessment :-/
Flunk - Sunday, September 4, 2011 - link
I agree, I've used the Windows soft raid feature a lot and it trumps even hardware raid for ease of use because if your raid controller dies you can just put your drives in any windows system and get your data off. You don't need to find another identical controller. Performance is similar to matrix raid, good enough for a file server.vol7ron - Monday, September 5, 2011 - link
Wouldn't a RAID card be limited to the PCI bus anyhow? I would suspect you'd want the full speed that the SATA ports are capable ofvol7ron - Monday, September 5, 2011 - link
Even with 5400RPM drives, if you have a lot of data you're copying/transfering, you could probably saturate the full bandwidth, right?Rick83 - Monday, September 5, 2011 - link
PCI is wide enough to support gigabit ethernet, so if you don't have too many devices on the bus, you'll be fine until you have to build a RAID array.With PCI-X and PCIe these limitations are no longer of relevance.
Jaybus - Monday, September 5, 2011 - link
There are plenty of PCI-E x4 and x8 RAID cards.