Re: RAID 1 or Hammer

From: Bjørn Vermo <bv@xxxxxxxxx>
Date: Wed, 14 Jan 2009 08:58:14 +0100

On 13. jan.. 2009, at 04.15, Matthew Dillon wrote:

I've seen uncaught data corruption on older machines, but not in the
last few years. Ah, the days of IDE cabling problems, remembered
fondly (or not). I've seen bad data get through TCP connections
uncaught! Yes, it actually does happen, even more so now that OS's
are depending more and more on CRC checking done by the ethernet device.
Modern (meaning anything with an ATA or SCSI controller in it) drives will do so much error checking and recovery that the time between externally noticeable failures and total breakdown will be very short.

I have a number of 7-8 years old hand-me-down IBM Netfinity servers to use for testing purposes, and the combination of the processing done by the ServeRaid controllers and the Datastar ultra-320 drives makes it next to impossible for an error to slip through to the operating system. I will probably find out soon enough how the eventual breakdown happens, I have a yellow warning light on on a drive for about half a year now on a system I'm stress testing. Does not help to have hot-swappable drives when you have run out of spares...

I still have had errors noticed by JFS or ReiserFS, but they have not been caused by disk problems. On desktop systems, one of my first suspects will be power supplies and bad capacitors on the motherboard. Another suspect is software bugs, and on the servers that is the most plausible.

Bjørn Vermo
Core networking
Opera Software ASA

