The hardware in the SAN I've built is turning out to be a mixed bag of good inexpensive parts, and disappointing expensive ones. While this may be contrary to the popular belief that expensive parts work best, experience has shown time and again that the price tag has little to do with performance or stability.
The motherboard's iKVM/BMC had to be completely reset, which the manufacturer couldn't help me with. A quick search of the chip manufacturer's site turned up a tool that would allow me to update the firmware and erase the chip in one process. This turned out to be exactly what I needed to do, and it worked perfectly. I've since notified Tyan of the utility, in the event this happens in the future to another customer.
The hard drives that I initially believed to be DOA actually seem to test out fine with the motherboard's onboard LSI SAS HBA/RAID chipset. In fact, the onboard chipset seems to behave better and more predictably in nearly every way! This is disappointing because the Areca RAID card cost over $1300, and the onboard LSI chipset is essentially a throw-away part. It's so cheap to put on the board that's not worth removing in newer hardware versions!
The HBA/RAID driver story is more of the same. The LSI chipset uses the "mptsas" kernel driver, which has had many contributors offering fixes and enhancements. The Areca card seems to suffer from the opposite in a bad way. Areca themselves have published a newer version that the one included in the 3.0.x Linux kernel, but it has some pretty big flaws that make it a bit wonky. The first and most noticable flaw is the fact that a SAS/SATA device the stops responding causes the driver to freak out. To me, this is totally unacceptable. I'd rather see the device drop from the bus and be considered "offline" than the driver just freezing. A second big blemish is the remaining reference to the "Big Kernel Lock", which has been removable since something like 2.6.28 and is now off by default in many distributions shipping a 3.x kernel.
I've been in contact with Areca support, and the things they've had me try have only further proven that there is a problem with the driver. It has also indicated that there may be a pretty big problem with the card itself as well! For $1300, I'd expect a HBA/RAID card to undergo some serious QA process before shipping. But, more hard drives are causing issues, so at this point I'm pretty sure the batch of disks I purchased is fine and the Areca card is the source of all my issues.
So, I've started the RMA process for the existing RAID card, and ordered a LSI 9211-8i card. This card has 2 SFF-8087 ports for a total of 8 SAS 6G lanes. I've also ordered an HP SAS expander card which will give me 36 total ports and dual 4-lane SAS connectivity for a total theoretical throughput of 12Gbit/sec. This should work out to somewhere around 50Mbyte/sec per drive if all of them are active simultaneously. In reality, that rarely happens in a RAID system so I'm confident this configuration will be what I need. From the reviews I've read, the card seems like it's a great solution and really well liked by benchmarking forums. And it's rated for something like 290,000 IO/s max!