SuperMicro + Adaptec SAS RAID: Constant Physical and Logical Device Failures

Joined
Dec 15, 2014
Messages
4
So, I'm just a lurker here -- created an account just to post this, honestly -- and usually I find my answers to problems quickly enough by looking over other HF posts, but I'm currently at my wit's end and have *no* idea what to do except start asking for help. Maybe somebody here will recognize some of these symptoms and have a troubleshooting idea I haven't tried.

Thanks for taking a look, if you have the time! I really appreciate it.


** SYSTEM SPECS **
  • Chassis: SuperMicro 846E16-R1200B (Intel C602)
  • Motherboard: SuperMicro X9DR7/E-LN4F (BIOS R 3.0a)
  • CPU: Dual Xeon E5-v2 2620
  • RAM: 16GB 1333MHz (8x 2GB)
  • GPU: Matrox G200eW
  • RAID Controller: Adaptec 6805 (FW 19147)
  • SAS Backplane: SuperMicro BPN-SAS2-846EL1 (LSI SAS2x36 expander, Rev 0717?)
  • Boot drive: 250GB SATA (direct to mobo)
  • RAID drives: 5x2TB WD Re SAS (WD2001FYYG, mixed FW VR07 & VR02)
  • OS: Windows Server 2008 R2 (fully updated)


** IMMEDIATE ISSUE **

Drives fail off the Adaptec 6805 controller like it's their job.

Literally any I/O pressure or substantial time spent at idle will cause them to spontaneously unmount with the Adaptec error

Physical drive removed: controller: 1 ( Adaptec 6805 #XXXXXXXXXXX Physical Slot: 1 ), channel: 0, deviceID: 9, enclosure ID: 0, slot ID: 1, WWN: XXXXXXXXXXXXXXXX, vendor: WD, model: WD2001FYYG-01SL3, S/N: XXXXXXXXXXXX, firmware level: VR02.

or something very much like it, depending on which slot failed this time, which leads to logical device failures, degraded arrays, and sad faces.

When this machine was first built it was running ONLY a boot disk over the Adaptec controller as a Simple Volume, and it BSOD'd pretty frequently, though it did limp along. Troubleshooting it was impossible in that setup -- eventually I deployed a fresh install of Server 2008 to a separate SATA drive on the motherboard so I could at least test the RAID, and have since discovered that *nothing* will stay alive on this controller / expander.

I have tried updating every piece of firmware on every piece of hardware I can find, have tried different combinations of drive models, tried every PCI slot, disabled every piece of hardware not directly in line with the SAS drives, replaced cabling, replaced drives, replaced the 6805 controller with another unit (under warranty) etc. I have no idea what is left to test, short of buying a new brand of controller or a new backplane (both of which are cost prohibitive for the role this system is being repurposed for).


** SYMPTOMS **

After creating any new logical device (via either the Adaptec BIOS utility or MaxView Storage Manger) performance is extremely slow (<75MB/s read/write on single drives and all types of arrays) and erratic, and if pushed (via ATTO or AJA disk benchmark utilities) the drive will within minutes disconnect. If left relatively idle the drives may stay on for as long as several days but they will disconnect eventually.

This behavior is easier to replicate in Windows (where I can push some I/O) but even if you leave the system in the BIOS utility with a freshly built logical device (no OS partitions), the drives will eventually disconnect.

I have tried creating JBODs, Simple Volumes, RAID 0, RAID 1, and RAID 5 arrays with 5, 4, 3, 2, and 1 disk arrangements, swapping disks in and out of these tests and into different slots on the expander. I have tried different cabling arrangements, using the J1 (Aux) port on the backplane, different cables, different brands of cables, etc.


** ADDITIONAL BACKGROUND **

The machine was built by BOXX in early 2013 and they have not had much luck helping me out (nor has Adaptec, who we don't have a direct support contract with -- I've only been able to speak with them via their "ASK" system. They also recommended getting a replacement 6805 card, which did not help). We bought the machine from BOXX as a renderfarm manager, so it didn't need much local storage and as I said earlier it limped along in that capacity for a year until we decided to repurpose it as a file server.

I'm not an IT / admin or hardware professional, I'm just the studio's resident nerd. I have about 15 years of experience building custom systems for film and animation production and have been running RAIDs on various hardware for a long time, but I am by no means an expert and, quite frankly, this one has defeated me completely.

Any advice from the community would be hugely appreciated. I have logs from the most recent round of testing available if anyone is interested and would like more detail or clarification.

Thanks in advance for your time!
 
Sounds like incompability between brands: Adaptec controller vs. LSI backplane.
Replace the Adaptec controller with a cheap used IBM M1015 or M5015 controller. The Adaptec/PMC chipset controllers have subpar performance even when they do work.
 
while the 6805 is okay, my 5805's were pretty crappy too over all, not that it helps at all, but i have learned to not use Adaptec raid cards and stick to LSI/Areca now.
 
Sounds like incompability between brands: Adaptec controller vs. LSI backplane.

That's kind of what I'm left with as a conclusion -- they just don't wanna work together. It's ridiculous though, *every* component in this setup is listed specifically as being compatible in Adaptec's own testing:

http://download.adaptec.com/pdfs/compatibility_report/arc-sas_cr_02-14-13_series6.pdf

And Boxx must ship plenty of these systems with exactly the same base configuration, it was their standard server build until about a year ago.

Sigh. Thank you for the help though, I will start looking at those models as a test replacement and see if the studio wants to throw any more money at this project.
 
I have not had any issues with my Adaptec 5805 controller in my Supermicro 846E16 case with WD RED's.

I wanted to let people know that they do work well together. I am sorry for the issues OP is having and hope it gets fixed.
 
Thanks for the replies everyone -- I appreciate your time looking at this.

is this SAS1 or SAS2 backplane?

It's an SAS2 backplane -- SuperMicro model BPN-SAS2-846EL1, which is based on the LSI SAS2x36 expander chip.

We actually tried something new yesterday that makes me think it's not the RAID card at fault here but the backplane itself (either the expander chip or the power distribution) -- I went out and jury-rigged a mini-SAS --> 4x SATA fanout cable to 3 random SATA drives we had lying around and powered them from an external PSU outside the rack to try isolating the backplane from the mobo / controller.

The setup looked like kind of a monstrosity but the result was a stable array which I've been hammering with I/O tests for the past 2 days without any errors. So now I'm looking into diagnostics for the backplane. Unfortunately I can't figure out how to get access to it, if it even has a management interface. If Boxx wants to send us a replacement that will be the ultimate test I guess...

Thanks again for your thoughts -- I'll post back here if I learn anything more.
 
Thanks for the replies everyone -- I appreciate your time looking at this.



It's an SAS2 backplane -- SuperMicro model BPN-SAS2-846EL1, which is based on the LSI SAS2x36 expander chip.

We actually tried something new yesterday that makes me think it's not the RAID card at fault here but the backplane itself (either the expander chip or the power distribution) -- I went out and jury-rigged a mini-SAS --> 4x SATA fanout cable to 3 random SATA drives we had lying around and powered them from an external PSU outside the rack to try isolating the backplane from the mobo / controller.

The setup looked like kind of a monstrosity but the result was a stable array which I've been hammering with I/O tests for the past 2 days without any errors. So now I'm looking into diagnostics for the backplane. Unfortunately I can't figure out how to get access to it, if it even has a management interface. If Boxx wants to send us a replacement that will be the ultimate test I guess...

Thanks again for your thoughts -- I'll post back here if I learn anything more.

do not now on windows
in linux, seg_ses can query expander detail information....

I always have badluck with adaptec + lsi sas expander :p..

my configuration is: sas2 backplane + IBM M1015(latest firmware). I have been hammering for 2 weeks, and give me a good overall.

if you have other raidcard/HBA non Adaptec, give a shot!. Prefers LSI( or OEM LSI) with lastest update
I assume nothing wrong with sas2 backplane on your case, just not working together as expected.

all based on my experience..
 
Back
Top