Supermicro X9DRI-F no post or beeps

Smoblikat

Limp Gawd
Joined
May 28, 2020
Messages
444
Hello all, having an issue and looking for some ideas. Recently I upgraded one of my servers from my old dual LGA1366 setup to a newer LGA2011 setup:

New Board:
https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRi-F.cfm
Old Board:
https://www.supermicro.com/products/motherboard/QPI/5500/X8DTH-iF.cfm

I got the motherboard installed using the dual E5-2620 V1 that came with it, the board booted up perfectly, all my RAM was detected etc..afterwards I went to swap in my dual 2690V2 and now the board will not POST/beep/output video, and I can tell the CPUs arent really being powered because the heatsinks dont even get slightly warm (also ran it without a heatsink just to test, definitely no heat).

Things ive done:
Checked for bent pins X10, there really arent any that I can see, and I dont think I actually bent any (was a straightforward CPU install)
Ran board out of the case on an anti-static surface
Put the original confirmed working 2620s back in
Run only 1 CPU (tried 2620 and 2690)
Reset CMOS
Tried a different power supply
Removed everything that wasnt the CPU power and mobo 24 pin power cable
Tried using external GPU instead of onboard
Ive asked nicely

The IPMI seems to work though, and I can get in and turn on/off the server. It shows the temperature sensors as detecting CPU1 & CPU2 (Not CPU0 and CPU1......for reasons?) as having an NA reading and not being detected, but scrolling down to the voltages shows that my 2690s idle at .54v and my 2620's idle at .59v, so they are getting voltage (or the baord thinks they are), so im not actually sure what to think of that. Nothing else was changed except for the CPUs, and I did check the BIOS before swapping in the new procs, it seems to be a high enough version to support the V2 CPUs, though supermicro has made it unnecessarily complex to determine that. The BIOS doesnt list a "BIOS Version", its called a "Firmware Version", mine is listed as 3.4 with a build date of 2020, yet the IPMI reports my "Firmware" as being at 3.62, and the readme for the BIOS lists it as being version 3.0.........so I literally dont know what my BIOS version is, but it probobly supports V2 CPUs?

Anyone have any suggestions? I was going to go through the manual tonight and see if its maybe some jumper settings that need to be changed, but the system was working fine before the CPU swap, so unless the jumpers need to be changed to allow a first boot with a new CPU, I dont think it would help much.
 
So I went through the manual and set the VGA jumper to disabled just to force it to use the PCIe card, and I used the clear cmos pads/jumper to make sure the CMOS was actually cleared. Still no beep/video on boot, the system also doesnt immediately turn off if you press the power button, so it doesnt seem like its stuck on a hardware fault, its doing........somthing. Another oddity is the IPMI password, even after clearing the CMOS the password is completely different than what ive seen in any of the documentation. Officially the default login should be ADMIN/ADMIN, yet mine is set to ADMIN/PASSWORD (yes PASSWORD was my first guess after ADMIN didnt work :p), so I dont know what thats about either.

EDIT - I wasnt sure if the motherboard had some sort of secret "thermal moron" mode that prevents it from running if it doesnt detect a fan (all are molex powered), so I plugged a 4 pin fan in to test, no difference.
 
Last edited:
Supermicro boards are notoriously temperamental about the CMOS battery being dead or the voltage being too low. If you haven't tried to swap the CR2032, you might try that, or at least test it. If it's much below 3v, try swapping it out. I know your board is only a couple of years old, but I've come across lots of dead new CR2032 cells before.

Something else to try is reflashing the firmware. I've read that newer SM boards with IPMI are notorious for firmware corruption issues. I don't have any experience with IPMI, but it's apparently possible to flash over the IPMI interface, albeit dangerous. But your board already doesn't work, so you can't make it much worse.

Last option is to RMA it back to SM, if it still has a warranty on it.
 
Supermicro boards are notoriously temperamental about the CMOS battery being dead or the voltage being too low. If you haven't tried to swap the CR2032, you might try that, or at least test it. If it's much below 3v, try swapping it out. I know your board is only a couple of years old, but I've come across lots of dead new CR2032 cells before.

Something else to try is reflashing the firmware. I've read that newer SM boards with IPMI are notorious for firmware corruption issues. I don't have any experience with IPMI, but it's apparently possible to flash over the IPMI interface, albeit dangerous. But your board already doesn't work, so you can't make it much worse.

Last option is to RMA it back to SM, if it still has a warranty on it.

Great suggestions!

I swapped the CMOS battery with 2 different ones, the one from my old LGA 1366 board and one brandy new one, no dice. I was able to reflash the IPMI firmware over the IPM interface, which immediately got the credentials set back to ADMIN/ADMIN like they should have been, so I think that definitely helped. I tried following the instructions provided by supermicro for reflashing a BIOS on a board that wont boot, which is basically put the rom on a flash drive, and hold ctrl + home on bootup, but my USB controller doesnt work so I couldnt do it that way. You can flash the BIOS over IPMI if you purchase a license key, they seem to cost $27 and I was considering just buying one, but they dont sell them for the X9 series boards anymore, just X10 and up. Which would have left me in a really bad place if I hadnt found this absolute legend:

https://peterkleissner.com/2018/05/27/reverse-engineering-supermicro-ipmi/

I got my key, flashed the BIOS, and...........nothing. Methinks the board is kaput, I might reach out to supermicro and see if they offer out of warranty rapairs for it, otherwise I guess its back to dual X5680's :/
 
The only other thing I could think happened is some IC on the board failed either from mechanical stress or the newer CPUs killed something on the board. I've seen both happen.

Back around the ROHS switchover, motherboard death was insane from BGA failures. You could literally push your finger on the board to lightly flex it, and that was enough to separate/crack the BGA on the PCH/chipset. CPU installation was more than enough to cause BGA failure from the crazy mounting pressure of the heatsink retainer. I learned to just leave the heatsink installed on motherboards, even when removing them from cases, and only change them when the CPU needed to be changed. BGA failure is still a problem today, but much less so because solders have improved somewhat.

Your installation of the newer CPUs could have caused mechanical failure on some chip. The CPUs themselves could have also caused some electrical failure if there was a problem with them.
 
The only other thing I could think happened is some IC on the board failed either from mechanical stress or the newer CPUs killed something on the board. I've seen both happen.

Back around the ROHS switchover, motherboard death was insane from BGA failures. You could literally push your finger on the board to lightly flex it, and that was enough to separate/crack the BGA on the PCH/chipset. CPU installation was more than enough to cause BGA failure from the crazy mounting pressure of the heatsink retainer. I learned to just leave the heatsink installed on motherboards, even when removing them from cases, and only change them when the CPU needed to be changed. BGA failure is still a problem today, but much less so because solders have improved somewhat.

Your installation of the newer CPUs could have caused mechanical failure on some chip. The CPUs themselves could have also caused some electrical failure if there was a problem with them.

Thats all very true, and supermicro specifically states they do not recommend removing the heatsink from the motherboard right in the manual. At this point the board isnt getting any deader, its about to take a vacation to my oven.
 
I don't recommend that, it'll just destroy the board and everything on it.

If you don't have a hot air station, you can use a heat gun on the low setting to try and reflow the chipset. You WILL need some sort of non-acid rosin based flux and flood under the chipset with it. before reflowing it. You'll also need to protect the rest of the board with a heat shield to prevent stuff from burning, like plastic parts and boiling capacitors. Several layers of aluminum foil works great. When using the heat gun, keep it around 4" away from the board and keep moving it in a circular motion to avoid hot spots. You can heat it up for 60 seconds and then slowly pull the heat gun away from the board and off completely, then let the board cool down until it reaches room temperature. Make sure to NOT move the board around while reflowing to avoid dislodging parts. Also be careful not to have too high an airflow so you don't blow parts off the board.

Just tossing it an oven at some random temperature will cook all of the electrolytic capacitors and make solder joints all over the board brittle and more failure prone. Plus you'll be offgassing residual PCB fluxes and other cleaners used during the manufacturing process, they're not something you want to be breathing in.
 
For the supermicro keys there are sites that will generate a key but prob against the rules but I think it's something like a Mac address or something with a conversion string. Against the rules to post but should be easy to find to at least get you to try new flash.
 
Yup, no difference at all, the little green LED starts flashing on the board, but nothing happens. You could honestly write a nursery rhyme about how careful I was too, I let it cool all the way down etc..

Oh well, it was a cool board for the 2 minutes I was able to spend in BIOS on it :D
 
Not sure what the outcome was here - I've had some shit luck with memory and a single bad stick onboard can cause no boot. Hopefully you can get to the bottom of it.
 
I wound up finding a sub $100 board on ebay which I purchased.......to nobodies surprise it arrived DOA. For now I put my old Dual 1366 setup back in (which is running beautifully). Once I get my refund for the DOA board im going to try and find another one. Good advice on the RAM though, I did try single stick with known good DIMMs, and all of the RAM gets detected in my old setup (I went from 144gb to 160 now) so maybe I will take one of the currently working DIMMs out and put it in my 2011 board again, ive got the 2011 board setup on the test bench so it isnt hard to try a few more things.
 
Thats low voltage on the CPUs the voltage range ive read is 0.65-1.30v its a small difference, but I think your board has power delivery issues.
 
The only other thing I could think happened is some IC on the board failed either from mechanical stress or the newer CPUs killed something on the board. I've seen both happen.

Back around the ROHS switchover, motherboard death was insane from BGA failures. You could literally push your finger on the board to lightly flex it, and that was enough to separate/crack the BGA on the PCH/chipset. CPU installation was more than enough to cause BGA failure from the crazy mounting pressure of the heatsink retainer. I learned to just leave the heatsink installed on motherboards, even when removing them from cases, and only change them when the CPU needed to be changed. BGA failure is still a problem today, but much less so because solders have improved somewhat.

Your installation of the newer CPUs could have caused mechanical failure on some chip. The CPUs themselves could have also caused some electrical failure if there was a problem with them.
I wonder if I crushed my BGA solder connections. I used, Intel XTX100H Extreme Tower Heatsink Gaming Cooler for LGA 1150, 1151... There are no springs
so when we tighten it down the engineered distance between the socket and the copper cooler we put heat sink paste on is exact. If the CPU is slightly higher, standing on the leggo's of the BGA solder connections, it should be only finger tight and not screwed down with a screw driver.

Previously they engineered springs into this fastening, so that if we screwed down tight with a screw driver, it still would be at the correct tension.

These Intel Heatsinks could be crushing down too tight.

In retrospect I wished I bought one with springs and a little more expensive with that additional hardware. I see that they have thumb screws. Just slightly tight is good enough.
 
Did you try running the board outside the case? What PSU are you using?

FWIW, the 8XDTH-6F is the king of the X8 boards. I have a couple of them.
 
Yes the board was ran outside the case, ive since been able to get the DOA board returned, and found a nice replacement for it (I still have the original one that I broke)

Everything is running great now that I didnt try to install it like an ogre :p

Untitled.png
 
I wonder if I crushed my BGA solder connections. I used, Intel XTX100H Extreme Tower Heatsink Gaming Cooler for LGA 1150, 1151... There are no springs
so when we tighten it down the engineered distance between the socket and the copper cooler we put heat sink paste on is exact. If the CPU is slightly higher, standing on the leggo's of the BGA solder connections, it should be only finger tight and not screwed down with a screw driver.

Previously they engineered springs into this fastening, so that if we screwed down tight with a screw driver, it still would be at the correct tension.

These Intel Heatsinks could be crushing down too tight.

In retrospect I wished I bought one with springs and a little more expensive with that additional hardware. I see that they have thumb screws. Just slightly tight is good enough.

Entirely possible the socket was damaged, but flexion of the board could have still been a contributing factor.

From experience trying to remove BGA sockets from motherboards, solder cracking is very possible. Whatever solder they use under the socket is different from anywhere else on the motherboard. It's some insanely hard, insanely high melting point solder that's super brittle. The last one I tried to remove, I was going at the board from both sides with hot air stations after preheating the board. Everything else fell off and the PCB started to burn and the damn socket was just sitting there doing absolutely nothing.
 
How about Plumbers expoxy to plug up the opening in the center where thos ceramic capacitors are. Then with the two part epoxy clay, put some around the LGA
socket and push underneath a little to make it tougher and give it a location to set the new LGA, BGA socket that you can get from Ebay, already balled.
Exactly the right temperature, such as 202 degrees C. with some bottom heat and after putting it in the over to get the whole motherboard to 100 degrees C.
There is a CPU socket tester you can order from China to test the connections. For LGA 1150, 1151 at least.
The BGA reflow station is available on Ebay for $500.00 and there are others for thousands with cameras and microscopes.
 
How about Plumbers expoxy to plug up the opening in the center where thos ceramic capacitors are. Then with the two part epoxy clay, put some around the LGA
socket and push underneath a little to make it tougher and give it a location to set the new LGA, BGA socket that you can get from Ebay, already balled.
Exactly the right temperature, such as 202 degrees C. with some bottom heat and after putting it in the over to get the whole motherboard to 100 degrees C.
There is a CPU socket tester you can order from China to test the connections. For LGA 1150, 1151 at least.
The BGA reflow station is available on Ebay for $500.00 and there are others for thousands with cameras and microscopes.

Very sure even a multi-thousand dollar BGA rework machine wouldn't have done anything on the board I tried to replace the socket on.

After the double hot air failed, I just chiseled the socket off the board since it was baked to death anyway. I got out my huge 150W matchless Hakko 551 and went at the remaining solder balls and it barely even smeared them around, and that iron gets so hot the tip glows red, or about 1100F. If 1100 degrees barely smears the solder, a BGA rework station wouldn't have done anything.

I have no idea how they even got the socket on the board to begin with. Welding it maybe?
 
Have you tried a manual bios recovery? I've had luck recovering supermicro boards after CPU swaps. There's a certain USB port on the back that can be used to recover the bios that loads on startup. I believe after it flashed I had to pull the CMOS battery, then it booted. The info should be on their support site.
 
Very sure even a multi-thousand dollar BGA rework machine wouldn't have done anything on the board I tried to replace the socket on.

After the double hot air failed, I just chiseled the socket off the board since it was baked to death anyway. I got out my huge 150W matchless Hakko 551 and went at the remaining solder balls and it barely even smeared them around, and that iron gets so hot the tip glows red, or about 1100F. If 1100 degrees barely smears the solder, a BGA rework station wouldn't have done anything.

I have no idea how they even got the socket on the board to begin with. Welding it maybe?
Intel XTX100H Extreme Tower Heatsink Gaming Cooler for LGA 1150, 1151... Is this the heatsink CPU cooler you used? It will work until you take it off later
and put it back on again. The damage is already done. The BGA solder connections were pressed down on too hard and cracked the connections. After you re install it will work until you shut down the system and re start it again. You will see DRAM LED's lit, and the diagnosis also may saw the bios is corrupted. It is not the bios.

Maybe a half pound of pressure is required pushing down on the CPU by the heat sink. Not more than that, and it shall be limited by springs as usual and no design change should have come through without informing the buyer that the springs are absent. A $50.00 heat sink by a reputable company, and who thought to not screw it down with a screw driver? The knurled handles show that the design engineer only wanted it to be finger tight.
 
Intel XTX100H Extreme Tower Heatsink Gaming Cooler for LGA 1150, 1151... Is this the heatsink CPU cooler you used? It will work until you take it off later

The board I was trying to repair had mangled socket pins from a CPU being dropped sideways into the socket, it didn't have cracked BGA joints. Bought it from another [H] member already damaged for cheap just to see if I could replace the socket.
 
How much can a PCB flex?


IPC 600 indicates 0.75% or less bow or twist for PCB assemblies with surface mount components. 0.75% is practically no flex at all and very difficult to achieve.

I have a CPU socket tester. Looks like no power to the CPU to have any LEDs lit. 95 watts. The foil traces are very small. How do they
carry any wattage? Could there be a foil trace that blew. Is there a fuse?
 

Attachments

  • P1040146.JPG
    P1040146.JPG
    335.1 KB · Views: 0
Last edited:
How effective is it?
A hairline crack on the 3.3v power feed supply lines? I cannot get a motherboard to run with the manufacturers suggestion of taking it out of the case supposing a
short to ground. How about an X ray machine finding a hairline crack or one of the soldered pads not connecting? So far my CPU tester from China, shows no power at all.

You have a defeatist attitude so you must have a little experience with this.
 
skimmed through there is talk of reflowing and removing cpu sockets, did any one fine out why the board doesn't post? or does it post with a ton of pressure on a chip or cpu?
i don't see any mention of a PCIe debugger to give the posting check codes.
this is why i sent my non posting openbox supermicro board and back and spent more on an AsRock Rack with post code deubbger onboard and found out it was the used CPUs i was buying and not the boards, as Epyc chips program a prom and burn an efuse to never boot in any other brand of server to checksum the bios against unauthorized modifcations

EDIT: oh there is a cpu socket tester?
 
skimmed through there is talk of reflowing and removing cpu sockets, did any one fine out why the board doesn't post? or does it post with a ton of pressure on a chip or cpu?
i don't see any mention of a PCIe debugger to give the posting check codes.
this is why i sent my non posting openbox supermicro board and back and spent more on an AsRock Rack with post code deubbger onboard and found out it was the used CPUs i was buying and not the boards, as Epyc chips program a prom and burn an efuse to never boot in any other brand of server to checksum the bios against unauthorized modifcations

EDIT: oh there is a cpu socket tester?
Intel XTX100H Extreme Tower Heatsink Gaming Cooler for LGA 1150, 1151... is what I used and without tension springs, it may have been over tighened down, either fracturing the balled connection imbedded in plastic or caused a hair line crack in the PCB in those fine foil trace lines. The CPU Socket Tester did not light up, so it looks like no power to the CPU. That Procomm motherboard was made and did not go into production to be built for H.P. and are still available. Handle with care, I think it was only
that.... Spring tension was what the previous CPU Heatsinks had, but not this one. We previously could not over tighten those thumb screws.
 
thats one thing i like about water cooling is no hanging weight. my buddy has this rare ass 775 board from asus and it only posts with heavy manual pressure on the socket or chipset. i've killed GPUs with larger aftermarket air coolers before too.
 
Back
Top