Any Vega 64 Graphics Card Circuitry Guru's out there?

krushin1

n00b
Joined
Dec 29, 2021
Messages
15
Hi, New to the forum here. Hoping there's a graphics card repair expert out there that can point me in the right direction. I have a Vega 64 card that does not boot, and fan does not spin. After watching several youtube videos to gain an understanding of how the cards work I was hopeful I could revive it, but I'm stuck.

There doesn't seem to be a power short that I can find. When powered up the 12 phases are all getting 900mV, HBM is getting 1.2V, 5V rails are good on the front and back, PCIe rail has 800mV, Memory controller has 900mV. All the reading I took are in the attached image. The card gets hot, but fan does not spin, and no display out. Does anyone have any suggestions where I would look to continue troubleshooting?

Thanks!
 

Attachments

  • gfx crd.jpg
    gfx crd.jpg
    360.3 KB · Views: 77
Hi, New to the forum here. Hoping there's a graphics card repair expert out there that can point me in the right direction. I have a Vega 64 card that does not boot, and fan does not spin. After watching several youtube videos to gain an understanding of how the cards work I was hopeful I could revive it, but I'm stuck.

There doesn't seem to be a power short that I can find. When powered up the 12 phases are all getting 900mV, HBM is getting 1.2V, 5V rails are good on the front and back, PCIe rail has 800mV, Memory controller has 900mV. All the reading I took are in the attached image. The card gets hot, but fan does not spin, and no display out. Does anyone have any suggestions where I would look to continue troubleshooting?

Thanks!

There is a forum member named RazorWind who had done a bunch of these “graphics card necromancy” threads where he worked on fixing problems for dead video cards. Might be worth sending him a DM.
 
Well, the answer was in my picture above :whistle:

The resistance on the HBM memory inductor on the bottom should be around 35-40 ohms, I verified this with a good card. With my old leads it was reading 0.8 (which is still too low and should have clued me in). But with my new super pointy test leads it actually reads 0.0 and is shorted. I'm not sure if this issue could possibly be the mosfet or if it's indicative the memory on the die is bad. I'm trying to determine what else I can do. There is 1.2V present when the card is powered up, not sure if that exonerates the mosfet. If anyone knows the answer to that please chime in! :)
 
Last edited:
Well... I removed the inductor and the MOSFET for the memory phase. From what I understand the left portion of the inductor pad to ground should read the memory resistance. I'm still getting 0 ohms. I'm thinking that means the HBM memory portion of the die is shorted. :(

If I'm wrong someone more educted please enlighten me. Otherwise it looks like I've got a dead duck. :dead:
 

Attachments

  • hbm.jpg
    hbm.jpg
    412.7 KB · Views: 31
I couldn't for the life of me figure out how to send a private message. Hopefully quoting your name sends you a notification lol... RazorWind any chance you can confirm my findings? Not sure if you're familiar with the Vega style cards?
 
Keep in mind that on the output side of a buck converter, there are usually a whole bunch of components in parallel. One of them is the memory logic, but you also have stuff like the noise filtering and bulk capacitors, current sense resistors, sometimes the control IC itself, and so forth. A short to ground at any one of those components could cause this, although it's very strange indeed that you have 1.2V on that rail, even with a 0.0 ohm short to ground.

Are you sure that, with the inductor removed, you actually have 0.0 ohms to ground on the pad on the left?

I've not had one of these cards in my hands, but my understanding is that interconnect between the GPU and HBM ICs is extremely fragile, and caused a lot of them to fail. I'd bet on that being the cause of your problem, but without more thoroughly hunting for a short, it's hard to say for sure. You also can't rule out the possibility of damage to the PCB itself. Sometimes, the layer that isolates one of the copper layers from the others fails, creating a short internal to the board that you can't detect using the techniques most DIYers have available.
 
RazorWind Thanks for the quick reply!

Yes, when check the left pad to ground I either get 0.2 ohms or 0.0 ohms, which seems to indicated it's shorted. The 1.2V puzzles me, because you would think it would blow something up being powered up with a short like that (though I suppose it may have blown up the HBM modules). I wouldn't expect the 1.2V to be present at any rate.

I've checked through many of the capacitors and resistors in the area on both sides of the PCB looking for shorts but haven't located any. It's difficult to determine every component that connect to the area though without completely stripping the card. I wonder if I did pull all components off if that left pad solely connects to the memory module? If it's dead anyways and I have no other direction I suppose it wouldn't hurt.
 
Try injecting 1.2V and some sane number of amps into that pad and see what gets hot.
 
  • Like
Reactions: travm
like this
So, I hooked 1.2V up to the pad... My power supply is geared to drive a short up to 30A so initially I set a current limit, but it maxxed out at a little over 2 amps, so it's not just sinking endless current or anything, which maybe explains why the 1.2V wasn't unhappy about being there. I gave both sides of the card several shots of the duster spray to set some frost on it and hooked up the power. None of the capacitors or IC's that I could find were heating up on either side. The only thing that dissipated the frost that I could find was the bottom portion of the die with the two memory modules. :X3: They didn't get hot to the touch with this voltage applied or anything though.

With 1.2V applied I did find that the chips underneath the GPU in the PLL area have varying voltages on them, mostly a little under 1 volt. Since it was still powered with the inductor removed, that circuitry must be linked either in parallel or in series with the memory modules. Worth noting I guess but not entirely helpful since they don't seem to be heating up.

Based on my youtube'ing, I haven't seen the "pros" do anything other than remove the inductor on the front of the card when troubleshooting the VRM in this way, but there are the pair of black components with 470 73PPZ on them on the rear of the area where the inductor is mounted. I believe they may be diodes but not sure. Not sure if those need to be removed as well or if removing the inductor is enough. I do get a reading of 0.2 ohms across them when I test the two for the memory. The ones attached on the other inductors all read 0.0... but from what I understand the actual resistance of the core requires a MilliOhm meter to read.
 

Attachments

  • 1.jpg
    1.jpg
    498.7 KB · Views: 18
  • 2.jpg
    2.jpg
    486.7 KB · Views: 19
  • 3.jpg
    3.jpg
    465.7 KB · Views: 35
Last edited:
Based on my youtube'ing, I haven't seen the "pros" do anything other than remove the inductor on the front of the card when troubleshooting the VRM in this way, but there are the pair of black components with 470 73PPZ on them on the rear of the area where the inductor is mounted. I believe they may be diodes but not sure. Not sure if those need to be removed as well or if removing the inductor is enough. I do get a reading of 0.2 ohms across them when I test the two for the memory. The ones attached on the other inductors all read 0.0... but from what I understand the actual resistance of the core requires a MilliOhm meter to read.
The black components you speak of are capacitors, which are installed in parallel with the memory logic. They're used to store energy for the memory so that the voltage doesn't drop when the card goes from idle to working, before the VRM can respond. It's possible that you have a short through one or both of them, but pretty unlikely. They're easy to remove and install though, so you can pretty easily check that by just removing them and checking the resistance again.

Chances are, you either have a cracked ceramic cap that you just haven't found yet, or a failure in the memory interposer.
 
I removed the black capacitors and they tested around 450 uF, so they must be fine. The 4 to the left of them tested 23uF, so must be ok as well. I still get 0.2 ohms across the pads where they used to be so they're still hooked up through some other components. Across the pads on the front where the inductor sat show a 10Megaohm resistance.

Just randomly probing from that pad, it has connections to the little driver chip left of it, and also shows connections to the 1.8V, 5V, 900mV... pretty much all of the voltage stages over on the far side of the board. I'm not really sure how I can verify that I'm probing only from that pad to the memory module. I'm thinking maybe pull the little driver chip, if that fails maybe pull the inductors off all the power rails... Beyond that seems to be sheer guessing, at that point I'm feeling like I'm going to have a big container of spare chips :woot:

I went through a whole can of duster hoping to find some component that was heating up, but if there is one I didn't spot it (though I certainly could have missed a small cap somewhere)
 
When you say "Just randomly probing from that" pad, are you referring to the switch node (the one one right), or the negative side of the inductor (the one one the left)?

You have to keep in mind that the card is one huge complex circuit. You can't isolate any one pad from everything else completely. You should have nearly infinite resistance to most of the rest of the board from the switch node, particularly with the FET package removed, but the other pad will show some level of connection to most of the other circuits on the board because it legitimately is connected to them.

You can try a thermal camera in lieu of freeze spray and see if that produces better results. Isopropanol is also a good indicator, but not really better than freeze spray. Still, I think there's a pretty good chance that your problem is the interposer (or some other part of the GPU package). That's pretty common these, as I understand it.
 
I was referring to the pad on the left. The switch node on the right does show high resistance to most of the board components even with the FET package still installed.

I am supposed to be injecting 1.2V into the pad on the left (on the front of the board with the inductor removed), correct? If that's not correct, I was doing it wrong. I'm assuming since the HBM was heating up I was doing it right.

Unfortunately, I don't have a thermal camera. With isopropyl are you just looking for evaporation? I would imagine freeze spray would be more visible.
 
Interestingly, if I probe between the left pad and a some of the tiny little resistors that are placed all around the GPU die, I get 37 ohms. Which was the value I got from the memory inductor to ground on the good card.

Probing from those same resistors to ground shows about 31 ohms. All of the resistors around the die show either 31 ohms or 0.2 ohms. I would guess I'm getting 0.2 on the resistors which connect to the GPU core and 31 ohms on the ones that connect to the HBM memory.

I still have some faith there may be a fixable problem here. Whether or not I can find it remains to be seen lol.
 
Well, I borrowed a FLIR camera. I gave the board a pretty thorough going over, I also checked up close with a macro lens but I was unable to find any areas of the board that were heating up other than the memory and area underneath it on the PCB. I'm thinking this one's a parts card. The camera's pretty cool though! Just can be difficult to differentiate between shiny objects and objects that are heated. Too many reflective parts on these cards lol :)
 

Attachments

  • IMG_0002.JPG
    IMG_0002.JPG
    37.2 KB · Views: 10
  • IMG_0001.JPG
    IMG_0001.JPG
    43.5 KB · Views: 10
  • Like
Reactions: travm
like this
If you haven't already, maybe give this video a shot:

Goes through with close-ups on various things to check. I need to buy some equipment and do this myself at some point. Go great "deals" on dead vega 56 GPUs so now I have a pile to work on.
 
Andrew_Carr Thanks, yeah I think I watched about 2 dozen videos on Vegas and 1080 and RX style cards. The 'Tech Cemetary' tutorials are very good as well. I did learn alot, but I didn't see any cards that had exactly the same problem as this one. Pretty much all of the cards (at least that someone made a youtube video for) were missing the voltage on one of the rails or missing the voltage on the Vcore or Memory. Pretty much all the episodes like that are due to a burnt up IC or a blown capacitor somewhere. This one has all the voltages present, just a low resistance across the memory.

I am curious if it is a memory short, if reballing the GPU would fix it? I could probably be tempted to pick up one of those $100 BGA rework stations on ebay if it stands a chance.

I'm also curious what the memory might read with a MilliOhm meter. I can't imagine that it's completely shorted since it doesn't blow out the 1.2V rail. My meter reads 0.2 ohms on the memory rail, which is the same reading I get when I check any of the inductors hooked to the core. But from what I've seen in those videos, 0.2 isn't unusal for a core reading, but is way too low for a memory reading.
 
Last edited:
Andrew_Carr Thanks, yeah I think I watched about 2 dozen videos on Vegas and 1080 and RX style cards. The 'Tech Cemetary' tutorials are very good as well. I did learn alot, but I didn't see any cards that had exactly the same problem as this one. Pretty much all of the cards (at least that someone made a youtube video for) were missing the voltage on one of the rails or missing the voltage on the Vcore or Memory. Pretty much all the episodes like that are due to a burnt up IC or a blown capacitor somewhere. This one has all the voltages present, just a low resistance across the memory.

I am curious if it is a memory short, if reballing the GPU would fix it? I could probably be tempted to pick up one of those $100 BGA rework stations on ebay if it stands a chance.

I'm also curious what the memory might read with a MilliOhm meter. I can't imagine that it's completely shorted since it doesn't blow out the 1.2V rail. My meter reads 0.2 ohms on the memory rail, which is the same reading I get when I check any of the inductors hooked to the core. But from what I've seen in those videos, 0.2 isn't unusal for a core reading, but is way too low for a memory reading.
Have you thrown the GPU into the bin or continued fighting? I am think I am, if not close, in the same situation. I would love to get an update from you and I can measure on my two cards for you! Something shorts the caps on the backside of the GPU chip, but not only do I know if it is one of the tens of caps or if it is simply some completely other component...

EDIT: I do have 40 ohm on the memory inductor and the card fires up. But unstable, and crashes during load. All rails remain on, but VMemory is at 250mV for some reason
 
It's still sitting on my table, I got distracted with other things for the last few months and didn't think it was worth sending off for repair work. I'm thinking about just selling it for parts at this point. I did recently have an odd situation with a 5700XT (bought new, RMA'd twice) that wouldn't work at all in linux but works fine in windows for gaming, so I might try switch computers too at some point. Just need to setup more test rigs.
 
It's still sitting on my table, I got distracted with other things for the last few months and didn't think it was worth sending off for repair work. I'm thinking about just selling it for parts at this point. I did recently have an odd situation with a 5700XT (bought new, RMA'd twice) that wouldn't work at all in linux but works fine in windows for gaming, so I might try switch computers too at some point. Just need to setup more test rigs.

Alright. Well, did you find any of the shorts on the 470uF capacitors?

The black components you speak of are capacitors, which are installed in parallel with the memory logic. They're used to store energy for the memory so that the voltage doesn't drop when the card goes from idle to working, before the VRM can respond. It's possible that you have a short through one or both of them, but pretty unlikely. They're easy to remove and install though, so you can pretty easily check that by just removing them and checking the resistance again.

Chances are, you either have a cracked ceramic cap that you just haven't found yet, or a failure in the memory interposer.
I believe have have the same issue as this since all (except the last two, for the memory I think?) are shorted to ground on both poles. There are 14 black caps like this and like 20-30 smaller yellow caps on the backside of the GPU chip with the same symptom. Do I simply have to remove one after one or could the short be somewhere else? The card is "working", just crashing during load wihtout artifacting. Temps and voltages are good. I have embedded and electrical knowledgde but the PCB is quite complex, especially for someone new to this (GPU diagnostics)

EDIT: Forgot to ask, what component is the memory interposer?
 
And now I am totaly confused since the markings on the PCB are: C533, where C should be capacitor.Tantalum capacitors?
 
I accidentaly found this thread, clarifying those are not capacitors but rectifiers. Thought? They are still shorted!
https://www.techpowerup.com/forums/threads/chips-behind-gpu.232540/
Definitely capacitors. Either tantalum or aluminum polymer. Hard to say without a closer look at the markings.

The memory interposer is part of the GPU package. On this board, the GPU and memory are all manufactured together in one big package with an extra piece of silicon in there called the interposer. The interposer is extremely fragile, and is quite prone to failure on these early HBM cards, usually resulting in symptoms similar to what you're seeing. When it does fail, the only way to repair the board is to replace the entire GPU package, memory, interposer, and all. If I had to guess, based on the symptoms you've reported, I would say that's what you're looking at here.

How are you measuring the 250mv on the memory rail? That's low enough, it shouldn't even work at all.
 
Have you thrown the GPU into the bin or continued fighting? I am think I am, if not close, in the same situation. I would love to get an update from you and I can measure on my two cards for you! Something shorts the caps on the backside of the GPU chip, but not only do I know if it is one of the tens of caps or if it is simply some completely other component...

EDIT: I do have 40 ohm on the memory inductor and the card fires up. But unstable, and crashes during load. All rails remain on, but VMemory is at 250mV for some reason
The card is still sitting on the shelf, but I'm thinking it's likely the interposer. Your issue sounds like it's more likely to be fixable if the card fires up. Keep in mind if you do think you've found a bad capacitor, you can't really check them with a multimeter without removing them from the board.
 
Since I have an identical card, I checked the values on that one and same result - caps on the backside have continuity through them?
Strange, but it working flawlessly. I accidently fried the IR35217-controller though, so I have to change that first. May be waste of time and money, but I am not planning on making money here, more fun to just try and getting it working again! :)
Definitely capacitors. Either tantalum or aluminum polymer. Hard to say without a closer look at the markings.

The memory interposer is part of the GPU package. On this board, the GPU and memory are all manufactured together in one big package with an extra piece of silicon in there called the interposer. The interposer is extremely fragile, and is quite prone to failure on these early HBM cards, usually resulting in symptoms similar to what you're seeing. When it does fail, the only way to repair the board is to replace the entire GPU package, memory, interposer, and all. If I had to guess, based on the symptoms you've reported, I would say that's what you're looking at here.

How are you measuring the 250mv on the memory rail? That's low enough, it shouldn't even work at all.
When I fire up the card, all voltages are correct (memory around 1.2V if I recall correctly). Once it has crashed, I measure on the inductor to the HBM and it read 250mV. The GPU VRM inductors read 0mV, all other rails output the correct voltage. Since the card CAN be stresstested, I feel like it may be saveable. I havent heard about any interposer before so I do not know the state of that component.
One thing which seem a bit off is that the 5V output on the LDR on the backside read approx. 2k ohm while the front one read 3.5k ohm which is the go to value. So it may be some problem there but I can not locate anything. Any help appriciated!
 
Since I have an identical card, I checked the values on that one and same result - caps on the backside have continuity through them?
Strange, but it working flawlessly. I accidently fried the IR35217-controller though, so I have to change that first. May be waste of time and money, but I am not planning on making money here, more fun to just try and getting it working again! :)

When I fire up the card, all voltages are correct (memory around 1.2V if I recall correctly). Once it has crashed, I measure on the inductor to the HBM and it read 250mV. The GPU VRM inductors read 0mV, all other rails output the correct voltage. Since the card CAN be stresstested, I feel like it may be saveable. I havent heard about any interposer before so I do not know the state of that component.
One thing which seem a bit off is that the 5V output on the LDR on the backside read approx. 2k ohm while the front one read 3.5k ohm which is the go to value. So it may be some problem there but I can not locate anything. Any help appriciated!
Yes I do know about legs/contacts have to be soldered off to be testet correctly but straight up continuity seems strange to me.
 
Since I have an identical card, I checked the values on that one and same result - caps on the backside have continuity through them?
Strange, but it working flawlessly. I accidently fried the IR35217-controller though, so I have to change that first. May be waste of time and money, but I am not planning on making money here, more fun to just try and getting it working again! :)

When I fire up the card, all voltages are correct (memory around 1.2V if I recall correctly). Once it has crashed, I measure on the inductor to the HBM and it read 250mV. The GPU VRM inductors read 0mV, all other rails output the correct voltage. Since the card CAN be stresstested, I feel like it may be saveable. I havent heard about any interposer before so I do not know the state of that component.
One thing which seem a bit off is that the 5V output on the LDR on the backside read approx. 2k ohm while the front one read 3.5k ohm which is the go to value. So it may be some problem there but I can not locate anything. Any help appriciated!
When you say they have "continuity" how are you testing that? Are you using the "short detection"/"continuity"/"beep" mode on a handheld DMM?

You can't do that. The static resistance through the GPU core is probably less than one ohm, and when you measure those caps, that's what you're measuring. The threshold to trigger a "beep" in the short detection mode on most meters is like thirty ohms, so it will tell you that you have a short across that capacitor, but it's actually perfectly normal, and the resistance is just really low. Don't ever use beep mode when troubleshooting a big meaty logic circuit like this.

You said the card can be "stress tested." What exactly do you mean? You mean it's detected as a Vega 64, the driver installs properly, and it can render 3D graphics for a while, but then it just randomly crashes with a black screen?
 
When you say they have "continuity" how are you testing that? Are you using the "short detection"/"continuity"/"beep" mode on a handheld DMM?

You can't do that. The static resistance through the GPU core is probably less than one ohm, and when you measure those caps, that's what you're measuring. The threshold to trigger a "beep" in the short detection mode on most meters is like thirty ohms, so it will tell you that you have a short across that capacitor, but it's actually perfectly normal, and the resistance is just really low. Don't ever use beep mode when troubleshooting a big meaty logic circuit like this.

You said the card can be "stress tested." What exactly do you mean? You mean it's detected as a Vega 64, the driver installs properly, and it can render 3D graphics for a while, but then it just randomly crashes with a black screen?
Resistance mode over the cap is under 0.5homs, dont have a milliohm meter.

Furmark could be ran for 10 minutes, had to stop due to bad fan cooling solution for testing purposes at 230+W. At the same time, it COULD crash at Windows initialising during the first couple of seconds without really doing anything. Something that always crashed the card was opening God of War 4 a few seconds in, a very demanding game when it comes to GPU power. I also managed to play LoL for some minutes on low settings but as soon as I changed them to very high, it crashed. But looking at Youtube, surfing other webs, using Discord or any other thing you might do works great, can be used for hours without crashing as long I dont open any texture heavy process.
The card is detected, drivers work good, no bad BIOS, running at 16x 3.0, all memory detected (has never artifacted in any way), no code 43, temps are good and the clocks were good (as long as it did not crash). The is nothing really indicating something is wrong. Once again, I am confident this is no software fault but I am happy to try those things as well once the new PWM-controller is soldered on.

Since Furmark does not increase VRAM usage that much, I think it may be related to the memory or the VRM for it in some way.
 
That 0.5 ohms is normal. To detect an actual short on a big logic circuit like that, you really need a milliohm scale meter.

Most VRMs have a mechanism where they can monitor their current output, and shut down if they exceed a known-safe value. Many also have a thermal shutdown feature, where they can shut down if certain components exceed a certain temperature. If the card works under light loads, but crashes under heavy ones, it's possible that one of these features is getting tripped, and causing a cascade of VRM shutdowns. First thing to do is probably figure out which one is shutting down. Usually, the memory VRM has to be running for the core VRM to run, so if they're both off (which they probably are, if you have less than .7V on either one), then you can assume that the problem is on the memory power rail and troubleshoot that.

Time to get datasheets for the controllers and start reading.
 
That 0.5 ohms is normal. To detect an actual short on a big logic circuit like that, you really need a milliohm scale meter.

Most VRMs have a mechanism where they can monitor their current output, and shut down if they exceed a known-safe value. Many also have a thermal shutdown feature, where they can shut down if certain components exceed a certain temperature. If the card works under light loads, but crashes under heavy ones, it's possible that one of these features is getting tripped, and causing a cascade of VRM shutdowns. First thing to do is probably figure out which one is shutting down. Usually, the memory VRM has to be running for the core VRM to run, so if they're both off (which they probably are, if you have less than .7V on either one), then you can assume that the problem is on the memory power rail and troubleshoot that.

Time to get datasheets for the controllers and start reading.
I have ordered the IR35217, which got accidentally fried, but also the 4C86N which is the controller for the memory rail if I managed to figure that out correctly.
To make it clear, VCore is around 0.9v during startup, 0.7-0.8V in Windows but around 0.0-0.1V when in "crashed state".
 
How do you know you "fried" the IR35217? If you did, the card probably wouldn't work at all, although I guess it's possible it might behave like this if you just damaged it. I can't find a datasheet for it immediately, but some of the nicer controllers are able to split their phase outputs into groups, and power two or more separate power rails. In the case of many of the Infineon controllers that AMD likes to use, such as the IR3567B that was used on the 200-500 series, there's a firmware that gets programmed onto it, such that I don't think can just replace it with a new one. I'd be really surprised if the 35217 is much different.

Also, a 4C86N is not a control IC, but rather a dual N-channel mosfet package. Looking at a photo of the board, it appears that this is the FET package for a single phase buck converter, I assume the memory power, that probably gets its gets its gate signals from the large QFN controller on the back of the board.

If the card works even kind of normally, I doubt either of those components is actually damaged, but it's possible. More likely is that you have some corrosion or something on a sense resistor that's throwing off the current readings. What does it say in GPU-Z is when the card is under a light load?

Again, you need datasheets for these components, and you need to read and understand them, and then actually check against what you have on the board. Don't just go replacing stuff - especially a big QFN package, which is likely to be a real pain in the ass to solder correctly. I know Louis Rossmann makes it look like this is mostly firing up the hot air station and yolo'ing a whole tube of flux on there, but 90% of this process is actually diagnostics.

Be very careful working on this board - it sounds like you have a good candidate for repair, but I doubt the interposer silicon will react well to being heated and cooled repeatedly. You need to figure out what's actually wrong, and fix it in as few heat cycles as you can. Diagnose.

Edit: Here's a datasheet for the 4C86N to get you started. Start by figuring out where its control signals come from, and when the card is in the crashed state, whether or not it's getting gate drive (probably not).
 
How do you know you "fried" the IR35217? If you did, the card probably wouldn't work at all, although I guess it's possible it might behave like this if you just damaged it. I can't find a datasheet for it immediately, but some of the nicer controllers are able to split their phase outputs into groups, and power two or more separate power rails. In the case of many of the Infineon controllers that AMD likes to use, such as the IR3567B that was used on the 200-500 series, there's a firmware that gets programmed onto it, such that I don't think can just replace it with a new one. I'd be really surprised if the 35217 is much different.

Also, a 4C86N is not a control IC, but rather a dual N-channel mosfet package. Looking at a photo of the board, it appears that this is the FET package for a single phase buck converter, I assume the memory power, that probably gets its gets its gate signals from the large QFN controller on the back of the board.

If the card works even kind of normally, I doubt either of those components is actually damaged, but it's possible. More likely is that you have some corrosion or something on a sense resistor that's throwing off the current readings. What does it say in GPU-Z is when the card is under a light load?

Again, you need datasheets for these components, and you need to read and understand them, and then actually check against what you have on the board. Don't just go replacing stuff - especially a big QFN package, which is likely to be a real pain in the ass to solder correctly. I know Louis Rossmann makes it look like this is mostly firing up the hot air station and yolo'ing a whole tube of flux on there, but 90% of this process is actually diagnostics.

Be very careful working on this board - it sounds like you have a good candidate for repair, but I doubt the interposer silicon will react well to being heated and cooled repeatedly. You need to figure out what's actually wrong, and fix it in as few heat cycles as you can. Diagnose.

Edit: Here's a datasheet for the 4C86N to get you started. Start by figuring out where its control signals come from, and when the card is in the crashed state, whether or not it's getting gate drive (probably not).
The ICs (4C86N FET and IR35217) have now been removed since the FET had some physical markings on it and the phase controller because I accidentaly fried it by contacting positive voltage to ground, making it give off some, not so nice, smoke. Not in a dramatic way though, do not think I have damaged anything else.
The FET seemed to be in working order and the contacts seem fine with good readings. I can not find anything really wrong with the card. When you mention that it may be corrosion on some sense resistor, it is possible since some wierd temperature spikes occured during Furmark up from 30c to 70c back to 30c in one frame. Happened three times but it may have been interference between Furmark and GPU-Z running at the same time.

There is however definitely one thing that is way off which I mentioned but I noticed it is behaving rather strange (maybe not, just havent seen it before). The front 5V rail LDR reads 3.5kohm, as it should. But the backside one reads something around 1.5kohm when first probing it but increases slowly to 2.3kohm as if some capacitor is charging up (which it does not on the frontside). But if I release the probe, and instantaniously probe it again, the multimeter reads around 6kohm, and drops rather fast down to specifically 3.5kohm (around 5 seconds), then it stutters and goes down to 1.5kohm, and starts rising again up to 2.3kohm. Definitely capacitors causing this, but I do not understand why, how and where the capacitors are. Any suggestions?
 
Update: IR35217 phase controller and 4C86N memory FET replaced.
Afterwards, I was missing VDDCI (memory controller voltage) and I found a dead 1 ohm resistor which supplies 12v to Vcc on the corresponding circuit to the IC on the backside (not sure exactly what it does since there is one on the front and one on the back).
This made the voltage come back to 0.900V, but, only during the first couple of seconds until a fault is detected.VCore and VMem is always at 0V. I am not sure what should come after the memory controller that could trigger a fault detection, is it the PWM controller itself? I can not find a data sheet explicitly for that IC.
 
Update 2:
All rails are present; 0.9V on the VCore and 1.2V on the VMem. The red LED is turned on, indicating that the GPU is active.
However, no output is present. Does this suggest a dead core or are there some things to check before that?
 
Back
Top