Thunderdolt
Gawd
- Joined
- Oct 23, 2018
- Messages
- 1,016
I've been having stability problems with a trio of different renderers. All three crash and blame memory errors in their logs. All three software makers told me to reach out to Nvidia because that usually means bad hardware. Nvidia says they agree that this is probably due to bad hardware, but they need me to figure out which GPU is at fault and that has to be done via manual hardware swaps.
Since the GPUs are set up in pairs for the cooling, that's where I started. I seemed to get some signal initially and was pretty happy while thinking that I was about to solve the problem. Turns out that was premature. So, now I'm at a loss. Any suggestions on what to try next?
Here is what I've tried so far. Note that at no point has an NVLink bridge been present.
Motherboard is Rampage VI Extreme, GPUs are Titan RTX, driver is the current Studio release (442.92 - though Nvidia tech support does not think this is driver-related).
Since the GPUs are set up in pairs for the cooling, that's where I started. I seemed to get some signal initially and was pretty happy while thinking that I was about to solve the problem. Turns out that was premature. So, now I'm at a loss. Any suggestions on what to try next?
Here is what I've tried so far. Note that at no point has an NVLink bridge been present.
- Configuration:
- Slot 1: GPU 1
- Slot 2: GPU 2
- Slot 3: GPU 3
- Slot 4: GPU 4
- Result: Render crash (illegal memory access)
- Configuration:
- Slot 1: GPU 1
- Slot 2: GPU 2
- Slot 3: Empty
- Slot 4: Empty
- Result: Render crash (illegal memory access)
- Configuration:
- Slot 1: GPU 3
- Slot 2: GPU 4
- Slot 3: Empty
- Slot 4: Empty
- Result: Render completes successfully (approx 4hrs)
- Configuration:
- Slot 1: GPU 1
- Slot 2: Empty
- Slot 3: Empty
- Slot 4: Empty
- Result: Render completes successfully (approx 8hrs)
- Configuration:
- Slot 1: GPU 2
- Slot 2: Empty
- Slot 3: Empty
- Slot 4: Empty
- Result: Render completes successfully (approx 8hrs)
Motherboard is Rampage VI Extreme, GPUs are Titan RTX, driver is the current Studio release (442.92 - though Nvidia tech support does not think this is driver-related).