Weird memory errors

Anthos

n00b
Joined
Mar 31, 2012
Messages
20
Hi all,

I have a Crucial ballistix 2x16GB 3600mhz C16 kit along with an asus dark hero and a 5950x. I've had the build since December. Due to some reasons I've ran recently TestMem5 (did so in the past for other reasons), and it came back with some errors. Kinda randomly. You ran it one time you get no errors, then again you get 7 then again you get 1 then again you get no errors etc. When I ran in stock settings (2667) I tend to get on average less errors. I've ran extensively memtest86+ and windows memory diagnostics and those never find any errors.
Now the thing is if I test every single memory module individually I get no errors at all with TestMem (tested each one 5+ times at DOCP/XMP settings).
I've upgraded to the latest bios, and I've also downgraded to an older bios at a time I used to get no errors. Tested in a fresh windows installation and the event viewer never really shows anything important. Does anyone have a clue?
 
It's possible though not likely that one of the memory slots itself is having an issue and causing the test your running to come back with bad memory. If you run the test with a module in a different slot each time you can probably narrow it down to which slot may be possibly bad.

Well I did think of that so when I was testing the modules individually I kept them in their regular slots (i.e my setup is A2/B2. Took out B2 and tested the module left on A2. No errors. Put the module on B2, took A2 out and tested again. No errors).
 
bump you ram voltage to 1.4-1.45v and try testing both again.(yes its safe)
 
bump you ram voltage to 1.4-1.45v and try testing both again.(yes its safe)

I did try them a few days ago at 1.4V and same situation. I was intending to get them a bit more and see if it makes a difference. Early this year though at 1.35V and everything was ok so why changed now? :/ Unless the few times I ran the tests by sheer chance just happened not to get errors? I don't know.

Btw how hot is ram ok to get? Because I know that with higher temps you get higher chance for errors. And when I ran them alone they get up to high 50s while both they get up to high 60s. However according to Crucial and Micron OP temp can be upto 70-95C so in that case then, shouldn't really be giving anything off.
 
I did try them a few days ago at 1.4V and same situation. I was intending to get them a bit more and see if it makes a difference. Early this year though at 1.35V and everything was ok so why changed now? :/ Unless the few times I ran the tests by sheer chance just happened not to get errors? I don't know.

Btw how hot is ram ok to get? Because I know that with higher temps you get higher chance for errors. And when I ran them alone they get up to high 50s while both they get up to high 60s. However according to Crucial and Micron OP temp can be upto 70-95C so in that case then, shouldn't really be giving anything off.
yeah going up to 1.45 is safe, so id try that. not sure why it would just start, one could be going bad..
temps is fine, like you said, its good till ~95ish. as long as there is a bit of air flow going past them theyll be fine.
 
Check dram termination resistance with Ryzen master and manually set it one notch higher and retest if that results in more errors try one notch lower than the original value and retest. Hth.
 
Is TestMem5 a reliable test? It looks like some random Russian program (not saying its bad...just wondering).
 
You still got errors when you dropped it down to 2666? But only when all 4 DIMMS were installed? Am I reading that right? That's weird. Is your BIOS up to date?
 
Is TestMem5 a reliable test? It looks like some random Russian program (not saying its bad...just wondering).

Some people swear by it. I tried it and I could pass it for hours with timings that would make HCI Memtest error out in 10 minutes. Could have been user error, it seems finicky to setup.
 
yeah going up to 1.45 is safe, so id try that. not sure why it would just start, one could be going bad..
temps is fine, like you said, its good till ~95ish. as long as there is a bit of air flow going past them theyll be fine.

Tried 1.425V, 1.45V, and 1.3V. The last one is the one that took the most to come up with an error. :/

Check dram termination resistance with Ryzen master and manually set it one notch higher and retest if that results in more errors try one notch lower than the original value and retest. Hth.

I assume that's the ProcODT value right?

Is TestMem5 a reliable test? It looks like some random Russian program (not saying its bad...just wondering).

Thought that myself a time or two but apparently lots of RAM overclockers use it often so I'd assume there must be a reason.

You still got errors when you dropped it down to 2666? But only when all 4 DIMMS were installed? Am I reading that right? That's weird. Is your BIOS up to date?

My ram is 2 DIMMS (installed on A2/B2). Yes on 2667 I get errors but takes longer for them to come up. I am on the latest bios but did downgrade to an older one I was using 6 months ago just to be on the safe side. No change. When I am running a single DIMM no matter the bios settings (DOCP/XMP, voltages etc) always are error free.
 
Tried 1.425V, 1.45V, and 1.3V. The last one is the one that took the most to come up with an error. :/

So it works fine when there is only 1 DIMM installed, and it works better (but still gives errors) with lower voltage and lower clocks. It sounds almost like your motherboard is struggling to supply the needed current to the DRAM when it's under full load, because anything you do to reduce the current draw is reducing the errors. Also, what kind of PSU are you running? Are all the 8-pin or 4-pin EPS connectors hooked up wherever there is a spot for one?
 
So it works fine when there is only 1 DIMM installed, and it works better (but still gives errors) with lower voltage and lower clocks. It sounds almost like your motherboard is struggling to supply the needed current to the DRAM when it's under full load, because anything you do to reduce the current draw is reducing the errors. Also, what kind of PSU are you running?

EVGA G3 750W

Edit: My next possible culprit if it's not actually the ram acting up would probably by the motherboard. I can't imagine the PSU being at fault struggling with a single DIMM but being fine in games etc. (card is 780TI for whatever is worth).
 
I would try to swap in different RAM and a different PSU just to see what happens, but I'd say the motherboard is your prime suspect.
 
I would try to swap in different RAM and a different PSU just to see what happens, but I'd say the motherboard is your prime suspect.

That would be ideal unfortunately I don't have that option (I live and work abroad with a small social circle and none of them have pcs). I'll most probably start by RMA'ing the memory first just in case for whatever reason the kit just doesn't playing along with each other anymore. The only reason I am not rushing to do so is that at the time I got my kit Crucial had began already switching them from double rank to single rank modules which means if it's not the RAM then mostly likely I got my self downgraded by most likely receiving back single rank DIMMs.

I tried to get into contact with Crucial today and the only way was to talk with those live agents. It went as tragic as I would have expected. (where I introduced the situation in the first post to him as I did here).

Him: Did you try testing MemTest86+?
Me: Yes, I already said so in the beginning.
Him: And you got errors right?
Me: Can you please re-read my original post? I already answered these things. We are going around in circles. I ll paste it again below.
Him: Ok. And did you try testing each module individually?
Me: *AAAAAHHHHHH-----*

I understand that doing these things is cheaper for them but for fucks sake, how many virgins do I have to sacrifice to be able to chat with an actual competent person that isn't reading things from a script on the other side of the world??

Sorry... I am pissed...
 
That would be ideal unfortunately I don't have that option (I live and work abroad with a small social circle and none of them have pcs). I'll most probably start by RMA'ing the memory first just in case for whatever reason the kit just doesn't playing along with each other anymore. The only reason I am not rushing to do so is that at the time I got my kit Crucial had began already switching them from double rank to single rank modules which means if it's not the RAM then mostly likely I got my self downgraded by most likely receiving back single rank DIMMs.

I tried to get into contact with Crucial today and the only way was to talk with those live agents. It went as tragic as I would have expected. (where I introduced the situation in the first post to him as I did here).

Him: Did you try testing MemTest86+?
Me: Yes, I already said so in the beginning.
Him: And you got errors right?
Me: Can you please re-read my original post? I already answered these things. We are going around in circles. I ll paste it again below.
Him: Ok. And did you try testing each module individually?
Me: *AAAAAHHHHHH-----*

I understand that doing these things is cheaper for them but for fucks sake, how many virgins do I have to sacrifice to be able to chat with an actual competent person that isn't reading things from a script on the other side of the world??

Sorry... I am pissed...

RMA'ing the RAM first is probably best. If it does end up being the board, ruling out faulty memory is an automatic first step before you get it replaced. I wouldn't worry too much about single rank vs dual rank. Dual rank is more bandwidth, single rank maybe gets higher clocks, but it's all margin of error stuff. You might find that the single rank sticks play better with your motherboard (even though for what a Dark Hero costs, it should play nice with everything and then do your laundry, walk your dog, and wash your car).
 
RMA'ing the RAM first is probably best. If it does end up being the board, ruling out faulty memory is an automatic first step before you get it replaced. I wouldn't worry too much about single rank vs dual rank. Dual rank is more bandwidth, single rank maybe gets higher clocks, but it's all margin of error stuff. You might find that the single rank sticks play better with your motherboard (even though for what a Dark Hero costs, it should play nice with everything and then do your laundry, walk your dog, and wash your car).

Yeah I know I agree. I did say to myself multiple times that stable ram > ranks. (even in this case where I don't really exhibit any crashes... That I know of..) Because lets be real. It's not like you will really notice any practical day to day differences. Unless you are chasing benchmark scores. Same with the CPU curves which after a few months just gave up and kept everything at 0 (which might beg the question why the hell did I get a dark hero then, heh, :p).

I 'll probably see how it goes over the near term, and take if from there I guess.
 
Yeap, tried that, no difference.
Maybe it’s the ram temps, I know for Samsung b-die it’s rated for up to 100C but if you are pushing frequency and tight timings anything over 50C and it’ll start producing errors, try putting a fan directly over the memory and run the ram test again
 
Just a quick update.

I contacted Crucial for advice and they just pretty much gave me straight up an RMA number. I got the first module a week ago, and they sent today the second one.
They seem to be manufactured 1,5 years apart and one of them has a Reference Raw Card: B2 while the other one B0. Don't know if that's of any importance. Otherwise they are both micron E-die (which is surprising because the new module was built in this month and as far as I was aware micron had switched these kits to a different single rank die, apparently not).
I ran TestMem5 for close to 2 hours and no errors came up. In the past I used to get them from a few minutes up to 15-20 minutes for the first to pop in. I'll keep my eye if any of this changes in the future because I am pretty sure my original one also was working fine at first.

Aside from that the only thing I've noticed is that now that I added the second module thaiphoon burner seems to be acting up. With the previous modules it would always read the info just fine, and the same with this new single module. Now that I've added the second one lots of times it either comes up with crc errors or just lists everything as undefined. If I keep pressing on reading multiple times eventually it seems to read the info just fine? (but it does seem to struggle a bit more reading the newer module rather than the older one). Anyone can make any sense of this?

EDIT: I booted into my secondary fresh windows installation and there thaiphoon works normally so I suppose it's just clashing with an RGB software or something of the sort
 
Last edited:
Back
Top