Author Topic: Threadripper machine not stable after few hours of rendering  (Read 7870 times)

2019-05-30, 11:46:32

royvaes

  • Active Users
  • **
  • Posts: 22
    • View Profile
    • VAES visuals
Hi Guys,

First of all, I would like to say that I'm not sure this problem is caused by corona but hopefully someone can give me some insight. I have this machine for 5 months now and it worked perfect up until now.


A week ago my pc kept shutting down after a few hours of rendering. the pc was completely shut down and was not going back on when I pressed the power button, I needed to flick the main switch of the power supply to bring it back to life.

The guys from the pc shop who build the system thought it was caused by a faulty power supply, so we switched it to a new one - the result was the same problem occurring.

So the next step was a hardware test, the result was a faulty first gen RTX 2080ti, so we replaced it with a new one and tested rendering again overnight. Now the pc did not shut down anymore but after a few hours, the fan speed went to 100%, screens to black and not responsive at all. Again I needed to reboot with the main switch of the power supply.

After that, we replaced the CPU with a new one and all the RAM memory banks.

So basically we replaced almost the entire machine except for the motherboard, we did update the BIOS though to the latest version.

Hopefully, there is someone on this forum who has some experience with this kind of problems.


System specs:
Windows 10
CPU: AMD Ryzen Threadripper 2990WX
GPU: MSI GeForce RTX 2080Ti GAMING X TRIO
Motherboard: AMD MSI X399 SLI PLUS
RAM: Corsair DDR4 Vengeance LPX 16GB 3000 C15 (128GB total)
Cooler: Noctua CPU Cooler NH-U14S TR4-SP3
Power supply: Corsair PSU Professional Platinum AX1200i DSP

2019-05-30, 12:21:37
Reply #1

agentdark45

  • Active Users
  • **
  • Posts: 577
    • View Profile
I'm almost 100% certain this is due to the VRM's on the motherboard overheating/drawing too much power, tripping the auto shutdown safety systems of the board.

The 2990wx draws ridiculous amount of power and most x399 motherboards are unequipped for it. In short you're looking at either the MSI MEG Creation, or the Asus ROG Zenith Extreme Alpha to fix this.
Vray who?

2019-05-30, 12:22:17
Reply #2

Juraj

  • Active Users
  • **
  • Posts: 4763
    • View Profile
    • studio website
Jesus that's solid horror story, sorry to hear that :- (.

Anyway...few ramblings:

Do you monitor temperatures of CPU &VRM ? MSI x399 SLI Plus is not the best board for 2990WX but it can work. But 8x8 memory modules @ 3000MHz + 2990WX put a big stress on the power cascade and can overwhelm this board easily.
Stress test your setup with Prime95(AVX load) and full Interactive Rendering (later 'IR') for at least 20 minutes. Write down temperature spikes/maxims.

Recent Corona versions introduced nVidia Optix denoiser to be turned on by default for IR, so during IR, both your CPU&GPU will kick on at same time, producing a lot of heat into the case (because your GPU is non-blower style) that the airflow might now struggle with and VRM can potentially overheat and shut down the PC to protect the motherboard. This is only one of very wild guesses.

If I'll think of more I'll write it down. Any chance you can return the board and buy one of the 2990WX trifecta ? (MSI MEG Creation, ASUS Rampage ROG "ALPHA" (not just Zenit), GIGABYTE Aorus Xtreme)
Please follow my new Instagram for latest projects, tips&tricks, short video tutorials and free models
Behance  Probably best updated portfolio of my work
lysfaere.com Please check the new stuff!

2019-05-30, 13:26:11
Reply #3

rowmanns

  • Corona Team
  • Active Users
  • ****
  • Posts: 1892
  • Corona for 3ds Max QA Team
    • View Profile
Hey,

Yeah this is a really difficult one to diagnose.

It sounds like it could be few different things;
- There is some thermal protection kicking in and shutting the system down to protect it.
- The PSU doesn't have enough power to run the system under very high stress levels
- There could be some issue with the motherboards compatibility
- Something else that I didn't think of yet...

What I'd suggest is the following:
- Make sure all your drivers are up to date and the latest version
- Double check that there isn't some Windows sleep/hibernate setting enabled
- Run a Prime95 stress test on the system
- Measure system temperatures during the stress test and note them using something like Core Temp

Thanks,

Rowan


 
Please read this before reporting bugs: How to report issues to us!
Send me your scene!

2019-05-31, 16:04:46
Reply #4

royvaes

  • Active Users
  • **
  • Posts: 22
    • View Profile
    • VAES visuals
Thank you very much for your help guys! Realy appreciate it. On Monday the pc shop will open again and I will see if they have one of these motherboards in stock and switch them.

2019-05-31, 21:14:05
Reply #5

tallbox

  • Active Users
  • **
  • Posts: 139
  • George Nicola
    • View Profile
    • Architectural Visualizations | TALLBOX
Sorry to hear. I had a similar problem with my machine almost same specs. When I changed the MB to Arous and installed reliable water cooling the system does not shut down anymore. So my guess should be a heat problem. Try the following (I know it might sound ridiculous but worth checking).
Get an external fan (home appliance) from the nearest shop or if you have one. Open the case from both sides, turn on the fan at 100%, point it to the MB at an angle and run the render. The fan will not cool down the CPU that much but rather the VRM.

Best of luck
Architectural Visualizations / Deep work practitioner
https://www.tallboxdesign.com

2019-06-14, 10:16:04
Reply #6

royvaes

  • Active Users
  • **
  • Posts: 22
    • View Profile
    • VAES visuals
Hi Guys,

!! PROBLEM SOLVED !!

Thanks for the help. The problems where caused my the motherboard. With advice of Juraj talcik, I bought a new motherboard.   
Asus rog zenith extreme alpha. Works like a beast now!

2019-06-14, 11:19:46
Reply #7

rowmanns

  • Corona Team
  • Active Users
  • ****
  • Posts: 1892
  • Corona for 3ds Max QA Team
    • View Profile
Hi,

I'm glad it's fixed! :)

Cheers,

Rowan
Please read this before reporting bugs: How to report issues to us!
Send me your scene!

2019-06-25, 23:47:27
Reply #8

slowgojoe

  • Active Users
  • **
  • Posts: 28
    • View Profile
You really should monitor temps (you can download Ryzen Master from AMD's website) with that processor. and get a good cooler. One that really covers the entire surface area of the chip. I recommend one of the Noctua fans.. even though they are air, they seem to work the best of all the different solutions i've tried.
I've built 6 threadripper's now (four 1950x, one 2950x, and one 2990wx), and cooling is probably the number 1 concern I have with all of them.

fwiw i am using the MSI SLI plus with my 2950x and it doesn't have any issues running corona even when overclocked. the 2990wx build is using the Gigabyte Designare mobo and seems to run great as well.
Good luck and enjoy!