Author Topic: CPU FAQ  (Read 4085 times)

2017-07-04, 17:30:36

Ryuu

  • Corona Team
  • Active Users
  • ****
  • Posts: 567
  • Michal
    • View Profile
What is a thread?
In the context of computing, a thread is a piece of executable code with its own state. Apart from maintaining its own state (private data), each thread can also access global data shared by all threads belonging to the same application (process).
 
For example in Corona we launch as many threads as you have CPU cores and instruct the OS to run each thread on a different CPU core. This way all the threads run in parallel at once. The private data of the thread is the ray this thread is currently tracing, while the shared data is the scene, textures, etc.
 
What exactly does 100% usage in Windows task manager mean?
This number only says that some threads are scheduled by Windows to be executed on all CPU cores 100% of the time. It does not tell you how much of the physical execution resources on the CPU are actually utilized.
 
How is it possible that there is code being executed 100% of the time and the CPU is not fully utilized?
Modern CPUs have many execution units specialized for different tasks. Most applications are able to utilize only some portion of these units at any given time. For example MS Word doesn't need to do much floating point calculations, so the floating point execution units just sit idle when the CPU is running Word.
 
Even if the application was able to keep all the execution units fully utilized, it may not be able to feed them with data fast enough. Modern CPUs are a few orders of magnitude faster than modern RAM, so when the CPU is able to process the data faster than the memory is able to supply it, the CPU will end up waiting for the data most of the time.
 
Is there any way to run a different piece of code on the idle execution units?
Yes, this technology is called SMT (Symmetric MultiThreading). Intel calls their implementation of this technology HyperThreading. With this technology, a single physical CPU core executes two or more threads at once.
 
Unfortunately, unless the simultaneously running threads have vastly different computation requirements, they’ll end up competing for the same execution units most of the time. In normal circumstances, running two threads on a single physical CPU core gives 10%-20% performance boost over running just a single thread.
 
Why is my CPU temperature higher/fan louder during the denoising compared to the rendering?
The memory access pattern of the rendering code is very random - for example, each ray may hit a completely different object. This makes it very hard for the CPU to predict which parts of the memory will be accessed next, and in the end the CPU ends up waiting for data to be fetched from memory most of the time.
 
When the image is denoised, it is accessed sequentially one pixel at a time. Also the processing of each pixel takes long enough so that the CPU is able to read the next pixel from memory before it is needed. This is a best case scenario for the CPU - it does not have to wait for memory, and the execution units are (almost) fully utilized.
 
When the execution units sit idle waiting for memory, the CPU is able to optimize its energy consumption (and therefore heat generation), but when the denoising kicks in, the CPU units are kept busy all the time and so the temperature goes up.

2017-07-04, 17:31:51
Reply #1

Ryuu

  • Corona Team
  • Active Users
  • ****
  • Posts: 567
  • Michal
    • View Profile
Since we kept answering the same questions all over again, here's a simple FAQ. Let us know if something isn't clear enough, anything that needs more/less detail, etc.

2017-07-05, 14:08:16
Reply #2

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 8896
  • Marcin
    • View Profile

2019-03-30, 02:27:39
Reply #3

Smetz

  • Users
  • *
  • Posts: 4
    • View Profile
I'm trying to solve a problems perhaps someone here can help me out with.
I have a MacBook Pro i7 and just bought an AMD 24core Threadripper.  They both have 16gb ram.   

When rendering simple Cinema4D animation they render about the exact same speed despite the Threadripper having 3x the cores.  Whats going on here?  Am I bottlenecking the Threadripper with the 16gb ram?   Any insight you can help an artist like me would be great.  Thank you!

2019-04-02, 17:54:06
Reply #4

maru

  • Corona Team
  • Active Users
  • ****
  • Posts: 8896
  • Marcin
    • View Profile
What RAM frequency do you have in your Threadripper PC? It has to be at least 2933 MHz to get good performance, and generally the higher the frequency, the better.

2019-04-02, 18:38:01
Reply #5

Smetz

  • Users
  • *
  • Posts: 4
    • View Profile
My ram is 2400MHz on the ThreadRipper and 1600 on the MacBook,  I'm assuming faster ram will prepare the scene faster.  Since these are such quick to render frames.   I also think that having only two 8Gb Ram sticks on a CPU with 4 dies is probably causing the bottleneck.   Would that make sense?
Thank you,

2019-05-08, 16:42:40
Reply #6

SairesArt

  • Active Users
  • **
  • Posts: 663
  • Pizza | The Cheesen One
    • View Profile
    • SairesArt Portfolio
faster ram will prepare the scene faster
No. RAM speed is not the bottleneck in scene parsing / preperation. Your CPU is the sole bottleneck during scene parsing.

Last time I asked, 3dsMax delivers the scene data to the renderer single threaded ohh you are on C4D but the point still stands, so there isn't much you can do to upgrade, besides making sure that your Harddrive where the Textures are being pulled from does not cause a slow down.

two 8Gb Ram sticks on a CPU with 4 dies
This has nothing to do with each other. As long as RAM sticks are not an uneven number, which would prevent proper use of dual channel ram, such relationships don't matter.
Even if you would use 3 sticks, which would prevent proper dual channel you probably wouldn't even notice.
So no, CPU dies (I presume you mean cores, because you propably only have a single CPU die ;]  ) does not factor into how many ram sticks you should have.

If you have scene performance problems open up a thread about it and give a little more detail about your scene ( how many objects, how many textures and maybe a screenshot ), so it can be properly adressed.
« Last Edit: 2019-05-08, 16:50:45 by SairesArt »

2019-05-08, 20:45:49
Reply #7

diffuus

  • Active Users
  • **
  • Posts: 17
    • View Profile
Threadripper is QUAD channel, so you do need at least 4 RAM modules. Using only 2 RAM modules will basically half it's performance.

2019-05-08, 22:51:27
Reply #8

SairesArt

  • Active Users
  • **
  • Posts: 663
  • Pizza | The Cheesen One
    • View Profile
    • SairesArt Portfolio
Using only 2 RAM modules will basically half it's performance.
It will half it's maximum theoretical bandwidth, nothing more.
You will be hard pressed to find any benchmark showing benefits of Quad channel vs Dual channel, that aren't synthetic.

Leaving the realm of practically and entering overclocking it can even be the other way around. Because how infinity fabric (jesus, who names these, does thanos work at AMD?) is setup, much of then CPU is tied to the RAM speed and two RAM sticks often clock higher than 4. Remember the witcher 3 promotional material for Ryzen, showing how higher clocks achieve higher fps? Shortly after came out guides showing how 2 RAM sticks clock higher and run thus faster than 4 in benchmarks, because no non-synthetic benchmark is actually bound by bandwidth specifically.

But all in all these technicalities don't matter in the grand scheme of things. Especially since Corona's mantra of simplicity and not getting caught up in technicalities brought a departure from settings ridden render setup tabs.

2019-05-09, 06:58:56
Reply #9

sprayer

  • Active Users
  • **
  • Posts: 568
    • View Profile
SairesArt
Did you setup AMD config? They really have magic working fast on faster RAM also RAM slot have similar magic what half speed ram clock if you wrong place in modules slot

2019-05-09, 13:35:43
Reply #10

SairesArt

  • Active Users
  • **
  • Posts: 663
  • Pizza | The Cheesen One
    • View Profile
    • SairesArt Portfolio
SairesArt
Did you setup AMD config? They really have magic working fast on faster RAM also RAM slot have similar magic what half speed ram clock if you wrong place in modules slot
Yes, I'm an AMD guy since forever :]
Indeed, with how the Zen architecture ties much of it's internals to RAM speed, it does have a surprising amount of speed up with faster RAM, which is why I paid premium for 3200mhz RAM and OC'd it to 3400. And you are correct, mistaking the RAM slot order (I mean they are color coordinated, that's hard to do) does prevent multi channel functionality of RAM, definitely showing a performance hit.

My point was to counter the "if you don't have Quad channel, you have half the performance", which is untrue in the sense of measuring real world performance.
I don't have a QuadChannel system, but I urge someone to do a comparison on the Corona benchmark.
You will not have any performance difference between dual channel and quad channel on the corona benchmark.