Why is my GPU Performance worse with UE4.1?

I have just tested the last UE 4.1 and surprisingly my frame rate has decreased when comparing to UE 4.0.2.

From the GPU profiler, I have seen 3 major changes:

  • “ParticleInjection” takes 3.75ms vs 0.02ms before;

  • “Lights” takes 2.95ms vs 2.12ms before;

  • “PostProcessHistogram” is new and takes 1.25ms;

Do you have the same results?
Do you know why ParticleInjection is so long? I have only 20K GPU particles.

Hi, I’m not aware of any changes to the renderer that would explain those deltas.

“ParticleInjection” takes 3.75ms vs 0.02ms before;

This is dependent on how many particles are being spawned in that frame and changes dramatically over time, you can’t really look at a single frame result to know the average cost.

“Lights” takes 2.95ms vs 2.12ms before;

Can you see any children under Lights that explain the difference?

“PostProcessHistogram” is new and takes 1.25ms;

This is used by eye adaptation / dynamic exposure. That’s been enabled by default in UE4 for a long time. Maybe you profiled with fixed exposure before?

Hi, thank you for your answer. I will try to give you more details below.

Particle

I’ve done 3 profilings with UE 4.0.2 and 3 profilings with UE 4.1, and the results are consistent.

The particle system has a constant spawn rate of 5000 and the lifespan is between 3s and 5s, so at 60Hz, there are around ~20 particles emitted per frame. So, I can not explain this cost. Perhaps a GPU stall when I’m profiling?

Lighting

Here is the sub-log of “lights” with UE 4.0.2:

18.3% 2.12ms   Lights 0 draws 0 prims 0 verts
   18.3% 2.12ms   DirectLighting 0 draws 0 prims 0 verts
       6.7% 0.78ms   NonShadowedLights 0 draws 0 prims 0 verts
          5.3% 0.62ms   StandardDeferredLighting 1 draws 1 prims 3 verts
          1.4% 0.16ms   InjectNonShadowedTranslucentLighting 2 draws 256 prims 512 verts
      11.5% 1.34ms   treasure_map.Swimmer_C_0 1 draws 0 prims 0 verts
          7.5% 0.87ms   ShadowDepthsFromOpaque 1 draws 0 prims 0 verts
             7.4% 0.86ms   WholeScene 68 draws 305726 prims 603205 verts
          0.2% 0.02ms   ShadowProjectionOnOpaque 0 draws 0 prims 0 verts
             0.2% 0.02ms   WholeScene 3 draws 24 prims 16 verts
          0.2% 0.02ms   InjectTranslucentVolume 2 draws 96 prims 192 verts
          3.6% 0.42ms   LightFunction Material=flashlight_mat 2 draws 0 prims 0 verts
          0.0% 0.01ms   StandardDeferredLighting 1 draws 0 prims 0 verts

Lighting

Here is the sub-log of “lights” with UE 4.1:

16.9% 2.97ms   Lights 0 draws 0 prims 0 verts
   16.9% 2.97ms   DirectLighting 0 draws 0 prims 0 verts
       9.9% 1.75ms   NonShadowedLights 0 draws 0 prims 0 verts
          8.9% 1.57ms   StandardDeferredLighting 2 draws 1 prims 3 verts
          1.0% 0.18ms   InjectNonShadowedTranslucentLighting 4 draws 354 prims 708 verts
       6.9% 1.22ms   treasure_map.Swimmer_C_0 1 draws 0 prims 0 verts
          5.0% 0.88ms   ShadowDepthsFromOpaque 1 draws 0 prims 0 verts
             4.9% 0.87ms   WholeScene 69 draws 305894 prims 603392 verts
          0.1% 0.02ms   ShadowProjectionOnOpaque 0 draws 0 prims 0 verts
             0.1% 0.02ms   WholeScene 3 draws 24 prims 16 verts
          0.1% 0.01ms   InjectTranslucentVolume 2 draws 102 prims 204 verts
          1.7% 0.29ms   LightFunction Material=flashlight_mat 2 draws 0 prims 0 verts
          0.0% 0.01ms   StandardDeferredLighting 1 draws 0 prims 0 verts

There is now 2 draw calls for the StandardDeferredLighting.

Exposure

Here is the sub-log of “FinishRendering” with UE 4.0.2:

20.9% 2.42ms   FinishRendering 0 draws 0 prims 0 verts
    0.1% 0.01ms   RenderVelocities 11 draws 39904 prims 57824 verts
    2.2% 0.25ms   BokehDOFRecombine 1 draws 1 prims 3 verts
   14.7% 1.71ms   TemporalAA 2 draws 2 prims 6 verts
    0.1% 0.01ms   PostProcessEyeAdaptation 1 draws 1 prims 3 verts
    0.0% 0.00ms   PostProcessCombineLUTs 1 draws 32 prims 64 verts
    3.8% 0.44ms   Tonemapper#3 1 draws 1 prims 3 verts

Exposure

Here is the sub-log of “FinishRendering” with UE 4.1:

21.3% 3.75ms   FinishRendering 0 draws 0 prims 0 verts
    0.0% 0.01ms   RenderVelocities 11 draws 39904 prims 57824 verts
    1.4% 0.25ms   BokehDOFRecombine 1 draws 1 prims 3 verts
    9.5% 1.67ms   TemporalAA 2 draws 2 prims 6 verts
    0.8% 0.14ms   Downsample 1 draws 1 prims 3 verts
    7.1% 1.25ms   PostProcessHistogram 1 draws 1 prims 0 verts
    0.3% 0.05ms   PostProcessHistogramReduce 1 draws 1 prims 3 verts
    0.1% 0.01ms   PostProcessEyeAdaptation 1 draws 1 prims 3 verts
    0.0% 0.00ms   PostProcessCombineLUTs 1 draws 32 prims 64 verts
    2.0% 0.36ms   Tonemapper#3 1 draws 1 prims 3 verts

As you can see, the exposure was also enabled with UE 4.0.2.

We see a performance issue in the 4.1 release (Win8 and integrated GPU related) - there is a workaround.
Please verify if that solves your problem.

The PostProcessEyeAdaptation pass is always active, but the histogram pass is needed before PostProcessEyeAdaptation when

  • MinBrightness is less than MaxBrightness in the PP volume
  • Tonemapper is enabled
  • Exposure is set to auto (not fixed) on the editor viewport

8.9% 1.57ms StandardDeferredLighting 2 draws 1 prims 3 verts

This indicates there is one more dynamic light being drawn in your 4.1 capture

After a new set of bench with same assets, same settings, same code, and same camera position, it appears that:

  • the ParticleInjection time is very unstable. The same draw call with ~160 prims ~320 verts could cost from 0.02ms to 4.41ms with UE 4.0 or UE 4.1;
  • the “Lights” time is in fact the same with both version;
  • the “FinishRendering” time is higher with UE 4.1 due to the PostProcessHistogram;