gcadet_1
(GillesCadet)
April 24, 2014, 7:35pm
1
I have just tested the last UE 4.1 and surprisingly my frame rate has decreased when comparing to UE 4.0.2.
From the GPU profiler, I have seen 3 major changes:
“ParticleInjection” takes 3.75ms vs 0.02ms before;
“Lights” takes 2.95ms vs 2.12ms before;
“PostProcessHistogram” is new and takes 1.25ms;
Do you have the same results?
Do you know why ParticleInjection is so long? I have only 20K GPU particles.
Hi, I’m not aware of any changes to the renderer that would explain those deltas.
“ParticleInjection” takes 3.75ms vs 0.02ms before;
This is dependent on how many particles are being spawned in that frame and changes dramatically over time, you can’t really look at a single frame result to know the average cost.
“Lights” takes 2.95ms vs 2.12ms before;
Can you see any children under Lights that explain the difference?
“PostProcessHistogram” is new and takes 1.25ms;
This is used by eye adaptation / dynamic exposure. That’s been enabled by default in UE4 for a long time. Maybe you profiled with fixed exposure before?
gcadet_1
(GillesCadet)
April 25, 2014, 3:31pm
3
Hi, thank you for your answer. I will try to give you more details below.
gcadet_1
(GillesCadet)
April 25, 2014, 3:31pm
4
Particle
I’ve done 3 profilings with UE 4.0.2 and 3 profilings with UE 4.1, and the results are consistent.
The particle system has a constant spawn rate of 5000 and the lifespan is between 3s and 5s, so at 60Hz, there are around ~20 particles emitted per frame. So, I can not explain this cost. Perhaps a GPU stall when I’m profiling?
gcadet_1
(GillesCadet)
April 25, 2014, 3:32pm
5
Lighting
Here is the sub-log of “lights” with UE 4.0.2:
18.3% 2.12ms Lights 0 draws 0 prims 0 verts
18.3% 2.12ms DirectLighting 0 draws 0 prims 0 verts
6.7% 0.78ms NonShadowedLights 0 draws 0 prims 0 verts
5.3% 0.62ms StandardDeferredLighting 1 draws 1 prims 3 verts
1.4% 0.16ms InjectNonShadowedTranslucentLighting 2 draws 256 prims 512 verts
11.5% 1.34ms treasure_map.Swimmer_C_0 1 draws 0 prims 0 verts
7.5% 0.87ms ShadowDepthsFromOpaque 1 draws 0 prims 0 verts
7.4% 0.86ms WholeScene 68 draws 305726 prims 603205 verts
0.2% 0.02ms ShadowProjectionOnOpaque 0 draws 0 prims 0 verts
0.2% 0.02ms WholeScene 3 draws 24 prims 16 verts
0.2% 0.02ms InjectTranslucentVolume 2 draws 96 prims 192 verts
3.6% 0.42ms LightFunction Material=flashlight_mat 2 draws 0 prims 0 verts
0.0% 0.01ms StandardDeferredLighting 1 draws 0 prims 0 verts
gcadet_1
(GillesCadet)
April 25, 2014, 3:33pm
6
Lighting
Here is the sub-log of “lights” with UE 4.1:
16.9% 2.97ms Lights 0 draws 0 prims 0 verts
16.9% 2.97ms DirectLighting 0 draws 0 prims 0 verts
9.9% 1.75ms NonShadowedLights 0 draws 0 prims 0 verts
8.9% 1.57ms StandardDeferredLighting 2 draws 1 prims 3 verts
1.0% 0.18ms InjectNonShadowedTranslucentLighting 4 draws 354 prims 708 verts
6.9% 1.22ms treasure_map.Swimmer_C_0 1 draws 0 prims 0 verts
5.0% 0.88ms ShadowDepthsFromOpaque 1 draws 0 prims 0 verts
4.9% 0.87ms WholeScene 69 draws 305894 prims 603392 verts
0.1% 0.02ms ShadowProjectionOnOpaque 0 draws 0 prims 0 verts
0.1% 0.02ms WholeScene 3 draws 24 prims 16 verts
0.1% 0.01ms InjectTranslucentVolume 2 draws 102 prims 204 verts
1.7% 0.29ms LightFunction Material=flashlight_mat 2 draws 0 prims 0 verts
0.0% 0.01ms StandardDeferredLighting 1 draws 0 prims 0 verts
There is now 2 draw calls for the StandardDeferredLighting.
gcadet_1
(GillesCadet)
April 25, 2014, 3:33pm
7
Exposure
Here is the sub-log of “FinishRendering” with UE 4.0.2:
20.9% 2.42ms FinishRendering 0 draws 0 prims 0 verts
0.1% 0.01ms RenderVelocities 11 draws 39904 prims 57824 verts
2.2% 0.25ms BokehDOFRecombine 1 draws 1 prims 3 verts
14.7% 1.71ms TemporalAA 2 draws 2 prims 6 verts
0.1% 0.01ms PostProcessEyeAdaptation 1 draws 1 prims 3 verts
0.0% 0.00ms PostProcessCombineLUTs 1 draws 32 prims 64 verts
3.8% 0.44ms Tonemapper#3 1 draws 1 prims 3 verts
gcadet_1
(GillesCadet)
April 25, 2014, 3:33pm
8
Exposure
Here is the sub-log of “FinishRendering” with UE 4.1:
21.3% 3.75ms FinishRendering 0 draws 0 prims 0 verts
0.0% 0.01ms RenderVelocities 11 draws 39904 prims 57824 verts
1.4% 0.25ms BokehDOFRecombine 1 draws 1 prims 3 verts
9.5% 1.67ms TemporalAA 2 draws 2 prims 6 verts
0.8% 0.14ms Downsample 1 draws 1 prims 3 verts
7.1% 1.25ms PostProcessHistogram 1 draws 1 prims 0 verts
0.3% 0.05ms PostProcessHistogramReduce 1 draws 1 prims 3 verts
0.1% 0.01ms PostProcessEyeAdaptation 1 draws 1 prims 3 verts
0.0% 0.00ms PostProcessCombineLUTs 1 draws 32 prims 64 verts
2.0% 0.36ms Tonemapper#3 1 draws 1 prims 3 verts
As you can see, the exposure was also enabled with UE 4.0.2.
We see a performance issue in the 4.1 release (Win8 and integrated GPU related) - there is a workaround.
Please verify if that solves your problem.
The PostProcessEyeAdaptation pass is always active, but the histogram pass is needed before PostProcessEyeAdaptation when
MinBrightness is less than MaxBrightness in the PP volume
Tonemapper is enabled
Exposure is set to auto (not fixed) on the editor viewport
8.9% 1.57ms StandardDeferredLighting 2 draws 1 prims 3 verts
This indicates there is one more dynamic light being drawn in your 4.1 capture
gcadet_1
(GillesCadet)
April 28, 2014, 10:17pm
12
After a new set of bench with same assets, same settings, same code, and same camera position, it appears that:
the ParticleInjection time is very unstable. The same draw call with ~160 prims ~320 verts could cost from 0.02ms to 4.41ms with UE 4.0 or UE 4.1;
the “Lights” time is in fact the same with both version;
the “FinishRendering” time is higher with UE 4.1 due to the PostProcessHistogram;