[Closed] Nested material functions and shader code optimization
I noticed that encapsulating two material functions into a larger material function can obstruct shader code optimizations. In such cases, having the encapsulated material function nodes explicit in the actual material asset yields the expected optimized code.
I have tried come up with a very simple example to illustrate the issue.
Assume two hypothetical material functions, acting like a demultiplex multiplex pair:
We could now explicitly place and connect both of them in a material, or we could encapsulate them within a FCodec material function and place this abstraction instead in the material.
If we use the FCodec node, as in MTestJoined, and inspect the generated HLSL code, we can notice that each of its outputs is making an invocation to the FEncoder logic (in this case, to the Custom Expression within it, but could be a standard material node network), even though the inputs did not change.
On the other hand, if we place FEncoder and FDecoder directly in the material instead, as in MTestSplit, only one invocation is made, as expected.
Sure, for such simple material functions, the resulting amount of machine instructions will be the same, since the shader optimizer can do a good job of detecting the redundancies. However, in many of our real/complex materials, the shader optimizer is unable to detect the pleonasm and we end up paying a heavy performance toll for this redundancy (especially if whatever custom expression logic being invoked contains loops and such).
Is there any way to assist the shader code generator in this matter?
Thanks in advance.
The question has been closed Sep 16 '16 at 03:00 PM by AndrewHurley for the following reason:
The question is answered, right answer was accepted
Thanks for the detailed explanation, it seems you've already performed a solid investigation into what's happening here. When the translator steps into a material function it creates a temporary expression map to allow better scope control and result sharing, but when we leave the function that map is discarded. Other material pins are re-entering the function but creating a new map without realizing the code had already been translated. This leads to custom expressions incorrectly duplicating their definitions, but it sounds like your caching workaround is already attempting to handle that?
I've created issue UE-32897 to track this task internally, to better handle custom expression calls and avoid duplicate inclusion. Additionally we could improve some code sharing between function expression stacks. The main issue is that the code generation step is done by a translator only, it doesn't necessarily have understanding of any of the underlying code so it's not always trivial to remove redundancy. Keeping the graph 'flatter' with less recursive function calls is the best way to avoid this.
If you're working on something more performance restricting then it might be better to create the functions in a .usf file then include that in the MaterialTemplate or Common shader files to allow use elsewhere. This will give you the greatest control over the final code and prevent the translator from generating multiple copies of the actual functionality, aiding the compiler.
answered Sep 16 '16 at 02:59 PM
Follow this question
Once you sign in you will be able to subscribe for any updates here